Skip to content

[Cherry-Pick][Optimization] merge matmul and add(#6986) #7184

Merged
zoooo0820 merged 2 commits intoPaddlePaddle:release/2.5from
BingooYang:2.5/linear_opt
Apr 9, 2026
Merged

[Cherry-Pick][Optimization] merge matmul and add(#6986) #7184
zoooo0820 merged 2 commits intoPaddlePaddle:release/2.5from
BingooYang:2.5/linear_opt

Conversation

@BingooYang
Copy link
Copy Markdown
Contributor

Motivation

性能优化

Modifications

将UnquantizedLinearMethod中的matmul和add用linear替换。
带bias情况基本上有加速,不带bias情况小shape下性能有下降(主要是python层if等调度开销,linear内部实现也是matmul)。
810cbda4d14af8f770c231d04dbe1090

Usage or Command

Accuracy Tests

精度保持一致

Checklist

  • [ x ] Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • [ x ] Format your code, run pre-commit before commit.
  • [ x ] Add unit tests. Please write the reason in this PR if no unit tests.
  • [ x ] Provide accuracy results.
  • [ x ] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Apr 3, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (release/2.5@bccf388). Learn more about missing BASE report.

Additional details and impacted files
@@              Coverage Diff               @@
##             release/2.5    #7184   +/-   ##
==============================================
  Coverage               ?   69.19%           
==============================================
  Files                  ?      390           
  Lines                  ?    54384           
  Branches               ?     8574           
==============================================
  Hits                   ?    37631           
  Misses                 ?    14041           
  Partials               ?     2712           
Flag Coverage Δ
GPU 69.19% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@fastdeploy-bot fastdeploy-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-03 17:30 CST

📋 Review 摘要

PR 概述:将 UnquantizedLinearMethod.apply() 中的 matmul + add 替换为 paddle.nn.functional.linear,优化带 bias 场景的计算性能

变更范围model_executor/layers/linear.py(核心优化)、tests/e2e/utils/(测试 baseline 路径更新)

影响面 TagOP

问题

级别 文件 概述
🟡 建议 rollout_routing_replay_test_utils.py:160 baseline 路径硬编码日期 0403,建议语义化命名

总体评价

核心优化逻辑正确,新增的 bias 维度 assert 检查增强了代码健壮性。测试文件的 baseline 路径变更建议改用更语义化的版本命名方式。

if model_path:
baseline_path = os.path.join(model_path, f"R3_BaseLine_25_uint8/routing_replay_output_baseline_{model_name}")
baseline_path = os.path.join(
model_path, f"R3_BaseLine_dev_uint8_0403/routing_replay_output_baseline_{model_name}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 baseline 路径中硬编码了日期 0403

这种命名方式可能导致未来维护困难:

  1. 每次更新 baseline 都需要修改代码中的路径
  2. 日期含义不够清晰,dev_uint8_0403 中的 0403 指的是什么不明确

建议:

  1. 使用更语义化的版本命名,如 R3_BaseLine_v2.5_uint8
  2. 或者将 baseline 路径配置化,通过环境变量或配置文件管理

@EmmonsCurse EmmonsCurse changed the title [Optimization] merge matmul and add [Cherry-Pick][Optimization] merge matmul and add(#6986) Apr 3, 2026
Copy link
Copy Markdown

@fastdeploy-bot fastdeploy-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 AI Code Review | 2026-04-08 16:57 CST

📋 Review 摘要

PR 概述:将 UnquantizedLinearMethod 中的 matmul + add 融合为 paddle.nn.functional.linear 调用,提升推理性能。

变更范围model_executor/layers/linear.py、测试文件

影响面 Tag[Optimization] [OP]

问题

级别 文件 概述
🔴 Bug linear.py:87 assert 语句在每次 forward 时执行,引入额外性能开销

总体评价

PR 的性能优化方向正确,使用 paddle.nn.functional.linear 融合 matmul 和 add 是合理的优化策略。但在热路径(apply 方法)中添加 assert 语句会在每次推理时执行验证,这违背了性能优化的初衷。建议将形状验证移至初始化或权重加载阶段。测试文件中的预期输出更新和 baseline 路径变更属于正常的维护更新。

linear_out = paddle.add(linear_out, layer.bias)
return linear_out
bias = layer.bias
assert bias.dim() == 1 and bias.shape[-1] == layer.weight.shape[-1], (
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Bugapply 方法中每次 forward 都执行 assert 语句检查 bias 形状,这会引入不必要的性能开销,违背了性能优化的初衷。

影响分析

  • assert 语句在每次调用 forward_cuda 时都会执行,增加了 Python 层的开销
  • bias 的形状在初始化时就已经确定,无需在每次调用时验证
  • 这会部分抵消使用 paddle.nn.functional.linear 带来的性能收益

建议修复方式
将形状检查移到权重加载阶段,例如在 process_loaded_weights 中验证:

def process_loaded_weights(self, layer, weights) -> None:
    if layer.weight.dtype != weights.dtype:
        weights = weights.cast(layer.weight.dtype)
    layer.weight.set_value(weights)
    # 在这里验证 bias 和 weight 的形状匹配(仅在权重加载时执行一次)
    if layer.with_bias:
        assert layer.bias.dim() == 1 and layer.bias.shape[0] == layer.weight.shape[-1], (
            f"bias must be 1D with size equal to the last dim of weight, "
            f"but got bias.shape={layer.bias.shape}, weight.shape[-1]={layer.weight.shape[-1]}"
        )

这样验证只在模型加载时执行一次,而非每次推理调用。

Copy link
Copy Markdown
Collaborator

@zoooo0820 zoooo0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zoooo0820 zoooo0820 merged commit 324f083 into PaddlePaddle:release/2.5 Apr 9, 2026
35 of 38 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants