arm multiheadattention bf16 storage#6717
Conversation
|
@codex review |
|
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #6717 +/- ##
==========================================
+ Coverage 93.90% 93.97% +0.07%
==========================================
Files 933 930 -3
Lines 310041 310141 +100
==========================================
+ Hits 291140 291466 +326
+ Misses 18901 18675 -226 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This PR enables bf16 storage support for the ARM MultiHeadAttention implementation by allowing bf16-backed pipelines while forcing specific intermediate GEMM outputs to fp32, and extends the ARM bf16s GEMM path to support fp32 output (via output_elemtype).
Changes:
- Enable
support_bf16_storageforMultiHeadAttention_armwhenNCNN_BF16is enabled, with special handling/disablement underint8_scale_term. - Force qk/qkv GEMM intermediates to output fp32 (
output_elemtype=fp32) and add casting for bf16v_affinewhere needed. - Extend ARM bf16s GEMM unpack/output handling to optionally write fp32 output (
output_elemtype == 1).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/layer/arm/multiheadattention_arm.cpp |
Enables bf16 storage support and adjusts GEMM options/intermediate buffers for fp32 outputs. |
src/layer/arm/gemm_bf16s.h |
Adds output_elemtype handling to allow bf16s GEMM unpack to write fp32 output. |
src/layer/arm/gemm_arm.cpp |
Plumbs output_elemtype through bf16s GEMM helpers and allocates output buffer size accordingly. |
src/layer/arm/gemm_arm_bf16.cpp |
Updates bf16 bf16 wrapper signature to forward output_elemtype. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
Dimensity 9000 bf16s
|
No description provided.