Skip to content

replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16#5181

Merged
martin-frbg merged 1 commit intoOpenMathLib:developfrom
taoye9:change_sbgemn_cast_bf16
Mar 13, 2025
Merged

replace customize bf16_to_fp32 with arm neon vcvtah_f32_bf16#5181
martin-frbg merged 1 commit intoOpenMathLib:developfrom
taoye9:change_sbgemn_cast_bf16

Conversation

@taoye9
Copy link
Copy Markdown
Contributor

@taoye9 taoye9 commented Mar 13, 2025

This pr is to replace hack func to cast bf16 to fp32 with standard arm neon intrinsics in arm64 sbgemv_n kernel in previous pr: #5160.

This PR may also slightly improve performance by reducing a cast from two to one assembly instruction—specifically, replacing (UMOV, UBFIZ) with SHL

@martin-frbg martin-frbg added this to the 0.3.30 milestone Mar 13, 2025
@martin-frbg martin-frbg merged commit 2f77855 into OpenMathLib:develop Mar 13, 2025
84 of 86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants