Skip to content

[RVV] add rvv f16 kernels for velu, vgelu, vapproxgelu, vsigmoid, vtanh#9986

Open
velonica0 wants to merge 1 commit intogoogle:masterfrom
velonica0:rvv-f16-elementwise
Open

[RVV] add rvv f16 kernels for velu, vgelu, vapproxgelu, vsigmoid, vtanh#9986
velonica0 wants to merge 1 commit intogoogle:masterfrom
velonica0:rvv-f16-elementwise

Conversation

@velonica0
Copy link
Copy Markdown

Add rvv f16 kernels for velu, vgelu, vapproxgelu, vsigmoid, vtanh.

Tested on SpacemiT K3 CPU(VLEN=256 and zvfh_zvfhmin).

K3 FP16 Activation Benchmark Results

Operator N=7680 (Scalar) N=7680 (RVV) N=65280 (Scalar) N=65280 (RVV) Speedup (Max)
f16-velu 83,954 5,601 713,689 47,433 15.0x
f16-vgelu 284,363 7,785 2,408,980 66,021 36.5x
f16-vapproxgelu 237,706 7,797 2,019,673 66,018 30.6x
f16-vsigmoid 289,988 16,620 2,465,351 141,139 17.5x
f16-vtanh 170,980 8,114 1,435,735 69,170 20.8x

Note: > 1. Speedup is calculated based on $N=65280$.
2. The scalar FP16 baseline is very slow — it either uses software emulation or widen-to-f32 with extra conversion overhead.

@ken-unger
Copy link
Copy Markdown
Contributor

Thanks. BTW I have f16-sin, cos, exp done and will post a PR tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants