Skip to content

[CUDA] Branchless NF4/FP4 kDequantizeBlockwise kernel for faster dequantization #1561

[CUDA] Branchless NF4/FP4 kDequantizeBlockwise kernel for faster dequantization

[CUDA] Branchless NF4/FP4 kDequantizeBlockwise kernel for faster dequantization #1561

Status Success
Total duration 21s
Artifacts

lint.yml

on: pull_request
Fit to window
Zoom out
Zoom in