Heya, quick note that finegrained-fp8's act_quant kernel computes the per-block FP8 scale with s = max(abs(x)) / 448 with no eps, so we get s = 0 when x is a block of zeros, and the 0/0 division yields NaN.
I opened a PR for it earlier but didn't realize that it was contributor-only: #680
Heya, quick note that finegrained-fp8's act_quant kernel computes the per-block FP8 scale with
s = max(abs(x)) / 448with no eps, so we gets = 0when x is a block of zeros, and the 0/0 division yields NaN.I opened a PR for it earlier but didn't realize that it was contributor-only: #680