Commit 5c7b7ec
[rocm-libraries] ROCm/rocm-libraries#7272 (commit d02f3c0)
[ck_tile][fmha_bwd] Fix sink_host OOB in group mode reference
runner (#7272)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Summary
In `fmha_bwd_runner.hpp`, the `sink_host` `HostTensor` is allocated with
first
dimension `shape_batch` (= 1 in group mode), but the reference forward
loop
accesses `sink_host(wb, i_h)` with `wb ∈ [0, batch-1]`. For any `wb >=
1` this
is an out-of-bounds heap read, silently corrupting the reference forward
math
chain (`lse_host`, `o_host`) and turning the bwd-side `d_sink_head_acc`
reference into non-deterministic garbage.
`HostTensor::operator()` does not bounds check, so the OOB is not caught
at
runtime. This manifests as intermittent `tile_example_fmha_bwd` failures
(25–67% fail rate) when `-sink_grad=1` is combined with `-mode=1` (group
mode),
with bit-exact but spurious `max_err` values like 4.27 / 14.6.
## Fix
One-line: allocate `sink_host` with `batch` (the real per-batch dim)
instead of
`shape_batch`, mirroring how `sink_host` is accessed by the loop.
```diff
- sink_grad ? std::array<ck_tile::index_t, 2>{shape_batch, nhead}
+ sink_grad ? std::array<ck_tile::index_t, 2>{batch, nhead}
Repro
tile_example_fmha_bwd -b=2 -h=2 -s=516 -s_k=253 -prec=bf16 -d=72 \
-bias=n -dbias=0 -p_drop=0 -iperm=1 -operm=1 -deterministic=0 \
-v=3 -mode=1 -kname=1 -sink_grad=1
Verification
- 0/30 fail on the repro config after fix
- Baselines (before fix):
- sink=1, mask=n: 25% fail rate (p ≈ 1.8e-4)
- sink=1, mask=t: 67% fail rate (p ≈ 6e-15)
Attribution
Shape bug introduced together with sink_grad in #5504. Unrelated to
#6914
(which is a fwd-only fix on a different code path)
```
## Submission Checklist
- [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.1 parent 6989cf8 commit 5c7b7ec
2 files changed
Lines changed: 81 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
264 | 264 | | |
265 | 265 | | |
266 | 266 | | |
267 | | - | |
| 267 | + | |
268 | 268 | | |
269 | 269 | | |
270 | 270 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
995 | 995 | | |
996 | 996 | | |
997 | 997 | | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
| 1002 | + | |
| 1003 | + | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
| 1007 | + | |
| 1008 | + | |
| 1009 | + | |
| 1010 | + | |
| 1011 | + | |
| 1012 | + | |
| 1013 | + | |
| 1014 | + | |
| 1015 | + | |
| 1016 | + | |
| 1017 | + | |
| 1018 | + | |
| 1019 | + | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
| 1065 | + | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
| 1070 | + | |
| 1071 | + | |
| 1072 | + | |
| 1073 | + | |
| 1074 | + | |
| 1075 | + | |
| 1076 | + | |
| 1077 | + | |
0 commit comments