Found while fuzzing upstream LLVM HEAD with llvm/llvm-project#198373, llvm/llvm-project#196418, llvm/llvm-project#198412, and llvm/llvm-project#198419 applied. The original oracle finding was:
kind=oracle
index=0
input=0x0
o0=0xFFFFFFFF
o2=0xFF22DD00
expected=0xFF22DD00
Run the reproducer with:
known-miscompiles/run_ll_reproducer.sh \
known-miscompiles/m055-i64byteperm-loop-readfirstlane/reduced.llObserved result on LLVM HEAD with the local PR patches:
input=0x00000000
O0=0xffffffff
O2=0xff22dd00
mismatch=true
The minimized kernel keeps the normal idx < n guard so the one-element
runner does not execute out-of-bounds workitems. llvm-reduce got the original
680-line generated program down to 437 lines while preserving that guard and
the exact output mismatch; further instruction-level reduction stalled on
GPU-running candidates.
The remaining shape has a loop-carried i32 value whose backedge depends on a
branch between a SWAR-style value and an i64 byte-permutation fold:
%wide = or i64 ...
%ctpop = call i64 @llvm.ctpop.i64(i64 %wide)
%add.pop = add i64 %wide, shl i64 %ctpop, 8
%hi = trunc i64 (lshr i64 %add.pop, 32) to i32
%lo = trunc i64 %add.pop to i32
%fold = xor i32 %hi, %lo
%loop.phi = phi i32 [ %fold, %then ], [ %swar, %else ]For input 0, the LLVM interpreter and -O2 agree on 0xff22dd00, while
LLVM HEAD -O0 returns 0xffffffff.
LLVM HEAD -O0 emits a long divergent loop sequence with s_or_saveexec_b64,
v_readfirstlane_b32, VALU bit operations, and a final global_store_dword.
The bad assembly differs from ROCm HEAD in the local lowering around the
overflow/bitselect part of the loop body:
v_cndmask_b32_e64 v5, 0, -1, s[0:1]
v_and_b32_e64 v1, v0, v5
v_bitop3_b32 v4, v3, v5, v3 bitop3:0x30
v_bfi_b32 v2, v2, v6, v7
v_bitop3_b32 v0, v0, v3, v5 bitop3:0xf4ROCm HEAD compiles the same region differently and produces the expected
result. This looks like an LLVM HEAD-only -O0 lowering regression in the
divergent loop/bitselect sequence rather than a source-IR semantics issue.
| Toolchain | Result |
|---|---|
ROCm 7.2.3 source build from tag rocm-7.2.3, commit f58b06dce1f9c15707c5f808fd002e18c2accf7e, Release, sanitizer coverage, no ASan |
Passes: O0=0xff22dd00, O2=0xff22dd00. |
LLVM HEAD, commit 0dd29960cd6102b37651cc3f58f872652099b83b, with llvm/llvm-project#198373, llvm/llvm-project#196418, llvm/llvm-project#198412, and llvm/llvm-project#198419 applied locally |
Reproduces: O0=0xffffffff, O2=0xff22dd00. |
ROCm HEAD, commit a5de13684ba84db953b28e632ea304080a4318d0, with llvm/llvm-project#198373, llvm/llvm-project#196418, llvm/llvm-project#198412, and llvm/llvm-project#198419 applied locally |
Passes: O0=0xff22dd00, O2=0xff22dd00. |
The fuzzer now rejects loop-carried values depending on generated i64
byte-permutation idioms by default. Set
FUZZX_ALLOW_M055_I64BYTEPERM_LOOP=1 to re-enable this bug class.