Commit e0269c4
fix: use rotated mask register in scalar byte tails
The scalar tail (.m_bytes, .u_bytes, .gf_bytes) used [rsi+1] and
[rsi+2] (original mask pointer) for the last 2-3 bytes instead of
the rotated r8d register. After AVX2 alignment prologues rotate the
mask for non-multiple-of-4 offsets, these produce wrong XOR bytes.
Fix all three tails to advance r8d via ror and use r8b consistently.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 8d632cf commit e0269c4
1 file changed
Lines changed: 12 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
500 | 500 | | |
501 | 501 | | |
502 | 502 | | |
| 503 | + | |
503 | 504 | | |
504 | | - | |
| 505 | + | |
505 | 506 | | |
506 | 507 | | |
507 | 508 | | |
| 509 | + | |
508 | 510 | | |
509 | | - | |
| 511 | + | |
510 | 512 | | |
511 | 513 | | |
512 | 514 | | |
| |||
928 | 930 | | |
929 | 931 | | |
930 | 932 | | |
931 | | - | |
932 | | - | |
| 933 | + | |
| 934 | + | |
933 | 935 | | |
934 | 936 | | |
935 | | - | |
936 | | - | |
| 937 | + | |
| 938 | + | |
937 | 939 | | |
938 | 940 | | |
939 | 941 | | |
| |||
2098 | 2100 | | |
2099 | 2101 | | |
2100 | 2102 | | |
| 2103 | + | |
2101 | 2104 | | |
2102 | | - | |
| 2105 | + | |
2103 | 2106 | | |
2104 | 2107 | | |
2105 | 2108 | | |
| 2109 | + | |
2106 | 2110 | | |
2107 | | - | |
| 2111 | + | |
2108 | 2112 | | |
2109 | 2113 | | |
2110 | 2114 | | |
| |||
0 commit comments