Skip to content

Commit 36f0b74

Browse files
author
Jeff Law
committed
[RISC-V][PR target/123838] Improve code generated for shifts with counts 31-N or 63-N
A shift count expressed at 31 - n ends up generating code like this: li a5,31 subw a5,a5,a1 sllw a0,a0,a5 ret Note how we had to load 31 into a constant for the subtraction. But instead of using 31 - n we can use a bit-not as it'll do precisely what we need in the bits that the shift instruction actually uses. This results in: not a1, a1 sllw a0, a0, a1 ret The core idea we're exploiting here is the processor implements SHIFT_COUNT_TRUNCATED semantics. so a SI shift only cares about the low 5 bits and DI the low 6 bits of the shift count. And if we think about what bit pattern -1 would be in those cases we get 31 and 63. We then exploit the identity -x = ~x + 1 // identity -1 - x = ~x // a tiny bit of algebra So in these limited cases we can place the the -1 - x with ~x. I didn't implement this in simplify-rtx. It wasn't actually going to help because while the RISC-V chip implements SHIFT_COUNT_TRUNCATED semantics, it doesn't define SHIFT_COUNT_TRUNCATED for "reasons". So there's two patterns. One for an X mode destination, naturally the shift count is 31/63 - n for SI/DI respectively. It's a bit odd that the subtraction is always SImode, but that's probably narrowing happening somewhere. The second pattern covers the "w" forms for rv64. This trick probably works for the zbs instructions as well. That's going to be a whole lot more patterns and I haven't seen this idiom show up anywhere in practice, so it doesn't seem like a good cost/benefit analysis. This spun overnight on riscv32-elf and riscv64-elf and on the Pioneer without regressions. I'll wait for pre-commit CI to do its thing before pushing. PR target/123838 gcc/ * config/riscv/riscv.md: Use splitters to simplify shifts where the shift count is 31-N or 63-N. gcc/testsuite * gcc.target/riscv/pr123838.c: New test. Co-authored-by: Austin Law <austinklaw@gmail.com>
1 parent 7c3e6df commit 36f0b74

2 files changed

Lines changed: 57 additions & 0 deletions

File tree

gcc/config/riscv/riscv.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4995,6 +4995,48 @@
49954995
{ operands[3] = GEN_INT (BITS_PER_WORD
49964996
- exact_log2 (INTVAL (operands[3]) + 1)); })
49974997

4998+
;; If a shift count is BITS_PER_WORD - 1 - N, then we can exploit the identity
4999+
;; that -x = ~x + 1 which is equivalent to (-1 - x) = ~x. When shifting only
5000+
;; low bits of X matter (5 for SI, 6 for DI). So 31/63 are equivalent to -1
5001+
;; for SI/DI shifts.
5002+
;;
5003+
;; Strangely, even for rv64, the shift computation is done in SI, presumably
5004+
;; something narrowed the arithmetic prior to gimple->rtl expansion.
5005+
;; Ultimately it gets wrapped with a SUBREG narrowing to QI.
5006+
(define_split
5007+
[(set (match_operand:X 0 "register_operand")
5008+
(any_shift_rotate:X
5009+
(match_operand:X 1 "register_operand")
5010+
(subreg:QI (minus:SI (match_operand 2 "bitpos_mask_operand")
5011+
(match_operand:SI 3 "register_operand")) 0)))
5012+
(clobber (match_operand:X 4 "register_operand"))]
5013+
""
5014+
[(set (match_dup 4) (not:X (match_dup 6)))
5015+
(set (match_dup 0) (any_shift_rotate:X (match_dup 1) (match_dup 5)))]
5016+
{
5017+
operands[5] = gen_lowpart (QImode, operands[4]);
5018+
operands[6] = gen_lowpart (word_mode, operands[3]);
5019+
})
5020+
5021+
;; This is the same thing as the prior pattern, but for 32 bit shifts on rv64.
5022+
(define_split
5023+
[(set (match_operand:DI 0 "register_operand")
5024+
(sign_extend:DI
5025+
(any_shift_rotate:SI
5026+
(match_operand:SI 1 "register_operand")
5027+
(subreg:QI (minus:SI (const_int 31)
5028+
(match_operand:SI 2 "register_operand")) 0))))
5029+
(clobber (match_operand:DI 3 "register_operand"))]
5030+
"TARGET_64BIT"
5031+
[(set (match_dup 3) (not:DI (match_dup 2)))
5032+
(set (match_dup 0)
5033+
(sign_extend:DI (any_shift_rotate:SI (match_dup 1)
5034+
(match_dup 4))))]
5035+
{
5036+
operands[2] = gen_lowpart (DImode, operands[2]);
5037+
operands[4] = gen_lowpart (QImode, operands[3]);
5038+
})
5039+
49985040
;; Standard extensions and pattern for optimization
49995041
(include "bitmanip.md")
50005042
(include "crypto.md")
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
/* { dg-do compile } */
2+
/* { dg-skip-if "" { *-*-* } { "-O0" "-Og"} } */
3+
4+
#define F(NAME, OP, TYPE) TYPE f##NAME##TYPE (TYPE x, TYPE n) { return x OP (sizeof (TYPE) * 8 - 1 - n); }
5+
6+
F(RSHIFT, >>, int)
7+
F(LSHIFT, <<, int)
8+
F(RSHIFT, >>, long)
9+
F(LSHIFT, <<, long)
10+
11+
/* { dg-final { scan-assembler-times "not\t" 4 } } */
12+
/* { dg-final { scan-assembler-not "li\t" } } */
13+
/* { dg-final { scan-assembler-not "sub\t" } } */
14+
/* { dg-final { scan-assembler-not "subw\t" } } */
15+

0 commit comments

Comments
 (0)