diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2023-10-04 17:17:03 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2023-10-04 17:17:03 +0100 |
commit | 3ceb109fcb045de9d78b650112399bbb7df78cdc (patch) | |
tree | fc5f66f208c42f61ddee8a422f61852bda8d4b10 /gcc/config/arc/arc.md | |
parent | f4e7bba98c1ad143d253d57bcf39dade2a3a868e (diff) | |
download | gcc-3ceb109fcb045de9d78b650112399bbb7df78cdc.zip gcc-3ceb109fcb045de9d78b650112399bbb7df78cdc.tar.gz gcc-3ceb109fcb045de9d78b650112399bbb7df78cdc.tar.bz2 |
ARC: Split SImode shifts pre-reload on !TARGET_BARREL_SHIFTER.
This patch splits SImode shifts, for !TARGET_BARREL_SHIFTER targets,
after combine and before reload, in the split1 pass, as suggested by
the FIXME comment above output_shift in arc.cc. To do this I've
copied the implementation of the x86_pre_reload_split function from
the i386 backend, and renamed it arc_pre_reload_split.
Although the actual implementations of shifts remain the same
(as in output_shift), having them as explicit instructions in
the RTL stream allows better scheduling and use of compact forms
when available. The benefits can be seen in two short examples
below.
For the function:
unsigned int foo(unsigned int x, unsigned int y) {
return y << 2;
}
GCC with -O2 -mcpu=em would previously generate:
foo: add r1,r1,r1
add r1,r1,r1
j_s.d [blink]
mov_s r0,r1 ;4
and with this patch now generates:
foo: asl_s r0,r1
j_s.d [blink]
asl_s r0,r0
Notice the original (from shift_si3's output_shift) requires the
shift sequence to be monolithic with the same destination register
as the source (requiring an extra mov_s). The new version can
eliminate this move, and schedule the second asl in the branch
delay slot of the return.
For the function:
int x,y,z;
void bar()
{
x <<= 3;
y <<= 3;
z <<= 3;
}
GCC -O2 -mcpu=em currently generates:
bar: push_s r13
ld.as r12,[gp,@x@sda] ;23
ld.as r3,[gp,@y@sda] ;23
mov r2,0
add3 r12,r2,r12
mov r2,0
add3 r3,r2,r3
ld.as r2,[gp,@z@sda] ;23
st.as r12,[gp,@x@sda] ;26
mov r13,0
add3 r2,r13,r2
st.as r3,[gp,@y@sda] ;26
st.as r2,[gp,@z@sda] ;26
j_s.d [blink]
pop_s r13
where each shift by 3, uses ARC's add3 instruction, which is similar
to x86's lea implementing x = (y<<3) + z, but requires the value zero
to be placed in a temporary register "z". Splitting this before reload
allows these pseudos to be shared/reused. With this patch, we get
bar: ld.as r2,[gp,@x@sda] ;23
mov_s r3,0 ;3
add3 r2,r3,r2
ld.as r3,[gp,@y@sda] ;23
st.as r2,[gp,@x@sda] ;26
ld.as r2,[gp,@z@sda] ;23
mov_s r12,0 ;3
add3 r3,r12,r3
add3 r2,r12,r2
st.as r3,[gp,@y@sda] ;26
st.as r2,[gp,@z@sda] ;26
j_s [blink]
Unfortunately, register allocation means that we only share two of the
three "mov_s z,0", but this is sufficient to reduce register pressure
enough to avoid spilling r13 in the prologue/epilogue.
2023-10-04 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc-protos.h (emit_shift): Delete prototype.
(arc_pre_reload_split): New function prototype.
* config/arc/arc.cc (emit_shift): Delete function.
(arc_pre_reload_split): New predicate function, copied from i386,
to schedule define_insn_and_split splitters to the split1 pass.
* config/arc/arc.md (ashlsi3): Expand RTL template unconditionally.
(ashrsi3): Likewise.
(lshrsi3): Likewise.
(shift_si3): Move after other shift patterns, and disable when
operands[2] is one (which is handled by its own define_insn).
Use shiftr4_operator, instead of shift4_operator, as this is no
longer used for left shifts.
(shift_si3_loop): Likewise. Additionally remove match_scratch.
(*ashlsi3_nobs): New pre-reload define_insn_and_split.
(*ashrsi3_nobs): Likewise.
(*lshrsi3_nobs): Likewise.
(rotrsi3_cnt1): Rename define_insn from *rotrsi3_cnt1.
(add_shift): Rename define_insn from *add_shift.
* config/arc/predicates.md (shiftl4_operator): Delete.
(shift4_operator): Delete.
gcc/testsuite/ChangeLog
* gcc.target/arc/ashrsi-1.c: New TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/ashrsi-2.c: New !TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/ashrsi-3.c: Likewise.
* gcc.target/arc/ashrsi-4.c: Likewise.
* gcc.target/arc/ashrsi-5.c: Likewise.
* gcc.target/arc/lshrsi-1.c: New TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/lshrsi-2.c: New !TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/lshrsi-3.c: Likewise.
* gcc.target/arc/lshrsi-4.c: Likewise.
* gcc.target/arc/lshrsi-5.c: Likewise.
* gcc.target/arc/shlsi-1.c: New TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/shlsi-2.c: New !TARGET_BARREL_SHIFTER test case.
* gcc.target/arc/shlsi-3.c: Likewise.
* gcc.target/arc/shlsi-4.c: Likewise.
* gcc.target/arc/shlsi-5.c: Likewise.
Diffstat (limited to 'gcc/config/arc/arc.md')
-rw-r--r-- | gcc/config/arc/arc.md | 238 |
1 files changed, 182 insertions, 56 deletions
diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md index 4af7332..9d5c659 100644 --- a/gcc/config/arc/arc.md +++ b/gcc/config/arc/arc.md @@ -3401,70 +3401,19 @@ archs4x, archs4xd" [(set (match_operand:SI 0 "dest_reg_operand" "") (ashift:SI (match_operand:SI 1 "register_operand" "") (match_operand:SI 2 "nonmemory_operand" "")))] - "" - " -{ - if (!TARGET_BARREL_SHIFTER) - { - emit_shift (ASHIFT, operands[0], operands[1], operands[2]); - DONE; - } -}") + "") (define_expand "ashrsi3" [(set (match_operand:SI 0 "dest_reg_operand" "") (ashiftrt:SI (match_operand:SI 1 "register_operand" "") (match_operand:SI 2 "nonmemory_operand" "")))] - "" - " -{ - if (!TARGET_BARREL_SHIFTER) - { - emit_shift (ASHIFTRT, operands[0], operands[1], operands[2]); - DONE; - } -}") + "") (define_expand "lshrsi3" [(set (match_operand:SI 0 "dest_reg_operand" "") (lshiftrt:SI (match_operand:SI 1 "register_operand" "") (match_operand:SI 2 "nonmemory_operand" "")))] - "" - " -{ - if (!TARGET_BARREL_SHIFTER) - { - emit_shift (LSHIFTRT, operands[0], operands[1], operands[2]); - DONE; - } -}") - -(define_insn "shift_si3" - [(set (match_operand:SI 0 "dest_reg_operand" "=r") - (match_operator:SI 3 "shift4_operator" - [(match_operand:SI 1 "register_operand" "0") - (match_operand:SI 2 "const_int_operand" "n")])) - (clobber (match_scratch:SI 4 "=&r")) - (clobber (reg:CC CC_REG)) - ] - "!TARGET_BARREL_SHIFTER" - "* return output_shift (operands);" - [(set_attr "type" "shift") - (set_attr "length" "16")]) - -(define_insn "shift_si3_loop" - [(set (match_operand:SI 0 "dest_reg_operand" "=r,r") - (match_operator:SI 3 "shift_operator" - [(match_operand:SI 1 "register_operand" "0,0") - (match_operand:SI 2 "nonmemory_operand" "rn,Cal")])) - (clobber (match_scratch:SI 4 "=X,X")) - (clobber (reg:SI LP_COUNT)) - (clobber (reg:CC CC_REG)) - ] - "!TARGET_BARREL_SHIFTER" - "* return output_shift (operands);" - [(set_attr "type" "shift") - (set_attr "length" "16,20")]) + "") ; asl, asr, lsr patterns: ; There is no point in including an 'I' alternative since only the lowest 5 @@ -3512,6 +3461,183 @@ archs4x, archs4xd" (set_attr "predicable" "no,no,no,yes,no,no") (set_attr "cond" "canuse,nocond,canuse,canuse,nocond,nocond")]) +(define_insn_and_split "*ashlsi3_nobs" + [(set (match_operand:SI 0 "dest_reg_operand") + (ashift:SI (match_operand:SI 1 "register_operand") + (match_operand:SI 2 "nonmemory_operand")))] + "!TARGET_BARREL_SHIFTER + && operands[2] != const1_rtx + && arc_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ + if (CONST_INT_P (operands[2])) + { + int n = INTVAL (operands[2]) & 0x1f; + if (n <= 9) + { + if (n == 0) + emit_move_insn (operands[0], operands[1]); + else if (n <= 2) + { + emit_insn (gen_ashlsi3_cnt1 (operands[0], operands[1])); + if (n == 2) + emit_insn (gen_ashlsi3_cnt1 (operands[0], operands[0])); + } + else + { + rtx zero = gen_reg_rtx (SImode); + emit_move_insn (zero, const0_rtx); + emit_insn (gen_add_shift (operands[0], operands[1], + GEN_INT (3), zero)); + for (n -= 3; n >= 3; n -= 3) + emit_insn (gen_add_shift (operands[0], operands[0], + GEN_INT (3), zero)); + if (n == 2) + emit_insn (gen_add_shift (operands[0], operands[0], + const2_rtx, zero)); + else if (n) + emit_insn (gen_ashlsi3_cnt1 (operands[0], operands[0])); + } + DONE; + } + else if (n >= 29) + { + if (n < 31) + { + if (n == 29) + { + emit_insn (gen_andsi3_i (operands[0], operands[1], + GEN_INT (7))); + emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); + } + else + emit_insn (gen_andsi3_i (operands[0], operands[1], + GEN_INT (3))); + emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); + } + else + emit_insn (gen_andsi3_i (operands[0], operands[1], const1_rtx)); + emit_insn (gen_rotrsi3_cnt1 (operands[0], operands[0])); + DONE; + } + } + + rtx shift = gen_rtx_fmt_ee (ASHIFT, SImode, operands[1], operands[2]); + emit_insn (gen_shift_si3_loop (operands[0], operands[1], + operands[2], shift)); + DONE; +}) + +(define_insn_and_split "*ashlri3_nobs" + [(set (match_operand:SI 0 "dest_reg_operand") + (ashiftrt:SI (match_operand:SI 1 "register_operand") + (match_operand:SI 2 "nonmemory_operand")))] + "!TARGET_BARREL_SHIFTER + && operands[2] != const1_rtx + && arc_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ + if (CONST_INT_P (operands[2])) + { + int n = INTVAL (operands[2]) & 0x1f; + if (n <= 4) + { + if (n != 0) + { + emit_insn (gen_ashrsi3_cnt1 (operands[0], operands[1])); + while (--n > 0) + emit_insn (gen_ashrsi3_cnt1 (operands[0], operands[0])); + } + else + emit_move_insn (operands[0], operands[1]); + DONE; + } + } + + rtx pat; + rtx shift = gen_rtx_fmt_ee (ASHIFTRT, SImode, operands[1], operands[2]); + if (shiftr4_operator (shift, SImode)) + pat = gen_shift_si3 (operands[0], operands[1], operands[2], shift); + else + pat = gen_shift_si3_loop (operands[0], operands[1], operands[2], shift); + emit_insn (pat); + DONE; +}) + +(define_insn_and_split "*lshrsi3_nobs" + [(set (match_operand:SI 0 "dest_reg_operand") + (lshiftrt:SI (match_operand:SI 1 "register_operand") + (match_operand:SI 2 "nonmemory_operand")))] + "!TARGET_BARREL_SHIFTER + && operands[2] != const1_rtx + && arc_pre_reload_split ()" + "#" + "&& 1" + [(const_int 0)] +{ + if (CONST_INT_P (operands[2])) + { + int n = INTVAL (operands[2]) & 0x1f; + if (n <= 4) + { + if (n != 0) + { + emit_insn (gen_lshrsi3_cnt1 (operands[0], operands[1])); + while (--n > 0) + emit_insn (gen_lshrsi3_cnt1 (operands[0], operands[0])); + } + else + emit_move_insn (operands[0], operands[1]); + DONE; + } + } + + rtx pat; + rtx shift = gen_rtx_fmt_ee (LSHIFTRT, SImode, operands[1], operands[2]); + if (shiftr4_operator (shift, SImode)) + pat = gen_shift_si3 (operands[0], operands[1], operands[2], shift); + else + pat = gen_shift_si3_loop (operands[0], operands[1], operands[2], shift); + emit_insn (pat); + DONE; +}) + +;; shift_si3 appears after {ashr,lshr}si3_nobs +(define_insn "shift_si3" + [(set (match_operand:SI 0 "dest_reg_operand" "=r") + (match_operator:SI 3 "shiftr4_operator" + [(match_operand:SI 1 "register_operand" "0") + (match_operand:SI 2 "const_int_operand" "n")])) + (clobber (match_scratch:SI 4 "=&r")) + (clobber (reg:CC CC_REG)) + ] + "!TARGET_BARREL_SHIFTER + && operands[2] != const1_rtx" + "* return output_shift (operands);" + [(set_attr "type" "shift") + (set_attr "length" "16")]) + +;; shift_si3_loop appears after {ashl,ashr,lshr}si3_nobs +(define_insn "shift_si3_loop" + [(set (match_operand:SI 0 "dest_reg_operand" "=r,r") + (match_operator:SI 3 "shift_operator" + [(match_operand:SI 1 "register_operand" "0,0") + (match_operand:SI 2 "nonmemory_operand" "rn,Cal")])) + (clobber (reg:SI LP_COUNT)) + (clobber (reg:CC CC_REG)) + ] + "!TARGET_BARREL_SHIFTER + && operands[2] != const1_rtx" + "* return output_shift (operands);" + [(set_attr "type" "shift") + (set_attr "length" "16,20")]) + +;; Rotate instructions. + (define_insn "rotrsi3" [(set (match_operand:SI 0 "dest_reg_operand" "=r, r, r") (rotatert:SI (match_operand:SI 1 "arc_nonmemory_operand" " 0,rL,rCsz") @@ -5923,7 +6049,7 @@ archs4x, archs4xd" (zero_extract:SI (match_dup 1) (match_dup 5) (match_dup 7)))]) (match_dup 1)]) -(define_insn "*rotrsi3_cnt1" +(define_insn "rotrsi3_cnt1" [(set (match_operand:SI 0 "dest_reg_operand" "=r") (rotatert:SI (match_operand:SI 1 "nonmemory_operand" "rL") (const_int 1)))] @@ -6342,7 +6468,7 @@ archs4x, archs4xd" (set_attr "type" "multi") (set_attr "predicable" "yes")]) -(define_insn "*add_shift" +(define_insn "add_shift" [(set (match_operand:SI 0 "register_operand" "=q,r,r") (plus:SI (ashift:SI (match_operand:SI 1 "register_operand" "q,r,r") (match_operand:SI 2 "_1_2_3_operand" "")) |