diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2023-10-30 16:17:42 +0000 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2023-10-30 16:17:42 +0000 |
commit | 31cc9824d1cd5e0f7fa145d0831a923479333cd6 (patch) | |
tree | f6aa115934bf827a620e69a88e1df8600fd6ab0a /libcpp/line-map.cc | |
parent | d24c3c533454449f4bfe3db5a4c7d0eaff08e3c7 (diff) | |
download | gcc-31cc9824d1cd5e0f7fa145d0831a923479333cd6.zip gcc-31cc9824d1cd5e0f7fa145d0831a923479333cd6.tar.gz gcc-31cc9824d1cd5e0f7fa145d0831a923479333cd6.tar.bz2 |
ARC: Improved ARC rtx_costs/insn_cost for SHIFTs and ROTATEs.
This patch overhauls the ARC backend's insn_cost target hook, and makes
some related improvements to rtx_costs, BRANCH_COST, etc. The primary
goal is to allow the backend to indicate that shifts and rotates are
slow (discouraged) when the CPU doesn't have a barrel shifter. I should
also acknowledge Richard Sandiford for inspiring the use of set_cost
in this rewrite of arc_insn_cost; this implementation borrows heavily
for the target hooks for AArch64 and ARM.
The motivating example is derived from PR rtl-optimization/110717.
struct S { int a : 5; };
unsigned int foo (struct S *p) {
return p->a;
}
With a barrel shifter, GCC -O2 generates the reasonable:
foo: ldb_s r0,[r0]
asl_s r0,r0,27
j_s.d [blink]
asr_s r0,r0,27
What's interesting is that during combine, the middle-end actually
has two shifts by three bits, and a sign-extension from QI to SI.
Trying 8, 9 -> 11:
8: r158:SI=r157:QI#0<<0x3
REG_DEAD r157:QI
9: r159:SI=sign_extend(r158:SI#0)
REG_DEAD r158:SI
11: r155:SI=r159:SI>>0x3
REG_DEAD r159:SI
Whilst it's reasonable to simplify this to two shifts by 27 bits when
the CPU has a barrel shifter, it's actually a significant pessimization
when these shifts are implemented by loops. This combination can be
prevented if the backend provides accurate-ish estimates for insn_cost.
Previously, without a barrel shifter, GCC -O2 -mcpu=em generates:
foo: ldb_s r0,[r0]
mov lp_count,27
lp 2f
add r0,r0,r0
nop
2: # end single insn loop
mov lp_count,27
lp 2f
asr r0,r0
nop
2: # end single insn loop
j_s [blink]
which contains two loops and requires about ~113 cycles to execute.
With this patch to rtx_cost/insn_cost, GCC -O2 -mcpu=em generates:
foo: ldb_s r0,[r0]
mov_s r2,0 ;3
add3 r0,r2,r0
sexb_s r0,r0
asr_s r0,r0
asr_s r0,r0
j_s.d [blink]
asr_s r0,r0
which requires only ~6 cycles, for the shorter shifts by 3 and sign
extension.
2023-10-30 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/arc/arc.cc (arc_rtx_costs): Improve cost estimates.
Provide reasonable values for SHIFTS and ROTATES by constant
bit counts depending upon TARGET_BARREL_SHIFTER.
(arc_insn_cost): Use insn attributes if the instruction is
recognized. Avoid calling get_attr_length for type "multi",
i.e. define_insn_and_split patterns without explicit type.
Fall-back to set_rtx_cost for single_set and pattern_cost
otherwise.
* config/arc/arc.h (COSTS_N_BYTES): Define helper macro.
(BRANCH_COST): Improve/correct definition.
(LOGICAL_OP_NON_SHORT_CIRCUIT): Preserve previous behavior.
Diffstat (limited to 'libcpp/line-map.cc')
0 files changed, 0 insertions, 0 deletions