aboutsummaryrefslogtreecommitdiff
path: root/gcc/cppdefault.c
diff options
context:
space:
mode:
authorXionghu Luo <luoxhu@linux.ibm.com>2020-08-03 22:09:15 -0500
committerXionghu Luo <luoxhu@linux.ibm.com>2020-08-03 22:09:15 -0500
commit265d817b1eb4644c7a9613ad6920315d98e2e0a4 (patch)
tree10b0950642c2dc9f5e0468fcfd882a895396e817 /gcc/cppdefault.c
parent6a1ad710ad20ef05296013679dd42724865a0396 (diff)
downloadgcc-265d817b1eb4644c7a9613ad6920315d98e2e0a4.zip
gcc-265d817b1eb4644c7a9613ad6920315d98e2e0a4.tar.gz
gcc-265d817b1eb4644c7a9613ad6920315d98e2e0a4.tar.bz2
dse: Remove partial load after full store for high part access[PR71309]
v5 update as comments: 1. Move const_rhs out of loop; 2. Iterate from int size for read_mode. This patch could optimize(works for char/short/int/void*): 6: r119:TI=[r118:DI+0x10] 7: [r118:DI]=r119:TI 8: r121:DI=[r118:DI+0x8] => 6: r119:TI=[r118:DI+0x10] 16: r122:DI=r119:TI#8 Final ASM will be as below without partial load after full store(stxv+ld): ld 10,16(3) mr 9,3 ld 3,24(3) std 10,0(9) std 3,8(9) blr It could achieve ~25% performance improvement for typical cases on Power9. Bootstrap and regression tested on Power9-LE. For AArch64, one ldr is replaced by mov with this patch: ldp x2, x3, [x0, 16] stp x2, x3, [x0] ldr x0, [x0, 8] => mov x1, x0 ldp x2, x0, [x0, 16] stp x2, x0, [x1] gcc/ChangeLog: 2020-08-04 Xionghu Luo <luoxhu@linux.ibm.com> PR rtl-optimization/71309 * dse.c (find_shift_sequence): Use subreg of shifted from high part register to avoid loading from address. gcc/testsuite/ChangeLog: 2020-08-04 Xionghu Luo <luoxhu@linux.ibm.com> PR rtl-optimization/71309 * gcc.target/powerpc/pr71309.c: New test.
Diffstat (limited to 'gcc/cppdefault.c')
0 files changed, 0 insertions, 0 deletions