diff options
author | Xionghu Luo <luoxhu@linux.ibm.com> | 2020-08-03 22:09:15 -0500 |
---|---|---|
committer | Xionghu Luo <luoxhu@linux.ibm.com> | 2020-08-03 22:09:15 -0500 |
commit | 265d817b1eb4644c7a9613ad6920315d98e2e0a4 (patch) | |
tree | 10b0950642c2dc9f5e0468fcfd882a895396e817 /gcc/cppdefault.c | |
parent | 6a1ad710ad20ef05296013679dd42724865a0396 (diff) | |
download | gcc-265d817b1eb4644c7a9613ad6920315d98e2e0a4.zip gcc-265d817b1eb4644c7a9613ad6920315d98e2e0a4.tar.gz gcc-265d817b1eb4644c7a9613ad6920315d98e2e0a4.tar.bz2 |
dse: Remove partial load after full store for high part access[PR71309]
v5 update as comments:
1. Move const_rhs out of loop;
2. Iterate from int size for read_mode.
This patch could optimize(works for char/short/int/void*):
6: r119:TI=[r118:DI+0x10]
7: [r118:DI]=r119:TI
8: r121:DI=[r118:DI+0x8]
=>
6: r119:TI=[r118:DI+0x10]
16: r122:DI=r119:TI#8
Final ASM will be as below without partial load after full store(stxv+ld):
ld 10,16(3)
mr 9,3
ld 3,24(3)
std 10,0(9)
std 3,8(9)
blr
It could achieve ~25% performance improvement for typical cases on
Power9. Bootstrap and regression tested on Power9-LE.
For AArch64, one ldr is replaced by mov with this patch:
ldp x2, x3, [x0, 16]
stp x2, x3, [x0]
ldr x0, [x0, 8]
=>
mov x1, x0
ldp x2, x0, [x0, 16]
stp x2, x0, [x1]
gcc/ChangeLog:
2020-08-04 Xionghu Luo <luoxhu@linux.ibm.com>
PR rtl-optimization/71309
* dse.c (find_shift_sequence): Use subreg of shifted from high part
register to avoid loading from address.
gcc/testsuite/ChangeLog:
2020-08-04 Xionghu Luo <luoxhu@linux.ibm.com>
PR rtl-optimization/71309
* gcc.target/powerpc/pr71309.c: New test.
Diffstat (limited to 'gcc/cppdefault.c')
0 files changed, 0 insertions, 0 deletions