aboutsummaryrefslogtreecommitdiff
path: root/gcc/ginclude/float.h
diff options
context:
space:
mode:
authorRoger Sayle <roger@nextmovesoftware.com>2020-11-16 16:55:29 -0700
committerJeff Law <law@redhat.com>2020-11-16 16:56:45 -0700
commitbe39636d9f68c437c8a2c2e7a225c4aed4663e78 (patch)
treebfeeaf670c92a0c70dd4c13ff9043d47dbba093c /gcc/ginclude/float.h
parent814e016318646d06b1662219cc716d502b76d8ce (diff)
downloadgcc-be39636d9f68c437c8a2c2e7a225c4aed4663e78.zip
gcc-be39636d9f68c437c8a2c2e7a225c4aed4663e78.tar.gz
gcc-be39636d9f68c437c8a2c2e7a225c4aed4663e78.tar.bz2
Improve code generation for x86_64 [PR 92180]
This patch catches a missed optimization opportunity where GCC currently generates worse code than LLVM. The issue, as nicely analyzed in bugzilla, boils down to the following three insns in combine: (insn 6 5 7 2 (parallel [ (set (reg:DI 85) (ashift:DI (reg:DI 85) (const_int 32 [0x20]))) (clobber (reg:CC 17 flags)) ]) "pr92180.c":4:10 564 {*ashldi3_1} (expr_list:REG_UNUSED (reg:CC 17 flags) (nil))) (insn 7 6 14 2 (parallel [ (set (reg:DI 84) (ior:DI (reg:DI 84) (reg:DI 85))) (clobber (reg:CC 17 flags)) ]) "pr92180.c":4:10 454 {*iordi_1} (expr_list:REG_DEAD (reg:DI 85) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil)))) (insn 14 7 15 2 (set (reg/i:SI 0 ax) (subreg:SI (reg:DI 84) 0)) "pr92180.c":5:1 67 {*movsi_internal} (expr_list:REG_DEAD (reg:DI 84) (nil))) Normally, combine/simplify-rtx would notice that insns 6 and 7 (which update highpart bits) are unnecessary as the final insn 14 only requires to lowpart bits. The complication is that insn 14 sets a hard register in targetm.class_likely_spilled_p which prevents combine from performing its simplifications, and removing the redundant instructions. At first glance a fix would appear to require changes to combine, potentially affecting code generation on all small register class targets... An alternate (and I think clever) solution is to spot that this problematic situation can be avoided by the backend. At RTL expansion time, the middle-end has a clear separation between pseudos and hard registers, so the RTL initially contains: (insn 9 8 10 2 (set (reg:SI 86) (subreg:SI (reg:DI 82 [ _1 ]) 0)) "pr92180.c":6:10 -1 (nil)) (insn 10 9 14 2 (set (reg:SI 83 [ <retval> ]) (reg:SI 86)) "pr92180.c":6:10 -1 (nil)) (insn 14 10 15 2 (set (reg/i:SI 0 ax) (reg:SI 83 [ <retval> ])) "pr92180.c":7:1 -1 (nil)) which can be optimized without problems by combine; it is only the intervening passes (initially fwprop1) that propagate computations into sets of hard registers, and disable those opportunities. The solution proposed here is to have the x86 backend/recog prevent early RTL passes composing instructions (that set likely_spilled hard registers) that they (combine) can't simplify, until after reload. We allow sets from pseudo registers, immediate constants and memory accesses, but anything more complicated is performed via a temporary pseudo. Not only does this simplify things for the register allocator, but any remaining register-to-register moves are easily cleaned up by the late optimization passes after reload, such as peephole2 and cprop_hardreg. This patch has been tested on x86_64-pc-linux-gnu with a "make bootstrap" and a "make -k check" with no new failures. Ok for mainline? gcc PR rtl-optimization/92180 * config/i386/i386.c (ix86_hardreg_mov_ok): New function to determine whether (set DST SRC) should be allowed at this point. * config/i386/i386-protos.h (ix86_hardreg_mov_ok): Prototype here. * config/i386/i386-expand.c (ix86_expand_move): Check whether this is a complex set of a likely spilled hard register, and if so place the value in a pseudo, and load the hard reg from it. * config/i386/i386.md (*movdi_internal, *movsi_internal) (*movhi_internal, *movqi_internal): Make these instructions conditional on ix86_hardreg_mov_ok. (*lea<mode>): Make this define_insn_and_split conditional on ix86_hardreg_mov_ok. gcc/testsuite PR rtl-optimization/92180 * gcc.target/i386/pr92180.c: New test.
Diffstat (limited to 'gcc/ginclude/float.h')
0 files changed, 0 insertions, 0 deletions