aboutsummaryrefslogtreecommitdiff
path: root/gcc/ada
diff options
context:
space:
mode:
authorRoger Sayle <roger@nextmovesoftware.com>2023-12-19 11:24:36 +0000
committerRoger Sayle <roger@nextmovesoftware.com>2023-12-19 11:24:36 +0000
commit7e1c440bc84c02e67b1cf338579a3274cdc337e0 (patch)
treef9312aa153a0a9e62369c1f72b8f3fab7e08f03d /gcc/ada
parentcf840a7f7c14242ab7018071310851486a557d4f (diff)
downloadgcc-7e1c440bc84c02e67b1cf338579a3274cdc337e0.zip
gcc-7e1c440bc84c02e67b1cf338579a3274cdc337e0.tar.gz
gcc-7e1c440bc84c02e67b1cf338579a3274cdc337e0.tar.bz2
i386: Improved TImode (128-bit) integer constants on x86_64.
This patch fixes two issues with the handling of 128-bit TImode integer constants in the x86_64 backend. The main issue is that GCC always tries to load 128-bit integer constants via broadcasts to vector SSE registers, even if the result is required in general registers. This is seen in the two closely related functions below: __int128 m; void foo() { m &= CONST; } void bar() { m = CONST; } When compiled with -O2 -mavx, we currently generate: foo: movabsq $81985529216486895, %rax vmovq %rax, %xmm0 vpunpcklqdq %xmm0, %xmm0, %xmm0 vmovq %xmm0, %rax vpextrq $1, %xmm0, %rdx andq %rax, m(%rip) andq %rdx, m+8(%rip) ret bar: movabsq $81985529216486895, %rax vmovq %rax, %xmm1 vpunpcklqdq %xmm1, %xmm1, %xmm0 vpextrq $1, %xmm0, %rdx vmovq %xmm0, m(%rip) movq %rdx, m+8(%rip) ret With this patch we defer the decision to use vector broadcast for TImode until we know that we actually want a SSE register result, by moving the call to ix86_convert_const_wide_int_to_broadcast from the RTL expansion pass, to the scalar-to-vector (STV) pass. With this change (and a minor tweak described below) we now generate: foo: movabsq $81985529216486895, %rax andq %rax, m(%rip) andq %rax, m+8(%rip) ret bar: movabsq $81985529216486895, %rax vmovq %rax, %xmm0 vpunpcklqdq %xmm0, %xmm0, %xmm0 vmovdqa %xmm0, m(%rip) ret showing that we now correctly use vector mode broadcasts (only) where appropriate. The one minor tweak mentioned above is to enable the un-cprop hi/lo optimization, that I originally contributed back in September 2004 https://gcc.gnu.org/pipermail/gcc-patches/2004-September/148756.html even when not optimizing for size. Without this (and currently with just -O2) the function foo above generates: foo: movabsq $81985529216486895, %rax movabsq $81985529216486895, %rdx andq %rax, m(%rip) andq %rdx, m+8(%rip) ret I'm not sure why (back in 2004) I thought that avoiding the implicit "movq %rax, %rdx" instead of a second load was faster, perhaps avoiding a dependency to allow better scheduling, but nowadays "movq %rax, %rdx" is either eliminated by GCC's hardreg cprop pass, or special cased by modern hardware, making the first foo preferrable, not only shorter but also faster. 2023-12-19 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_convert_const_wide_int_to_broadcast): Remove static. (ix86_expand_move): Don't attempt to convert wide constants to SSE using ix86_convert_const_wide_int_to_broadcast here. (ix86_split_long_move): Always un-cprop multi-word constants. * config/i386/i386-expand.h (ix86_convert_const_wide_int_to_broadcast): Prototype here. * config/i386/i386-features.cc: Include i386-expand.h. (timode_scalar_chain::convert_insn): When converting TImode to V1TImode, try ix86_convert_const_wide_int_to_broadcast. gcc/testsuite/ChangeLog * gcc.target/i386/movti-2.c: New test case. * gcc.target/i386/movti-3.c: Likewise.
Diffstat (limited to 'gcc/ada')
0 files changed, 0 insertions, 0 deletions