diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2022-08-09 18:59:55 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2022-08-09 19:02:44 +0100 |
commit | a56c1641e9d25e46059168e811b4a2f185f07b6b (patch) | |
tree | bd0b6af060cf38ca5546b7466c96135c60096e16 /gcc/gcc.cc | |
parent | 6fc14f1963dfefead588a4cd8902d641ed69255c (diff) | |
download | gcc-a56c1641e9d25e46059168e811b4a2f185f07b6b.zip gcc-a56c1641e9d25e46059168e811b4a2f185f07b6b.tar.gz gcc-a56c1641e9d25e46059168e811b4a2f185f07b6b.tar.bz2 |
Use PTEST to perform AND in TImode STV of (A & B) != 0 on x86_64.
This x86_64 backend patch allows TImode STV to take advantage of the
fact that the PTEST instruction performs an AND operation. Previously
PTEST was (mostly) used for comparison against zero, by using the same
operands. The benefits are demonstrated by the new test case:
__int128 a,b;
int foo()
{
return (a & b) != 0;
}
Currently with -O2 -msse4 we generate:
movdqa a(%rip), %xmm0
pand b(%rip), %xmm0
xorl %eax, %eax
ptest %xmm0, %xmm0
setne %al
ret
with this patch we now generate:
movdqa a(%rip), %xmm0
xorl %eax, %eax
ptest b(%rip), %xmm0
setne %al
ret
Technically, the magic happens using new define_insn_and_split patterns.
Using two patterns allows this transformation to performed independently
of whether TImode STV is run before or after combine. The one tricky
case is that immediate constant operands of the AND behave slightly
differently between TImode and V1TImode: All V1TImode immediate operands
becomes loads, but for TImode only values that are not hilo_operands
need to be loaded. Hence the new *testti_doubleword accepts any
general_operand, but internally during split calls force_reg whenever
the second operand is not x86_64_hilo_general_operand. This required
(benefits from) some tweaks to TImode STV to support CONST_WIDE_INT in
more places, using CONST_SCALAR_INT_P instead of just CONST_INT_P.
2022-08-09 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (scalar_chain::convert_compare):
Create new pseudos only when/if needed. Add support for TEST,
i.e. (COMPARE (AND x y) (const_int 0)), using UNSPEC_PTEST.
When broadcasting V2DImode and V4SImode use new pseudo register.
(timode_scalar_chain::convert_op): Do nothing if operand is
already V1TImode. Avoid generating useless SUBREG conversions,
i.e. (SUBREG:V1TImode (REG:V1TImode) 0). Handle CONST_WIDE_INT
in addition to CONST_INT by using CONST_SCALAR_INT_P.
(convertible_comparison_p): Use CONST_SCALAR_INT_P to match both
CONST_WIDE_INT and CONST_INT. Recognize new *testti_doubleword
pattern as an STV candidate.
(timode_scalar_to_vector_candidate_p): Allow CONST_SCALAR_INT_P
operands in binary logic operations.
* config/i386/i386.cc (ix86_rtx_costs) <case UNSPEC>: Add costs
for UNSPEC_PTEST; a PTEST that performs an AND has the same cost
as regular PTEST, i.e. cost->sse_op.
* config/i386/i386.md (*testti_doubleword): New pre-reload
define_insn_and_split that recognizes comparison of TI mode AND
against zero.
* config/i386/sse.md (*ptest<mode>_and): New pre-reload
define_insn_and_split that recognizes UNSPEC_PTEST of identical
AND operands.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-stv-8.c: New test case.
Diffstat (limited to 'gcc/gcc.cc')
0 files changed, 0 insertions, 0 deletions