diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2023-07-12 14:09:54 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2023-07-12 14:10:58 +0100 |
commit | 46ade8c9cc860170ab4253cffd24169efa46ca70 (patch) | |
tree | 745f0049d0ff3bf4148b00d13a8db44ea4551cc4 | |
parent | a454325bea77a0dd79415480d48233a7c296bc0a (diff) | |
download | gcc-46ade8c9cc860170ab4253cffd24169efa46ca70.zip gcc-46ade8c9cc860170ab4253cffd24169efa46ca70.tar.gz gcc-46ade8c9cc860170ab4253cffd24169efa46ca70.tar.bz2 |
i386: Tweak ix86_expand_int_compare to use PTEST for vector equality.
I've come up with an alternate/complementary/supplementary fix to the
patch https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622706.html
for generating the PTEST during RTL expansion, rather than rely on
this being caught/optimized later during STV.
You'll notice in this patch, the tests for TARGET_SSE4_1 and TImode
appear last. When I was writing this, I initially also added support
for AVX VPTEST and OImode, before realizing that x86 doesn't (yet)
support 256-bit OImode (which also explains why we don't have an OImode
to V1OImode scalar-to-vector pass). Retaining this clause ordering
should minimize the lines changed if things change in future.
2023-07-12 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_int_compare): If
testing a TImode SUBREG of a 128-bit vector register against
zero, use a PTEST instruction instead of first moving it to
a pair of scalar registers.
-rw-r--r-- | gcc/config/i386/i386-expand.cc | 19 |
1 files changed, 18 insertions, 1 deletions
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index fd5d103..648d609 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -2987,9 +2987,26 @@ ix86_expand_int_compare (enum rtx_code code, rtx op0, rtx op1) cmpmode = SELECT_CC_MODE (code, op0, op1); flags = gen_rtx_REG (cmpmode, FLAGS_REG); + /* Attempt to use PTEST, if available, when testing vector modes for + equality/inequality against zero. */ + if (op1 == const0_rtx + && SUBREG_P (op0) + && cmpmode == CCZmode + && SUBREG_BYTE (op0) == 0 + && REG_P (SUBREG_REG (op0)) + && VECTOR_MODE_P (GET_MODE (SUBREG_REG (op0))) + && TARGET_SSE4_1 + && GET_MODE (op0) == TImode + && GET_MODE_SIZE (GET_MODE (SUBREG_REG (op0))) == 16) + { + tmp = SUBREG_REG (op0); + tmp = gen_rtx_UNSPEC (CCZmode, gen_rtvec (2, tmp, tmp), UNSPEC_PTEST); + } + else + tmp = gen_rtx_COMPARE (cmpmode, op0, op1); + /* This is very simple, but making the interface the same as in the FP case makes the rest of the code easier. */ - tmp = gen_rtx_COMPARE (cmpmode, op0, op1); emit_insn (gen_rtx_SET (flags, tmp)); /* Return the test that should be put into the flags user, i.e. |