diff options
author | Roger Sayle <roger@nextmovesoftware.com> | 2023-06-28 11:11:34 +0100 |
---|---|---|
committer | Roger Sayle <roger@nextmovesoftware.com> | 2023-06-28 11:11:34 +0100 |
commit | 4afbebcdc5780d28e52b7d65643e462c7c3882ce (patch) | |
tree | 5cacff498c2174e7ca21234c75728371b04dcf76 /gcc/fortran/trans-decl.cc | |
parent | c027592d39b2968005aa28bc84a946bab2668db8 (diff) | |
download | gcc-4afbebcdc5780d28e52b7d65643e462c7c3882ce.zip gcc-4afbebcdc5780d28e52b7d65643e462c7c3882ce.tar.gz gcc-4afbebcdc5780d28e52b7d65643e462c7c3882ce.tar.bz2 |
i386: Add cbranchti4 pattern to i386.md (for -m32 compare_by_pieces).
This patch fixes some very odd (unanticipated) code generation by
compare_by_pieces with -m32 -mavx, since the recent addition of the
cbranchoi4 pattern. The issue is that cbranchoi4 is available with
TARGET_AVX, but cbranchti4 is currently conditional on TARGET_64BIT
which results in the odd behaviour (thanks to OPTAB_WIDEN) that with
-m32 -mavx, compare_by_pieces ends up (inefficiently) widening 128-bit
comparisons to 256-bits before performing PTEST.
This patch fixes this by providing a cbranchti4 pattern that's available
with either TARGET_64BIT or TARGET_SSE4_1.
For the test case below (again from PR 104610):
int foo(char *a)
{
static const char t[] = "0123456789012345678901234567890";
return __builtin_memcmp(a, &t[0], sizeof(t)) == 0;
}
GCC with -m32 -O2 -mavx currently produces the bonkers:
foo: pushl %ebp
movl %esp, %ebp
andl $-32, %esp
subl $64, %esp
movl 8(%ebp), %eax
vmovdqa .LC0, %xmm4
movl $0, 48(%esp)
vmovdqu (%eax), %xmm2
movl $0, 52(%esp)
movl $0, 56(%esp)
movl $0, 60(%esp)
movl $0, 16(%esp)
movl $0, 20(%esp)
movl $0, 24(%esp)
movl $0, 28(%esp)
vmovdqa %xmm2, 32(%esp)
vmovdqa %xmm4, (%esp)
vmovdqa (%esp), %ymm5
vpxor 32(%esp), %ymm5, %ymm0
vptest %ymm0, %ymm0
jne .L2
vmovdqu 16(%eax), %xmm7
movl $0, 48(%esp)
movl $0, 52(%esp)
vmovdqa %xmm7, 32(%esp)
vmovdqa .LC1, %xmm7
movl $0, 56(%esp)
movl $0, 60(%esp)
movl $0, 16(%esp)
movl $0, 20(%esp)
movl $0, 24(%esp)
movl $0, 28(%esp)
vmovdqa %xmm7, (%esp)
vmovdqa (%esp), %ymm1
vpxor 32(%esp), %ymm1, %ymm0
vptest %ymm0, %ymm0
je .L6
.L2: movl $1, %eax
xorl $1, %eax
vzeroupper
leave
ret
.L6: xorl %eax, %eax
xorl $1, %eax
vzeroupper
leave
ret
with this patch, we now generate the (slightly) more sensible:
foo: vmovdqa .LC0, %xmm0
movl 4(%esp), %eax
vpxor (%eax), %xmm0, %xmm0
vptest %xmm0, %xmm0
jne .L2
vmovdqa .LC1, %xmm0
vpxor 16(%eax), %xmm0, %xmm0
vptest %xmm0, %xmm0
je .L5
.L2: movl $1, %eax
xorl $1, %eax
ret
.L5: xorl %eax, %eax
xorl $1, %eax
ret
2023-06-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-expand.cc (ix86_expand_branch): Also use ptest
for TImode comparisons on 32-bit architectures.
* config/i386/i386.md (cbranch<mode>4): Change from SDWIM to
SWIM1248x to exclude/avoid TImode being conditional on -m64.
(cbranchti4): New define_expand for TImode on both TARGET_64BIT
and/or with TARGET_SSE4_1.
* config/i386/predicates.md (ix86_timode_comparison_operator):
New predicate that depends upon TARGET_64BIT.
(ix86_timode_comparison_operand): Likewise.
gcc/testsuite/ChangeLog
* gcc.target/i386/pieces-memcmp-2.c: New test case.
Diffstat (limited to 'gcc/fortran/trans-decl.cc')
0 files changed, 0 insertions, 0 deletions