Age | Commit message (Collapse) | Author | Files | Lines |
|
Fixes: 77eccbf39ed5
rs6000.h has
#define PROCESSOR_POWERPC PROCESSOR_PPC604
#define PROCESSOR_POWERPC64 PROCESSOR_RS64A
which means that if you use things like -mcpu=powerpc -mvsx it will no
longer work after my latest .machine patch. This causes GCC build errors
in some cases, not a good idea (even if the errors are actually
pre-existing: using -mvsx with a machine that does not have VSX cannot
work properly).
2022-03-11 Segher Boessenkool <segher@kernel.crashing.org>
PR target/104829
* config/rs6000/rs6000.cc (rs6000_machine_from_flags): Don't output
"ppc" and "ppc64" based on rs6000_cpu.
|
|
PR target/104868 had had an issue where my code that updated the DImode to
TImode sign extension for power10 failed. In looking at the failure
message, the reason is when extendditi2 tries to split the insn, it
generates an insn that does not satisfy its constraints:
(set (reg:V2DI 65 1)
(vec_duplicate:V2DI (reg:DI 0)))
The reason is vsx_splat_v2di does not allow GPR register 0 when the will
be generating a mtvsrdd instruction. In the definition of the mtvsrdd
instruction, if the RA register is 0, it means clear the upper 64 bits of
the vector instead of moving register GPR 0 to those bits.
When I wrote the extendditi2 pattern, I forgot that mtvsrdd had that
behavior so I used a 'r' constraint instead of 'b'. In the rare case
where the value is in GPR register 0, this split will fail.
This patch uses the right constraint for extendditi2.
2022-03-11 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/104868
* config/rs6000/vsx.md (extendditi2): Use a 'b' constraint when
moving from a GPR register to an Altivec register.
|
|
This patch is the backend piece of my proposed fix to PR tree-opt/98335,
to allow C++ partial struct initialization to be as efficient/optimized
as full struct initialization.
With the middle-end patch just posted to gcc-patches, the test case
in the PR compiles on x86_64-pc-linux-gnu with -O2 to:
xorl %eax, %eax
movb c(%rip), %al
ret
with this additional peephole2 (actually four peephole2s):
movzbl c(%rip), %eax
ret
2022-03-11 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR tree-optimization/98335
* config/i386/i386.md (peephole2): Eliminate redundant insv.
Combine movl followed by movb. Transform xorl followed by
a suitable movb or movw into the equivalent movz[bw]l.
gcc/testsuite/ChangeLog
PR tree-optimization/98335
* g++.target/i386/pr98335.C: New test case.
* gcc.target/i386/pr98335.c: New test case.
|
|
After accounting for GPR -> XMM move cost for vec_construct the
base cost needs adjustments to not double-cost those. This also
lowers the cost when such move is not necessary.
2022-03-11 Richard Biener <rguenther@suse.de>
PR target/104762
* config/i386/i386.cc (ix86_builtin_vectorization_cost): Do not
cost the first lane of SSE pieces as inserts for vec_construct.
|
|
The documentation states about the predicable instruction attribute:
...
This attribute must be a boolean (i.e. have exactly two elements in its
list-of-values), with the possible values being no and yes.
...
The nvptx port has instead:
...
(define_attr "predicable" "false,true"
(const_string "true"))
...
Fix this by updating to:
...
(define_attr "predicable" "no,yes"
(const_string "yes"))
...
Tested on nvptx.
gcc/ChangeLog:
2022-03-08 Tom de Vries <tdevries@suse.de>
PR target/104840
* config/nvptx/nvptx.md (define_attr "predicable"): Use no,yes instead
of false,true.
|
|
I ran into a hang for this code:
...
#pragma omp target map(tofrom: counter_N0)
#pragma omp simd
for (int i = 0 ; i < 1 ; i++ )
{
#pragma omp atomic update
counter_N0 = counter_N0 + 1 ;
}
...
This has to do with the nature of -muniform-simt. It has two modes of
operation: inside and outside an SIMT region.
Outside an SIMT region, a warp pretends to execute a single thread, but
actually executes in all threads, to keep the local registers in all threads
consistent. This approach works unless the insn that is executed is a syscall
or an atomic insn. In that case, the insn is predicated, such that it
executes in only one thread. If the predicated insn writes a result to a
register, then that register is propagated to the other threads, after which
the local registers in all threads are consistent again.
Inside an SIMT region, a warp executes in all threads. However, the
predication and propagation for syscalls and atomic insns is also present
here, because nvptx_reorg_uniform_simt works on all code. Care has been taken
though to ensure that the predication and propagation is a nop. That is,
inside an SIMT region:
- the predicate evalutes to true for each thread, and
- the propagation insn copies a register from each thread to the same thread.
That works fine, until we use -mptx=6.0, and instead of using the deprecated
warp propagation insn shfl, we start using shfl.sync:
...
@%r33 atom.add.u32 _, [%r29], 1;
shfl.sync.idx.b32 %r30, %r30, %r32, 31, 0xffffffff;
...
The shfl.sync specifies a member mask indicating all threads, but given that
the loop only has a single iteration, only thread 0 will execute the insn,
where it will hang waiting for the other threads.
Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the
uniform warp check) such that it only executes outside the SIMT region.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-03-08 Tom de Vries <tdevries@suse.de>
PR target/104783
* config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate)
(nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate.
(nvptx_get_unisimt_outside_simt_predicate): New function.
(predicate_insn): New function, factored out of ...
(nvptx_reorg_uniform_simt): ... here. Predicate all emitted insns.
* config/nvptx/nvptx.h (struct machine_function): Add
unisimt_outside_simt_predicate field.
* config/nvptx/nvptx.md (define_insn "nvptx_warpsync")
(define_insn "nvptx_uniform_warp_check"): Make predicable.
libgomp/ChangeLog:
2022-03-10 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.c/pr104783.c: New test.
|
|
For an example:
...
#pragma omp target map(tofrom: counter_N0)
#pragma omp simd
for (int i = 0 ; i < 1 ; i++ )
{
#pragma omp atomic update
counter_N0 = counter_N0 + 1 ;
}
...
I noticed that the result of the atomic update (%r30) is propagated:
...
@%r33 atom.add.u32 _, [%r29], 1;
shfl.sync.idx.b32 %r30, %r30, %r32, 31, 0xffffffff;
...
even though it is unused (which is why the bit bucket operand _ is used).
Fix this by not emitting the shuffle in this case, such that we have instead:
...
@%r33 atom.add.u32 _, [%r29], 1;
bar.warp.sync 0xffffffff;
...
Tested on nvptx.
gcc/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Handle unused
result.
gcc/testsuite/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/uniform-simt-4.c: New test.
|
|
For an atomic fetch operation that doesn't use the result:
...
__atomic_fetch_add (p64, v64, MEMMODEL_RELAXED);
...
we currently emit:
...
atom.add.u64 %r26, [%r25], %r27;
...
Detect the REG_UNUSED reg-note for %r26, and emit instead:
...
atom.add.u64 _, [%r25], %r27;
...
Likewise for all atom insns.
Tested on nvptx.
gcc/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
PR target/104815
* config/nvptx/nvptx.cc (nvptx_print_operand): Handle 'x' operand
modifier.
* config/nvptx/nvptx.md: Use %x0 destination operand in atom insns.
gcc/testsuite/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
PR target/104815
* gcc.target/nvptx/atomic-bit-bucket-dest.c: New test.
|
|
The ptx manual prescribes the instruction format atom{.space}.op.type but the
compiler currently emits:
...
atom.b64.and %r31, [%r30], %r32;
...
which uses the instruction format atom{.space}.type.op.
Fix this by emitting instead:
...
atom.and.b64 %r31, [%r30], %r32;
...
Tested on nvptx.
gcc/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.md (define_insn "atomic_fetch_<logic><mode>"):
Emit atom.and.b64 instead of atom.b64.and.
gcc/testsuite/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
* gcc.target/nvptx/atomic_fetch-1.c: Update.
* gcc.target/nvptx/atomic_fetch-2.c: Update.
|
|
With commit 5b5e456f018 ("[nvptx] Build libraries with mptx=3.1") the
intention was that the ptx isa version for all libraries was switched back to
3.1 using MULTILIB_EXTRA_OPTS, without changing the default 6.0.
Further testing revealed that this is not the case, and some libs were still
build with 6.0.
Fix this by introducing an mptx=3.1 multilib.
Adding a multilib should be avoided if possible, because it adds build time.
But I think it's a reasonable trade-off. With --disable-multilib, the default
lib with misa=sm_30 and mptx=6.0 should be usable in most scenarios. With
--enable-multilib, we can enable older drivers, as well as generate code
similar to how that was done in previous gcc releases, which is very useful.
Tested on nvptx.
gcc/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Move mptx=3.1 ...
(MULTILIB_OPTIONS): ... here.
|
|
With commit 07667c911b1 ("[nvptx] Build libraries with misa=sm_30") the
intention was that the sm_xx for all libraries was switched back to sm_30
using MULTILIB_EXTRA_OPTS, without changing the default sm_35.
Testing on an sm_30 board revealed that still some libs were build with sm_35,
so fix this by switching back to default sm_30.
Tested on nvptx.
gcc/ChangeLog:
2022-03-07 Tom de Vries <tdevries@suse.de>
PR target/104758
* config/nvptx/nvptx.opt (misa): Set default to sm_30.
* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Remove misa=sm_30.
|
|
As mentioned in the PR, right now on powerpc* __SIZEOF_{FLOAT,IBM}128__
macros are predefined unconditionally, because {ieee,ibm}128_float_type_node
is always non-NULL, doesn't reflect whether __ieee128 or __ibm128 are
actually supported or not.
Based on patch review discussions, the following patch:
1) allows __ibm128 to be used in the sources even when !TARGET_FLOAT128_TYPE,
as long as long double is double double
2) ensures ibm128_float_type_node is non-NULL only if __ibm128 is supported
3) ensures ieee128_float_type_node is non-NULL only if __ieee128 is supported
(aka when TARGET_FLOAT128_TYPE)
4) predefines __SIZEOF_IBM128__ only when ibm128_float_type_node != NULL
5) newly predefines __SIZEOF_IEEE128__ if ieee128_float_type_node != NULL
6) predefines __SIZEOF_FLOAT128__ whenever ieee128_float_type_node != NULL
and __float128 macro is predefined to __ieee128
7) removes ptr_*128_float_type_node which nothing uses
8) in order not to ICE during builtin initialization when
ibm128_float_type_node == NULL, uses long_double_type_node as fallback
for the __builtin_{,un}pack_ibm128 builtins
9) errors when those builtins are called used when
ibm128_float_type_node == NULL (during their expansion)
10) moves the {,un}packif -> {,un}packtf remapping for these builtins in
expansion earlier, so that we don't ICE on them if not -mabi=ieeelongdouble
2022-03-10 Jakub Jelinek <jakub@redhat.com>
PR target/99708
* config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Remove
RS6000_BTI_ptr_ieee128_float and RS6000_BTI_ptr_ibm128_float.
(ptr_ieee128_float_type_node, ptr_ibm128_float_type_node): Remove.
* config/rs6000/rs6000-builtin.cc (rs6000_type_string): Return
"**NULL**" if type_node is NULL first. Handle
ieee128_float_type_node.
(rs6000_init_builtins): Don't initialize ptr_ieee128_float_type_node
and ptr_ibm128_float_type_node. Set ibm128_float_type_node and
ieee128_float_type_node to NULL rather than long_double_type_node if
they aren't supported. Do support __ibm128 even if
!TARGET_FLOAT128_TYPE when long double is double double.
(rs6000_expand_builtin): Error if bif_is_ibm128 and
!ibm128_float_type_node. Remap RS6000_BIF_{,UN}PACK_IF to
RS6000_BIF_{,UN}PACK_TF much earlier and only use bif_is_ibm128 check
for it.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
__SIZEOF_FLOAT128__ here and only iff __float128 macro is defined.
(rs6000_cpu_cpp_builtins): Don't define __SIZEOF_FLOAT128__ here.
Define __SIZEOF_IBM128__=16 if ieee128_float_type_node is non-NULL.
Formatting fix.
* config/rs6000/rs6000-gen-builtins.cc: Document ibm128 attribute.
(struct attrinfo): Add isibm128 member.
(TYPE_MAP_SIZE): Remove.
(type_map): Use [] instead of [TYPE_MAP_SIZE]. For "if" use
ibm128_float_type_node only if it is non-NULL, otherwise fall back
to long_double_type_node. Remove "pif" entry.
(parse_bif_attrs): Handle ibm128 attribute and print it for debugging.
(write_decls): Output bif_ibm128_bit and bif_is_ibm128.
(write_type_node): Use sizeof type_map / sizeof type_map[0]
instead of TYPE_MAP_SIZE.
(write_bif_static_init): Handle isibm128.
* config/rs6000/rs6000-builtins.def: Document ibm128 attribute.
(__builtin_pack_ibm128, __builtin_unpack_ibm128): Add ibm128
attribute.
* gcc.dg/pr99708.c: New test.
* gcc.target/powerpc/pr99708-2.c: New test.
* gcc.target/powerpc/convert-fp-128.c (mode_kf): Define only if
__FLOAT128_TYPE__ is defined.
|
|
On Mon, Mar 07, 2022 at 07:06:28AM -0800, H.J. Lu wrote:
> Since eh_return doesn't work with stack realignment, disable SSE on
> unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte
> incoming stack to avoid SSE usage which is caused by
The following change does that using LIBGCC2_UNWIND_ATTRIBUTE macro instead,
for ia32 only by forcing -mgeneral-regs-only on routines that call
__builtin_eh_return in libgcc.
2022-03-09 Jakub Jelinek <jakub@redhat.com>
PR target/104781
* config/i386/i386.h (LIBGCC2_UNWIND_ATTRIBUTE): Define for ia32.
|
|
gcc/
PR target/104842
* config/mips/mips.h (LUI_OPERAND): Cast the input to an unsigned
value before adding an offset.
|
|
Commits r12-7342 and r12-7344 made some cleanup, leaving
arm_binop_none_none_unone_qualifiers unused.
This is causing build failures with -Werror (eg bootstrap).
This patch fixes the problem by removing the definition of
arm_binop_none_none_unone_qualifiers and
BINOP_NONE_NONE_UNONE_QUALIFIERS which are now unused.
Tested by bootstraping on arm-linux-gnueaibhf.
2022-03-04 Christophe Lyon <christophe.lyon@arm.com>
gcc/
* config/arm/arm-builtins.cc
(arm_binop_none_none_unone_qualifiers): Delete.
(BINOP_NONE_NONE_UNONE_QUALIFIERS): Delete.
|
|
This amends an error message to correct punctuation and a little
better wording.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR translation/104552
gcc/ChangeLog:
* config/host-darwin.cc (darwin_gt_pch_get_address): Amend
the PCH out of memory error message punctuation and wording.
|
|
Disallow stack realignment and regparm nested function with EH return
since they don't work together.
gcc/
PR target/104781
* config/i386/i386.cc (ix86_expand_epilogue): Sorry if there is
stack realignment or regparm nested function with EH return.
gcc/testsuite/
PR target/104781
* gcc.target/i386/eh_return-1.c: Add -mincoming-stack-boundary=4.
* gcc.target/i386/eh_return-2.c: Likewise.
|
|
This patch relaxes the addressing modes for the mve full load and stores (by
full loads and stores I mean non-widening or narrowing loads and stores resp).
The code before was requiring a LO_REGNUM for these, where this is only a
requirement if the load is widening or the store narrowing.
gcc/ChangeLog:
PR target/104790
* config/arm/arm.h (MVE_STN_LDW_MODE): New MACRO.
* config/arm/arm.cc (mve_vector_mem_operand): Relax constraint on base
register for non widening loads or narrowing stores.
|
|
This will enable below
- vbroadcastss .LC1(%rip), %xmm0
+ movl $-45, %edx
+ vmovd %edx, %xmm0
+ vpshufd $0, %xmm0, %xmm0
According to microbenchmark, it's faster than broadcast from memory
for TARGET_INTER_UNIT_MOVES_TO_VEC.
gcc/ChangeLog:
* config/i386/sse.md (*vec_dupv4si): Disable memory operand
for !TARGET_INTER_UNIT_MOVES_TO_VEC when prefer_for_speed.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr100865-8a.c: Adjust testcase.
* gcc.target/i386/pr100865-8c.c: Ditto.
* gcc.target/i386/pr100865-9c.c: Ditto.
|
|
Like in r10-7215-g700d4cb08c88aec37c13e21e63dd61fd698baabc 2 years ago,
I've run
grep -v 'long long\|optab optab\|template template\|double double' *.{[chS],cc} */*.{[chS],cc} *.def config/*/* 2>/dev/null | grep ' \([a-zA-Z]\+\) \1 '
and for the cases that looked clearly wrong changed them, mostly by removing
one of the duplicated words but in some cases with other changes.
2022-03-07 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree-ssa-propagate.cc: Fix up duplicated word issue in a comment.
* config/riscv/riscv.cc: Likewise.
* config/darwin.h: Likewise.
* config/i386/i386.cc: Likewise.
* config/aarch64/thunderx3t110.md: Likewise.
* config/aarch64/fractional-cost.h: Likewise.
* config/vax/vax.cc: Likewise.
* config/rs6000/pcrel-opt.md: Likewise.
* config/rs6000/predicates.md: Likewise.
* ctfc.h: Likewise.
* tree-ssa-uninit.cc: Likewise.
* value-relation.h: Likewise.
* gimple-range-gori.cc: Likewise.
* ipa-polymorphic-call.cc: Likewise.
* pointer-query.cc: Likewise.
* ipa-sra.cc: Likewise.
* internal-fn.cc: Likewise.
* varasm.cc: Likewise.
* gimple-ssa-warn-access.cc: Likewise.
gcc/analyzer/
* store.cc: Fix up duplicated word issue in a comment.
* analyzer.cc: Likewise.
* engine.cc: Likewise.
* sm-taint.cc: Likewise.
gcc/c-family/
* c-attribs.cc: Fix up duplicated word issue in a comment.
gcc/cp/
* cvt.cc: Fix up duplicated word issue in a comment.
* pt.cc: Likewise.
* module.cc: Likewise.
* coroutines.cc: Likewise.
gcc/fortran/
* trans-expr.cc: Fix up duplicated word issue in a comment.
* gfortran.h: Likewise.
* scanner.cc: Likewise.
gcc/jit/
* libgccjit.h: Fix up duplicated word issue in a comment.
|
|
PR target/104794
gcc/ChangeLog:
* config/arm/arm.cc (arm_option_override_internal): Add missing
space.
|
|
PR target/104797
gcc/ChangeLog:
* config/msp430/msp430.cc (msp430_expand_delay_cycles): Remove
parenthesis from built-in name.
|
|
PR target/104794
gcc/ChangeLog:
* config/arm/arm.cc (arm_option_override_internal): Fix quoting
of options in error messages.
(arm_option_reconfigure_globals): Likewise.
|
|
PR target/104794
gcc/ChangeLog:
* config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error
message. Fix ARM_BUILTIN_WRORHI and ARM_BUILTIN_WRORH that can
have only range [0,32].
|
|
The following testcase fails to assemble due to clgte %r6,0(%r1,%r10)
insn not being accepted by assembler.
My rough understanding is that in the RSY-b insn format the spot
in other formats used for index registers is used instead for M3 what
kind of comparison it is, so this patch follows what other similar
instructions use for constraint (i.e. one without index register).
2022-03-07 Jakub Jelinek <jakub@redhat.com>
PR target/104775
* config/s390/s390.md (*cmp_and_trap_unsigned_int<mode>): Use
S constraint instead of T in the last alternative.
* gcc.target/s390/pr104775.c: New test.
|
|
PR translation/90148
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_linux64_override_options): Put
quote to a proper place.
* plugin.cc (default_plugin_dir_name): Likewise.
gcc/fortran/ChangeLog:
* intrinsic.cc (gfc_is_intrinsic): Put
quote to a proper place.
|
|
PR target/99297
gcc/ChangeLog:
* config/rx/rx.cc (rx_expand_builtin_mvtc): Fix translation
string.
|
|
The following testcase ICEs, because the cond_andv* expander
has vector_operand predicates in both of the commutative inputs
and calls gen_andv*_mask which calls ix86_binary_operator_ok
in its condition, but nothing calls ix86_fixup_binary_operands_no_copy
during the expansion, which means cond_* accepts even operands
like 2 MEMs which then can't be matched.
The following patch handles it like most other insns that the other
cond_* patterns use - by having a separate define_expand that calls
ix86_fixup_binary_operands_no_copy and define_ins with
ix86_binary_operator_ok.
2022-03-07 Jakub Jelinek <jakub@redhat.com>
PR target/104779
* config/i386/sse.md (avx512dq_mul<mode>3<mask_name>): New
define_expand pattern. Rename define_insn to ...
(*avx512dq_mul<mode>3<mask_name>): ... this.
(<code><mode>3_mask): New any_logic define_expand pattern.
(<mask_codefor><code><mode>3<mask_name>): Rename to ...
(*<code><mode>3<mask_name>): ... this.
* gcc.target/i386/pr104779.c: New test.
|
|
This clean-up patch resolves PR testsuite/104732, the failure of the recent
test gcc.target/i386/pr100711-1.c on 32-bit Solaris/x86. Rather than just
tweak the testcase, the proposed approach is to fix the underlying problem
by removing the "TARGET_STV && TARGET_SSE2" conditionals from the DI mode
logical operation expanders and pre-reload splitters in i386.md, which as
I'll show generate inferior code (even a GCC 12 regression) on !TARGET_64BIT
whenever -mno-stv (such as Solaris) or -msse (but not -msse2).
First a little bit of history. In the beginning, DImode operations on
i386 weren't defined by the machine description, and lowered during RTL
expansion to SI mode operations. The with PR 65105 in 2015, -mstv was
added, together with a SWIM1248x mode iterator (later renamed to SWIM1248x)
together with several *<code>di3_doubleword post-reload splitters that
made use of register allocation to perform some double word operations
in 64-but XMM registers. A short while later in 2016, PR 70322 added
similar support for one_cmpldi2. All of this logic was dependent upon
"!TARGET_64BIT && TARGET_STV && TARGET_SSE2". With the passing of time,
these conditions became irrelevant when in 2019, it was decided to split
these double-word patterns before reload.
https://gcc.gnu.org/pipermail/gcc-patches/2019-June/523877.html
https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532236.html
Hence the current situation, where on most modern CPU architectures
(where "TARGET_STV && TARGET_SSE2" is true), RTL is expanded with DI
mode operations, that are then split into two SI mode instructions
before reload, except on Solaris and other odd cases, where the splitting
is to two SI mode instructions is done during RTL expansion. By the
time compilation reaches register allocation both paths in theory
produce identical or similar code, so the vestigial legacy/logic would
appear to be harmless.
Unfortunately, there is one place where this arbitrary choice of how
to lower DI mode doubleword operations is visible to the middle-end,
it controls whether the backend appears to have a suitable optab, and
the presence (or not) of DImode optabs can influence vectorization
cost models and veclower decisions.
The issue (and code quality regression) can be seen in this test case:
typedef long long v2di __attribute__((vector_size (16)));
v2di x;
void foo (long long a)
{
v2di t = {a, a};
x = ~t;
}
which when compiled with "-O2 -m32 -msse -march=pentiumpro" produces:
foo: subl $28, %esp
movl %ebx, 16(%esp)
movl 32(%esp), %eax
movl %esi, 20(%esp)
movl 36(%esp), %edx
movl %edi, 24(%esp)
movl %eax, %esi
movl %eax, %edi
movl %edx, %ebx
movl %edx, %ecx
notl %esi
notl %ebx
movl %esi, (%esp)
notl %edi
notl %ecx
movl %ebx, 4(%esp)
movl 20(%esp), %esi
movl %edi, 8(%esp)
movl 16(%esp), %ebx
movl %ecx, 12(%esp)
movl 24(%esp), %edi
movss 8(%esp), %xmm1
movss 12(%esp), %xmm2
movss (%esp), %xmm0
movss 4(%esp), %xmm3
unpcklps %xmm2, %xmm1
unpcklps %xmm3, %xmm0
movlhps %xmm1, %xmm0
movaps %xmm0, x
addl $28, %esp
ret
Importantly notice the four "notl" instructions. With this patch:
foo: subl $28, %esp
movl 32(%esp), %edx
movl 36(%esp), %eax
notl %edx
movl %edx, (%esp)
notl %eax
movl %eax, 4(%esp)
movl %edx, 8(%esp)
movl %eax, 12(%esp)
movaps (%esp), %xmm1
movaps %xmm1, x
addl $28, %esp
ret
Notice only two "notl" instructions. Checking with godbolt.org, GCC
generated 4 NOTs in GCC 4.x and 5.x, 2 NOTs between GCC 6.x and 9.x,
and regressed to 4 NOTs since GCC 10.x [which hopefully qualifies
this clean-up as suitable for stage 4].
Most significantly, this patch allows pr100711-1.c to pass with
-mno-stv, allowing pandn to be used with V2DImode on Solaris/x86.
Fingers-crossed this should reduce the number of discrepancies
encountered supporting Solaris/x86.
2022-03-05 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
PR testsuite/104732
* config/i386/i386.md (SWIM1248x): Renamed from SWIM1248s.
Include DI mode unconditionally.
(*anddi3_doubleword): Remove && TARGET_STV && TARGET_SSE2 condition,
i.e. always split on !TARGET_64BIT.
(*<any_or>di3_doubleword): Likewise.
(*one_cmpldi2_doubleword): Likewise.
(and<mode>3 expander): Update to use SWIM1248x from SWIM1248s.
(<any_or><mode>3 expander): Likewise.
(one_cmpl<mode>2 expander): Likewise.
gcc/testsuite/ChangeLog
PR testsuite/104732
* gcc.target/i386/pr104732.c: New test case.
|
|
On power10, GCC tries to optimize the signed conversion from DImode to
TImode by using the vextsd2q instruction. However to generate this
instruction, it would have to generate 3 direct moves (1 from the GPR
registers to the altivec registers, and 2 from the altivec registers to
the GPR register).
This patch generates the shift right immediate instruction to do the
conversion if the target/source registers ares GPR registers like it does
on earlier systems. If the target/source registers are Altivec registers,
it will generate the vextsd2q instruction.
2022-03-05 Michael Meissner <meissner@linux.ibm.com>
gcc/
PR target/104698
* config/rs6000/vsx.md (UNSPEC_MTVSRD_DITI_W1): Delete.
(mtvsrdd_diti_w1): Delete.
(extendditi2): Convert from define_expand to
define_insn_and_split. Replace with code to deal with both GPR
registers and with altivec registers.
gcc/testsuite/
PR target/104698
* gcc.target/powerpc/pr104698-1.c: New test.
* gcc.target/powerpc/pr104698-2.c: New test.
|
|
This adds more correct .machine for most older CPUs. It should be
conservative in the sense that everything we handled before we handle at
least as well now. This does not yet revamp the server CPU handling, it
is too risky at this point in time.
Tested on powerpc64-linux {-m32,-m64}. Also manually tested with all
-mcpu=, and the output of that passed through the GNU assembler.
2022-03-04 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/rs6000.cc (rs6000_machine_from_flags): Restructure a
bit. Handle most older CPUs.
|
|
DECL_MD_FUNCTION_CODE() returns an int, on one particular compiler the
code in darwin_fold_builtin() triggers a warning.
Fixed thus.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
gcc/ChangeLog:
* config/darwin.cc (darwin_fold_builtin): Make fcode an int to
avoid a mismatch with DECL_MD_FUNCTION_CODE().
|
|
Follow up discussion to the initial patch for this PR identified that it is
preferable to avoid the LRA change, and arrange for the target to reject the
hi and lo_sum selections when presented with an invalid address.
We split the Darwin high/low selectors into two:
1. One that handles non-PIC addresses (kernel mode, mdynamic-no-pic).
2. One that handles PIC addresses and rejects SYMBOL_REFs unless they are
suitably wrapped in the MACHOPIC_OFFSET unspec.
The second case is handled by providing a new predicate (macho_pic_address)
that checks the requirements.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
PR target/104117
gcc/ChangeLog:
* config/rs6000/darwin.md (@machopic_high_<mode>): New.
(@machopic_low_<mode>): New.
* config/rs6000/predicates.md (macho_pic_address): New.
* config/rs6000/rs6000.cc (rs6000_legitimize_address): Do not
apply the TLS processing to Darwin.
* lra-constraints.cc (process_address_1): Revert the changes
in r12-7209.
|
|
PR87496]
The glibc build is showing a build error due to extra "error" checking from my
PR87496 fix. That checking was overeager, disallowing setting the long double
size to 64-bits if the 128-bit long double ABI had already been specified.
Now we only emit an error if we specify a 128-bit long double ABI if our
long double size is not 128 bits. This also fixes an erroneous error when
-mabi=ieeelongdouble is used and ISA 2.06 is not enabled, but the long double
size has been changed to 64 bits.
2022-03-04 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/87496
PR target/104208
* config/rs6000/rs6000.cc (rs6000_option_override_internal): Make the
ISA 2.06 requirement for -mabi=ieeelongdouble conditional on
-mlong-double-128.
Move the -mabi=ieeelongdouble and -mabi=ibmlongdouble error checking
from here...
* common/config/rs6000/rs6000-common.cc (rs6000_handle_option):
... to here.
gcc/testsuite/
PR target/87496
PR target/104208
* gcc.target/powerpc/pr104208-1.c: New test.
* gcc.target/powerpc/pr104208-2.c: Likewise.
* gcc.target/powerpc/pr87496-2.c: Swap long double options to trigger
the expected error.
* gcc.target/powerpc/pr87496-3.c: Likewise.
|
|
ix86_gen_scratch_sse_rtx returns XMM7/XMM15/XMM31 as a scratch vector
register to prevent RTL optimizers from removing vector register. It
introduces a conflict with explicit XMM7/XMM15/XMM31 usage and when it
is called by RTL optimizers, it may introduce conflicting usages of
XMM7/XMM15/XMM31.
Change ix86_gen_scratch_sse_rtx to always return a pseudo register and
xfail x86 tests which are optimized with a hard scratch register.
gcc/
PR target/104704
* config/i386/i386.cc (ix86_gen_scratch_sse_rtx): Always return
a pseudo register.
gcc/testsuite/
PR target/104704
* gcc.target/i386/incoming-11.c: Xfail.
* gcc.target/i386/pieces-memset-3.c: Likewise.
* gcc.target/i386/pieces-memset-37.c: Likewise.
* gcc.target/i386/pieces-memset-39.c: Likewise.
* gcc.target/i386/pieces-memset-46.c: Likewise.
* gcc.target/i386/pieces-memset-47.c: Likewise.
* gcc.target/i386/pieces-memset-48.c: Likewise.
* gcc.target/i386/pr90773-5.c: Likewise.
* gcc.target/i386/pr90773-14.c: Likewise.
* gcc.target/i386/pr90773-17.c: Likewise.
* gcc.target/i386/pr100865-8a.c: Likewise.
* gcc.target/i386/pr100865-8c.c: Likewise.
* gcc.target/i386/pr100865-9c.c: Likewise.
* gcc.target/i386/pieces-memset-21.c: Always expect vzeroupper.
* gcc.target/i386/pr82941-1.c: Likewise.
* gcc.target/i386/pr82942-1.c: Likewise.
* gcc.target/i386/pr82990-1.c: Likewise.
* gcc.target/i386/pr82990-3.c: Likewise.
* gcc.target/i386/pr82990-5.c: Likewise.
* gcc.target/i386/pr100865-11b.c: Expect vmovdqa instead of
vmovdqa64.
* gcc.target/i386/pr100865-12b.c: Likewise.
* gcc.target/i386/pr100865-8b.c: Likewise.
* gcc.target/i386/pr100865-9b.c: Likewise.
* gcc.target/i386/pr104704-1.c: New test.
* gcc.target/i386/pr104704-2.c: Likewise.
* gcc.target/i386/pr104704-3.c: Likewise.
* gcc.target/i386/pr104704-4.c: Likewise.
* gcc.target/i386/pr104704-5.c: Likewise.
* gcc.target/i386/pr104704-6.c: Likewise.
|
|
In gcc-5 to gcc-11, the ptx isa version was 3.1.
On trunk, the default is now 6.0, which is also what will be the value in
the libraries.
Consequently, there may be setups with an older driver that worked with
gcc-11, but will become unsupported with gcc-12.
Fix this by building the libraries with mptx=3.1.
After this, setups with an older driver still won't work out of the box
with gcc-12, because the default ptx isa version has changed, but should work
after specifying mptx=3.1.
gcc/ChangeLog:
2022-03-03 Tom de Vries <tdevries@suse.de>
* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Add mptx=3.1.
|
|
In gcc-11, when specifying -misa=sm_30, an executable may still contain sm_35
code (due to libraries being built with the default -misa=sm_35), so it won't
run on an sm_30 board.
Fix this by building libraries with sm_30, as was the case in gcc-5 to gcc-10.
gcc/ChangeLog:
2022-03-03 Tom de Vries <tdevries@suse.de>
PR target/104758
* config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Add misa=sm_30.
|
|
In PR97348, we ran into the problem that recent CUDA dropped support for
sm_30, which inhibited the build when building with CUDA bin in the path,
because the nvptx-tools assembler uses CUDA's ptxas to do ptx verification.
To fix this, in gcc-11 the default sm_xx was moved from sm_30 to sm_35.
This however broke support for sm_30 boards: an executable build for sm_30
might contain sm_35 code from the libraries, which are build with the default
sm_xx (PR104758).
We want to fix this by going back to having the libraries build with sm_30, as
was the case for gcc-5 to gcc-10. That however reintroduces the problem from
PR97348.
Deal with PR97348 in the simplest way possible: when calling the assembler for
sm_30, specify --no-verify.
This has the unfortunate effect that after fixing PR104758 by building
libraries with sm_30, the libraries are no longer verified. This can be
improved upon by:
- adding a configure test in gcc that tests if CUDA supports sm_30, and
if so disabling this patch
- dealing with this in nvptx-tools somehow, either:
- detect at ptxas execution time that it doesn't support sm_30, or
- detect this at nvptx-tool configure time.
gcc/ChangeLog:
2022-03-03 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.h (ASM_SPEC): Add %{misa=sm_30:--no-verify}.
|
|
For a test-case doing an openmp target simd reduction on a complex double:
...
DOUBLE COMPLEX :: counter_N0
...
!$OMP TARGET SIMD reduction(+: counter_N0)
...
we run into:
...
during RTL pass: expand
b.f90: In function ‘MAIN__._omp_fn.0’:
b.f90:23:32: internal compiler error: in expand_insn, at optabs.cc:8029
23 | counter_N0 = counter_N0 + 1.
| ^
0x10f1cd3 expand_insn(insn_code, unsigned int, expand_operand*)
gcc/optabs.cc:8029
0xeac435 expand_GOMP_SIMT_XCHG_BFLY
gcc/internal-fn.cc:375
...
Fix this by handling DCmode and CDImode in define_expand
"omp_simt_xchg_{bfly,idx}".
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-02-28 Tom de Vries <tdevries@suse.de>
PR target/102429
* config/nvptx/nvptx.cc (nvptx_gen_shuffle): Handle DCmode and CDImode.
* config/nvptx/nvptx.md
(define_predicate "nvptx_register_or_complex_di_df_register_operand"):
New predicate.
(define_expand "omp_simt_xchg_bfly", define_expand "omp_simt_xchg_idx"):
Use nvptx_register_or_complex_di_df_register_operand.
|
|
Use nvptx-sm.def to generate new files nvptx-gen.h and nvptx-gen.opt, and:
- include nvptx-gen.h in nvptx.h, and
- add nvptx-gen.opt to extra_options (before nvptx.opt, in case that matters).
Tested on nvptx.
gcc/ChangeLog:
2022-02-25 Tom de Vries <tdevries@suse.de>
* config.gcc (nvptx*-*-*): Add nvptx/nvptx-gen.opt to extra_options.
* config/nvptx/gen-copyright.sh: New file.
* config/nvptx/gen-h.sh: New file.
* config/nvptx/gen-opt.sh: New file.
* config/nvptx/nvptx.h (TARGET_SM35, TARGET_SM53, TARGET_SM70)
(TARGET_SM75, TARGET_SM80): Move ...
* config/nvptx/nvptx-gen.h: ... here. New file, generate.
* config/nvptx/nvptx.opt (Enum ptx_isa): Move ...
* config/nvptx/nvptx-gen.opt: ... here. New file, generate.
* config/nvptx/t-nvptx ($(srcdir)/config/nvptx/nvptx-gen.h)
($(srcdir)/config/nvptx/nvptx-gen.opt): New make target.
|
|
Add a script gen-omp-device-properties.sh that uses nvptx-sm.def to generate
omp-device-properties-nvptx.
Tested on x86_64 with nvptx accelerator.
gcc/ChangeLog:
2022-02-25 Tom de Vries <tdevries@suse.de>
* config/nvptx/gen-omp-device-properties.sh: New file.
* config/nvptx/t-omp-device: Use gen-omp-device-properties.sh.
|
|
Add a file gcc/config/nvptx/nvptx-sm.def that lists all sm_xx versions used in
the port, like so:
...
NVPTX_SM(30, NVPTX_SM_SEP)
NVPTX_SM(35, NVPTX_SM_SEP)
NVPTX_SM(53, NVPTX_SM_SEP)
NVPTX_SM(70, NVPTX_SM_SEP)
NVPTX_SM(75, NVPTX_SM_SEP)
NVPTX_SM(80,)
...
and use it in various places using a pattern:
...
#define NVPTX_SM(XX, SEP) { ... }
#include "nvptx-sm.def"
#undef NVPTX_SM
...
Tested on nvptx.
gcc/ChangeLog:
2022-02-25 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx-sm.def: New file.
* config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Use nvptx-sm.def.
* config/nvptx/nvptx-opts.h (enum ptx_isa): Same.
* config/nvptx/nvptx.cc (sm_version_to_string)
(nvptx_omp_device_kind_arch_isa): Same.
|
|
ifcvt now passes a CC-mode "comparison" to backends. This patch
simply returns from gen_compare_reg () in that case since nothing
needs to be prepared anymore.
gcc/ChangeLog:
PR rtl-optimization/104154
* config/arc/arc.cc (gen_compare_reg): Return the CC-mode
comparison ifcvt passed us.
|
|
For V8HFmode vector init with HFmode, do not directly emits V8HF move
with subreg, which may cause reload to assign general register to move
src.
gcc/ChangeLog:
PR target/104664
* config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate):
Use vec_setv8hf_0 for HF to V8HFmode move instead of subreg.
gcc/testsuite/ChangeLog:
PR target/104664
* gcc.target/i386/pr104664.c: New test.
|
|
This patch is my proposed solution to PR tree-optimization/91384 which is
a missed-optimization/code quality regression on x86_64. The problematic
idiom is "if (r = -a)" which is equivalent to both "r = -a; if (r != 0)"
and alternatively "r = -a; if (a != 0)". In this particular case, on
x86_64, we prefer to use the condition codes from the negation, rather
than require an explicit testl instruction.
Unfortunately, combine can't help, as it doesn't attempt to merge pairs
of instructions that share the same operand(s), only pairs/triples of
instructions where the result of each instruction feeds the next. But
I doubt there's sufficient benefit to attempt this kind of "combination"
(that wouldn't already be caught by the tree-ssa passes).
Fortunately, it's relatively easy to fix this up (addressing the
regression) during peephole2 to eliminate the unnecessary testl in:
movl %edi, %ebx
negl %ebx
testl %edi, %edi
je .L2
2022-02-28 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR tree-optimization/91384
* config/i386/i386.md (peephole2): Eliminate final testl insn
from the sequence *movsi_internal, *negsi_1, *cmpsi_ccno_1 by
transforming using *negsi_2 for the negation.
gcc/testsuite/ChangeLog
PR tree-optimization/91384
* gcc.target/i386/pr91384.c: New test case.
|
|
Add an -mptx=_ value, that indicates the default ptx version.
It can be used to undo an explicit -mptx setting, so this:
...
$ gcc test.c -mptx=3.1 -mptx=_
...
has the same effect as:
...
$ gcc test.c
...
Tested on nvptx.
gcc/ChangeLog:
2022-02-28 Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx-opts.h (enum ptx_version): Add
PTX_VERSION_default.
* config/nvptx/nvptx.cc (handle_ptx_version_option): Handle
PTX_VERSION_default.
* config/nvptx/nvptx.opt: Add EnumValue "_" / PTX_VERSION_default.
|
|
Sync with llvm change in https://reviews.llvm.org/D120307 to
add enumeration and truncate imm to unsigned char, so users could
use ~ on immediates.
gcc/ChangeLog:
* config/i386/avx512fintrin.h (_MM_TERNLOG_ENUM): New enum.
(_mm512_ternarylogic_epi64): Truncate imm to unsigned
char to avoid error when using ~enum as parameter.
(_mm512_mask_ternarylogic_epi64): Likewise.
(_mm512_maskz_ternarylogic_epi64): Likewise.
(_mm512_ternarylogic_epi32): Likewise.
(_mm512_mask_ternarylogic_epi32): Likewise.
(_mm512_maskz_ternarylogic_epi32): Likewise.
* config/i386/avx512vlintrin.h (_mm256_ternarylogic_epi64):
Adjust imm param type to unsigned char.
(_mm256_mask_ternarylogic_epi64): Likewise.
(_mm256_maskz_ternarylogic_epi64): Likewise.
(_mm256_ternarylogic_epi32): Likewise.
(_mm256_mask_ternarylogic_epi32): Likewise.
(_mm256_maskz_ternarylogic_epi32): Likewise.
(_mm_ternarylogic_epi64): Likewise.
(_mm_mask_ternarylogic_epi64): Likewise.
(_mm_maskz_ternarylogic_epi64): Likewise.
(_mm_ternarylogic_epi32): Likewise.
(_mm_mask_ternarylogic_epi32): Likewise.
(_mm_maskz_ternarylogic_epi32): Likewise.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512f-vpternlogd-1.c: Use new enum.
* gcc.target/i386/avx512f-vpternlogq-1.c: Likewise.
* gcc.target/i386/avx512vl-vpternlogd-1.c: Likewise.
* gcc.target/i386/avx512vl-vpternlogq-1.c: Likewise.
* gcc.target/i386/testimm-10.c: Remove imm check for vpternlog
insns since the imm has been truncated in intrinsic.
|
|
The following testcase ICEs, because for some strange reason it decides to use
movmisaligntf during expansion where the destination is MEM and source is
CONST_DOUBLE. For normal mov<mode> expanders the rs6000 backend uses
rs6000_emit_move to ensure that if one operand is a MEM, the other is a REG
and a few other things, but for movmisalign<mode> nothing enforced this.
The middle-end documents that movmisalign<mode> shouldn't fail, so we can't
force that through predicates or condition on the expander.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
PR target/104681
* config/rs6000/vector.md (movmisalign<mode>): Use rs6000_emit_move.
* g++.dg/opt/pr104681.C: New test.
|
|
If the movcc comparison is not valid it triggers an assert in the
current implementation. This behavior is not needed as we can FAIL
the movcc expand pattern.
gcc/
* config/arc/arc.cc (gen_compare_reg): Return NULL_RTX if the
comparison is not valid.
* config/arc/arc.md (movsicc): Fail if comparison is not valid.
(movdicc): Likewise.
(movsfcc): Likewise.
(movdfcc): Likewise.
Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>
|
|
[PR104674]
As mentioned in the PR, the following testcase is miscompiled for similar
reasons as the already fixed PR78791 - we use SLOT_TEMP slots in various
places during expansion and during expansion we can guarantee that the
lifetime of those temporary slot doesn't overlap. But the following
splitter uses SLOT_TEMP too and in between expansion and split1 there is
a possibility that something extends the lifetime of SLOT_TEMP created
slots across an instruction that will be split by this splitter.
The following patch fixes it by using a new temp slot kind to make sure
it doesn't reuse a SLOT_TEMP that could be live across the instruction.
2022-02-25 Jakub Jelinek <jakub@redhat.com>
PR target/104674
* config/i386/i386.h (enum ix86_stack_slot): Add SLOT_FLOATxFDI_387.
* config/i386/i386.md (splitter to floatdi<mode>2_i387_with_xmm): Use
SLOT_FLOATxFDI_387 rather than SLOT_TEMP.
* gcc.target/i386/pr104674.c: New test.
|