aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2022-03-12rs6000: Do not use rs6000_cpu for .machine ppc and ppc64 (PR104829)Segher Boessenkool1-2/+10
Fixes: 77eccbf39ed5 rs6000.h has #define PROCESSOR_POWERPC PROCESSOR_PPC604 #define PROCESSOR_POWERPC64 PROCESSOR_RS64A which means that if you use things like -mcpu=powerpc -mvsx it will no longer work after my latest .machine patch. This causes GCC build errors in some cases, not a good idea (even if the errors are actually pre-existing: using -mvsx with a machine that does not have VSX cannot work properly). 2022-03-11 Segher Boessenkool <segher@kernel.crashing.org> PR target/104829 * config/rs6000/rs6000.cc (rs6000_machine_from_flags): Don't output "ppc" and "ppc64" based on rs6000_cpu.
2022-03-11Fix DImode to TImode sign extend issueMichael Meissner1-1/+1
PR target/104868 had had an issue where my code that updated the DImode to TImode sign extension for power10 failed. In looking at the failure message, the reason is when extendditi2 tries to split the insn, it generates an insn that does not satisfy its constraints: (set (reg:V2DI 65 1) (vec_duplicate:V2DI (reg:DI 0))) The reason is vsx_splat_v2di does not allow GPR register 0 when the will be generating a mtvsrdd instruction. In the definition of the mtvsrdd instruction, if the RA register is 0, it means clear the upper 64 bits of the vector instead of moving register GPR 0 to those bits. When I wrote the extendditi2 pattern, I forgot that mtvsrdd had that behavior so I used a 'r' constraint instead of 'b'. In the rare case where the value is in GPR register 0, this split will fail. This patch uses the right constraint for extendditi2. 2022-03-11 Michael Meissner <meissner@linux.ibm.com> gcc/ PR target/104868 * config/rs6000/vsx.md (extendditi2): Use a 'b' constraint when moving from a GPR register to an Altivec register.
2022-03-11PR tree-optimization/98335: New peephole2 xorl;movb -> movzblRoger Sayle1-0/+50
This patch is the backend piece of my proposed fix to PR tree-opt/98335, to allow C++ partial struct initialization to be as efficient/optimized as full struct initialization. With the middle-end patch just posted to gcc-patches, the test case in the PR compiles on x86_64-pc-linux-gnu with -O2 to: xorl %eax, %eax movb c(%rip), %al ret with this additional peephole2 (actually four peephole2s): movzbl c(%rip), %eax ret 2022-03-11 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR tree-optimization/98335 * config/i386/i386.md (peephole2): Eliminate redundant insv. Combine movl followed by movb. Transform xorl followed by a suitable movb or movw into the equivalent movz[bw]l. gcc/testsuite/ChangeLog PR tree-optimization/98335 * g++.target/i386/pr98335.C: New test case. * gcc.target/i386/pr98335.c: New test case.
2022-03-11target/104762 - vectorization costs of CONSTRUCTORsRichard Biener1-6/+11
After accounting for GPR -> XMM move cost for vec_construct the base cost needs adjustments to not double-cost those. This also lowers the cost when such move is not necessary. 2022-03-11 Richard Biener <rguenther@suse.de> PR target/104762 * config/i386/i386.cc (ix86_builtin_vectorization_cost): Do not cost the first lane of SSE pieces as inserts for vec_construct.
2022-03-10[nvptx] Use no,yes for attribute predicableTom de Vries1-20/+20
The documentation states about the predicable instruction attribute: ... This attribute must be a boolean (i.e. have exactly two elements in its list-of-values), with the possible values being no and yes. ... The nvptx port has instead: ... (define_attr "predicable" "false,true" (const_string "true")) ... Fix this by updating to: ... (define_attr "predicable" "no,yes" (const_string "yes")) ... Tested on nvptx. gcc/ChangeLog: 2022-03-08 Tom de Vries <tdevries@suse.de> PR target/104840 * config/nvptx/nvptx.md (define_attr "predicable"): Use no,yes instead of false,true.
2022-03-10[nvptx] Disable warp sync in simt regionTom de Vries3-17/+58
I ran into a hang for this code: ... #pragma omp target map(tofrom: counter_N0) #pragma omp simd for (int i = 0 ; i < 1 ; i++ ) { #pragma omp atomic update counter_N0 = counter_N0 + 1 ; } ... This has to do with the nature of -muniform-simt. It has two modes of operation: inside and outside an SIMT region. Outside an SIMT region, a warp pretends to execute a single thread, but actually executes in all threads, to keep the local registers in all threads consistent. This approach works unless the insn that is executed is a syscall or an atomic insn. In that case, the insn is predicated, such that it executes in only one thread. If the predicated insn writes a result to a register, then that register is propagated to the other threads, after which the local registers in all threads are consistent again. Inside an SIMT region, a warp executes in all threads. However, the predication and propagation for syscalls and atomic insns is also present here, because nvptx_reorg_uniform_simt works on all code. Care has been taken though to ensure that the predication and propagation is a nop. That is, inside an SIMT region: - the predicate evalutes to true for each thread, and - the propagation insn copies a register from each thread to the same thread. That works fine, until we use -mptx=6.0, and instead of using the deprecated warp propagation insn shfl, we start using shfl.sync: ... @%r33 atom.add.u32 _, [%r29], 1; shfl.sync.idx.b32 %r30, %r30, %r32, 31, 0xffffffff; ... The shfl.sync specifies a member mask indicating all threads, but given that the loop only has a single iteration, only thread 0 will execute the insn, where it will hang waiting for the other threads. Fix this by predicating the shfl.sync (and likewise, bar.warp.sync and the uniform warp check) such that it only executes outside the SIMT region. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-03-08 Tom de Vries <tdevries@suse.de> PR target/104783 * config/nvptx/nvptx.cc (nvptx_init_unisimt_predicate) (nvptx_output_unisimt_switch): Handle unisimt_outside_simt_predicate. (nvptx_get_unisimt_outside_simt_predicate): New function. (predicate_insn): New function, factored out of ... (nvptx_reorg_uniform_simt): ... here. Predicate all emitted insns. * config/nvptx/nvptx.h (struct machine_function): Add unisimt_outside_simt_predicate field. * config/nvptx/nvptx.md (define_insn "nvptx_warpsync") (define_insn "nvptx_uniform_warp_check"): Make predicable. libgomp/ChangeLog: 2022-03-10 Tom de Vries <tdevries@suse.de> * testsuite/libgomp.c/pr104783.c: New test.
2022-03-10[nvptx] Handle unused result in nvptx_unisimt_handle_setTom de Vries1-1/+3
For an example: ... #pragma omp target map(tofrom: counter_N0) #pragma omp simd for (int i = 0 ; i < 1 ; i++ ) { #pragma omp atomic update counter_N0 = counter_N0 + 1 ; } ... I noticed that the result of the atomic update (%r30) is propagated: ... @%r33 atom.add.u32 _, [%r29], 1; shfl.sync.idx.b32 %r30, %r30, %r32, 31, 0xffffffff; ... even though it is unused (which is why the bit bucket operand _ is used). Fix this by not emitting the shuffle in this case, such that we have instead: ... @%r33 atom.add.u32 _, [%r29], 1; bar.warp.sync 0xffffffff; ... Tested on nvptx. gcc/ChangeLog: 2022-03-07 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.cc (nvptx_unisimt_handle_set): Handle unused result. gcc/testsuite/ChangeLog: 2022-03-07 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/uniform-simt-4.c: New test.
2022-03-10[nvptx] Use bit-bucket operand for atom insnsTom de Vries2-6/+15
For an atomic fetch operation that doesn't use the result: ... __atomic_fetch_add (p64, v64, MEMMODEL_RELAXED); ... we currently emit: ... atom.add.u64 %r26, [%r25], %r27; ... Detect the REG_UNUSED reg-note for %r26, and emit instead: ... atom.add.u64 _, [%r25], %r27; ... Likewise for all atom insns. Tested on nvptx. gcc/ChangeLog: 2022-03-07 Tom de Vries <tdevries@suse.de> PR target/104815 * config/nvptx/nvptx.cc (nvptx_print_operand): Handle 'x' operand modifier. * config/nvptx/nvptx.md: Use %x0 destination operand in atom insns. gcc/testsuite/ChangeLog: 2022-03-07 Tom de Vries <tdevries@suse.de> PR target/104815 * gcc.target/nvptx/atomic-bit-bucket-dest.c: New test.
2022-03-10[nvptx] Use atom.and.b64 instead of atom.b64.andTom de Vries1-1/+1
The ptx manual prescribes the instruction format atom{.space}.op.type but the compiler currently emits: ... atom.b64.and %r31, [%r30], %r32; ... which uses the instruction format atom{.space}.type.op. Fix this by emitting instead: ... atom.and.b64 %r31, [%r30], %r32; ... Tested on nvptx. gcc/ChangeLog: 2022-03-07 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.md (define_insn "atomic_fetch_<logic><mode>"): Emit atom.and.b64 instead of atom.b64.and. gcc/testsuite/ChangeLog: 2022-03-07 Tom de Vries <tdevries@suse.de> * gcc.target/nvptx/atomic_fetch-1.c: Update. * gcc.target/nvptx/atomic_fetch-2.c: Update.
2022-03-10[nvptx] Add multilib mptx=3.1Tom de Vries1-3/+1
With commit 5b5e456f018 ("[nvptx] Build libraries with mptx=3.1") the intention was that the ptx isa version for all libraries was switched back to 3.1 using MULTILIB_EXTRA_OPTS, without changing the default 6.0. Further testing revealed that this is not the case, and some libs were still build with 6.0. Fix this by introducing an mptx=3.1 multilib. Adding a multilib should be avoided if possible, because it adds build time. But I think it's a reasonable trade-off. With --disable-multilib, the default lib with misa=sm_30 and mptx=6.0 should be usable in most scenarios. With --enable-multilib, we can enable older drivers, as well as generate code similar to how that was done in previous gcc releases, which is very useful. Tested on nvptx. gcc/ChangeLog: 2022-03-07 Tom de Vries <tdevries@suse.de> * config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Move mptx=3.1 ... (MULTILIB_OPTIONS): ... here.
2022-03-10[nvptx] Restore default to sm_30Tom de Vries2-2/+2
With commit 07667c911b1 ("[nvptx] Build libraries with misa=sm_30") the intention was that the sm_xx for all libraries was switched back to sm_30 using MULTILIB_EXTRA_OPTS, without changing the default sm_35. Testing on an sm_30 board revealed that still some libs were build with sm_35, so fix this by switching back to default sm_30. Tested on nvptx. gcc/ChangeLog: 2022-03-07 Tom de Vries <tdevries@suse.de> PR target/104758 * config/nvptx/nvptx.opt (misa): Set default to sm_30. * config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Remove misa=sm_30.
2022-03-10rs6000: Fix up __SIZEOF_{FLOAT,IBM}128__ defines [PR99708]Jakub Jelinek5-40/+65
As mentioned in the PR, right now on powerpc* __SIZEOF_{FLOAT,IBM}128__ macros are predefined unconditionally, because {ieee,ibm}128_float_type_node is always non-NULL, doesn't reflect whether __ieee128 or __ibm128 are actually supported or not. Based on patch review discussions, the following patch: 1) allows __ibm128 to be used in the sources even when !TARGET_FLOAT128_TYPE, as long as long double is double double 2) ensures ibm128_float_type_node is non-NULL only if __ibm128 is supported 3) ensures ieee128_float_type_node is non-NULL only if __ieee128 is supported (aka when TARGET_FLOAT128_TYPE) 4) predefines __SIZEOF_IBM128__ only when ibm128_float_type_node != NULL 5) newly predefines __SIZEOF_IEEE128__ if ieee128_float_type_node != NULL 6) predefines __SIZEOF_FLOAT128__ whenever ieee128_float_type_node != NULL and __float128 macro is predefined to __ieee128 7) removes ptr_*128_float_type_node which nothing uses 8) in order not to ICE during builtin initialization when ibm128_float_type_node == NULL, uses long_double_type_node as fallback for the __builtin_{,un}pack_ibm128 builtins 9) errors when those builtins are called used when ibm128_float_type_node == NULL (during their expansion) 10) moves the {,un}packif -> {,un}packtf remapping for these builtins in expansion earlier, so that we don't ICE on them if not -mabi=ieeelongdouble 2022-03-10 Jakub Jelinek <jakub@redhat.com> PR target/99708 * config/rs6000/rs6000.h (enum rs6000_builtin_type_index): Remove RS6000_BTI_ptr_ieee128_float and RS6000_BTI_ptr_ibm128_float. (ptr_ieee128_float_type_node, ptr_ibm128_float_type_node): Remove. * config/rs6000/rs6000-builtin.cc (rs6000_type_string): Return "**NULL**" if type_node is NULL first. Handle ieee128_float_type_node. (rs6000_init_builtins): Don't initialize ptr_ieee128_float_type_node and ptr_ibm128_float_type_node. Set ibm128_float_type_node and ieee128_float_type_node to NULL rather than long_double_type_node if they aren't supported. Do support __ibm128 even if !TARGET_FLOAT128_TYPE when long double is double double. (rs6000_expand_builtin): Error if bif_is_ibm128 and !ibm128_float_type_node. Remap RS6000_BIF_{,UN}PACK_IF to RS6000_BIF_{,UN}PACK_TF much earlier and only use bif_is_ibm128 check for it. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define __SIZEOF_FLOAT128__ here and only iff __float128 macro is defined. (rs6000_cpu_cpp_builtins): Don't define __SIZEOF_FLOAT128__ here. Define __SIZEOF_IBM128__=16 if ieee128_float_type_node is non-NULL. Formatting fix. * config/rs6000/rs6000-gen-builtins.cc: Document ibm128 attribute. (struct attrinfo): Add isibm128 member. (TYPE_MAP_SIZE): Remove. (type_map): Use [] instead of [TYPE_MAP_SIZE]. For "if" use ibm128_float_type_node only if it is non-NULL, otherwise fall back to long_double_type_node. Remove "pif" entry. (parse_bif_attrs): Handle ibm128 attribute and print it for debugging. (write_decls): Output bif_ibm128_bit and bif_is_ibm128. (write_type_node): Use sizeof type_map / sizeof type_map[0] instead of TYPE_MAP_SIZE. (write_bif_static_init): Handle isibm128. * config/rs6000/rs6000-builtins.def: Document ibm128 attribute. (__builtin_pack_ibm128, __builtin_unpack_ibm128): Add ibm128 attribute. * gcc.dg/pr99708.c: New test. * gcc.target/powerpc/pr99708-2.c: New test. * gcc.target/powerpc/convert-fp-128.c (mode_kf): Define only if __FLOAT128_TYPE__ is defined.
2022-03-09x86: Define LIBGCC2_UNWIND_ATTRIBUTE on ia32 [PR104781]Jakub Jelinek1-0/+6
On Mon, Mar 07, 2022 at 07:06:28AM -0800, H.J. Lu wrote: > Since eh_return doesn't work with stack realignment, disable SSE on > unwind-c.c and unwind-dw2.c to avoid stack realignment with the 4-byte > incoming stack to avoid SSE usage which is caused by The following change does that using LIBGCC2_UNWIND_ATTRIBUTE macro instead, for ia32 only by forcing -mgeneral-regs-only on routines that call __builtin_eh_return in libgcc. 2022-03-09 Jakub Jelinek <jakub@redhat.com> PR target/104781 * config/i386/i386.h (LIBGCC2_UNWIND_ATTRIBUTE): Define for ia32.
2022-03-09mips: avoid signed overflow in LUI_OPERAND [PR104842]Xi Ruoyao1-1/+1
gcc/ PR target/104842 * config/mips/mips.h (LUI_OPERAND): Cast the input to an unsigned value before adding an offset.
2022-03-08arm: Remove unused variable arm_binop_none_none_unone_qualifiersChristophe Lyon1-6/+0
Commits r12-7342 and r12-7344 made some cleanup, leaving arm_binop_none_none_unone_qualifiers unused. This is causing build failures with -Werror (eg bootstrap). This patch fixes the problem by removing the definition of arm_binop_none_none_unone_qualifiers and BINOP_NONE_NONE_UNONE_QUALIFIERS which are now unused. Tested by bootstraping on arm-linux-gnueaibhf. 2022-03-04 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm-builtins.cc (arm_binop_none_none_unone_qualifiers): Delete. (BINOP_NONE_NONE_UNONE_QUALIFIERS): Delete.
2022-03-08Darwin: Address a translation comment [PR104552].Iain Sandoe1-1/+1
This amends an error message to correct punctuation and a little better wording. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> PR translation/104552 gcc/ChangeLog: * config/host-darwin.cc (darwin_gt_pch_get_address): Amend the PCH out of memory error message punctuation and wording.
2022-03-08x86: Disallow unsupported EH returnH.J. Lu1-4/+7
Disallow stack realignment and regparm nested function with EH return since they don't work together. gcc/ PR target/104781 * config/i386/i386.cc (ix86_expand_epilogue): Sorry if there is stack realignment or regparm nested function with EH return. gcc/testsuite/ PR target/104781 * gcc.target/i386/eh_return-1.c: Add -mincoming-stack-boundary=4. * gcc.target/i386/eh_return-2.c: Likewise.
2022-03-08arm: MVE: Relax addressing modes for full loads and storesAndre Vieira2-12/+17
This patch relaxes the addressing modes for the mve full load and stores (by full loads and stores I mean non-widening or narrowing loads and stores resp). The code before was requiring a LO_REGNUM for these, where this is only a requirement if the load is widening or the store narrowing. gcc/ChangeLog: PR target/104790 * config/arm/arm.h (MVE_STN_LDW_MODE): New MACRO. * config/arm/arm.cc (mve_vector_mem_operand): Relax constraint on base register for non widening loads or narrowing stores.
2022-03-08Optimize v4si broadcast for noavx512vl.liuhongt1-1/+6
This will enable below - vbroadcastss .LC1(%rip), %xmm0 + movl $-45, %edx + vmovd %edx, %xmm0 + vpshufd $0, %xmm0, %xmm0 According to microbenchmark, it's faster than broadcast from memory for TARGET_INTER_UNIT_MOVES_TO_VEC. gcc/ChangeLog: * config/i386/sse.md (*vec_dupv4si): Disable memory operand for !TARGET_INTER_UNIT_MOVES_TO_VEC when prefer_for_speed. gcc/testsuite/ChangeLog: * gcc.target/i386/pr100865-8a.c: Adjust testcase. * gcc.target/i386/pr100865-8c.c: Ditto. * gcc.target/i386/pr100865-9c.c: Ditto.
2022-03-07Fix up duplicated duplicated words in commentsJakub Jelinek8-9/+9
Like in r10-7215-g700d4cb08c88aec37c13e21e63dd61fd698baabc 2 years ago, I've run grep -v 'long long\|optab optab\|template template\|double double' *.{[chS],cc} */*.{[chS],cc} *.def config/*/* 2>/dev/null | grep ' \([a-zA-Z]\+\) \1 ' and for the cases that looked clearly wrong changed them, mostly by removing one of the duplicated words but in some cases with other changes. 2022-03-07 Jakub Jelinek <jakub@redhat.com> gcc/ * tree-ssa-propagate.cc: Fix up duplicated word issue in a comment. * config/riscv/riscv.cc: Likewise. * config/darwin.h: Likewise. * config/i386/i386.cc: Likewise. * config/aarch64/thunderx3t110.md: Likewise. * config/aarch64/fractional-cost.h: Likewise. * config/vax/vax.cc: Likewise. * config/rs6000/pcrel-opt.md: Likewise. * config/rs6000/predicates.md: Likewise. * ctfc.h: Likewise. * tree-ssa-uninit.cc: Likewise. * value-relation.h: Likewise. * gimple-range-gori.cc: Likewise. * ipa-polymorphic-call.cc: Likewise. * pointer-query.cc: Likewise. * ipa-sra.cc: Likewise. * internal-fn.cc: Likewise. * varasm.cc: Likewise. * gimple-ssa-warn-access.cc: Likewise. gcc/analyzer/ * store.cc: Fix up duplicated word issue in a comment. * analyzer.cc: Likewise. * engine.cc: Likewise. * sm-taint.cc: Likewise. gcc/c-family/ * c-attribs.cc: Fix up duplicated word issue in a comment. gcc/cp/ * cvt.cc: Fix up duplicated word issue in a comment. * pt.cc: Likewise. * module.cc: Likewise. * coroutines.cc: Likewise. gcc/fortran/ * trans-expr.cc: Fix up duplicated word issue in a comment. * gfortran.h: Likewise. * scanner.cc: Likewise. gcc/jit/ * libgccjit.h: Fix up duplicated word issue in a comment.
2022-03-07arm: add missing space to error.Martin Liska1-1/+1
PR target/104794 gcc/ChangeLog: * config/arm/arm.cc (arm_option_override_internal): Add missing space.
2022-03-07MSP430: fix error message.Martin Liska1-1/+1
PR target/104797 gcc/ChangeLog: * config/msp430/msp430.cc (msp430_expand_delay_cycles): Remove parenthesis from built-in name.
2022-03-07arm: fix option quoting in error messages.Martin Liska1-3/+3
PR target/104794 gcc/ChangeLog: * config/arm/arm.cc (arm_option_override_internal): Fix quoting of options in error messages. (arm_option_reconfigure_globals): Likewise.
2022-03-07translation: reuse string and use switch for codesMartin Liska1-50/+77
PR target/104794 gcc/ChangeLog: * config/arm/arm-builtins.cc (arm_expand_builtin): Reuse error message. Fix ARM_BUILTIN_WRORHI and ARM_BUILTIN_WRORH that can have only range [0,32].
2022-03-07s390: Fix up *cmp_and_trap_unsigned_int<mode> constraints [PR104775]Jakub Jelinek1-1/+1
The following testcase fails to assemble due to clgte %r6,0(%r1,%r10) insn not being accepted by assembler. My rough understanding is that in the RSY-b insn format the spot in other formats used for index registers is used instead for M3 what kind of comparison it is, so this patch follows what other similar instructions use for constraint (i.e. one without index register). 2022-03-07 Jakub Jelinek <jakub@redhat.com> PR target/104775 * config/s390/s390.md (*cmp_and_trap_unsigned_int<mode>): Use S constraint instead of T in the last alternative. * gcc.target/s390/pr104775.c: New test.
2022-03-07Fix translation strings.Martin Liska1-1/+1
PR translation/90148 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_linux64_override_options): Put quote to a proper place. * plugin.cc (default_plugin_dir_name): Likewise. gcc/fortran/ChangeLog: * intrinsic.cc (gfc_is_intrinsic): Put quote to a proper place.
2022-03-07rx: Fix translation string.Martin Liska1-1/+1
PR target/99297 gcc/ChangeLog: * config/rx/rx.cc (rx_expand_builtin_mvtc): Fix translation string.
2022-03-07i386: Fix up cond_{and,ior,xor,mul}* [PR104779]Jakub Jelinek1-2/+21
The following testcase ICEs, because the cond_andv* expander has vector_operand predicates in both of the commutative inputs and calls gen_andv*_mask which calls ix86_binary_operator_ok in its condition, but nothing calls ix86_fixup_binary_operands_no_copy during the expansion, which means cond_* accepts even operands like 2 MEMs which then can't be matched. The following patch handles it like most other insns that the other cond_* patterns use - by having a separate define_expand that calls ix86_fixup_binary_operands_no_copy and define_ins with ix86_binary_operator_ok. 2022-03-07 Jakub Jelinek <jakub@redhat.com> PR target/104779 * config/i386/sse.md (avx512dq_mul<mode>3<mask_name>): New define_expand pattern. Rename define_insn to ... (*avx512dq_mul<mode>3<mask_name>): ... this. (<code><mode>3_mask): New any_logic define_expand pattern. (<mask_codefor><code><mode>3<mask_name>): Rename to ... (*<code><mode>3<mask_name>): ... this. * gcc.target/i386/pr104779.c: New test.
2022-03-05PR 104732: Simplify/fix DI mode logic expansion/splitting on -m32.Roger Sayle1-14/+14
This clean-up patch resolves PR testsuite/104732, the failure of the recent test gcc.target/i386/pr100711-1.c on 32-bit Solaris/x86. Rather than just tweak the testcase, the proposed approach is to fix the underlying problem by removing the "TARGET_STV && TARGET_SSE2" conditionals from the DI mode logical operation expanders and pre-reload splitters in i386.md, which as I'll show generate inferior code (even a GCC 12 regression) on !TARGET_64BIT whenever -mno-stv (such as Solaris) or -msse (but not -msse2). First a little bit of history. In the beginning, DImode operations on i386 weren't defined by the machine description, and lowered during RTL expansion to SI mode operations. The with PR 65105 in 2015, -mstv was added, together with a SWIM1248x mode iterator (later renamed to SWIM1248x) together with several *<code>di3_doubleword post-reload splitters that made use of register allocation to perform some double word operations in 64-but XMM registers. A short while later in 2016, PR 70322 added similar support for one_cmpldi2. All of this logic was dependent upon "!TARGET_64BIT && TARGET_STV && TARGET_SSE2". With the passing of time, these conditions became irrelevant when in 2019, it was decided to split these double-word patterns before reload. https://gcc.gnu.org/pipermail/gcc-patches/2019-June/523877.html https://gcc.gnu.org/pipermail/gcc-patches/2019-October/532236.html Hence the current situation, where on most modern CPU architectures (where "TARGET_STV && TARGET_SSE2" is true), RTL is expanded with DI mode operations, that are then split into two SI mode instructions before reload, except on Solaris and other odd cases, where the splitting is to two SI mode instructions is done during RTL expansion. By the time compilation reaches register allocation both paths in theory produce identical or similar code, so the vestigial legacy/logic would appear to be harmless. Unfortunately, there is one place where this arbitrary choice of how to lower DI mode doubleword operations is visible to the middle-end, it controls whether the backend appears to have a suitable optab, and the presence (or not) of DImode optabs can influence vectorization cost models and veclower decisions. The issue (and code quality regression) can be seen in this test case: typedef long long v2di __attribute__((vector_size (16))); v2di x; void foo (long long a) { v2di t = {a, a}; x = ~t; } which when compiled with "-O2 -m32 -msse -march=pentiumpro" produces: foo: subl $28, %esp movl %ebx, 16(%esp) movl 32(%esp), %eax movl %esi, 20(%esp) movl 36(%esp), %edx movl %edi, 24(%esp) movl %eax, %esi movl %eax, %edi movl %edx, %ebx movl %edx, %ecx notl %esi notl %ebx movl %esi, (%esp) notl %edi notl %ecx movl %ebx, 4(%esp) movl 20(%esp), %esi movl %edi, 8(%esp) movl 16(%esp), %ebx movl %ecx, 12(%esp) movl 24(%esp), %edi movss 8(%esp), %xmm1 movss 12(%esp), %xmm2 movss (%esp), %xmm0 movss 4(%esp), %xmm3 unpcklps %xmm2, %xmm1 unpcklps %xmm3, %xmm0 movlhps %xmm1, %xmm0 movaps %xmm0, x addl $28, %esp ret Importantly notice the four "notl" instructions. With this patch: foo: subl $28, %esp movl 32(%esp), %edx movl 36(%esp), %eax notl %edx movl %edx, (%esp) notl %eax movl %eax, 4(%esp) movl %edx, 8(%esp) movl %eax, 12(%esp) movaps (%esp), %xmm1 movaps %xmm1, x addl $28, %esp ret Notice only two "notl" instructions. Checking with godbolt.org, GCC generated 4 NOTs in GCC 4.x and 5.x, 2 NOTs between GCC 6.x and 9.x, and regressed to 4 NOTs since GCC 10.x [which hopefully qualifies this clean-up as suitable for stage 4]. Most significantly, this patch allows pr100711-1.c to pass with -mno-stv, allowing pandn to be used with V2DImode on Solaris/x86. Fingers-crossed this should reduce the number of discrepancies encountered supporting Solaris/x86. 2022-03-05 Roger Sayle <roger@nextmovesoftware.com> Uroš Bizjak <ubizjak@gmail.com> gcc/ChangeLog PR testsuite/104732 * config/i386/i386.md (SWIM1248x): Renamed from SWIM1248s. Include DI mode unconditionally. (*anddi3_doubleword): Remove && TARGET_STV && TARGET_SSE2 condition, i.e. always split on !TARGET_64BIT. (*<any_or>di3_doubleword): Likewise. (*one_cmpldi2_doubleword): Likewise. (and<mode>3 expander): Update to use SWIM1248x from SWIM1248s. (<any_or><mode>3 expander): Likewise. (one_cmpl<mode>2 expander): Likewise. gcc/testsuite/ChangeLog PR testsuite/104732 * gcc.target/i386/pr104732.c: New test case.
2022-03-05Optimize signed DImode -> TImode on power10.Michael Meissner1-22/+61
On power10, GCC tries to optimize the signed conversion from DImode to TImode by using the vextsd2q instruction. However to generate this instruction, it would have to generate 3 direct moves (1 from the GPR registers to the altivec registers, and 2 from the altivec registers to the GPR register). This patch generates the shift right immediate instruction to do the conversion if the target/source registers ares GPR registers like it does on earlier systems. If the target/source registers are Altivec registers, it will generate the vextsd2q instruction. 2022-03-05 Michael Meissner <meissner@linux.ibm.com> gcc/ PR target/104698 * config/rs6000/vsx.md (UNSPEC_MTVSRD_DITI_W1): Delete. (mtvsrdd_diti_w1): Delete. (extendditi2): Convert from define_expand to define_insn_and_split. Replace with code to deal with both GPR registers and with altivec registers. gcc/testsuite/ PR target/104698 * gcc.target/powerpc/pr104698-1.c: New test. * gcc.target/powerpc/pr104698-2.c: New test.
2022-03-04rs6000: Improve .machineSegher Boessenkool1-27/+54
This adds more correct .machine for most older CPUs. It should be conservative in the sense that everything we handled before we handle at least as well now. This does not yet revamp the server CPU handling, it is too risky at this point in time. Tested on powerpc64-linux {-m32,-m64}. Also manually tested with all -mcpu=, and the output of that passed through the GNU assembler. 2022-03-04 Segher Boessenkool <segher@kernel.crashing.org> * config/rs6000/rs6000.cc (rs6000_machine_from_flags): Restructure a bit. Handle most older CPUs.
2022-03-04Darwin: Fix a type mismatch warning for a non-GCC bootstrap compiler.Iain Sandoe1-1/+1
DECL_MD_FUNCTION_CODE() returns an int, on one particular compiler the code in darwin_fold_builtin() triggers a warning. Fixed thus. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> gcc/ChangeLog: * config/darwin.cc (darwin_fold_builtin): Make fcode an int to avoid a mismatch with DECL_MD_FUNCTION_CODE().
2022-03-04LRA, rs6000, Darwin: Revise lo_sum use for forced constants [PR104117].Iain Sandoe3-5/+30
Follow up discussion to the initial patch for this PR identified that it is preferable to avoid the LRA change, and arrange for the target to reject the hi and lo_sum selections when presented with an invalid address. We split the Darwin high/low selectors into two: 1. One that handles non-PIC addresses (kernel mode, mdynamic-no-pic). 2. One that handles PIC addresses and rejects SYMBOL_REFs unless they are suitably wrapped in the MACHOPIC_OFFSET unspec. The second case is handled by providing a new predicate (macho_pic_address) that checks the requirements. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk> PR target/104117 gcc/ChangeLog: * config/rs6000/darwin.md (@machopic_high_<mode>): New. (@machopic_low_<mode>): New. * config/rs6000/predicates.md (macho_pic_address): New. * config/rs6000/rs6000.cc (rs6000_legitimize_address): Do not apply the TLS processing to Darwin. * lra-constraints.cc (process_address_1): Revert the changes in r12-7209.
2022-03-04rs6000: Allow -mlong-double-64 after -mabi={ibm,ieee}longdouble [PR104208, ↵Peter Bergner1-9/+2
PR87496] The glibc build is showing a build error due to extra "error" checking from my PR87496 fix. That checking was overeager, disallowing setting the long double size to 64-bits if the 128-bit long double ABI had already been specified. Now we only emit an error if we specify a 128-bit long double ABI if our long double size is not 128 bits. This also fixes an erroneous error when -mabi=ieeelongdouble is used and ISA 2.06 is not enabled, but the long double size has been changed to 64 bits. 2022-03-04 Peter Bergner <bergner@linux.ibm.com> gcc/ PR target/87496 PR target/104208 * config/rs6000/rs6000.cc (rs6000_option_override_internal): Make the ISA 2.06 requirement for -mabi=ieeelongdouble conditional on -mlong-double-128. Move the -mabi=ieeelongdouble and -mabi=ibmlongdouble error checking from here... * common/config/rs6000/rs6000-common.cc (rs6000_handle_option): ... to here. gcc/testsuite/ PR target/87496 PR target/104208 * gcc.target/powerpc/pr104208-1.c: New test. * gcc.target/powerpc/pr104208-2.c: Likewise. * gcc.target/powerpc/pr87496-2.c: Swap long double options to trigger the expected error. * gcc.target/powerpc/pr87496-3.c: Likewise.
2022-03-03x86: Always return pseudo register in ix86_gen_scratch_sse_rtxH.J. Lu1-18/+1
ix86_gen_scratch_sse_rtx returns XMM7/XMM15/XMM31 as a scratch vector register to prevent RTL optimizers from removing vector register. It introduces a conflict with explicit XMM7/XMM15/XMM31 usage and when it is called by RTL optimizers, it may introduce conflicting usages of XMM7/XMM15/XMM31. Change ix86_gen_scratch_sse_rtx to always return a pseudo register and xfail x86 tests which are optimized with a hard scratch register. gcc/ PR target/104704 * config/i386/i386.cc (ix86_gen_scratch_sse_rtx): Always return a pseudo register. gcc/testsuite/ PR target/104704 * gcc.target/i386/incoming-11.c: Xfail. * gcc.target/i386/pieces-memset-3.c: Likewise. * gcc.target/i386/pieces-memset-37.c: Likewise. * gcc.target/i386/pieces-memset-39.c: Likewise. * gcc.target/i386/pieces-memset-46.c: Likewise. * gcc.target/i386/pieces-memset-47.c: Likewise. * gcc.target/i386/pieces-memset-48.c: Likewise. * gcc.target/i386/pr90773-5.c: Likewise. * gcc.target/i386/pr90773-14.c: Likewise. * gcc.target/i386/pr90773-17.c: Likewise. * gcc.target/i386/pr100865-8a.c: Likewise. * gcc.target/i386/pr100865-8c.c: Likewise. * gcc.target/i386/pr100865-9c.c: Likewise. * gcc.target/i386/pieces-memset-21.c: Always expect vzeroupper. * gcc.target/i386/pr82941-1.c: Likewise. * gcc.target/i386/pr82942-1.c: Likewise. * gcc.target/i386/pr82990-1.c: Likewise. * gcc.target/i386/pr82990-3.c: Likewise. * gcc.target/i386/pr82990-5.c: Likewise. * gcc.target/i386/pr100865-11b.c: Expect vmovdqa instead of vmovdqa64. * gcc.target/i386/pr100865-12b.c: Likewise. * gcc.target/i386/pr100865-8b.c: Likewise. * gcc.target/i386/pr100865-9b.c: Likewise. * gcc.target/i386/pr104704-1.c: New test. * gcc.target/i386/pr104704-2.c: Likewise. * gcc.target/i386/pr104704-3.c: Likewise. * gcc.target/i386/pr104704-4.c: Likewise. * gcc.target/i386/pr104704-5.c: Likewise. * gcc.target/i386/pr104704-6.c: Likewise.
2022-03-03[nvptx] Build libraries with mptx=3.1Tom de Vries1-1/+1
In gcc-5 to gcc-11, the ptx isa version was 3.1. On trunk, the default is now 6.0, which is also what will be the value in the libraries. Consequently, there may be setups with an older driver that worked with gcc-11, but will become unsupported with gcc-12. Fix this by building the libraries with mptx=3.1. After this, setups with an older driver still won't work out of the box with gcc-12, because the default ptx isa version has changed, but should work after specifying mptx=3.1. gcc/ChangeLog: 2022-03-03 Tom de Vries <tdevries@suse.de> * config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Add mptx=3.1.
2022-03-03[nvptx] Build libraries with misa=sm_30Tom de Vries1-0/+2
In gcc-11, when specifying -misa=sm_30, an executable may still contain sm_35 code (due to libraries being built with the default -misa=sm_35), so it won't run on an sm_30 board. Fix this by building libraries with sm_30, as was the case in gcc-5 to gcc-10. gcc/ChangeLog: 2022-03-03 Tom de Vries <tdevries@suse.de> PR target/104758 * config/nvptx/t-nvptx (MULTILIB_EXTRA_OPTS): Add misa=sm_30.
2022-03-03[nvptx] Use --no-verify for sm_30Tom de Vries1-1/+1
In PR97348, we ran into the problem that recent CUDA dropped support for sm_30, which inhibited the build when building with CUDA bin in the path, because the nvptx-tools assembler uses CUDA's ptxas to do ptx verification. To fix this, in gcc-11 the default sm_xx was moved from sm_30 to sm_35. This however broke support for sm_30 boards: an executable build for sm_30 might contain sm_35 code from the libraries, which are build with the default sm_xx (PR104758). We want to fix this by going back to having the libraries build with sm_30, as was the case for gcc-5 to gcc-10. That however reintroduces the problem from PR97348. Deal with PR97348 in the simplest way possible: when calling the assembler for sm_30, specify --no-verify. This has the unfortunate effect that after fixing PR104758 by building libraries with sm_30, the libraries are no longer verified. This can be improved upon by: - adding a configure test in gcc that tests if CUDA supports sm_30, and if so disabling this patch - dealing with this in nvptx-tools somehow, either: - detect at ptxas execution time that it doesn't support sm_30, or - detect this at nvptx-tool configure time. gcc/ChangeLog: 2022-03-03 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx.h (ASM_SPEC): Add %{misa=sm_30:--no-verify}.
2022-03-01[nvptx] Handle DCmode in define_expand "omp_simt_xchg_{bfly,idx}"Tom de Vries2-4/+33
For a test-case doing an openmp target simd reduction on a complex double: ... DOUBLE COMPLEX :: counter_N0 ... !$OMP TARGET SIMD reduction(+: counter_N0) ... we run into: ... during RTL pass: expand b.f90: In function ‘MAIN__._omp_fn.0’: b.f90:23:32: internal compiler error: in expand_insn, at optabs.cc:8029 23 | counter_N0 = counter_N0 + 1. | ^ 0x10f1cd3 expand_insn(insn_code, unsigned int, expand_operand*) gcc/optabs.cc:8029 0xeac435 expand_GOMP_SIMT_XCHG_BFLY gcc/internal-fn.cc:375 ... Fix this by handling DCmode and CDImode in define_expand "omp_simt_xchg_{bfly,idx}". Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-02-28 Tom de Vries <tdevries@suse.de> PR target/102429 * config/nvptx/nvptx.cc (nvptx_gen_shuffle): Handle DCmode and CDImode. * config/nvptx/nvptx.md (define_predicate "nvptx_register_or_complex_di_df_register_operand"): New predicate. (define_expand "omp_simt_xchg_bfly", define_expand "omp_simt_xchg_idx"): Use nvptx_register_or_complex_di_df_register_operand.
2022-03-01[nvptx] Add nvptx-gen.h and nvptx-gen.optTom de Vries8-27/+281
Use nvptx-sm.def to generate new files nvptx-gen.h and nvptx-gen.opt, and: - include nvptx-gen.h in nvptx.h, and - add nvptx-gen.opt to extra_options (before nvptx.opt, in case that matters). Tested on nvptx. gcc/ChangeLog: 2022-02-25 Tom de Vries <tdevries@suse.de> * config.gcc (nvptx*-*-*): Add nvptx/nvptx-gen.opt to extra_options. * config/nvptx/gen-copyright.sh: New file. * config/nvptx/gen-h.sh: New file. * config/nvptx/gen-opt.sh: New file. * config/nvptx/nvptx.h (TARGET_SM35, TARGET_SM53, TARGET_SM70) (TARGET_SM75, TARGET_SM80): Move ... * config/nvptx/nvptx-gen.h: ... here. New file, generate. * config/nvptx/nvptx.opt (Enum ptx_isa): Move ... * config/nvptx/nvptx-gen.opt: ... here. New file, generate. * config/nvptx/t-nvptx ($(srcdir)/config/nvptx/nvptx-gen.h) ($(srcdir)/config/nvptx/nvptx-gen.opt): New make target.
2022-03-01[nvptx] Use nvptx-sm.def for t-omp-deviceTom de Vries2-4/+36
Add a script gen-omp-device-properties.sh that uses nvptx-sm.def to generate omp-device-properties-nvptx. Tested on x86_64 with nvptx accelerator. gcc/ChangeLog: 2022-02-25 Tom de Vries <tdevries@suse.de> * config/nvptx/gen-omp-device-properties.sh: New file. * config/nvptx/t-omp-device: Use gen-omp-device-properties.sh.
2022-03-01[nvptx] Add nvptx-sm.defTom de Vries4-42/+57
Add a file gcc/config/nvptx/nvptx-sm.def that lists all sm_xx versions used in the port, like so: ... NVPTX_SM(30, NVPTX_SM_SEP) NVPTX_SM(35, NVPTX_SM_SEP) NVPTX_SM(53, NVPTX_SM_SEP) NVPTX_SM(70, NVPTX_SM_SEP) NVPTX_SM(75, NVPTX_SM_SEP) NVPTX_SM(80,) ... and use it in various places using a pattern: ... #define NVPTX_SM(XX, SEP) { ... } #include "nvptx-sm.def" #undef NVPTX_SM ... Tested on nvptx. gcc/ChangeLog: 2022-02-25 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx-sm.def: New file. * config/nvptx/nvptx-c.cc (nvptx_cpu_cpp_builtins): Use nvptx-sm.def. * config/nvptx/nvptx-opts.h (enum ptx_isa): Same. * config/nvptx/nvptx.cc (sm_version_to_string) (nvptx_omp_device_kind_arch_isa): Same.
2022-03-01arc: Fix for new ifcvt behavior [PR104154]Robin Dapp1-0/+6
ifcvt now passes a CC-mode "comparison" to backends. This patch simply returns from gen_compare_reg () in that case since nothing needs to be prepared anymore. gcc/ChangeLog: PR rtl-optimization/104154 * config/arc/arc.cc (gen_compare_reg): Return the CC-mode comparison ifcvt passed us.
2022-03-01i386: Fix V8HF vector init under -mno-avx [PR 104664]Hongyu Wang1-1/+6
For V8HFmode vector init with HFmode, do not directly emits V8HF move with subreg, which may cause reload to assign general register to move src. gcc/ChangeLog: PR target/104664 * config/i386/i386-expand.cc (ix86_expand_vector_init_duplicate): Use vec_setv8hf_0 for HF to V8HFmode move instead of subreg. gcc/testsuite/ChangeLog: PR target/104664 * gcc.target/i386/pr104664.c: New test.
2022-02-28PR tree-optimization/91384: peephole2 to eliminate testl after negl.Roger Sayle1-0/+13
This patch is my proposed solution to PR tree-optimization/91384 which is a missed-optimization/code quality regression on x86_64. The problematic idiom is "if (r = -a)" which is equivalent to both "r = -a; if (r != 0)" and alternatively "r = -a; if (a != 0)". In this particular case, on x86_64, we prefer to use the condition codes from the negation, rather than require an explicit testl instruction. Unfortunately, combine can't help, as it doesn't attempt to merge pairs of instructions that share the same operand(s), only pairs/triples of instructions where the result of each instruction feeds the next. But I doubt there's sufficient benefit to attempt this kind of "combination" (that wouldn't already be caught by the tree-ssa passes). Fortunately, it's relatively easy to fix this up (addressing the regression) during peephole2 to eliminate the unnecessary testl in: movl %edi, %ebx negl %ebx testl %edi, %edi je .L2 2022-02-28 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR tree-optimization/91384 * config/i386/i386.md (peephole2): Eliminate final testl insn from the sequence *movsi_internal, *negsi_1, *cmpsi_ccno_1 by transforming using *negsi_2 for the negation. gcc/testsuite/ChangeLog PR tree-optimization/91384 * gcc.target/i386/pr91384.c: New test case.
2022-02-28[nvptx] Add -mptx=_Tom de Vries3-1/+6
Add an -mptx=_ value, that indicates the default ptx version. It can be used to undo an explicit -mptx setting, so this: ... $ gcc test.c -mptx=3.1 -mptx=_ ... has the same effect as: ... $ gcc test.c ... Tested on nvptx. gcc/ChangeLog: 2022-02-28 Tom de Vries <tdevries@suse.de> * config/nvptx/nvptx-opts.h (enum ptx_version): Add PTX_VERSION_default. * config/nvptx/nvptx.cc (handle_ptx_version_option): Handle PTX_VERSION_default. * config/nvptx/nvptx.opt: Add EnumValue "_" / PTX_VERSION_default.
2022-02-28AVX512F: Add helper enumeration for ternary logic intrinsics.Hongyu Wang2-148/+262
Sync with llvm change in https://reviews.llvm.org/D120307 to add enumeration and truncate imm to unsigned char, so users could use ~ on immediates. gcc/ChangeLog: * config/i386/avx512fintrin.h (_MM_TERNLOG_ENUM): New enum. (_mm512_ternarylogic_epi64): Truncate imm to unsigned char to avoid error when using ~enum as parameter. (_mm512_mask_ternarylogic_epi64): Likewise. (_mm512_maskz_ternarylogic_epi64): Likewise. (_mm512_ternarylogic_epi32): Likewise. (_mm512_mask_ternarylogic_epi32): Likewise. (_mm512_maskz_ternarylogic_epi32): Likewise. * config/i386/avx512vlintrin.h (_mm256_ternarylogic_epi64): Adjust imm param type to unsigned char. (_mm256_mask_ternarylogic_epi64): Likewise. (_mm256_maskz_ternarylogic_epi64): Likewise. (_mm256_ternarylogic_epi32): Likewise. (_mm256_mask_ternarylogic_epi32): Likewise. (_mm256_maskz_ternarylogic_epi32): Likewise. (_mm_ternarylogic_epi64): Likewise. (_mm_mask_ternarylogic_epi64): Likewise. (_mm_maskz_ternarylogic_epi64): Likewise. (_mm_ternarylogic_epi32): Likewise. (_mm_mask_ternarylogic_epi32): Likewise. (_mm_maskz_ternarylogic_epi32): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-vpternlogd-1.c: Use new enum. * gcc.target/i386/avx512f-vpternlogq-1.c: Likewise. * gcc.target/i386/avx512vl-vpternlogd-1.c: Likewise. * gcc.target/i386/avx512vl-vpternlogq-1.c: Likewise. * gcc.target/i386/testimm-10.c: Remove imm check for vpternlog insns since the imm has been truncated in intrinsic.
2022-02-25rs6000: Use rs6000_emit_move in movmisalign<mode> expander [PR104681]Jakub Jelinek1-1/+4
The following testcase ICEs, because for some strange reason it decides to use movmisaligntf during expansion where the destination is MEM and source is CONST_DOUBLE. For normal mov<mode> expanders the rs6000 backend uses rs6000_emit_move to ensure that if one operand is a MEM, the other is a REG and a few other things, but for movmisalign<mode> nothing enforced this. The middle-end documents that movmisalign<mode> shouldn't fail, so we can't force that through predicates or condition on the expander. 2022-02-25 Jakub Jelinek <jakub@redhat.com> PR target/104681 * config/rs6000/vector.md (movmisalign<mode>): Use rs6000_emit_move. * g++.dg/opt/pr104681.C: New test.
2022-02-25arc: Fail conditional move expand patternsClaudiu Zissulescu2-6/+22
If the movcc comparison is not valid it triggers an assert in the current implementation. This behavior is not needed as we can FAIL the movcc expand pattern. gcc/ * config/arc/arc.cc (gen_compare_reg): Return NULL_RTX if the comparison is not valid. * config/arc/arc.md (movsicc): Fail if comparison is not valid. (movdicc): Likewise. (movsfcc): Likewise. (movdfcc): Likewise. Signed-off-by: Claudiu Zissulescu <claziss@synopsys.com>
2022-02-25i386: Use a new temp slot kind for splitter to floatdi<mode>2_i387_with_xmm ↵Jakub Jelinek2-3/+3
[PR104674] As mentioned in the PR, the following testcase is miscompiled for similar reasons as the already fixed PR78791 - we use SLOT_TEMP slots in various places during expansion and during expansion we can guarantee that the lifetime of those temporary slot doesn't overlap. But the following splitter uses SLOT_TEMP too and in between expansion and split1 there is a possibility that something extends the lifetime of SLOT_TEMP created slots across an instruction that will be split by this splitter. The following patch fixes it by using a new temp slot kind to make sure it doesn't reuse a SLOT_TEMP that could be live across the instruction. 2022-02-25 Jakub Jelinek <jakub@redhat.com> PR target/104674 * config/i386/i386.h (enum ix86_stack_slot): Add SLOT_FLOATxFDI_387. * config/i386/i386.md (splitter to floatdi<mode>2_i387_with_xmm): Use SLOT_FLOATxFDI_387 rather than SLOT_TEMP. * gcc.target/i386/pr104674.c: New test.