aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-07-02ada: Fix generic renaming table low bound on resetRonan Desplanques1-1/+1
gcc/ada/ * sem_ch12.adb (Save_And_Reset): Fix value of low bound used to reset table.
2024-07-02ada: Compiler accepts an illegal Unchecked_Access attribute referenceSteve Baird1-0/+7
The compiler incorrectly accepts Some_Object'Unchecked_Access'Image. gcc/ada/ * sem_attr.adb (Analyze_Image_Attribute.Check_Image_Type): Check for E_Access_Attribute_Type prefix type.
2024-07-02ada: Use clause (or use type clause) in a protected operation sometimes ignored.Steve Baird1-0/+41
In some cases, a use clause (or a use type clause) occurring within a protected operation is incorrectly ignored. gcc/ada/ * exp_ch9.adb (Expand_N_Protected_Body): Declare new procedure Unanalyze_Use_Clauses and call it before analyzing the newly constructed subprogram body.
2024-07-02ada: Put_Image aspect spec ignored for null extension.Steve Baird1-1/+16
If type T1 is is a tagged null record with a Put_Image aspect specification and type T2 is a null extension of T1 (with no aspect specifications), then evaluation of a T2'Image call should include a call to the specified procedure (as opposed to yielding "(NULL RECORD)"). gcc/ada/ * exp_put_image.adb (Build_Record_Put_Image_Procedure): Declare new Boolean-valued function Null_Record_Default_Implementation_OK; call it as part of deciding whether to generate "(NULL RECORD)" text.
2024-07-02ada: Allow mutably tagged types to work with qualified expressionsJustin Squirek1-0/+14
This patch modifies the experimental 'Size'Class feature such that objects of mutably tagged types can be assigned qualified expressions featuring a definite type (e.g. Mutable_Obj := Root_Child_T'(Root_T with others => <>)). gcc/ada/ * sem_ch5.adb: (Analyze_Assignment): Add special expansion for qualified expressions in certain cases dealing with mutably tagged types.
2024-07-02ada: Bug box for expression function with list comprehensionBob Duff1-0/+1
GNAT crashes on an iterator with a filter inside an expression function that is the completion of an earlier spec. gcc/ada/ * freeze.adb (Freeze_Type_Refs): If Node is in N_Has_Etype, check that it has had its Etype set, because this can be called early for expression functions that are completions.
2024-07-02ada: Call memcmp instead of Compare_Array_Unsigned_8 and...Eric Botcazou3-36/+138
... implement support for ordering comparisons of discrete array types. This extends the Support_Composite_Compare_On_Target feature to ordering comparisons of discrete array types as specified by RM 4.5.2(26/3), when the component type is a byte (unsigned). Implement support for ordering comparisons of discrete array types with a two-pronged approach: for types with a size known at compile time, this lets the gimplifier generate the call to memcmp (or else an optimize version of it); otherwise, this directly generates the call to memcmp. gcc/ada/ * exp_ch4.adb (Expand_Array_Comparison): Remove the obsolete byte addressibility test. If Support_Composite_Compare_On_Target is true, immediately return for a component size of 8, an unsigned component type and aligned operands. Disable when Unnest_Subprogram_Mode is true (for LLVM). (Expand_N_Op_Eq): Adjust comment. * targparm.ads (Support_Composite_Compare_On_Target): Replace bit by byte in description and document support for ordering comparisons. * gcc-interface/utils2.cc (compare_arrays): Rename into... (compare_arrays_for_equality): ...this. Remove redundant lines. (compare_arrays_for_ordering): New function. (build_binary_op) <comparisons>: Call compare_arrays_for_ordering to implement ordering comparisons for arrays.
2024-07-02ada: Fix analysis of Extensions_VisibleYannick Moy3-19/+38
Pragma/aspect Extensions_Visible should be analyzed before any pre/post contracts on a subprogram, as the legality of conversions of formal parameters to classwide type depends on the value of Extensions_Visible. Now fixed. gcc/ada/ * contracts.adb (Analyze_Pragmas_In_Declarations): Analyze pragmas in two iterations over the list of declarations in order to analyze some pragmas before others. * einfo-utils.ads (Get_Pragma): Fix comment. * sem_prag.ads (Pragma_Significant_To_Subprograms): Fix. (Pragma_Significant_To_Subprograms_Analyzed_First): Add new global array to identify these pragmas which should be analyzed first, which concerns only Extensions_Visible for now.
2024-07-02ada: Fix bogus error on allocator in instantiation with private derived typesEric Botcazou1-30/+21
The problem is that the call to Convert_View made from Make_Init_Call does nothing because the Etype is not set on the second argument. gcc/ada/ * exp_ch7.adb (Convert_View): Add third parameter Typ and use it if the second parameter does not have an Etype. (Make_Adjust_Call): Remove obsolete setting of Etype and pass Typ in call to Convert_View. (Make_Final_Call): Likewise. (Make_Init_Call): Pass Typ in call to Convert_View.
2024-07-02ada: Miscomputed bounds for inner null array aggregatesJavier Miranda1-31/+384
When an array has several dimensions, and inner dimmensions are initialized using Ada 2022 null array aggregates, the compiler crashes or reports spurious errors computing the bounds of the null array aggregates. This patch fixes the problem and adds new warnings reported when the index of null array aggregates is an enumeration type or a modular type and it is known at compile time that the program will raise Constraint_Error computing the bounds of the aggregate. gcc/ada/ * sem_aggr.adb (Cannot_Compute_High_Bound): New subprogram. (Report_Null_Array_Constraint_Error): New subprogram. (Collect_Aggr_Bounds): For null aggregates, build the bounds of the inner dimensions. (Has_Null_Aggregate_Raising_Constraint_Error): New subprogram. (Subtract): New subprogram. (Resolve_Array_Aggregate): Report a warning when the index of null array aggregates is an enumeration type or a modular type at we can statically determine that the program will raise CE at runtime computing its high bound. (Resolve_Null_Array_Aggregate): ditto.
2024-07-02ada: Fix crash on box-initialized component with No_Default_InitializationEric Botcazou3-22/+29
The problem is that the implementation of the No_Default_Initialization restriction assumes that no type initialization routines are needed and, therefore, builds a dummy version of them, which goes against their use for box-initialized components in aggregates. Therefore this use needs to be flagged as violating the restriction too. gcc/ada/ * doc/gnat_rm/standard_and_implementation_defined_restrictions.rst (No_Default_Initialization): Mention components alongside variables. * exp_aggr.adb (Build_Array_Aggr_Code.Gen_Assign): Check that the restriction No_Default_Initialization is not in effect for default initialized component. (Build_Record_Aggr_Code): Likewise. * gnat_rm.texi: Regenerate.
2024-07-02ada: Document that -gnatdJ is unusedEric Botcazou1-0/+1
gcc/ada/ * debug.adb (dJ): Add back as unused.
2024-07-02amdgcn: invent target feature flagsAndrew Stubbs4-87/+155
This is a first step towards having a device table so we can add new devices more easily. It'll also make it easier to remove the deprecated GCN3 bits. The patch should not change the behaviour of anything. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_GLOBAL_ADDRSPACE): New. (TARGET_AVGPRS): New. (TARGET_AVGPR_MEMOPS): New. (TARGET_AVGPR_COMBINED): New. (TARGET_FLAT_OFFSETS): New. (TARGET_11BIT_GLOBAL_OFFSET): New. (TARGET_CDNA2_MEM_COSTS): New. (TARGET_WAVE64_COMPAT): New. (TARGET_DPP_FULL): New. (TARGET_DPP16): New. (TARGET_DPP8): New. (TARGET_AVGPR_CDNA1_NOPS): New. (TARGET_VGPR_GRANULARITY): New. (TARGET_ARCHITECTED_FLAT_SCRATCH): New. (TARGET_EXPLICIT_CARRY): New. (TARGET_MULTIPLY_IMMEDIATE): New. (TARGET_SDWA): New. (TARGET_WBINVL1_CACHE): New. (TARGET_GLn_CACHE): New. * config/gcn/gcn-valu.md (throughout): Change TARGET_GCN*, TARGET_CDNA* and TARGET_RDNA* to use TARGET_<feature> instead. * config/gcn/gcn.cc (throughout): Likewise. * config/gcn/gcn.md (throughout): Likewise.
2024-07-02c++: Relax too strict assert in stabilize_expr [PR111160]Simon Martin3-1/+22
The case in the ticket is an ICE on invalid due to an assert in stabilize_expr, but the underlying issue can actually trigger on this *valid* code: === cut here === struct TheClass { TheClass() {} TheClass(volatile TheClass& t) {} TheClass operator=(volatile TheClass& t) volatile { return t; } }; void the_func() { volatile TheClass x, y, z; (false ? x : y) = z; } === cut here === The problem is that stabilize_expr asserts that it returns an expression without TREE_SIDE_EFFECTS, which can't be if the involved type is volatile. This patch relaxes the assert to accept having TREE_THIS_VOLATILE on the returned expression. Successfully tested on x86_64-pc-linux-gnu. PR c++/111160 gcc/cp/ChangeLog: * tree.cc (stabilize_expr): Stabilized expressions can have TREE_SIDE_EFFECTS if they're volatile. gcc/testsuite/ChangeLog: * g++.dg/overload/error8.C: New test. * g++.dg/overload/volatile2.C: New test.
2024-07-02i386: Support APX NF and NDD for imul/mulLingling Kong2-45/+61
gcc/ChangeLog: * config/i386/i386.md (*imulhi<mode>zu): Added APX NF support. (*imulhi<mode>zu<nf_name>): New define_insn. (*mulsi3_1_zext<nf_name>): Ditto. (*mul<mode><dwi>3_1<nf_name>): Ditto. (*<u>mulqihi3_1<nf_name>): Ditto. (*mul<mode>3_1<nf_name>): Added APX NDD support. (*mulv<mode>4): Ditto. (*mulvhi4): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/apx-ndd.c: Add test for imul ndd.
2024-07-02sparc: define SPARC_LONG_DOUBLE_TYPE_SIZE for vxworks [PR115739]Kewen Lin1-0/+4
Commit r15-1594 removed define of LONG_DOUBLE_TYPE_SIZE in sparc.cc, it's based on the assumption that each OS has its own define (see the comments in sparc.h), but it exposes an issue on vxworks which lacks of the define. We can bring back the default SPARC_LONG_DOUBLE_TYPE_SIZE to sparc.cc, but according to the comments in sparc.h, I think it's better to define this in vxworks.h. btw, I also went through all the sparc supported triples, vxworks is the only one that misses this define. PR target/115739 gcc/ChangeLog: * config/sparc/vxworks.h (SPARC_LONG_DOUBLE_TYPE_SIZE): New define.
2024-07-02LoongArch: Define loongarch_insn_cost and set the cost of movcf2gr and movgr2cf.Lulu Cheng1-0/+29
The following two FAIL items have been fixed: FAIL: gcc.target/loongarch/movcf2gr-via-fr.c scan-assembler movcf2fr\\t\\\\\$f[0-9]+,\\\\\$fcc FAIL: gcc.target/loongarch/movcf2gr-via-fr.c scan-assembler movfr2gr\\\\.s\\t\\\\\$r4 gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_insn_cost): New function. (TARGET_INSN_COST): New macro.
2024-07-02LoongArch: Fix explicit-relocs-{extreme-,}tls-desc.c tests.Lulu Cheng2-2/+2
After r15-1579, ADD and LD/ST pairs will be merged into LDX/STX. Cause these two tests to fail. To guarantee that these two tests pass, add the compilation option '-fno-late-combine-instructions'. gcc/testsuite/ChangeLog: * gcc.target/loongarch/explicit-relocs-extreme-tls-desc.c: Add compilation options '-fno-late-combine-instructions'. * gcc.target/loongarch/explicit-relocs-tls-desc.c: Likewise.
2024-07-02isel: Fold more in gimple_expand_vec_cond_expr [PR115659]Kewen Lin1-7/+41
As PR115659 shows, assuming c = x CMP y, there are some folding chances for patterns r = c ? -1/z : z/0. For r = c ? -1 : z, it can be folded into: - r = c | z (with ior_optab supported) - or r = c ? c : z while for r = c ? z : 0, it can be foled into: - r = c & z (with and_optab supported) - or r = c ? z : c This patch is to teach ISEL to take care of them and also remove the redundant gsi_replace as the caller of function gimple_expand_vec_cond_expr will handle it. PR tree-optimization/115659 gcc/ChangeLog: * gimple-isel.cc (gimple_expand_vec_cond_expr): Add more foldings for patterns x CMP y ? -1 : z and x CMP y ? z : 0.
2024-07-02Daily bump.GCC Administrator8-1/+366
2024-07-01c++: ICE with computed gotos [PR115469]Marek Polacek2-4/+34
This is a low-prio crash on invalid code where we ICE on a VAR_DECL with erroneous type. I thought I'd try to avoid putting such decls into ->names and ->names_in_scope but that sounds riskier than the following cleanup. PR c++/115469 gcc/cp/ChangeLog: * decl.cc (automatic_var_with_nontrivial_dtor_p): New. (poplevel_named_label_1): Use it. (check_goto_1): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/label17.C: New test.
2024-07-01testsuite: fix spaceship-narrowing1.CMarek Polacek1-1/+1
I made sure that Wnarrowing22.C works fine on ILP32, but apparently I didn't verify that spaceship-narrowing1.C works there as well. :( gcc/testsuite/ChangeLog: * g++.dg/cpp2a/spaceship-narrowing1.C: Use __INT64_TYPE__.
2024-07-01c++: unresolved overload with comma op [PR115430]Marek Polacek3-2/+28
This works: template<typename T> int Func(T); typedef int (*funcptrtype)(int); funcptrtype fp0 = &Func<int>; but this doesn't: funcptrtype fp2 = (0, &Func<int>); because we only call resolve_nondeduced_context on the LHS (via convert_to_void) but not on the RHS, so cp_build_compound_expr's type_unknown_p check issues an error. PR c++/115430 gcc/cp/ChangeLog: * typeck.cc (cp_build_compound_expr): Call resolve_nondeduced_context on RHS. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/noexcept41.C: Remove dg-error. * g++.dg/overload/addr3.C: New test.
2024-07-01c++: DR2627, Bit-fields and narrowing conversions [PR94058]Marek Polacek5-0/+134
This DR (https://cplusplus.github.io/CWG/issues/2627.html) says that even if we are converting from an integer type or unscoped enumeration type to an integer type that cannot represent all the values of the original type, it's not narrowing if "the source is a bit-field whose width w is less than that of its type (or, for an enumeration type, its underlying type) and the target type can represent all the values of a hypothetical extended integer type with width w and with the same signedness as the original type". DR 2627 PR c++/94058 PR c++/104392 gcc/cp/ChangeLog: * typeck2.cc (check_narrowing): Don't warn if the conversion isn't narrowing as per DR 2627. gcc/testsuite/ChangeLog: * g++.dg/DRs/dr2627.C: New test. * g++.dg/cpp0x/Wnarrowing22.C: New test. * g++.dg/cpp2a/spaceship-narrowing1.C: New test. * g++.dg/cpp2a/spaceship-narrowing2.C: New test.
2024-07-01Preserve SSA info for more propagated copyRichard Biener2-0/+12
Besides VN and copy-prop also CCP and VRP as well as forwprop propagate out copies and thus it's worthwhile to try to preserve range and points-to info there when possible. Note that this also fixes the testcase from PR115701 but that's because we do not actually intersect info but only copy info when there was no info present. * tree-ssa-forwprop.cc (fwprop_set_lattice_val): Preserve SSA info. * tree-ssa-propagate.cc (substitute_and_fold_dom_walker::before_dom_children): Likewise.
2024-07-01RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 4Pan Li9-0/+270
This patch would like to add test cases for the unsigned scalar .SAT_ADD IMM form 4. Aka: Form 4: #define DEF_SAT_U_ADD_IMM_FMT_4(T) \ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_4 (T x) \ { \ T ret; \ return __builtin_add_overflow (x, 9, &ret) == 0 ? ret : -1; \ } DEF_SAT_U_ADD_IMM_FMT_4(uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add helper test macro. * gcc.target/riscv/sat_u_add_imm-13.c: New test. * gcc.target/riscv/sat_u_add_imm-14.c: New test. * gcc.target/riscv/sat_u_add_imm-15.c: New test. * gcc.target/riscv/sat_u_add_imm-16.c: New test. * gcc.target/riscv/sat_u_add_imm-run-13.c: New test. * gcc.target/riscv/sat_u_add_imm-run-14.c: New test. * gcc.target/riscv/sat_u_add_imm-run-15.c: New test. * gcc.target/riscv/sat_u_add_imm-run-16.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-01RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 3Pan Li9-0/+270
This patch would like to add test cases for the unsigned scalar .SAT_ADD IMM form 3. Aka: Form 3: #define DEF_SAT_U_ADD_IMM_FMT_3(T) \ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_3 (T x) \ { \ T ret; \ return __builtin_add_overflow (x, 8, &ret) ? -1 : ret; \ } DEF_SAT_U_ADD_IMM_FMT_3(uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add helper test macro. * gcc.target/riscv/sat_u_add_imm-10.c: New test. * gcc.target/riscv/sat_u_add_imm-11.c: New test. * gcc.target/riscv/sat_u_add_imm-12.c: New test. * gcc.target/riscv/sat_u_add_imm-9.c: New test. * gcc.target/riscv/sat_u_add_imm-run-10.c: New test. * gcc.target/riscv/sat_u_add_imm-run-11.c: New test. * gcc.target/riscv/sat_u_add_imm-run-12.c: New test. * gcc.target/riscv/sat_u_add_imm-run-9.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-01RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 2Pan Li9-0/+269
This patch would like to add test cases for the unsigned scalar .SAT_ADD IMM form 2. Aka: Form 2: #define DEF_SAT_U_ADD_IMM_FMT_2(T) \ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_1 (T x) \ { \ return (T)(x + 9) < x ? -1 : (x + 9); \ } DEF_SAT_U_ADD_IMM_FMT_2(uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add helper test macro. * gcc.target/riscv/sat_u_add_imm-5.c: New test. * gcc.target/riscv/sat_u_add_imm-6.c: New test. * gcc.target/riscv/sat_u_add_imm-7.c: New test. * gcc.target/riscv/sat_u_add_imm-8.c: New test. * gcc.target/riscv/sat_u_add_imm-run-5.c: New test. * gcc.target/riscv/sat_u_add_imm-run-6.c: New test. * gcc.target/riscv/sat_u_add_imm-run-7.c: New test. * gcc.target/riscv/sat_u_add_imm-run-8.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-01RISC-V: Add testcases for unsigned scalar .SAT_ADD IMM form 1Pan Li9-0/+269
This patch would like to add test cases for the unsigned scalar .SAT_ADD IMM form 1. Aka: Form 1: #define DEF_SAT_U_ADD_IMM_FMT_1(T) \ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_1 (T x) \ { \ return (T)(x + 9) >= x ? (x + 9) : -1; \ } DEF_SAT_U_ADD_IMM_FMT_1(uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add helper test macro. * gcc.target/riscv/sat_u_add_imm-1.c: New test. * gcc.target/riscv/sat_u_add_imm-2.c: New test. * gcc.target/riscv/sat_u_add_imm-3.c: New test. * gcc.target/riscv/sat_u_add_imm-4.c: New test. * gcc.target/riscv/sat_u_add_imm-run-1.c: New test. * gcc.target/riscv/sat_u_add_imm-run-2.c: New test. * gcc.target/riscv/sat_u_add_imm-run-3.c: New test. * gcc.target/riscv/sat_u_add_imm-run-4.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-07-01testsuite: Fix -m32 gcc.target/i386/pr102464-vrndscaleph.c on RedHat.Roger Sayle1-0/+3
This patch fixes the 4 FAILs of gcc.target/i386/pr192464-vrndscaleph.c with --target_board='unix{-m32}' on RedHat 7.x. The issue is that this AVX512 test includes the system math.h, and on older systems this provides inline versions of floor, ceil and rint (for the 387). The work around is to define __NO_MATH_INLINES before #include <math.h> (or alternatively use __builtin_floor, __builtin_ceil, etc.). 2024-07-01 Roger Sayle <roger@nextmovesoftware.com> gcc/testsuite/ChangeLog PR middle-end/102464 * gcc.target/i386/pr102464-vrndscaleph.c: Define __NO_MATH_INLINES to resovle FAILs with -m32 on older RedHat systems.
2024-07-01i386: Additional peephole2 to use lea in round-up integer division.Roger Sayle2-0/+28
A common idiom for implementing an integer division that rounds upwards is to write (x + y - 1) / y. Conveniently on x86, the two additions to form the numerator can be performed by a single lea instruction, and indeed gcc currently generates a lea when both x and y are both registers. int foo(int x, int y) { return (x+y-1)/y; } generates with -O2: foo: leal -1(%rsi,%rdi), %eax // 4 bytes cltd idivl %esi ret Oddly, however, if x is a memory, gcc currently uses two instructions: int m; int bar(int y) { return (m+y-1)/y; } generates: foo: movl m(%rip), %eax addl %edi, %eax // 2 bytes subl $1, %eax // 3 bytes cltd idivl %edi ret This discrepancy is caused by the late decision (in peephole2) to split an addition with a memory operand, into a load followed by a reg-reg addition. This patch improves this situation by adding a peephole2 to recognize consecutive additions and transform them into lea if profitable. My first attempt at fixing this was to use a define_insn_and_split: (define_insn_and_split "*lea<mode>3_reg_mem_imm" [(set (match_operand:SWI48 0 "register_operand") (plus:SWI48 (plus:SWI48 (match_operand:SWI48 1 "register_operand") (match_operand:SWI48 2 "memory_operand")) (match_operand:SWI48 3 "x86_64_immediate_operand")))] "ix86_pre_reload_split ()" "#" "&& 1" [(set (match_dup 4) (match_dup 2)) (set (match_dup 0) (plus:SWI48 (plus:SWI48 (match_dup 1) (match_dup 4)) (match_dup 3)))] "operands[4] = gen_reg_rtx (<MODE>mode);") using combine to combine instructions. Unfortunately, this approach interferes with (reload's) subtle balance of deciding when to use/avoid lea, which can be observed as a code size regression in CSiBE. The peephole2 approach (proposed here) uniformly improves CSiBE results. 2024-07-01 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (peephole2): Transform two consecutive additions into a 3-component lea if !TARGET_AVOID_LEA_FOR_ADDR. gcc/testsuite/ChangeLog * gcc.target/i386/lea-3.c: New test case.
2024-07-01AVR: target/88236, target/115726 - Fix __memx code generation.Georg-Johann Lay3-3/+138
PR target/88236 PR target/115726 gcc/ * config/avr/avr.md (mov<mode>) [avr_mem_memx_p]: Expand in such a way that the destination does not overlap with any hard register clobbered / used by xload8qi_A resp. xload<mode>_A. * config/avr/avr.cc (avr_out_xload): Avoid early-clobber situation for Z by executing just one load when the output register overlaps with Z. gcc/testsuite/ * gcc.target/avr/torture/pr88236-pr115726.c: New test.
2024-07-01testsuite/52641 - Adjust some test cases to less capable platforms.Georg-Johann Lay5-4/+8
PR testsuite/52641 gcc/testsuite/ * gcc.dg/analyzer/pr109577.c: Use __SIZE_TYPE__ instead of "unsigned long". * gcc.dg/analyzer/pr93032-mztools-signed-char.c: Requires int32plus. * gcc.dg/analyzer/pr93032-mztools-unsigned-char.c: Requires int32plus. * gcc.dg/analyzer/putenv-1.c: Skip on avr. * gcc.dg/torture/type-generic-1.c: Skip on avr.
2024-07-01libgomp, openmp: Add ompx_gnu_pinned_mem_allocAndrew Stubbs11-37/+336
This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. This is not in the OpenMP standard so it uses the "ompx" namespace and an independent enum baseline of 200 (selected to not clash with other known implementations). The allocator is equivalent to using a custom allocator with the pinned trait and the null fallback trait. One motivation for having this feature is for use by the (planned) -foffload-memory=pinned feature. gcc/fortran/ChangeLog: * openmp.cc (is_predefined_allocator): Update valid ranges to incorporate ompx_gnu_pinned_mem_alloc. libgomp/ChangeLog: * allocator.c (ompx_gnu_min_predefined_alloc): New. (ompx_gnu_max_predefined_alloc): New. (predefined_alloc_mapping): Rename to ... (predefined_omp_alloc_mapping): ... this. (predefined_ompx_gnu_alloc_mapping): New. (_Static_assert): Adjust for the new name, and add a new assert for the new table. (predefined_allocator_p): New. (predefined_alloc_mapping): New. (omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc. Use predefined_allocator_p and predefined_alloc_mapping. (omp_free): Likewise. (omp_alligned_calloc): Likewise. (omp_realloc): Likewise. * env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc. * libgomp.texi: Document ompx_gnu_pinned_mem_alloc. * omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc. * omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc. * omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc. * testsuite/libgomp.c/alloc-pinned-5.c: New test. * testsuite/libgomp.c/alloc-pinned-6.c: New test. * testsuite/libgomp.fortran/alloc-pinned-1.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-pinned-1.f90: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2024-07-01libgomp: change alloc-pinned tests failure modeAndrew Stubbs2-28/+12
The feature doesn't work on non-Linux hosts, at present, so skip the tests entirely. On Linux systems that have insufficient lockable memory configured we still need to fail or else the feature won't be getting tested when we think it is, but now there's a message to explain why. libgomp/ChangeLog: * testsuite/libgomp.c/alloc-pinned-1.c: Change dg-xfail-run-if to dg-skip-if. Correct spelling mistake. Abort on insufficient lockable memory. Use #error on non-linux hosts. * testsuite/libgomp.c/alloc-pinned-2.c: Likewise.
2024-07-01libffi: Fix 32-bit SPARC structure passing [PR115681]Rainer Orth1-0/+16
The libffi.closures/single_entry_structs2.c test FAILs on 32-bit SPARC: FAIL: libffi.closures/single_entry_structs2.c -W -Wall -Wno-psabi -O0 execution test The issue has been reported, analyzed and fixed upstream: Several tests FAIL on 32-bit Solaris/SPARC https://github.com/libffi/libffi/issues/841 Therefore this patch imports the fix into the GCC tree. Tested on sparc-sun-solaris2.11. 2024-07-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> libffi: PR libffi/115681 * src/sparc/ffi.c (ffi_call_int): Copy structure arguments to maintain call-by-value semantics.
2024-07-01tree-optimization/115723 - ICE with .COND_ADD reductionRichard Biener2-4/+33
The following fixes an ICE with a .COND_ADD discovered as reduction even though its else value isn't the reduction chain link but a constant. This would be wrong-code with --disable-checking I think. PR tree-optimization/115723 * tree-vect-loop.cc (check_reduction_path): For a .COND_ADD verify the else value also refers to the reduction chain op. * gcc.dg/vect/pr115723.c: New testcase.
2024-07-01[MAINTAINERS] Update my email addressClaudiu Zissulescu1-2/+1
Update my email address. ChangeLog: * MAINTAINERS: Update claziss email address. Signed-off-by: Claudiu Zissulescu <claziss@gmail.com>
2024-07-01tree-optimization/115694 - ICE with complex store rewriteRichard Biener2-0/+15
The following adds a missed check when forwprop attempts to rewrite a complex store. PR tree-optimization/115694 * tree-ssa-forwprop.cc (pass_forwprop::execute): Check the store is complex before rewriting it. * g++.dg/torture/pr115694.C: New testcase.
2024-07-01Remove vcond{,u,eq}<mode> expanders since they will be obsolete.liuhongt2-310/+0
gcc/ChangeLog: PR target/115517 * config/i386/mmx.md (vcond<mode>v2sf): Removed. (vcond<MMXMODE124:mode><MMXMODEI:mode>): Ditto. (vcond<mode><mode>): Ditto. (vcondu<MMXMODE124:mode><MMXMODEI:mode>): Ditto. (vcondu<mode><mode>): Ditto. * config/i386/sse.md (vcond<V_512:mode><VF_512:mode>): Ditto. (vcond<V_256:mode><VF_256:mode>): Ditto. (vcond<V_128:mode><VF_128:mode>): Ditto. (vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): Ditto. (vcond<V_512:mode><VI_AVX512BW:mode>): Ditto. (vcond<V_256:mode><VI_256:mode>): Ditto. (vcond<V_128:mode><VI124_128:mode>): Ditto. (vcond<VI8F_128:mode>v2di): Ditto. (vcondu<V_512:mode><VI_AVX512BW:mode>): Ditto. (vcondu<V_256:mode><VI_256:mode>): Ditto. (vcondu<V_128:mode><VI124_128:mode>): Ditto. (vcondu<VI8F_128:mode>v2di): Ditto. (vcondeq<VI8F_128:mode>v2di): Ditto.
2024-07-01Optimize a < 0 ? -1 : 0 to (signed)a >> 31.liuhongt4-3/+138
Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31 and x < 0 ? 1 : 0 into (unsigned) x >> 31. Add define_insn_and_split for the optimization did in ix86_expand_int_vcond. gcc/ChangeLog: PR target/115517 * config/i386/sse.md ("*ashr<mode>3_1"): New define_insn_and_split. (*avx512_ashr<mode>3_1): Ditto. (*avx2_lshr<mode>3_1): Ditto. (*avx2_lshr<mode>3_2): Ditto and add 2 combine splitter after it. * config/i386/mmx.md (mmxscalarsize): New mode attribute. (*mmw_ashr<mode>3_1): New define_insn_and_split. ("mmx_<insn><mode>3): Add a combine spiltter after it. (*mmx_ashrv2hi3_1): New define_insn_and_plit, also add a combine splitter after it. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111023-2.c: Adjust testcase. * gcc.target/i386/vect-div-1.c: Ditto.
2024-07-01Adjust testcase for the regressed testcases after obsolete of vcond{,u,eq}.liuhongt9-9/+70
> Richard suggests that we implement the "obvious" transforms like > inversion in the middle-end but if for example unsigned compares > are not supported the us_minus + eq + negative trick isn't on > that list. > > The main reason to restrict vec_cmp would be to avoid > a <= b ? c : d going with an unsupported vec_cmp but instead > do a > b ? d : c - the alternative is trying to fix this > on the RTL side via combine. I understand the non-native Yes, I have a patch which can fix most regressions via pattern match in combine. Still there is a situation that is difficult to deal with, mainly the optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only exists under sse4.1, w/o sse4.1, it takes 3 instructions (pand,pandn,por) to simulate the vcond_mask, and the combine matches up to 4 instructions, which makes it currently impossible to use the combine to recover those optimizations in the vcond{,u,eq}.i.e min/max. In the case of sse 4.1 and above, there is basically no regression anymore. the regression testcases w/o sse4.1 FAIL: g++.target/i386/pr100637-1b.C -std=gnu++14 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr100637-1b.C -std=gnu++17 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr100637-1b.C -std=gnu++20 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr100637-1b.C -std=gnu++98 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr100637-1w.C -std=gnu++14 scan-assembler-times pcmpeqw 2 FAIL: g++.target/i386/pr100637-1w.C -std=gnu++17 scan-assembler-times pcmpeqw 2 FAIL: g++.target/i386/pr100637-1w.C -std=gnu++20 scan-assembler-times pcmpeqw 2 FAIL: g++.target/i386/pr100637-1w.C -std=gnu++98 scan-assembler-times pcmpeqw 2 FAIL: g++.target/i386/pr103861-1.C -std=gnu++14 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr103861-1.C -std=gnu++17 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr103861-1.C -std=gnu++20 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr103861-1.C -std=gnu++98 scan-assembler-times pcmpeqb 2 FAIL: gcc.target/i386/pr88540.c scan-assembler minpd gcc/testsuite/ChangeLog: PR target/115517 * g++.target/i386/pr100637-1b.C: Add xfail and -mno-sse4.1. * g++.target/i386/pr100637-1w.C: Ditto. * g++.target/i386/pr103861-1.C: Ditto. * gcc.target/i386/pr88540.c: Ditto. * gcc.target/i386/pr103941-2.c: Add -mno-avx512f. * g++.target/i386/sse4_1-pr100637-1b.C: New test. * g++.target/i386/sse4_1-pr100637-1w.C: New test. * g++.target/i386/sse4_1-pr103861-1.C: New test. * gcc.target/i386/sse4_1-pr88540.c: New test.
2024-07-01Add more splitter for mskmov with avx512 comparison.liuhongt1-23/+209
gcc/ChangeLog: PR target/115517 * config/i386/sse.md (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt_avx512): New define_insn_and_split. (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt_avx512): Ditto. (*<sse2_avx2>_pmovmskb_lt_avx512): Ditto. (*<sse2_avx2>_pmovmskb_zext_lt_avx512): Ditto. (*sse2_pmovmskb_ext_lt_avx512): Ditto. (*pmovsk_kmask_v16qi_avx512): Ditto. (*pmovsk_mask_v32qi_avx512): Ditto. (*pmovsk_mask_cmp_<mode>_avx512): Ditto. (*pmovsk_ptest_<mode>_avx512): Ditto.
2024-07-01Match IEEE min/max with UNSPEC_IEEE_{MIN,MAX}.liuhongt1-0/+63
These versions of the min/max patterns implement exactly the operations min = (op1 < op2 ? op1 : op2) max = (!(op1 < op2) ? op1 : op2) gcc/ChangeLog: PR target/115517 * config/i386/sse.md (*minmax<mode>3_1): New pre_reload define_insn_and_split. (*minmax<mode>3_2): Ditto.
2024-07-01Lower AVX512 kmask comparison back to AVX2 comparison when op_{true,false} ↵liuhongt1-0/+97
is vector -1/0. gcc/ChangeLog PR target/115517 * config/i386/sse.md (*<avx512>_cvtmask2<ssemodesuffix><mode>_not): New pre_reload splitter. (*<avx512>_cvtmask2<ssemodesuffix><mode>_not): Ditto. (*avx2_pcmp<mode>3_6): Ditto. (*avx2_pcmp<mode>3_7): Ditto.
2024-07-01Add more splitters to match (unspec [op1 op2 (gt op3 constm1_operand)] ↵liuhongt1-0/+130
UNSPEC_BLENDV) These define_insn_and_split are needed after vcond{,u,eq} is obsolete. gcc/ChangeLog: PR target/115517 * config/i386/sse.md (*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_gt): New define_insn_and_split. (*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_gtint): Ditto. (*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_not_gtint): Ditto. (*<sse4_1_avx2>_pblendvb_gt): Ditto. (*<sse4_1_avx2>_pblendvb_gt_subreg_not): Ditto.
2024-07-01Enable flate-combine.liuhongt15-24/+42
Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also define target_insn_cost to prevent post_reload pass_late_combine to revert the optimziation did in pass_rpad. Adjust testcases since pass_late_combine generates better code but break scan assembly. .i.e Under 32-bit target, gcc used to generate broadcast from stack and then do the real operation. After flate_combine, they're combined into embeded broadcast operations. gcc/ChangeLog: * config/i386/i386-features.cc (ix86_rpad_gate): New function. * config/i386/i386-options.cc (ix86_override_options_after_change): Don't disable flate_combine. * config/i386/i386-passes.def: Move pass_stv2 and pass_rpad after pre_reload pas_late_combine. * config/i386/i386-protos.h (ix86_rpad_gate): New declare. * config/i386/i386.cc (ix86_insn_cost): New function. (TARGET_INSN_COST): Define. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Adjus testcase. * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Ditto. * gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Ditto. * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Ditto. * gcc.target/i386/pr91333.c: Ditto. * gcc.target/i386/vect-strided-4.c: Ditto.
2024-07-01Extend lshifrtsi3_1_zext to ?k alternative.liuhongt2-6/+41
late_combine will combine lshift + zero into *lshifrtsi3_1_zext which cause extra mov between gpr and kmask, add ?k to the pattern. gcc/ChangeLog: PR target/115610 * config/i386/i386.md (<*insnsi3_zext): Add alternative ?k, enable it only for lshiftrt and under avx512bw. * config/i386/sse.md (*klshrsi3_1_zext): New define_insn, and add corresponding define_split after it.
2024-07-01Define mask as extern instead of uninitialized local variables.liuhongt6-10/+10
The testcases are supposed to scan for vpopcnt{b,w,d,q} operations with k mask, but mask is defined as uninitialized local variable which will be set as 0 at rtl expand phase. And it's further simplified off by late_combine which caused scan assembly failure. Move the definition of mask outside to make the testcases more stable. gcc/testsuite/ChangeLog: PR target/115610 * gcc.target/i386/avx512bitalg-vpopcntb.c: Define mask as extern instead of uninitialized local variables. * gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto. * gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto. * gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto. * gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Ditto. * gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Ditto.
2024-07-01Daily bump.GCC Administrator3-1/+49