aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-07-01i386: Additional peephole2 to use lea in round-up integer division.Roger Sayle2-0/+28
A common idiom for implementing an integer division that rounds upwards is to write (x + y - 1) / y. Conveniently on x86, the two additions to form the numerator can be performed by a single lea instruction, and indeed gcc currently generates a lea when both x and y are both registers. int foo(int x, int y) { return (x+y-1)/y; } generates with -O2: foo: leal -1(%rsi,%rdi), %eax // 4 bytes cltd idivl %esi ret Oddly, however, if x is a memory, gcc currently uses two instructions: int m; int bar(int y) { return (m+y-1)/y; } generates: foo: movl m(%rip), %eax addl %edi, %eax // 2 bytes subl $1, %eax // 3 bytes cltd idivl %edi ret This discrepancy is caused by the late decision (in peephole2) to split an addition with a memory operand, into a load followed by a reg-reg addition. This patch improves this situation by adding a peephole2 to recognize consecutive additions and transform them into lea if profitable. My first attempt at fixing this was to use a define_insn_and_split: (define_insn_and_split "*lea<mode>3_reg_mem_imm" [(set (match_operand:SWI48 0 "register_operand") (plus:SWI48 (plus:SWI48 (match_operand:SWI48 1 "register_operand") (match_operand:SWI48 2 "memory_operand")) (match_operand:SWI48 3 "x86_64_immediate_operand")))] "ix86_pre_reload_split ()" "#" "&& 1" [(set (match_dup 4) (match_dup 2)) (set (match_dup 0) (plus:SWI48 (plus:SWI48 (match_dup 1) (match_dup 4)) (match_dup 3)))] "operands[4] = gen_reg_rtx (<MODE>mode);") using combine to combine instructions. Unfortunately, this approach interferes with (reload's) subtle balance of deciding when to use/avoid lea, which can be observed as a code size regression in CSiBE. The peephole2 approach (proposed here) uniformly improves CSiBE results. 2024-07-01 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (peephole2): Transform two consecutive additions into a 3-component lea if !TARGET_AVOID_LEA_FOR_ADDR. gcc/testsuite/ChangeLog * gcc.target/i386/lea-3.c: New test case.
2024-07-01AVR: target/88236, target/115726 - Fix __memx code generation.Georg-Johann Lay3-3/+138
PR target/88236 PR target/115726 gcc/ * config/avr/avr.md (mov<mode>) [avr_mem_memx_p]: Expand in such a way that the destination does not overlap with any hard register clobbered / used by xload8qi_A resp. xload<mode>_A. * config/avr/avr.cc (avr_out_xload): Avoid early-clobber situation for Z by executing just one load when the output register overlaps with Z. gcc/testsuite/ * gcc.target/avr/torture/pr88236-pr115726.c: New test.
2024-07-01testsuite/52641 - Adjust some test cases to less capable platforms.Georg-Johann Lay5-4/+8
PR testsuite/52641 gcc/testsuite/ * gcc.dg/analyzer/pr109577.c: Use __SIZE_TYPE__ instead of "unsigned long". * gcc.dg/analyzer/pr93032-mztools-signed-char.c: Requires int32plus. * gcc.dg/analyzer/pr93032-mztools-unsigned-char.c: Requires int32plus. * gcc.dg/analyzer/putenv-1.c: Skip on avr. * gcc.dg/torture/type-generic-1.c: Skip on avr.
2024-07-01libgomp, openmp: Add ompx_gnu_pinned_mem_allocAndrew Stubbs2-4/+23
This creates a new predefined allocator as a shortcut for using pinned memory with OpenMP. This is not in the OpenMP standard so it uses the "ompx" namespace and an independent enum baseline of 200 (selected to not clash with other known implementations). The allocator is equivalent to using a custom allocator with the pinned trait and the null fallback trait. One motivation for having this feature is for use by the (planned) -foffload-memory=pinned feature. gcc/fortran/ChangeLog: * openmp.cc (is_predefined_allocator): Update valid ranges to incorporate ompx_gnu_pinned_mem_alloc. libgomp/ChangeLog: * allocator.c (ompx_gnu_min_predefined_alloc): New. (ompx_gnu_max_predefined_alloc): New. (predefined_alloc_mapping): Rename to ... (predefined_omp_alloc_mapping): ... this. (predefined_ompx_gnu_alloc_mapping): New. (_Static_assert): Adjust for the new name, and add a new assert for the new table. (predefined_allocator_p): New. (predefined_alloc_mapping): New. (omp_aligned_alloc): Support ompx_gnu_pinned_mem_alloc. Use predefined_allocator_p and predefined_alloc_mapping. (omp_free): Likewise. (omp_alligned_calloc): Likewise. (omp_realloc): Likewise. * env.c (parse_allocator): Add ompx_gnu_pinned_mem_alloc. * libgomp.texi: Document ompx_gnu_pinned_mem_alloc. * omp.h.in (omp_allocator_handle_t): Add ompx_gnu_pinned_mem_alloc. * omp_lib.f90.in: Add ompx_gnu_pinned_mem_alloc. * omp_lib.h.in: Add ompx_gnu_pinned_mem_alloc. * testsuite/libgomp.c/alloc-pinned-5.c: New test. * testsuite/libgomp.c/alloc-pinned-6.c: New test. * testsuite/libgomp.fortran/alloc-pinned-1.f90: New test. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-pinned-1.f90: New test. Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
2024-07-01tree-optimization/115723 - ICE with .COND_ADD reductionRichard Biener2-4/+33
The following fixes an ICE with a .COND_ADD discovered as reduction even though its else value isn't the reduction chain link but a constant. This would be wrong-code with --disable-checking I think. PR tree-optimization/115723 * tree-vect-loop.cc (check_reduction_path): For a .COND_ADD verify the else value also refers to the reduction chain op. * gcc.dg/vect/pr115723.c: New testcase.
2024-07-01tree-optimization/115694 - ICE with complex store rewriteRichard Biener2-0/+15
The following adds a missed check when forwprop attempts to rewrite a complex store. PR tree-optimization/115694 * tree-ssa-forwprop.cc (pass_forwprop::execute): Check the store is complex before rewriting it. * g++.dg/torture/pr115694.C: New testcase.
2024-07-01Remove vcond{,u,eq}<mode> expanders since they will be obsolete.liuhongt2-310/+0
gcc/ChangeLog: PR target/115517 * config/i386/mmx.md (vcond<mode>v2sf): Removed. (vcond<MMXMODE124:mode><MMXMODEI:mode>): Ditto. (vcond<mode><mode>): Ditto. (vcondu<MMXMODE124:mode><MMXMODEI:mode>): Ditto. (vcondu<mode><mode>): Ditto. * config/i386/sse.md (vcond<V_512:mode><VF_512:mode>): Ditto. (vcond<V_256:mode><VF_256:mode>): Ditto. (vcond<V_128:mode><VF_128:mode>): Ditto. (vcond<VI2HFBF_AVX512VL:mode><VHF_AVX512VL:mode>): Ditto. (vcond<V_512:mode><VI_AVX512BW:mode>): Ditto. (vcond<V_256:mode><VI_256:mode>): Ditto. (vcond<V_128:mode><VI124_128:mode>): Ditto. (vcond<VI8F_128:mode>v2di): Ditto. (vcondu<V_512:mode><VI_AVX512BW:mode>): Ditto. (vcondu<V_256:mode><VI_256:mode>): Ditto. (vcondu<V_128:mode><VI124_128:mode>): Ditto. (vcondu<VI8F_128:mode>v2di): Ditto. (vcondeq<VI8F_128:mode>v2di): Ditto.
2024-07-01Optimize a < 0 ? -1 : 0 to (signed)a >> 31.liuhongt4-3/+138
Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31 and x < 0 ? 1 : 0 into (unsigned) x >> 31. Add define_insn_and_split for the optimization did in ix86_expand_int_vcond. gcc/ChangeLog: PR target/115517 * config/i386/sse.md ("*ashr<mode>3_1"): New define_insn_and_split. (*avx512_ashr<mode>3_1): Ditto. (*avx2_lshr<mode>3_1): Ditto. (*avx2_lshr<mode>3_2): Ditto and add 2 combine splitter after it. * config/i386/mmx.md (mmxscalarsize): New mode attribute. (*mmw_ashr<mode>3_1): New define_insn_and_split. ("mmx_<insn><mode>3): Add a combine spiltter after it. (*mmx_ashrv2hi3_1): New define_insn_and_plit, also add a combine splitter after it. gcc/testsuite/ChangeLog: * gcc.target/i386/pr111023-2.c: Adjust testcase. * gcc.target/i386/vect-div-1.c: Ditto.
2024-07-01Adjust testcase for the regressed testcases after obsolete of vcond{,u,eq}.liuhongt9-9/+70
> Richard suggests that we implement the "obvious" transforms like > inversion in the middle-end but if for example unsigned compares > are not supported the us_minus + eq + negative trick isn't on > that list. > > The main reason to restrict vec_cmp would be to avoid > a <= b ? c : d going with an unsupported vec_cmp but instead > do a > b ? d : c - the alternative is trying to fix this > on the RTL side via combine. I understand the non-native Yes, I have a patch which can fix most regressions via pattern match in combine. Still there is a situation that is difficult to deal with, mainly the optimization w/o sse4.1 . Because pblendvb/blendvps/blendvpd only exists under sse4.1, w/o sse4.1, it takes 3 instructions (pand,pandn,por) to simulate the vcond_mask, and the combine matches up to 4 instructions, which makes it currently impossible to use the combine to recover those optimizations in the vcond{,u,eq}.i.e min/max. In the case of sse 4.1 and above, there is basically no regression anymore. the regression testcases w/o sse4.1 FAIL: g++.target/i386/pr100637-1b.C -std=gnu++14 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr100637-1b.C -std=gnu++17 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr100637-1b.C -std=gnu++20 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr100637-1b.C -std=gnu++98 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr100637-1w.C -std=gnu++14 scan-assembler-times pcmpeqw 2 FAIL: g++.target/i386/pr100637-1w.C -std=gnu++17 scan-assembler-times pcmpeqw 2 FAIL: g++.target/i386/pr100637-1w.C -std=gnu++20 scan-assembler-times pcmpeqw 2 FAIL: g++.target/i386/pr100637-1w.C -std=gnu++98 scan-assembler-times pcmpeqw 2 FAIL: g++.target/i386/pr103861-1.C -std=gnu++14 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr103861-1.C -std=gnu++17 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr103861-1.C -std=gnu++20 scan-assembler-times pcmpeqb 2 FAIL: g++.target/i386/pr103861-1.C -std=gnu++98 scan-assembler-times pcmpeqb 2 FAIL: gcc.target/i386/pr88540.c scan-assembler minpd gcc/testsuite/ChangeLog: PR target/115517 * g++.target/i386/pr100637-1b.C: Add xfail and -mno-sse4.1. * g++.target/i386/pr100637-1w.C: Ditto. * g++.target/i386/pr103861-1.C: Ditto. * gcc.target/i386/pr88540.c: Ditto. * gcc.target/i386/pr103941-2.c: Add -mno-avx512f. * g++.target/i386/sse4_1-pr100637-1b.C: New test. * g++.target/i386/sse4_1-pr100637-1w.C: New test. * g++.target/i386/sse4_1-pr103861-1.C: New test. * gcc.target/i386/sse4_1-pr88540.c: New test.
2024-07-01Add more splitter for mskmov with avx512 comparison.liuhongt1-23/+209
gcc/ChangeLog: PR target/115517 * config/i386/sse.md (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_lt_avx512): New define_insn_and_split. (*<sse>_movmsk<ssemodesuffix><avxsizesuffix>_<u>ext_lt_avx512): Ditto. (*<sse2_avx2>_pmovmskb_lt_avx512): Ditto. (*<sse2_avx2>_pmovmskb_zext_lt_avx512): Ditto. (*sse2_pmovmskb_ext_lt_avx512): Ditto. (*pmovsk_kmask_v16qi_avx512): Ditto. (*pmovsk_mask_v32qi_avx512): Ditto. (*pmovsk_mask_cmp_<mode>_avx512): Ditto. (*pmovsk_ptest_<mode>_avx512): Ditto.
2024-07-01Match IEEE min/max with UNSPEC_IEEE_{MIN,MAX}.liuhongt1-0/+63
These versions of the min/max patterns implement exactly the operations min = (op1 < op2 ? op1 : op2) max = (!(op1 < op2) ? op1 : op2) gcc/ChangeLog: PR target/115517 * config/i386/sse.md (*minmax<mode>3_1): New pre_reload define_insn_and_split. (*minmax<mode>3_2): Ditto.
2024-07-01Lower AVX512 kmask comparison back to AVX2 comparison when op_{true,false} ↵liuhongt1-0/+97
is vector -1/0. gcc/ChangeLog PR target/115517 * config/i386/sse.md (*<avx512>_cvtmask2<ssemodesuffix><mode>_not): New pre_reload splitter. (*<avx512>_cvtmask2<ssemodesuffix><mode>_not): Ditto. (*avx2_pcmp<mode>3_6): Ditto. (*avx2_pcmp<mode>3_7): Ditto.
2024-07-01Add more splitters to match (unspec [op1 op2 (gt op3 constm1_operand)] ↵liuhongt1-0/+130
UNSPEC_BLENDV) These define_insn_and_split are needed after vcond{,u,eq} is obsolete. gcc/ChangeLog: PR target/115517 * config/i386/sse.md (*<sse4_1>_blendv<ssemodesuffix><avxsizesuffix>_gt): New define_insn_and_split. (*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_gtint): Ditto. (*<sse4_1>_blendv<ssefltmodesuffix><avxsizesuffix>_not_gtint): Ditto. (*<sse4_1_avx2>_pblendvb_gt): Ditto. (*<sse4_1_avx2>_pblendvb_gt_subreg_not): Ditto.
2024-07-01Enable flate-combine.liuhongt15-24/+42
Move pass_stv2 and pass_rpad after pre_reload pass_late_combine, also define target_insn_cost to prevent post_reload pass_late_combine to revert the optimziation did in pass_rpad. Adjust testcases since pass_late_combine generates better code but break scan assembly. .i.e Under 32-bit target, gcc used to generate broadcast from stack and then do the real operation. After flate_combine, they're combined into embeded broadcast operations. gcc/ChangeLog: * config/i386/i386-features.cc (ix86_rpad_gate): New function. * config/i386/i386-options.cc (ix86_override_options_after_change): Don't disable flate_combine. * config/i386/i386-passes.def: Move pass_stv2 and pass_rpad after pre_reload pas_late_combine. * config/i386/i386-protos.h (ix86_rpad_gate): New declare. * config/i386/i386.cc (ix86_insn_cost): New function. (TARGET_INSN_COST): Define. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512f-broadcast-pr87767-1.c: Adjus testcase. * gcc.target/i386/avx512f-broadcast-pr87767-5.c: Ditto. * gcc.target/i386/avx512f-fmadd-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512f-fmsub-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512f-fnmadd-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512f-fnmsub-sf-zmm-7.c: Ditto. * gcc.target/i386/avx512vl-broadcast-pr87767-1.c: Ditto. * gcc.target/i386/avx512vl-broadcast-pr87767-5.c: Ditto. * gcc.target/i386/pr91333.c: Ditto. * gcc.target/i386/vect-strided-4.c: Ditto.
2024-07-01Extend lshifrtsi3_1_zext to ?k alternative.liuhongt2-6/+41
late_combine will combine lshift + zero into *lshifrtsi3_1_zext which cause extra mov between gpr and kmask, add ?k to the pattern. gcc/ChangeLog: PR target/115610 * config/i386/i386.md (<*insnsi3_zext): Add alternative ?k, enable it only for lshiftrt and under avx512bw. * config/i386/sse.md (*klshrsi3_1_zext): New define_insn, and add corresponding define_split after it.
2024-07-01Define mask as extern instead of uninitialized local variables.liuhongt6-10/+10
The testcases are supposed to scan for vpopcnt{b,w,d,q} operations with k mask, but mask is defined as uninitialized local variable which will be set as 0 at rtl expand phase. And it's further simplified off by late_combine which caused scan assembly failure. Move the definition of mask outside to make the testcases more stable. gcc/testsuite/ChangeLog: PR target/115610 * gcc.target/i386/avx512bitalg-vpopcntb.c: Define mask as extern instead of uninitialized local variables. * gcc.target/i386/avx512bitalg-vpopcntbvl.c: Ditto. * gcc.target/i386/avx512bitalg-vpopcntw.c: Ditto. * gcc.target/i386/avx512bitalg-vpopcntwvl.c: Ditto. * gcc.target/i386/avx512vpopcntdq-vpopcntd.c: Ditto. * gcc.target/i386/avx512vpopcntdq-vpopcntq.c: Ditto.
2024-07-01Daily bump.GCC Administrator3-1/+49
2024-06-30hppa: Fix ICE caused by mismatched predicate and constraint in xmpyu patternsJohn David Anglin1-18/+0
2024-06-30 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: PR target/115691 * config/pa/pa.md: Remove incorrect xmpyu patterns.
2024-06-30tree-optimization/115701 - fix maybe_duplicate_ssa_info_at_copyRichard Biener2-14/+30
The following restricts copying of points-to info from defs that might be in regions invoking UB and are never executed. PR tree-optimization/115701 * tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy): Only copy info from within the same BB. * gcc.dg/torture/pr115701.c: New testcase.
2024-06-30tree-optimization/115701 - factor out maybe_duplicate_ssa_info_at_copyRichard Biener4-50/+34
The following factors out the code that preserves SSA info of the LHS of a SSA copy LHS = RHS when LHS is about to be eliminated to RHS. PR tree-optimization/115701 * tree-ssanames.h (maybe_duplicate_ssa_info_at_copy): Declare. * tree-ssanames.cc (maybe_duplicate_ssa_info_at_copy): New function, split out from ... * tree-ssa-copy.cc (fini_copy_prop): ... here. * tree-ssa-sccvn.cc (eliminate_dom_walker::eliminate_stmt): ... and here.
2024-06-30Harden SLP reduction support wrt STMT_VINFO_REDUC_IDXRichard Biener1-2/+21
The following makes sure that for a SLP reductions all lanes have the same STMT_VINFO_REDUC_IDX. Once we move that info and can adjust it we can implement swapping. It also makes the existing protection against operand swapping trigger for all stmts participating in a reduction, not just the final one marked as reduction-def. * tree-vect-slp.cc (vect_build_slp_tree_1): Compare STMT_VINFO_REDUC_IDX. (vect_build_slp_tree_2): Prevent operand swapping for all stmts participating in a reduction.
2024-06-30vect: Determine input vectype for multiple lane-reducing operationsFeng Xue1-23/+56
The input vectype of reduction PHI statement must be determined before vect cost computation for the reduction. Since lance-reducing operation has different input vectype from normal one, so we need to traverse all reduction statements to find out the input vectype with the least lanes, and set that to the PHI statement. 2024-06-16 Feng Xue <fxue@os.amperecomputing.com> gcc/ * tree-vect-loop.cc (vectorizable_reduction): Determine input vectype during traversal of reduction statements.
2024-06-30vect: Fix shift-by-induction for single-lane slpFeng Xue3-1/+122
Allow shift-by-induction for slp node, when it is single lane, which is aligned with the original loop-based handling. 2024-06-26 Feng Xue <fxue@os.amperecomputing.com> gcc/ * tree-vect-stmts.cc (vectorizable_shift): Allow shift-by-induction for single-lane slp node. gcc/testsuite/ * gcc.dg/vect/vect-shift-6.c * gcc.dg/vect/vect-shift-7.c
2024-06-30Daily bump.GCC Administrator5-1/+47
2024-06-29[PR115565] cse: Don't use a valid regno for non-register in comparison_qtyMaciej W. Rozycki1-2/+2
Use INT_MIN rather than -1 in `comparison_qty' where a comparison is not with a register, because the value of -1 is actually a valid reference to register 0 in the case where it has not been assigned a quantity. Using -1 makes `REG_QTY (REGNO (folded_arg1)) == ent->comparison_qty' comparison in `fold_rtx' to incorrectly trigger in rare circumstances and return true for a memory reference, making CSE consider a comparison operation to evaluate to a constant expression and consequently make the resulting code incorrectly execute or fail to execute conditional blocks. This has caused a miscompilation of rwlock.c from LinuxThreads for the `alpha-linux-gnu' target, where `rwlock->__rw_writer != thread_self ()' expression (where `thread_self' returns the thread pointer via a PALcode call) has been decided to be always true (with `ent->comparison_qty' using -1 for a reference to to `rwlock->__rw_writer', while register 0 holding the thread pointer retrieved by `thread_self') and code for the false case has been optimized away where it mustn't have, causing program lockups. The issue has been observed as a regression from commit 08a692679fb8 ("Undefined cse.c behaviour causes 3.4 regression on HPUX"), <https://gcc.gnu.org/ml/gcc-patches/2004-10/msg02027.html>, and up to commit 932ad4d9b550 ("Make CSE path following use the CFG"), <https://gcc.gnu.org/ml/gcc-patches/2006-12/msg00431.html>, where CSE has been restructured sufficiently for the issue not to trigger with the original reproducer anymore. However the original bug remains and can trigger, because `comparison_qty' will still be assigned -1 for a memory reference and the `reg_qty' member of a `cse_reg_info_table' entry will still be assigned -1 for register 0 where the entry has not been assigned a quantity, e.g. at initialization. Use INT_MIN then as noted above, so that the value remains negative, for consistency with the REGNO_QTY_VALID_P macro (even though not used on `comparison_qty'), and then so that it should not ever match a valid negated register number, fixing the regression with commit 08a692679fb8. gcc/ PR rtl-optimization/115565 * cse.cc (record_jump_cond): Use INT_MIN rather than -1 for `comparison_qty' if !REG_P.
2024-06-29[to-be-committed,RISC-V,V4] movmem for RISCV with V extensionSergei Lewis2-0/+82
I hadn't updated my repo on the host where I handle email, so it picked up the older version of this patch without the testsuite fix. So, V4 with the testsuite option for lmul fixed. -- And Sergei's movmem patch. Just trivial testsuite adjustment for an option name change and a whitespace fix from me. I've spun this in my tester for rv32 and rv64. I'll wait for pre-commit CI before taking further action. Just a reminder, this patch is designed to handle the case where we can issue a single vector load/store which avoids all the complexities of determining which direction to copy. -- gcc/ChangeLog * config/riscv/riscv.md (movmem<mode>): New expander. gcc/testsuite/ChangeLog PR target/112109 * gcc.target/riscv/rvv/base/movmem-1.c: New test
2024-06-29Fortran: fix ALLOCATE with SOURCE of deferred character length [PR114019]Harald Anlauf2-1/+73
gcc/fortran/ChangeLog: PR fortran/114019 * trans-stmt.cc (gfc_trans_allocate): Fix handling of case of scalar character expression being used for SOURCE. gcc/testsuite/ChangeLog: PR fortran/114019 * gfortran.dg/allocate_with_source_33.f90: New test.
2024-06-29Match: Support imm form for unsigned scalar .SAT_ADDPan Li2-0/+26
This patch would like to support the form of unsigned scalar .SAT_ADD when one of the op is IMM. For example as below: Form IMM: #define DEF_SAT_U_ADD_IMM_FMT_1(T) \ T __attribute__((noinline)) \ sat_u_add_imm_##T##_fmt_1 (T x) \ { \ return (T)(x + 9) >= x ? (x + 9) : -1; \ } DEF_SAT_U_ADD_IMM_FMT_1(uint64_t) Before this patch: __attribute__((noinline)) uint64_t sat_u_add_imm_uint64_t_fmt_1 (uint64_t x) { long unsigned int _1; uint64_t _3; ;; basic block 2, loop depth 0 ;; pred: ENTRY _1 = MIN_EXPR <x_2(D), 18446744073709551606>; _3 = _1 + 9; return _3; ;; succ: EXIT } After this patch: __attribute__((noinline)) uint64_t sat_u_add_imm_uint64_t_fmt_1 (uint64_t x) { uint64_t _3; ;; basic block 2, loop depth 0 ;; pred: ENTRY _3 = .SAT_ADD (x_2(D), 9); [tail call] return _3; ;; succ: EXIT } The below test suites are passed for this patch: 1. The rv64gcv fully regression test with newlib. 2. The x86 bootstrap test. 3. The x86 fully regression test. gcc/ChangeLog: * match.pd: Add imm form for .SAT_ADD matching. * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): Add .SAT_ADD matching under PLUS_EXPR. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-06-29jit: Fix Darwin bootstrap after r15-1699.Iain Sandoe1-2/+2
r15-1699-g445c62ee492 contains changes that trigger two maybe-uninitialized warnings on Darwin, which result in a bootstrap failure. Note that the warnings are false positives, in fact the variables should be initialized in the cases of a switch (all values of the switch condition are covered). Fixed here by providing default initializations for the relevant variables. gcc/jit/ChangeLog: * jit-recording.cc (recording::memento_of_typeinfo::make_debug_string): Default the value of ident. (recording::memento_of_typeinfo::write_reproducer): Default the value of type. Signed-off-by: Iain Sandoe <iains@gcc.gnu.org>
2024-06-28[committed] Fix mcore-elf regression after recent IRA changeJeff Law1-6/+10
So the recent IRA change exposed a bug in the mcore backend. The mcore has a special instruction (xtrb3) which can zero extend a GPR into R1. It's useful because zextb requires a matching source/destination. Unfortunately xtrb3 modifies CC. The IRA changes twiddle register allocation such that we want to use xtrb3. Unfortunately CC is live at the point where we want to use xtrb3 and clobbering CC causes the test to fail. Exposing the clobber in the expander and insn seems like the best path forward. We could also drop the xtrb3 alternative, but that seems like it would hurt codegen more than exposing the clobber. The bitfield extraction patterns using xtrb look problematic as well, but I didn't try to fix those. This fixes the builtn-arith-overflow regressions and appears to fix 20010122-1.c as a side effect. gcc/ * config/mcore/mcore.md (zero_extendqihi2): Clobber CC in expander and matching insn. (zero_extendqisi2): Likewise.
2024-06-29Daily bump.GCC Administrator5-1/+139
2024-06-28c++: bad 'this' conversion for nullary memfn [PR106760]Patrick Palka2-1/+15
Here we notice the 'this' conversion for the call f<void>() is bad, so we correctly defer deduction for the template candidate, but we end up never adding it to 'bad_cands' since missing_conversion_p for it returns false (its only argument is 'this' which has already been determined to be bad). This is not a huge deal, but it causes us to longer accept the call with -fpermissive in release builds, and a tree check ICE in checking builds. So if we have a non-strictly viable template candidate that has not been instantiated, then we need to add it to 'bad_cands' even if no argument conversion is missing. PR c++/106760 gcc/cp/ChangeLog: * call.cc (add_candidates): Relax test for adding a candidate to 'bad_cands' to also accept an uninstantiated template candidate that has no missing conversions. gcc/testsuite/ChangeLog: * g++.dg/ext/conv3.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2024-06-28ssa_lazy_cache takes an optional bitmap_obstack pointer.Andrew MacLeod4-7/+33
Allow ssa_lazy cache to allocate bitmaps from a client provided obstack if so desired. * gimple-range-cache.cc (ssa_lazy_cache::ssa_lazy_cache): Relocate here. Check for provided obstack. (ssa_lazy_cache::~ssa_lazy_cache): Relocate here. Free bitmap or obstack. * gimple-range-cache.h (ssa_lazy_cache::ssa_lazy_cache): Move. (ssa_lazy_cache::~ssa_lazy_cache): Move. (ssa_lazy_cache::m_ob): New. * gimple-range.cc (dom_ranger::dom_ranger): Iniitialize obstack. (dom_ranger::~dom_ranger): Release obstack. (dom_ranger::pre_bb): Create ssa_lazy_cache using obstack. * gimple-range.h (m_bitmaps): New.
2024-06-28i386: Cleanup tmp variable usage in ix86_expand_moveUros Bizjak1-12/+10
Remove extra assignment, extra temp variable and variable shadowing. No functional changes intended. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_move): Remove extra assignment to tmp variable, reuse tmp variable instead of declaring new temporary variable and remove tmp variable shadowing.
2024-06-28Use move-aware auto_vec in mapJørgen Kvalsvik1-1/+1
Using auto_vec rather than vec for means the vectors are release automatically upon return, to stop the leak. The problem seems is that auto_vec<T, N> is not really move-aware, only the <T, 0> specialization is. gcc/ChangeLog: * tree-profile.cc (find_conditions): Use auto_vec without embedded storage.
2024-06-28tree-optimization/115652 - more fixing of the fixRichard Biener1-2/+9
The following addresses the corner case of an outer loop with an empty header where we end up asking for the BB of a NULL stmt by special-casing this case. PR tree-optimization/115652 * tree-vect-slp.cc (vect_schedule_slp_node): Handle the case where the outer loop header block is empty.
2024-06-28i386: Fix regression after refactoring legitimize_pe_coff_symbol, ↵Evgeny Karpov8-10/+23
ix86_GOT_alias_set and PE_COFF_LEGITIMIZE_EXTERN_DECL [PR115635] This patch fixes 3 bugs reported after merging the "Add DLL import/export implementation to AArch64" series. https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653955.html The series refactors the i386 codebase to reuse it in AArch64, which triggers some bugs. Bug 115661 - [15 Regression] wrong code at -O{2,3} on x86_64-linux-gnu since r15-1599-g63512c72df09b4 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115661 Bug 115635 - [15 regression] Bootstrap fails with failed self-test with the rust fe (diagnostic-path.cc:1153: test_empty_path: FAIL: ASSERT_FALSE ((path.interprocedural_p ()))) since r15-1599-g63512c72df09b4 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115635 Issue 1. In some code, i386 has been relying on the legitimize_pe_coff_symbol call on all platforms and should return NULL_RTX if it is not supported. Fix: NULL_RTX handling has been added when the target does not support PECOFF. Issue 2. ix86_GOT_alias_set is used on all platforms and cannot be extracted to mingw. Fix: ix86_GOT_alias_set has been returned as it was and is used on all platforms for i386. Bug 115643 - [15 regression] aarch64-w64-mingw32 support today breaks x86_64-w64-mingw32 build cannot represent relocation type BFD_RELOC_64 since r15-1602-ged20feebd9ea31 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115643 Issue 3. PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED has been added and used with a negative operator for a complex expression without braces. Fix: Braces has been added, and PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED has been renamed to PE_COFF_LEGITIMIZE_EXTERN_DECL. 2024-06-28 Evgeny Karpov <Evgeny.Karpov@microsoft.com> gcc/ChangeLog: PR bootstrap/115635 PR target/115643 PR target/115661 * config/aarch64/cygming.h (PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED): Rename to PE_COFF_LEGITIMIZE_EXTERN_DECL. (PE_COFF_LEGITIMIZE_EXTERN_DECL): Likewise. * config/i386/cygming.h (GOT_ALIAS_SET): Remove the diffinition to reuse it from i386.h. (PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED): Rename to PE_COFF_LEGITIMIZE_EXTERN_DECL. (PE_COFF_LEGITIMIZE_EXTERN_DECL): Likewise. * config/i386/i386-expand.cc (ix86_expand_move): Return ix86_GOT_alias_set. * config/i386/i386-expand.h (ix86_GOT_alias_set): Likewise. * config/i386/i386.cc (ix86_GOT_alias_set): Likewise. * config/i386/i386.h (GOT_ALIAS_SET): Likewise. * config/mingw/winnt-dll.cc (get_dllimport_decl): Use GOT_ALIAS_SET. (legitimize_pe_coff_symbol): Rename to PE_COFF_LEGITIMIZE_EXTERN_DECL. * config/mingw/winnt-dll.h (ix86_GOT_alias_set): Declare ix86_GOT_alias_set.
2024-06-28Remove unused hybrid_* operators in range-ops.Aldy Hernandez1-156/+0
gcc/ChangeLog: * range-op-ptr.cc (class hybrid_and_operator): Remove. (class hybrid_or_operator): Same. (class hybrid_min_operator): Same. (class hybrid_max_operator): Same.
2024-06-28tree-optimization/115640 - outer loop vect with inner SLP permuteRichard Biener1-3/+8
The following fixes wrong-code when using outer loop vectorization and an inner loop SLP access with permutation. A wrong adjustment to the IV increment is then applied on GCN. PR tree-optimization/115640 * tree-vect-stmts.cc (vectorizable_load): With an inner loop SLP access to not apply a gap adjustment.
2024-06-28amdgcn: Fix RDNA V32 permutations [PR115640]Andrew Stubbs1-1/+1
There was an off-by-one error in the RDNA validation check, plus I forgot to allow for two-to-one permute-and-merge operations. PR target/115640 gcc/ChangeLog: * config/gcn/gcn.cc (gcn_vectorize_vec_perm_const): Modify RDNA checks.
2024-06-28Add gfc_class_set_vptr.Andre Vehreschild4-148/+111
First step to adding a general assign all class type's data members routine. Having a general routine prevents forgetting to tackle the edge cases, e.g. setting _len. gcc/fortran/ChangeLog: * trans-expr.cc (gfc_class_set_vptr): Add setting of _vptr member. * trans-intrinsic.cc (conv_intrinsic_move_alloc): First use of gfc_class_set_vptr and refactor very similar code. * trans.h (gfc_class_set_vptr): Declare the new function. gcc/testsuite/ChangeLog: * gfortran.dg/unlimited_polymorphic_11.f90: Remove unnecessary casts in gd-final expression.
2024-06-28Use gfc_reset_vptr more consistently.Andre Vehreschild4-82/+38
The vptr for a class type is set in various ways in different locations. Refactor the use and simplify code. gcc/fortran/ChangeLog: * trans-array.cc (structure_alloc_comps): Use reset_vptr. * trans-decl.cc (gfc_trans_deferred_vars): Same. (gfc_generate_function_code): Same. * trans-expr.cc (gfc_reset_vptr): Allow supplying the class type. (gfc_conv_procedure_call): Use reset_vptr. * trans-intrinsic.cc (gfc_conv_intrinsic_transfer): Same.
2024-06-28i386: Handle sign_extend like zero_extend in *concatditi3_[346]Roger Sayle2-3/+13
This patch generalizes some of the patterns in i386.md that recognize double word concatenation, so they handle sign_extend the same way that they handle zero_extend in appropriate contexts. As a motivating example consider the following function: __int128 foo(long long x, unsigned long long y) { return ((__int128)x<<64) | y; } when compiled with -O2, x86_64 currently generates: foo: movq %rdi, %rdx xorl %eax, %eax xorl %edi, %edi orq %rsi, %rax orq %rdi, %rdx ret with this patch we now generate (the same as if x is unsigned): foo: movq %rsi, %rax movq %rdi, %rdx ret Treating both extensions the same way using any_extend is valid as the top (extended) bits are "unused" after the shift by 64 (or more). In theory, the RTL optimizers might consider canonicalizing the form of extension used in these cases, but zero_extend is faster on some machine, whereas sign extension is supported via addressing modes on others, so handling both in the machine description is probably best. 2024-06-28 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.md (*concat<mode><dwi>3_3): Change zero_extend to any_extend in first operand to left shift by mode precision. (*concat<mode><dwi>3_4): Likewise. (*concat<mode><dwi>3_6): Likewise. gcc/testsuite/ChangeLog * gcc.target/i386/concatditi-1.c: New test case.
2024-06-28i386: Some additional AVX512 ternlog refinements.Roger Sayle8-10/+39
This patch is another round of refinements to fine tune the new ternlog infrastructure in i386's sse.md. This patch tweaks ix86_ternlog_idx to allow multiple MEM/CONST_VECTOR/VEC_DUPLICATE operands prior to splitting (before reload), when force_register is called on all but one of these operands. Conceptually during the dynamic programming, registers fill the args slots in the order 0, 1, 2, and mem-like operands fill the slots in the order 2, 0, 1 [preferring the memory operand to come last]. This patch allows us to remove some of the legacy ternlog patterns in sse.md without regressions [which is left to the next and final patch in this series]. An indication that these patterns are no longer required is shown by the necessary testsuite tweaks below, where the output assembler for the legacy instructions used hexadecimal, but with the new ternlog infrastructure now consistently use decimal. 2024-06-28 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-expand.cc (ix86_ternlog_idx) <case VEC_DUPLICATE>: Add a "goto do_mem_operand" as this need not match memory_operand. <case CONST_VECTOR>: Only args[2] may be volatile memory operand. Allow MEM/VEC_DUPLICATE/CONST_VECTOR as args[0] and args[1]. gcc/testsuite/ChangeLog * gcc.target/i386/avx512f-andn-di-zmm-2.c: Match decimal instead of hexadecimal immediate operand to ternlog. * gcc.target/i386/avx512f-andn-si-zmm-2.c: Likewise. * gcc.target/i386/avx512f-orn-si-zmm-1.c: Likewise. * gcc.target/i386/avx512f-orn-si-zmm-2.c: Likewise. * gcc.target/i386/pr100711-3.c: Likewise. * gcc.target/i386/pr100711-4.c: Likewise. * gcc.target/i386/pr100711-5.c: Likewise.
2024-06-28Daily bump.GCC Administrator6-1/+352
2024-06-27libgccjit: Add ability to get the alignment of a typeAntoni Boucher11-15/+221
gcc/jit/ChangeLog: * docs/topics/compatibility.rst (LIBGCCJIT_ABI_28): New ABI tag. * docs/topics/expressions.rst: Document gcc_jit_context_new_alignof. * jit-playback.cc (new_alignof): New method. * jit-playback.h: New method. * jit-recording.cc (recording::context::new_alignof): New method. (recording::memento_of_sizeof::replay_into, recording::memento_of_typeinfo::replay_into, recording::memento_of_sizeof::make_debug_string, recording::memento_of_typeinfo::make_debug_string, recording::memento_of_sizeof::write_reproducer, recording::memento_of_typeinfo::write_reproducer): Rename. * jit-recording.h (enum type_info_type): New enum. (class memento_of_sizeof class memento_of_typeinfo): Rename. * libgccjit.cc (gcc_jit_context_new_alignof): New function. * libgccjit.h (gcc_jit_context_new_alignof): New function. * libgccjit.map: New function. gcc/testsuite/ChangeLog: * jit.dg/all-non-failing-tests.h: New test. * jit.dg/test-alignof.c: New test.
2024-06-27c: Error message for incorrect use of static in array declarations.Martin Uecker2-33/+44
Add an explicit error messages when c99's static is used without a size expression in an array declarator. gcc/c: * c-parser.cc (c_parser_direct_declarator_inner): Add error message. gcc/testsuite: * gcc.dg/c99-arraydecl-4.c: New test.
2024-06-27Disable late-combine for -O0 [PR115677]Richard Sandiford1-1/+7
late-combine relies on df, which for -O0 is only initialised late (pass_df_initialize_no_opt, after split1). Other df-based passes cope with this by requiring optimize > 0, so this patch does the same for late-combine. gcc/ PR rtl-optimization/115677 * late-combine.cc (pass_late_combine::gate): New function.
2024-06-27s390: Check for ADDR_REGS in s390_decompose_addrstyle_without_indexStefan Schulze Frielinghaus1-1/+3
An explicit check for address registers was not required so far since during register allocation the processing of address constraints was sufficient. However, address constraints themself do not check for REGNO_OK_FOR_{BASE,INDEX}_P. Thus, with the newly introduced late-combine pass in r15-1579-g792f97b44ffc5e we generate new insns with invalid address registers which aren't fixed up afterwards. Fixed by explicitly checking for address registers in s390_decompose_addrstyle_without_index such that those new insns are rejected. gcc/ChangeLog: PR target/115634 * config/s390/s390.cc (s390_decompose_addrstyle_without_index): Check for ADDR_REGS in s390_decompose_addrstyle_without_index.
2024-06-27tree-optimization/115669 - fix SLP reduction associationRichard Biener2-0/+25
The following avoids associating a reduction path as that might get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order. This is a latent issue with SLP reductions but now easily exposed as we're doing single-lane SLP reductions. When we achieved SLP only we can move and update this meta-data. PR tree-optimization/115669 * tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate chains that participate in a reduction. * gcc.dg/vect/pr115669.c: New testcase.