aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-10-19rtl-ssa: Support inferring uses of mem in change_insnsAlex Coplan2-4/+29
Currently, rtl_ssa::change_insns requires all new uses and defs to be specified explicitly. This turns out to be rather inconvenient for forming load pairs in the new aarch64 load pair pass, as the pass has to determine which mem def the final load pair consumes, and then obtain or create a suitable use (i.e. significant bookkeeping, just to keep the RTL-SSA IR consistent). It turns out to be much more convenient to allow change_insns to infer which def is consumed and create a suitable use of mem itself. This patch does that. gcc/ChangeLog: * rtl-ssa/changes.cc (function_info::finalize_new_accesses): Add new parameter to give final insn position, infer use of mem if it isn't specified explicitly. (function_info::change_insns): Pass down final insn position to finalize_new_accesses. * rtl-ssa/functions.h: Add parameter to finalize_new_accesses.
2023-10-19rtl-ssa: Add entry point to allow re-parenting usesAlex Coplan2-0/+11
This is needed by the upcoming aarch64 load pair pass, as it can re-order stores (when alias analysis determines this is safe) and thus change which mem def a given use consumes (in the RTL-SSA view, there is no alias disambiguation of memory). gcc/ChangeLog: * rtl-ssa/accesses.cc (function_info::reparent_use): New. * rtl-ssa/functions.h (function_info): Declare new member function reparent_use.
2023-10-19rtl-ssa: Add drop_memory_access helperAlex Coplan1-0/+13
Add a helper routine to access-utils.h which removes the memory access from an access_array, if it has one. gcc/ChangeLog: * rtl-ssa/access-utils.h (drop_memory_access): New.
2023-10-19rtl-ssa: Fix bug in function_info::add_insn_afterAlex Coplan1-3/+11
In the case that !insn->is_debug_insn () && next->is_debug_insn (), this function was missing an update of the prev pointer on the first nondebug insn following the sequence of debug insns starting at next. This can lead to corruption of the insn chain, in that we end up with: insn->next_any_insn ()->prev_any_insn () != insn in this case. This patch fixes that. gcc/ChangeLog: * rtl-ssa/insns.cc (function_info::add_insn_after): Ensure we update the prev pointer on the following nondebug insn in the case that !insn->is_debug_insn () && next->is_debug_insn ().
2023-10-19x86: Correct ISA enabled for clients since Arrow LakeHaochen Jiang5-46/+49
gcc/ChangeLog: * config/i386/i386.h: Correct the ISA enabled for Arrow Lake. Also make Clearwater Forest depends on Sierra Forest. * config/i386/i386-options.cc: Revise the order of the macro definition to avoid confusion. * doc/extend.texi: Revise documentation. * doc/invoke.texi: Correct documentation. gcc/testsuite/ChangeLog: * gcc.target/i386/funcspec-56.inc: Group Clearwater Forest with atom cores.
2023-10-19amdgcn: deprecate Fiji device and multilibAndrew Stubbs4-6/+39
LLVM wants to remove it, which breaks our build. This patch means that most users won't notice that change, when it comes, and those that do will have chosen to enable Fiji explicitly. I'm selecting gfx900 as the new default as that's the least likely for users to want, which means most users will specify -march explicitly, which means we'll be free to change the default again, when we need to, without breaking anybody's makefiles. gcc/ChangeLog: * config.gcc (amdgcn): Switch default to --with-arch=gfx900. Implement support for --with-multilib-list. * config/gcn/t-gcn-hsa: Likewise. * doc/install.texi: Likewise. * doc/invoke.texi: Mark Fiji deprecated.
2023-10-19LoongArch:Implement the new vector cost model framework.Jiahao Xu4-22/+188
This patch make loongarch use the new vector hooks and implements the costing function determine_suggested_unroll_factor, to make it be able to suggest the unroll factor for a given loop being vectorized base vec_ops analysis during vector costing and the available issue information. Referring to aarch64 and rs6000 port. The patch also reduces the cost of unaligned stores, making it equal to the cost of aligned ones in order to avoid odd alignment peeling. gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_vector_costs): Inherit from vector_costs. Add a constructor. (loongarch_vector_costs::add_stmt_cost): Use adjust_cost_for_freq to adjust the cost for inner loops. (loongarch_vector_costs::count_operations): New function. (loongarch_vector_costs::determine_suggested_unroll_factor): Ditto. (loongarch_vector_costs::finish_cost): Ditto. (loongarch_builtin_vectorization_cost): Adjust. * config/loongarch/loongarch.opt (loongarch-vect-unroll-limit): New parameter. (loongarcg-vect-issue-info): Ditto. (mmemvec-cost): Delete. * config/loongarch/genopts/loongarch.opt.in (loongarch-vect-unroll-limit): Ditto. (loongarcg-vect-issue-info): Ditto. (mmemvec-cost): Delete. * doc/invoke.texi (loongarcg-vect-unroll-limit): Document new option.
2023-10-19LoongArch:Implement vec_widen standard names.Jiahao Xu7-17/+277
Add support for vec_widen lo/hi patterns. These do not directly match on Loongarch lasx instructions but can be emulated with even/odd + vector merge. gcc/ChangeLog: * config/loongarch/lasx.md (vec_widen_<su>mult_even_v8si): New patterns. (vec_widen_<su>add_hi_<mode>): Ditto. (vec_widen_<su>add_lo_<mode>): Ditto. (vec_widen_<su>sub_hi_<mode>): Ditto. (vec_widen_<su>sub_lo_<mode>): Ditto. (vec_widen_<su>mult_hi_<mode>): Ditto. (vec_widen_<su>mult_lo_<mode>): Ditto. * config/loongarch/loongarch.md (u_bool): New iterator. * config/loongarch/loongarch-protos.h (loongarch_expand_vec_widen_hilo): New prototype. * config/loongarch/loongarch.cc (loongarch_expand_vec_interleave): New function. (loongarch_expand_vec_widen_hilo): New function. gcc/testsuite/ChangeLog: * gcc.target/loongarch/vect-widen-add.c: New test. * gcc.target/loongarch/vect-widen-mul.c: New test. * gcc.target/loongarch/vect-widen-sub.c: New test.
2023-10-19LoongArch:Implement avg and sad standard names.Jiahao Xu8-0/+284
gcc/ChangeLog: * config/loongarch/lasx.md (avg<mode>3_ceil): New patterns. (uavg<mode>3_ceil): Ditto. (avg<mode>3_floor): Ditto. (uavg<mode>3_floor): Ditto. (usadv32qi): Ditto. (ssadv32qi): Ditto. * config/loongarch/lsx.md (avg<mode>3_ceil): New patterns. (uavg<mode>3_ceil): Ditto. (avg<mode>3_floor): Ditto. (uavg<mode>3_floor): Ditto. (usadv16qi): Ditto. (ssadv16qi): Ditto. gcc/testsuite/ChangeLog: * gcc.target/loongarch/avg-ceil-lasx.c: New test. * gcc.target/loongarch/avg-ceil-lsx.c: New test. * gcc.target/loongarch/avg-floor-lasx.c: New test. * gcc.target/loongarch/avg-floor-lsx.c: New test. * gcc.target/loongarch/sad-lasx.c: New test. * gcc.target/loongarch/sad-lsx.c: New test.
2023-10-19Daily bump.GCC Administrator6-1/+351
2023-10-18Fix expansion of `(a & 2) != 1`Andrew Pinski2-4/+21
I had a thinko in r14-1600-ge60593f3881c72a96a3fa4844d73e8a2cd14f670 where we would remove the `& CST` part if we ended up not calling expand_single_bit_test. This fixes the problem by introducing a new variable that will be used for calling expand_single_bit_test. As afar as I know this can only show up when disabling optimization passes as this above form would have been optimized away. Committed as obvious after a bootstrap/test on x86_64-linux-gnu. PR middle-end/111863 gcc/ChangeLog: * expr.cc (do_store_flag): Don't over write arg0 when stripping off `& POW2`. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr111863-1.c: New test.
2023-10-18[c] Fix PR 101364: ICE after error due to diagnose_arglist_conflict not ↵Andrew Pinski2-1/+10
checking for error When checking to see if we have a function declaration has a conflict due to promotations, there is no test to see if the type was an error mark and then calls c_type_promotes_to. c_type_promotes_to is not ready for error_mark and causes an ICE. This adds a check for error before the call of c_type_promotes_to. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR c/101364 gcc/c/ChangeLog: * c-decl.cc (diagnose_arglist_conflict): Test for error mark before calling of c_type_promotes_to. gcc/testsuite/ChangeLog: * gcc.dg/pr101364-1.c: New test.
2023-10-18Fix ICE due to c_safe_arg_type_equiv_p not checking for error_mark nodeAndrew Pinski2-0/+13
This is a simple error recovery issue when c_safe_arg_type_equiv_p was added in r8-5312-gc65e18d3331aa999. The issue is that after an error, an argument type (of a function type) might turn into an error mark node and c_safe_arg_type_equiv_p was not ready for that. So this just adds a check for error operand for its arguments before getting the main variant. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR c/101285 gcc/c/ChangeLog: * c-typeck.cc (c_safe_arg_type_equiv_p): Return true for error operands early. gcc/testsuite/ChangeLog: * gcc.dg/pr101285-1.c: New test.
2023-10-19PR111648: Fix wrong code-gen due to incorrect VEC_PERM_EXPR folding.Prathamesh Kulkarni1-11/+103
gcc/ChangeLog: PR tree-optimization/111648 * fold-const.cc (valid_mask_for_fold_vec_perm_cst_p): If a1 chooses base element from arg, ensure that it's a natural stepped sequence. (build_vec_cst_rand): New param natural_stepped and use it to construct a naturally stepped sequence. (test_nunits_min_2): Add new unit tests Case 6 and Case 7.
2023-10-18pru: Implement TARGET_INSN_COSTDimitar Dimitrov1-0/+36
This patch slightly improves the embench-iot benchmark score for PRU code size. There is also small improvement in a few real-world firmware programs. Embench-iot size ------------------------------------------ Benchmark before after delta --------- ---- ---- ----- aha-mont64 4.15 4.15 0 crc32 6.04 6.04 0 cubic 21.64 21.62 -0.02 edn 6.37 6.37 0 huffbench 18.63 18.55 -0.08 matmult-int 5.44 5.44 0 md5sum 25.56 25.43 -0.13 minver 12.82 12.76 -0.06 nbody 15.09 14.97 -0.12 nettle-aes 4.75 4.75 0 nettle-sha256 4.67 4.67 0 nsichneu 3.77 3.77 0 picojpeg 4.11 4.11 0 primecount 7.90 7.90 0 qrduino 7.18 7.16 -0.02 sglib-combined 13.63 13.59 -0.04 slre 5.19 5.19 0 st 14.23 14.12 -0.11 statemate 2.34 2.34 0 tarfind 36.85 36.64 -0.21 ud 10.51 10.46 -0.05 wikisort 7.44 7.41 -0.03 --------- ----- ----- Geometric mean 8.42 8.40 -0.02 Geometric SD 2.00 2.00 0 Geometric range 12.68 12.62 -0.06 gcc/ChangeLog: * config/pru/pru.cc (pru_insn_cost): New function. (TARGET_INSN_COST): Define for PRU. Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
2023-10-18aarch64: Replace duplicated selftestsAndrew Carlotti1-12/+12
gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_test_fractional_cost): Test <= instead of testing < twice.
2023-10-18cse: Workaround GCC < 5 bug in cse_insn [PR111852]Jakub Jelinek1-0/+7
Before the r5-3834 commit for PR63362, GCC 4.8-4.9 refuses to compile cse.cc which contains a variable with rtx_def type, because rtx_def contains a union with poly_uint16 element. poly_int template has defaulted default constructor and a variadic template constructor which could have empty parameter pack. GCC < 5 treated it as non-trivially constructible class and deleted rtunion and rtx_def default constructors. For the cse_insn purposes, all we need is a variable with size and alignment of rtx_def, not necessarily rtx_def itself, which we then memset to 0 and fill in like rtx is normally allocated from heap, so this patch for GCC_VERSION < 5000 uses an unsigned char array of the right size/alignment. 2023-10-18 Jakub Jelinek <jakub@redhat.com> PR bootstrap/111852 * cse.cc (cse_insn): Add workaround for GCC 4.8-4.9, instead of using rtx_def type for memory_extend_buf, use unsigned char arrayy with size of rtx_def and its alignment.
2023-10-18diagnostic: add permerror variants with optJason Merrill16-40/+117
In the discussion of promoting some pedwarns to be errors by default, rather than move them all into -fpermissive it seems to me to make sense to support DK_PERMERROR with an option flag. This way will also work with -fpermissive, but users can also still use -Wno-error=narrowing to downgrade that specific diagnostic rather than everything affected by -fpermissive. So, for diagnostics that we want to make errors by default we can just change the pedwarn call to permerror. The tests check desired behavior for such a permerror in a system header with various flags. The patch preserves the existing permerror behavior of ignoring -w and system headers by default, but respecting them when downgraded to a warning by -fpermissive. This seems similar to but a bit better than the approach of forcing -pedantic-errors that I previously used for -Wnarrowing: specifically, in that now -w by itself is not enough to silence the -Wnarrowing error (integer-pack2.C). gcc/ChangeLog: * doc/invoke.texi: Move -fpermissive to Warning Options. * diagnostic.cc (update_effective_level_from_pragmas): Remove redundant system header check. (diagnostic_report_diagnostic): Move down syshdr/-w check. (diagnostic_impl): Handle DK_PERMERROR with an option number. (permerror): Add new overloads. * diagnostic-core.h (permerror): Declare them. gcc/cp/ChangeLog: * typeck2.cc (check_narrowing): Use permerror. gcc/testsuite/ChangeLog: * g++.dg/ext/integer-pack2.C: Add -fpermissive. * g++.dg/diagnostic/sys-narrow.h: New test. * g++.dg/diagnostic/sys-narrow1.C: New test. * g++.dg/diagnostic/sys-narrow1a.C: New test. * g++.dg/diagnostic/sys-narrow1b.C: New test. * g++.dg/diagnostic/sys-narrow1c.C: New test. * g++.dg/diagnostic/sys-narrow1d.C: New test. * g++.dg/diagnostic/sys-narrow1e.C: New test. * g++.dg/diagnostic/sys-narrow1f.C: New test. * g++.dg/diagnostic/sys-narrow1g.C: New test. * g++.dg/diagnostic/sys-narrow1h.C: New test. * g++.dg/diagnostic/sys-narrow1i.C: New test.
2023-10-18OpenMP: Avoid ICE with LTO and 'omp allocate'Tobias Burnus2-7/+45
gcc/ChangeLog: * gimplify.cc (gimplify_bind_expr): Remove "omp allocate" attribute to avoid that auxillary statement list reaches LTO. gcc/testsuite/ChangeLog: * gfortran.dg/gomp/allocate-13a.f90: New test.
2023-10-18tree-ssa-math-opts: Fix up match_uaddc_usubc [PR111845]Jakub Jelinek3-17/+94
GCC ICEs on the first testcase. Successful match_uaddc_usubc ends up with some dead stmts which DCE will remove (hopefully) later all. The ICE is because one of the dead stmts refers to a freed SSA_NAME. The code already gsi_removes a couple of stmts in the /* Remove some statements which can't be kept in the IL because they use SSA_NAME whose setter is going to be removed too. */ section for the same reason (the reason for the freed SSA_NAMEs is that we don't really have a replacement for those cases - all we have after a match is combined overflow from the addition/subtraction of 2 operands + a [0, 1] carry in, but not the individual overflows from the former 2 additions), but for the last (most significant) limb case, where we try to match x = op1 + op2 + carry1 + carry2; or x = op1 - op2 - carry1 - carry2; we just gsi_replace the final stmt, but left around the 2 temporary stmts as dead; if we were unlucky enough that those referenced the carry flag that went away, it ICEs. So, the following patch remembers those temporary statements (rather than trying to rediscover them more expensively) and removes them before the final one is replaced. While working on it, I've noticed we didn't support all the reassociated possibilities of writing the addition of 4 operands or subtracting 3 operands from one, we supported e.g. x = ((op1 + op2) + op3) + op4; x = op1 + ((op2 + op3) + op4); but not x = (op1 + (op2 + op3)) + op4; x = op1 + (op2 + (op3 + op4)); Fixed by the change to inspect also rhs[2] when rhs[1] didn't yield what we were searching for (if non-NULL) - rhs[0] is inspected in the first loop and has different handling for the MINUS_EXPR case. 2023-10-18 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/111845 * tree-ssa-math-opts.cc (match_uaddc_usubc): Remember temporary statements for the 4 operand addition or subtraction of 3 operands from 1 operand cases and remove them when successful. Look for nested additions even from rhs[2], not just rhs[1]. * gcc.dg/pr111845.c: New test. * gcc.target/i386/pr111845.c: New test.
2023-10-18nvptx: Use fatal_error when -march= is missing not an assert [PR111093]Tobias Burnus1-2/+3
gcc/ChangeLog: PR target/111093 * config/nvptx/nvptx.cc (nvptx_option_override): Issue fatal error instead of an assert ICE when no -march= has been specified.
2023-10-18Darwin: Check as for .build_version support and use it if available.Iain Sandoe4-2/+79
This adds support for the minimum OS version data in assembler files. At present, we have no mechanism to detect the SDK version in use, and so that is omitted from build_versions. We follow the implementation in clang, '.build_version' is only emitted (where supported) for target macOS versions >= 10.14. For earlier macOS we fall back to using a '.macosx_version_min' directive. This latter is also emitted when the assembler supports it, but not build_version. gcc/ChangeLog: * config.in: Regenerate. * config/darwin.cc (darwin_file_start): Add assembler directives for the target OS version, where these are supported by the assembler. (darwin_override_options): Check for building >= macOS 10.14. * configure: Regenerate. * configure.ac: Check for assembler support of .build_version directives. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2023-10-18ifcvt: rewrite args handling to remove lookupsTamar Christina1-42/+88
This refactors the code to remove the args cache and index lookups in favor of a single structure. It also again, removes the use of std::sort as previously requested but avoids the new asserts in trunk. gcc/ChangeLog: PR tree-optimization/109154 * tree-if-conv.cc (INCLUDE_ALGORITHM): Remove. (typedef struct ifcvt_arg_entry): New. (cmp_arg_entry): New. (gen_phi_arg_condition, gen_phi_nest_statement, predicate_scalar_phi): Use them.
2023-10-18AArch64: Rewrite simd move immediate patterns to new syntaxTamar Christina1-69/+47
This rewrites the simd MOV patterns to use the new compact syntax. No change in semantics is expected. This will be needed in follow on patches. This also merges the splits into the define_insn which will also be needed soon. gcc/ChangeLog: PR tree-optimization/109154 * config/aarch64/aarch64-simd.md (*aarch64_simd_mov<VDMOV:mode>): Rewrite to new syntax. (*aarch64_simd_mov<VQMOV:mode): Rewrite to new syntax and merge in splits.
2023-10-18middle-end: ifcvt: Allow any const IFN in conditional blocksTamar Christina1-5/+10
When ifcvt was initially added masking was not a thing and as such it was rather conservative in what it supported. For builtins it only allowed C99 builtin functions which it knew it can fold away. These days the vectorizer is able to deal with needing to mask IFNs itself. vectorizable_call is able vectorize the IFN by emitting a VEC_PERM_EXPR after the operation to emulate the masking. This is then used by match.pd to conver the IFN into a masked variant if it's available. For these reasons the restriction in ifconvert is no longer require and we needless block vectorization when we can effectively handle the operations. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Note: This patch is part of a testseries and tests for it are added in the AArch64 patch that adds supports for the optab. gcc/ChangeLog: PR tree-optimization/109154 * tree-if-conv.cc (if_convertible_stmt_p): Allow any const IFN.
2023-10-18middle-end: Fold vec_cond into conditional ternary or binary operation when ↵Tamar Christina2-0/+156
sharing operand [PR109154] When we have a vector conditional on a masked target which is doing a selection on the result of a conditional operation where one of the operands of the conditional operation is the other operand of the select, then we can fold the vector conditional into the operation. Concretely this transforms c = mask1 ? (masked_op mask2 a b) : b into c = masked_op (mask1 & mask2) a b The mask is then propagated upwards by the compiler. In the SVE case we don't end up needing a mask AND here since `mask2` will end up in the instruction creating `mask` which gives us a natural &. Such transformations are more common now in GCC 13+ as PRE has not started unsharing of common code in case it can make one branch fully independent. e.g. in this case `b` becomes a loop invariant value after PRE. This transformation removes the extra select for masked architectures but doesn't fix the general case. gcc/ChangeLog: PR tree-optimization/109154 * match.pd: Add new cond_op rule. gcc/testsuite/ChangeLog: PR tree-optimization/109154 * gcc.target/aarch64/sve/pre_cond_share_1.c: New test.
2023-10-18LoongArch: Use fcmp.caf.s instead of movgr2cf for zeroing a fccXi Ruoyao1-1/+1
During the review of an LLVM change [1], on LA464 we found that zeroing an fcc with fcmp.caf.s is much faster than a movgr2cf from $r0. [1]: https://github.com/llvm/llvm-project/pull/69300 gcc/ChangeLog: * config/loongarch/loongarch.md (movfcc): Use fcmp.caf.s for zeroing a fcc.
2023-10-18Re-instantiate integer mask to traditional vector mask supportRichard Biener1-1/+8
The following allows to pass integer mask data as traditional vector mask for OMP SIMD clone calls which is required due to the limited set of OMP SIMD clones in the x86 ABI when using AVX512 but a prefered vector size of 256 bits. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Relax check to again allow passing integer mode masks as traditional vectors.
2023-10-18middle-end: maintain LCSSA throughout loop peelingTamar Christina4-192/+149
This final patch updates peeling to maintain LCSSA all the way through. It's significantly easier to maintain it during peeling while we still know where all new edges connect rather than touching it up later as is currently being done. This allows us to remove many of the helper functions that touch up the loops at various parts. The only complication is for loop distribution where we should be able to use the same, however ldist depending on whether redirect_lc_phi_defs is true or not will either try to maintain a limited LCSSA form itself or removes are non-virtual phis. The problem here is that if we maintain LCSSA then in some cases the blocks connecting the two loops get PHIs to keep the loop IV up to date. However there is no loop, the guard condition is rewritten as 0 != 0, to the "loop" always exits. However due to the PHI nodes the probabilities get completely wrong. It seems to think that the impossible exit is the likely edge. This causes incorrect warnings and the presence of the PHIs prevent the blocks to be simplified. While it may be possible to make ldist work with LCSSA form, doing so seems more work than not. For that reason the peeling code has an additional parameter used by only ldist to not connect the two loops during peeling. This preserves the current behaviour from ldist until I can dive into the implementation more. Hopefully that's ok for now. gcc/ChangeLog: * tree-loop-distribution.cc (copy_loop_before): Request no LCSSA. * tree-vect-loop-manip.cc (adjust_phi_and_debug_stmts): Add additional asserts. (slpeel_tree_duplicate_loop_to_edge_cfg): Keep LCSSA during peeling. (find_guard_arg): Look value up through explicit edge and original defs. (vect_do_peeling): Use it. (slpeel_update_phi_nodes_for_guard2): Take explicit exit edge. (slpeel_update_phi_nodes_for_lcssa, slpeel_update_phi_nodes_for_loops): Remove. * tree-vect-loop.cc (vect_create_epilog_for_reduction): Initialize phi. * tree-vectorizer.h (slpeel_tree_duplicate_loop_to_edge_cfg): Add optional param to turn off LCSSA mode.
2023-10-18middle-end: updated niters analysis to handle multiple exits.Tamar Christina4-64/+152
This second part updates niters analysis to be able to analyze any number of exits. If we have multiple exits we determine the main exit by finding the first counting IV. The change allows the vectorizer to pass analysis for multiple loops, but we later gracefully reject them. It does however allow us to test if the exit handling is using the right exit everywhere. Additionally since we analyze all exits, we now return all conditions for them and determine which condition belongs to the main exit. The main condition is needed because the vectorizer needs to ignore the main IV condition during vectorization as it will replace it during codegen. To track versioned loops we extend the contract between ifcvt and the vectorizer to store the exit number in aux so that we can match it up again during peeling. gcc/ChangeLog: * tree-if-conv.cc (tree_if_conversion): Record exits in aux. * tree-vect-loop-manip.cc (slpeel_tree_duplicate_loop_to_edge_cfg): Use it. * tree-vect-loop.cc (vect_get_loop_niters): Determine main exit. (vec_init_loop_exit_info): Extend analysis when multiple exits. (vect_analyze_loop_form): Record conds and determine main cond. (vect_create_loop_vinfo): Extend bookkeeping of conds. (vect_analyze_loop): Release conds. * tree-vectorizer.h (LOOP_VINFO_LOOP_CONDS, LOOP_VINFO_LOOP_IV_COND): New. (struct vect_loop_form_info): Add conds, alt_loop_conds; (struct loop_vec_info): Add conds, loop_iv_cond.
2023-10-18middle-end: Refactor vectorizer loop conditionals and separate out IV to new ↵Tamar Christina8-93/+190
variables This is extracted out of the patch series to support early break vectorization in order to simplify the review of that patch series. The goal of this one is to separate out the refactoring from the new functionality. This first patch separates out the vectorizer's definition of an exit to their own values inside loop_vinfo. During vectorization we can have three separate copies for each loop: scalar, vectorized, epilogue. The scalar loop can also be the versioned loop before peeling. Because of this we track 3 different exits inside loop_vinfo corresponding to each of these loops. Additionally each function that uses an exit, when not obviously clear which exit is needed will now take the exit explicitly as an argument. This is because often times the callers switch the loops being passed around. While the caller knows which loops it is, the callee does not. For now the loop exits are simply initialized to same value as before determined by single_exit (..). No change in functionality is expected throughout this patch series. gcc/ChangeLog: * tree-loop-distribution.cc (copy_loop_before): Pass exit explicitly. (loop_distribution::distribute_loop): Bail out of not single exit. * tree-scalar-evolution.cc (get_loop_exit_condition): New. * tree-scalar-evolution.h (get_loop_exit_condition): New. * tree-vect-data-refs.cc (vect_enhance_data_refs_alignment): Pass exit explicitly. * tree-vect-loop-manip.cc (vect_set_loop_condition_partial_vectors, vect_set_loop_condition_partial_vectors_avx512, vect_set_loop_condition_normal, vect_set_loop_condition): Explicitly take exit. (slpeel_tree_duplicate_loop_to_edge_cfg): Explicitly take exit and return new peeled corresponding peeled exit. (slpeel_can_duplicate_loop_p): Explicitly take exit. (find_loop_location): Handle not knowing an explicit exit. (vect_update_ivs_after_vectorizer, vect_gen_vector_loop_niters_mult_vf, find_guard_arg, slpeel_update_phi_nodes_for_loops, slpeel_update_phi_nodes_for_guard2): Use new exits. (vect_do_peeling): Update bookkeeping to keep track of exits. * tree-vect-loop.cc (vect_get_loop_niters): Explicitly take exit to analyze. (vec_init_loop_exit_info): New. (_loop_vec_info::_loop_vec_info): Initialize vec_loop_iv, vec_epilogue_loop_iv, scalar_loop_iv. (vect_analyze_loop_form): Initialize exits. (vect_create_loop_vinfo): Set main exit. (vect_create_epilog_for_reduction, vectorizable_live_operation, vect_transform_loop): Use it. (scale_profile_for_vect_loop): Explicitly take exit to scale. * tree-vectorizer.cc (set_uid_loop_bbs): Initialize loop exit. * tree-vectorizer.h (LOOP_VINFO_IV_EXIT, LOOP_VINFO_EPILOGUE_IV_EXIT, LOOP_VINFO_SCALAR_IV_EXIT): New. (struct loop_vec_info): Add vec_loop_iv, vec_epilogue_loop_iv, scalar_loop_iv. (vect_set_loop_condition, slpeel_can_duplicate_loop_p, slpeel_tree_duplicate_loop_to_edge_cfg): Take explicit exits. (vec_init_loop_exit_info): New. (struct vect_loop_form_info): Add loop_exit.
2023-10-18middle-end: refactor vectorizable_comparison to make the main body re-usable.Tamar Christina1-18/+46
Vectorization of a gcond starts off essentially the same as vectorizing a comparison witht he only difference being how the operands are extracted. This refactors vectorable_comparison such that we now have a generic function that can be used from vectorizable_early_break. The refactoring splits the gassign checks and actual validation/codegen off to a helper function. No change in functionality expected. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_comparison): Refactor, splitting body to ... (vectorizable_comparison_1): ...This.
2023-10-18RISC-V: Optimize consecutive permutation index pattern by vrgather.vi/vxJuzhe-Zhong9-0/+465
This patch optimize this following permutation with consecutive patterns index: typedef char vnx16i __attribute__ ((vector_size (16))); #define MASK_16 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15, 12, 13, 14, 15 vnx16i __attribute__ ((noinline, noclone)) test_1 (vnx16i x, vnx16i y) { return __builtin_shufflevector (x, y, MASK_16); } Before this patch: lui a5,%hi(.LC0) addi a5,a5,%lo(.LC0) vsetivli zero,16,e8,m1,ta,ma vle8.v v3,0(a5) vle8.v v2,0(a1) vrgather.vv v1,v2,v3 vse8.v v1,0(a0) ret After this patch: vsetivli zero,16,e8,mf8,ta,ma vle8.v v2,0(a1) vsetivli zero,4,e32,mf2,ta,ma vrgather.vi v1,v2,3 vsetivli zero,16,e8,mf8,ta,ma vse8.v v1,0(a0) ret Overal reduce 1 instruction which is vector load instruction which is much more expansive than VL toggling. Also, with this patch, we are using vrgather.vi which reduce 1 vector register consumption. gcc/ChangeLog: * config/riscv/riscv-v.cc (shuffle_consecutive_patterns): New function. (expand_vec_perm_const_1): Add consecutive pattern recognition. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/def.h: Add new test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive-2.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-1.c: New test. * gcc.target/riscv/rvv/autovec/vls-vlmax/consecutive_run-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-1.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-2.c: New test. * gcc.target/riscv/rvv/autovec/vls/consecutive-3.c: New test.
2023-10-18fortran/intrinsic.texi: Add 'intrinsic' to SIGNAL exampleTobias Burnus1-0/+1
gcc/fortran/ChangeLog: * intrinsic.texi (signal): Add 'intrinsic :: signal, sleep' to the example to make it safer.
2023-10-18Initial Panther Lake SupportHaochen Jiang12-45/+94
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Add Panther Lake. * common/config/i386/i386-common.cc (processor_name): Ditto. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types): Add INTEL_PANTHERLAKE. * config.gcc: Add -march=pantherlake. * config/i386/driver-i386.cc (host_detect_local_cpu): Refactor the if clause. Handle pantherlake. * config/i386/i386-c.cc (ix86_target_macros_internal): Handle pantherlake. * config/i386/i386-options.cc (processor_cost_table): Ditto. (m_PANTHERLAKE): New. (m_CORE_HYBRID): Add pantherlake. * config/i386/i386.h (enum processor_type): Ditto. * doc/extend.texi: Ditto. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Ditto. * gcc.target/i386/funcspec-56.inc: Handle new march.
2023-10-18x86: Add m_CORE_HYBRID for hybrid clients tuningHaochen Jiang2-62/+52
gcc/Changelog: * config/i386/i386-options.cc (m_CORE_HYBRID): New. * config/i386/x86-tune.def: Replace hybrid client tune to m_CORE_HYBRID.
2023-10-18Initial Clearwater Forest SupportHaochen Jiang12-3/+47
gcc/ChangeLog: * common/config/i386/cpuinfo.h (get_intel_cpu): Handle Clearwater Forest. * common/config/i386/i386-common.cc (processor_name): Add Clearwater Forest. (processor_alias_table): Ditto. * common/config/i386/i386-cpuinfo.h (enum processor_types): Add INTEL_CLEARWATERFOREST. * config.gcc: Add -march=clearwaterforest. * config/i386/driver-i386.cc (host_detect_local_cpu): Handle clearwaterforest. * config/i386/i386-c.cc (ix86_target_macros_internal): Ditto. * config/i386/i386-options.cc (processor_cost_table): Ditto. (m_CLEARWATERFOREST): New. (m_CORE_ATOM): Add clearwaterforest. * config/i386/i386.h (enum processor_type): Ditto. * doc/extend.texi: Ditto. * doc/invoke.texi: Ditto. gcc/testsuite/ChangeLog: * g++.target/i386/mv16.C: Ditto. * gcc.target/i386/funcspec-56.inc: Handle new march.
2023-10-18Support 32/64-bit vectorization for _Float16 fma related operations.liuhongt3-1/+231
gcc/ChangeLog: * config/i386/mmx.md (fma<mode>4): New expander. (fms<mode>4): Ditto. (fnma<mode>4): Ditto. (fnms<mode>4): Ditto. (vec_fmaddsubv4hf4): Ditto. (vec_fmsubaddv4hf4): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-fmaddsubhf-1.c: New test. * gcc.target/i386/part-vect-fmahf-1.c: New test.
2023-10-18RISC-V: Enable more tests for dynamic LMUL and bug fix[PR111832]Juzhe-Zhong2-8/+21
Last time, Robin has mentioned that dynamic LMUL will cause ICE in SPEC: https://gcc.gnu.org/pipermail/gcc-patches/2023-September/629992.html which is caused by assertion FAIL. When we enable more currents in rvv.exp with dynamic LMUL, such issue can be reproduced and has a PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111832 Now, we enable more tests in rvv.exp in this patch and fix the bug. PR target/111832 gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (get_biggest_mode): New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/rvv.exp: Enable more dynamic tests.
2023-10-18Daily bump.GCC Administrator8-1/+478
2023-10-17aarch64: Put LR save slot first in more casesRichard Sandiford5-9/+9
Now that the prologue and epilogue code iterates over saved registers in offset order, we can put the LR save slot first without compromising LDP/STP formation. This isn't worthwhile when shadow call stacks are enabled, since the first two registers are also push/pop candidates, and LR cannot be popped when shadow call stacks are enabled. (LR is instead loaded first and compared against the shadow stack's value.) But otherwise, it seems better to put the LR save slot first, to reduce unnecessary variation with the layout for stack clash protection. gcc/ * config/aarch64/aarch64.cc (aarch64_layout_frame): Don't make the position of the LR save slot dependent on stack clash protection unless shadow call stacks are enabled. gcc/testsuite/ * gcc.target/aarch64/test_frame_2.c: Expect x30 to come before x19. * gcc.target/aarch64/test_frame_4.c: Likewise. * gcc.target/aarch64/test_frame_7.c: Likewise. * gcc.target/aarch64/test_frame_10.c: Likewise.
2023-10-17aarch64: Use vecs to store register save orderRichard Sandiford8-120/+128
aarch64_save/restore_callee_saves looped over registers in register number order. This in turn meant that we could only use LDP and STP for registers that were consecutive both number-wise and offset-wise (after unsaved registers are excluded). This patch instead builds lists of the registers that we've decided to save, in offset order. We can then form LDP/STP pairs regardless of register number order, which in turn means that we can put the LR save slot first without losing LDP/STP opportunities. gcc/ * config/aarch64/aarch64.h (aarch64_frame): Add vectors that store the list saved GPRs, FPRs and predicate registers. * config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize the lists of saved registers. Use them to choose push candidates. Invalidate pop candidates if we're not going to do a pop. (aarch64_next_callee_save): Delete. (aarch64_save_callee_saves): Take a list of registers, rather than a range. Make !skip_wb select only write-back candidates. (aarch64_expand_prologue): Update calls accordingly. (aarch64_restore_callee_saves): Take a list of registers, rather than a range. Always skip pop candidates. Also skip LR if shadow call stacks are enabled. (aarch64_expand_epilogue): Update calls accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/pcs/stack_clash_2.c: Expect restores to happen in offset order. * gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise.
2023-10-17Handle epilogues that contain jumpsRichard Sandiford3-30/+70
The prologue/epilogue pass allows the prologue sequence to contain jumps. The sequence is then partitioned into basic blocks using find_many_sub_basic_blocks. This patch treats epilogues in a similar way. Since only one block might need to be split, the patch (re)introduces a find_sub_basic_blocks routine to handle a single block. The new routine hard-codes the assumption that split_block will chain the new block immediately after the original block. The routine doesn't try to replicate the fix for PR81030, since that was specific to gimple->rtl expansion. The patch is needed for follow-on aarch64 patches that add conditional code to the epilogue. The tests are part of those patches. gcc/ * cfgbuild.h (find_sub_basic_blocks): Declare. * cfgbuild.cc (update_profile_for_new_sub_basic_block): New function, split out from... (find_many_sub_basic_blocks): ...here. (find_sub_basic_blocks): New function. * function.cc (thread_prologue_and_epilogue_insns): Handle epilogues that contain jumps.
2023-10-17ssa_name_has_boolean_range vs signed-boolean:31 typesAndrew Pinski4-4/+43
This turns out to be a latent bug in ssa_name_has_boolean_range where it would return true for all boolean types but all of the uses of ssa_name_has_boolean_range was expecting 0/1 as the range rather than [-1,0]. So when I fixed vector lower to do all comparisons in boolean_type rather than still in the signed-boolean:31 type (to fix a different issue), the pattern in match for `-(type)!A -> (type)A - 1.` would assume A (which was signed-boolean:31) had a range of [0,1] which broke down and sometimes gave us -1/-2 as values rather than what we were expecting of -1/0. This was the simpliest patch I found while testing. We have another way of matching [0,1] range which we could use instead of ssa_name_has_boolean_range except that uses only the global ranges rather than the local range (during VRP). I tried to clean this up slightly by using gimple_match_zero_one_valuedp inside ssa_name_has_boolean_range but that failed because due to using only the global ranges. I then tried to change get_nonzero_bits to use the local ranges at the optimization time but that failed also because we would remove branches to __builtin_unreachable during evrp and lose information as we don't set the global ranges during evrp. OK? Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/110817 gcc/ChangeLog: * tree-ssanames.cc (ssa_name_has_boolean_range): Remove the check for boolean type as they don't have "[0,1]" range. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr110817-1.c: New test. * gcc.c-torture/execute/pr110817-2.c: New test. * gcc.c-torture/execute/pr110817-3.c: New test.
2023-10-17c++: accepts-invalid with =delete("") [PR111840]Marek Polacek2-7/+17
r6-2367 added a DECL_INITIAL check to cp_parser_simple_declaration so that we don't emit multiple errors in g++.dg/parse/error57.C. But that means we don't diagnose int f1() = delete("george_crumb"); anymore, because fn decls often have error_mark_node in their DECL_INITIAL. (The code may be allowed one day via https://wg21.link/P2573R0.) I was hoping I could use cp_parser_error_occurred but that would regress error57.C. PR c++/111840 gcc/cp/ChangeLog: * parser.cc (cp_parser_simple_declaration): Do cp_parser_error for FUNCTION_DECLs. gcc/testsuite/ChangeLog: * g++.dg/parse/error65.C: New test.
2023-10-17c++: Fix compile-time-hog in cp_fold_immediate_r [PR111660]Marek Polacek3-23/+128
My recent patch introducing cp_fold_immediate_r caused exponential compile time with nested COND_EXPRs. The problem is that the COND_EXPR case recursively walks the arms of a COND_EXPR, but after processing both arms it doesn't end the walk; it proceeds to walk the sub-expressions of the outermost COND_EXPR, triggering again walking the arms of the nested COND_EXPR, and so on. This patch brings the compile time down to about 0m0.030s. The ff_fold_immediate flag is unused after this patch but since I'm using it in the P2564 patch, I'm not removing it now. Maybe at_eof can be used instead and then we can remove ff_fold_immediate. PR c++/111660 gcc/cp/ChangeLog: * cp-gimplify.cc (cp_fold_immediate_r) <case COND_EXPR>: Don't handle it here. (cp_fold_r): Handle COND_EXPR here. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/hog1.C: New test. * g++.dg/cpp2a/consteval36.C: New test.
2023-10-17c++: mangling tweaksJason Merrill2-57/+34
Most of this is introducing the abi_check function to reduce the verbosity of most places that check -fabi-version. The start_mangling change is to avoid needing to zero-initialize additional members of the mangling globals, though I'm not actually adding any. The comment documents existing semantics. gcc/cp/ChangeLog: * mangle.cc (abi_check): New. (write_prefix, write_unqualified_name, write_discriminator) (write_type, write_member_name, write_expression) (write_template_arg, write_template_param): Use it. (start_mangling): Assign from {}. * cp-tree.h: Update comment.
2023-10-17c++: Add missing auto_diagnostic_groups to constexpr.ccNathaniel Shead1-0/+9
gcc/cp/ChangeLog: * constexpr.cc (cxx_eval_dynamic_cast_fn): Add missing auto_diagnostic_group. (cxx_eval_call_expression): Likewise. (diag_array_subscript): Likewise. (outside_lifetime_error): Likewise. (potential_constant_expression_1): Likewise. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Marek Polacek <polacek@redhat.com>
2023-10-17RISC-V/testsuite/pr111466.c: update test and expected outputVineet Gupta1-2/+2
Update the test to potentially generate two SEXT.W instructions: one for incoming function arg, other for function return. But after commit 8eb9cdd14218 ("expr: don't clear SUBREG_PROMOTED_VAR_P flag for a promoted subreg") the test is not supposed to generate either of them so fix the expected assembler output which was errorneously introduced by commit above. gcc/testsuite/ChangeLog: * gcc.target/riscv/pr111466.c (foo2): Change return to unsigned int as that will potentially generate two SEXT.W instructions. dg-final: Change to scan-assembler-not SEXT.W. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-10-17c: error for function with external and internal linkage [PR111708]Martin Uecker3-0/+84
Declaring a function with both external and internal linkage in the same TU is translation-time UB. Add an error for this case as already done for objects. PR c/111708 gcc/c/ChangeLog: * c-decl.cc (grokdeclarator): Add error. gcc/testsuite/ChangeLog: * gcc.dg/pr111708-1.c: New test. * gcc.dg/pr111708-2.c: New test.