aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-25Daily bump.GCC Administrator7-1/+294
2023-04-25[SVE] Fold svrev(svrev(v)) to v.Prathamesh Kulkarni2-0/+33
gcc/ChangeLog: * tree-ssa-forwprop.cc (is_combined_permutation_identity): Try to simplify two successive VEC_PERM_EXPRs with same VLA mask, where mask chooses elements in reverse order. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/acle/general/rev-1.c: New test.
2023-04-24Update gcc hr.po, sv.po, zh_CN.poJoseph Myers3-1146/+587
* hr.po, sv.po, zh_CN.po: Update.
2023-04-24libstdc++: Fix __max_diff_type::operator>>= for negative valuesPatrick Palka2-3/+12
This patch fixes sign bit propagation when right-shifting a negative __max_diff_type value by more than one, a bug that our existing test coverage didn't expose until r14-159-g03cebd304955a6 fixed the front end's 'signed typedef-name' handling that the test relies on (which is a non-standard extension to the language grammar). libstdc++-v3/ChangeLog: * include/bits/max_size_type.h (__max_diff_type::operator>>=): Fix propagation of sign bit. * testsuite/std/ranges/iota/max_size_type.cc: Avoid using the non-standard 'signed typedef-name'. Add some compile-time tests for right-shifting a negative __max_diff_type value by more than one.
2023-04-24PHIOPT: Add support for diamond shaped bb to match_simplify_replacementAndrew Pinski3-10/+35
This adds diamond shaped form of basic blocks to match_simplify_replacement. This is the patch is the start of removing/moving all of what minmax_replacement does to match.pd to reduce the code duplication. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Note phi-opt-{23,24}.c testcase had an incorrect xfail as there should have been 2 if still because f4/f5 would not be transformed as -ABS is not allowable during early phi-opt. gcc/ChangeLog: * tree-ssa-phiopt.cc (match_simplify_replacement): Add new arguments and support diamond shaped basic block form. (tree_ssa_phiopt_worker): Update call to match_simplify_replacement gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-23.c: Update testcase. * gcc.dg/tree-ssa/phi-opt-24.c: Likewise.
2023-04-24PHIOPT: Ignore predicates for match-and-simplify phi-optAndrew Pinski2-3/+23
This fixes a missed optimization where early phi-opt would not work when there was predicates. The easiest fix is to change empty_bb_or_one_feeding_into_p to ignore those statements while checking for only feeding statement. Note phi-opt-23.c and phi-opt-24.c still fail as we don't handle diamond form in match_and_simplify phiopt yet. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p): Instead of calling last_and_only_stmt, look for the last statement manually. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-ifcombine-13.c: Add -fno-ssa-phiopt.
2023-04-24PHIOPT: Factor out some code from match_simplify_replacementAndrew Pinski1-44/+62
This factors out the code checking if we have an empty bb or one statement that feeds into the phi so it can be used when adding diamond shaped bb form to match_simplify_replacement in the next patch. Also allows for some improvements in the next patches too. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (empty_bb_or_one_feeding_into_p): New function. (match_simplify_replacement): Call empty_bb_or_one_feeding_into_p instead of doing it inline.
2023-04-24PHIOPT: Allow other diamond uses when do_hoist_loads is trueAndrew Pinski1-4/+1
While working on adding diamond shaped form to match-and-simplify phiopt, I Noticed that we would not reach there if do_hoist_loads was true. In the original code before the cleanups it was not obvious why but after I finished the cleanups, it was just a matter of removing a continue and that is what this patch does. This just happens also to fix a bug report that I noticed too. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/68894 * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove the continue for the do_hoist_loads diamond case.
2023-04-24PHIOPT: Cleanup tree_ssa_phiopt_worker codeAndrew Pinski1-105/+107
This patch cleans up tree_ssa_phiopt_worker by merging common code. Making do_store_elim handled earlier. Note this does not change any overall logic of the code, just moves code around enough to be able to do this. This will make it easier to move code around even more and a few other fixes I have. Plus I think all of the do_store_elim code really should move to its own function as how much code is shared is now obvious not much. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Rearrange code for better code readability.
2023-04-24PHIOPT: Move check on diamond bb to tree_ssa_phiopt_worker from ↵Andrew Pinski3-6/+34
minmax_replacement This moves the check to make sure on the diamond shaped form bbs that the the two middle bbs are only for that diamond shaped form earlier in the shared code. Also remove the redundant check for single_succ_p since that was already done before hand. The next patch will simplify the code even further and remove redundant checks. PR tree-optimization/109604 gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Move the diamond form check from ... (minmax_replacement): Here. gcc/testsuite/ChangeLog: * gcc.c-torture/compile/pr109604-1.c: New test. * gcc.c-torture/compile/pr109604-2.c: New test.
2023-04-24c++, tree: declare some basic functions inlinePatrick Palka4-57/+54
The functions strip_array_types, is_typedef_decl, typedef_variant_p and cp_expr_location are used throughout the C++ front end including in some fairly hot parts (e.g. in the tsubst routines and cp_walk_subtree) and they're small enough that the overhead of calling them out-of-line is relatively significant. So this patch moves their definitions into the appropriate headers to enable inlining them. gcc/cp/ChangeLog: * cp-tree.h (cp_expr_location): Define here. * tree.cc (cp_expr_location): Don't define here. gcc/ChangeLog: * tree.cc (strip_array_types): Don't define here. (is_typedef_decl): Don't define here. (typedef_variant_p): Don't define here. * tree.h (strip_array_types): Define here. (is_typedef_decl): Define here. (typedef_variant_p): Define here.
2023-04-24Docs, OpenMP: Small fixes to internal OMP_FOR doc.Frederik Harwath2-4/+4
gcc/ChangeLog: * doc/generic.texi (OpenMP): Add != to allowed conditions and state that vars can be unsigned. * tree.def (OMP_FOR): Likewise.
2023-04-24aarch64: Add mulv2di3 expander for TARGET_SVEKyrylo Tkachov3-0/+81
Motivated by a recent LLVM patch I saw, we can use SVE for 64-bit vector integer MUL (plain Advanced SIMD doesn't support it). Since the Advanced SIMD regs are just the low 128-bit part of the SVE regs it all works transparently. It's a reasonably straightforward implementation of the mulv2di3 optab that wires it up through the mulvnx2di3 expander and subregs the results back to the Advanced SIMD modes. There's more such tricks possible with other operations (and we could do 64-bit multiply-add merged operations too) but for now this self-contained patch improves the mul case as without it for the testcases in the patch we'd have scalarised the arguments, moved them to GP regs, performed two GP MULs and moved them back to SIMD regs. Advertising a mulv2di3 optab from the backend should also allow for more flexibile vectorisation opportunities. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (mulv2di3): New expander. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve-neon-modes_1.c: New test. * gcc.target/aarch64/sve-neon-modes_2.c: New test.
2023-04-24MAINTAINERS: fix sorting of namesMartin Liska1-1/+1
ChangeLog: * MAINTAINERS: Fix sorting.
2023-04-24doc: Update install.texi for GCC 13Rainer Orth1-109/+78
install.texi needs some updates for GCC 13 and trunk: * We used a mixture of Solaris 2 and Solaris references. Since Solaris 1/SunOS 4 is ancient history by now, consistently use Solaris everywhere. Likewise, explicit references to Solaris 11 can go in many places since Solaris 11.3 and 11.4 is all GCC supports. * Some caveats apply to both Solaris/SPARC and x86, like the difference between as and gas. * Some specifics are obsolete, like the /usr/ccs/bin path whose contents was merged into /usr/bin in Solaris 11.0 already. Likewise, /bin/sh is ksh93 since Solaris 11.0, so there's no need to explicitly use /bin/ksh. * I've removed the reference to OpenCSW: there's barely a need for external sites to get additional packages. OpenCSW is mostly unmaintained these days and has been found to be rather harmful then helping. * The section on assembler and linker to use was partially duplicated. Better keep the info in one place. * GNAT is bundled in recent Solaris 11.4 updates, so recommend that. Tested on i386-pc-solaris2.11 with make doc/gccinstall.{info,pdf} and inspection of the latter. 2023-04-21 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> gcc: * doc/install.texi: Consistently use Solaris rather than Solaris 2. Remove explicit Solaris 11 references. Markup fixes. (Options specification, --with-gnu-as): as and gas always differ on Solaris. Remove /usr/ccs/bin reference. (Installing GCC: Binaries, Solaris (SPARC, Intel)): Remove. (i?86-*-solaris2*): Merge assembler, linker recommendations ... (*-*-solaris2*): ... here. Update bundled GCC versions. Don't refer to pre-built binaries. Remove /bin/sh warning. Update assembler, linker recommendations. Document GNAT bootstrap compiler. (sparc-sun-solaris2*): Remove non-UltraSPARC reference. (sparc64-*-solaris2*): Move content... (sparcv9-*-solaris2*): ...here. Add GDC for 64-bit bootstrap compilers.
2023-04-24aarch64: PR target/109406 Add support for SVE2 unpredicated MULKyrylo Tkachov4-4/+57
SVE2 supports an unpredicated vector integer MUL form that we can emit from our SVE expanders without using up a predicate registers. This patch does so. As the SVE MUL expansion currently is templated away through a code iterator I did not split it off just for this case but instead special-cased it in the define_expand. It seemed somewhat less invasive than the alternatives but I could split it off more explicitly if others want to. The div-by-bitmask_1.c testcase is adjusted to expect this new MUL form. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: PR target/109406 * config/aarch64/aarch64-sve.md (<optab><mode>3): Handle TARGET_SVE2 MUL case. * config/aarch64/aarch64-sve2.md (*aarch64_mul_unpredicated_<mode>): New pattern. gcc/testsuite/ChangeLog: PR target/109406 * gcc.target/aarch64/sve2/div-by-bitmask_1.c: Adjust for unpredicated SVE2 MUL. * gcc.target/aarch64/sve2/unpred_mul_1.c: New test.
2023-04-24[4/4] aarch64: Convert UABAL2 and SABAL2 patterns to standard RTL codesKyrylo Tkachov5-15/+144
The final patch in the series tackles the most complex of this family of patterns, UABAL2 and SABAL2. These extract the high part of the sources, perform an absdiff on them, widen the result and accumulate. The motivating testcase for this patch (series) is included and the simplification required doesn't actually trigger with just the RTL pattern change because rtx_costs block it. So this patch also extends rtx costs to recognise the (minus (smax (x, y) (smin (x, y)))) expression we use to describe absdiff in the backend and avoid recursing into its arms. This allows us to generate the single-instruction sequence expected here. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>abal2<mode>): Rename to... (aarch64_<su>abal2<mode>_insn): ... This. Use RTL codes instead of unspec. (aarch64_<su>abal2<mode>): New define_expand. * config/aarch64/aarch64.cc (aarch64_abd_rtx_p): New function. (aarch64_rtx_costs): Handle ABD rtxes. * config/aarch64/aarch64.md (UNSPEC_SABAL2, UNSPEC_UABAL2): Delete. * config/aarch64/iterators.md (ABAL2): Delete. (sur): Remove handling of UNSPEC_UABAL2 and UNSPEC_SABAL2. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vabal_combine.c: New test.
2023-04-24[3/4] aarch64: Convert UABAL and SABAL patterns to standard RTL codesKyrylo Tkachov4-26/+26
With the SABDL and UABDL patterns converted, the accumulating forms of them UABAL and SABAL are not much more complicated. There's an accumulator argument that we, err, accumulate into with a PLUS once all the widening is done. Some necessary renaming of patterns relating to the removal of UNSPEC_SABAL and UNSPEC_UABAL is included. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>abal<mode>): Rename to... (aarch64_<su>abal<mode>): ... This. Use RTL codes instead of unspec. (<sur>sadv16qi): Rename to... (<su>sadv16qi): ... This. Adjust for the above. * config/aarch64/aarch64-sve.md (<sur>sad<vsi2qi>): Rename to... (<su>sad<vsi2qi>): ... This. Adjust for the above. * config/aarch64/aarch64.md (UNSPEC_SABAL, UNSPEC_UABAL): Delete. * config/aarch64/iterators.md (ABAL): Delete. (sur): Remove handling of UNSPEC_SABAL and UNSPEC_UABAL.
2023-04-24[2/4] aarch64: Convert UABDL2 and SABDL2 patterns to standard RTL codesKyrylo Tkachov3-11/+33
Similar to the previous patch for UABDL and SABDL, this patch covers the *2 versions that vec_select the high half of its input to do the asbsdiff and extend. A define_expand is added for the intrinsic to create the "select-high-half" RTX the pattern expects. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>abdl2<mode>): Rename to... (aarch64_<su>abdl2<mode>_insn): ... This. Use RTL codes instead of unspec. (aarch64_<su>abdl2<mode>): New define_expand. * config/aarch64/aarch64.md (UNSPEC_SABDL2, UNSPEC_UABDL2): Delete. * config/aarch64/iterators.md (ABDL2): Delete. (sur): Remove handling of UNSPEC_SABDL2 and UNSPEC_UABDL2.
2023-04-24[1/4] aarch64: Convert UABDL and SABDL patterns to standard RTL codesKyrylo Tkachov3-12/+10
This is the first patch in a series to improve the RTL representation of the sum-of-absolute-differences patterns in the backend. We can use standard RTL codes and remove some unspecs. For UABDL and SABDL we have a widening of the result so we can represent uabdl (x, y) as (zero_extend (minus (smax (x, y) (smin (x, y))))) and sabdl (x, y) as (zero_extend (minus (umax (x, y) (umin (x, y))))). It is important to use zero_extend rather than sign_extend for the sabdl case, as the result of the absolute difference is still a positive unsigned value (the signedness of the operation refers to the values being diffed, not the absolute value of the difference) that must be zero-extended. Bootstrapped and tested on aarch64-none-linux-gnu (these intrinsics are reasonably well-covered by the advsimd-intrinsics.exp tests) gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>abdl<mode>): Rename to... (aarch64_<su>abdl<mode>): ... This. Use standard RTL ops instead of unspec. * config/aarch64/aarch64.md (UNSPEC_SABDL, UNSPEC_UABDL): Delete. * config/aarch64/iterators.md (ABDL): Delete. (sur): Remove handling of UNSPEC_SABDL and UNSPEC_UABDL.
2023-04-24aarch64: Add pattern to match zero-extending scalar result of ADDLVKyrylo Tkachov2-0/+100
The vaddlv_u8 and vaddlv_u16 intrinsics produce a widened scalar result (uint16_t and uint32_t). The ADDLV instructions themselves zero the rest of the V register, which gives us a free zero-extension to 32 and 64 bits, similar to how it works on the GP reg side. Because we don't model that zero-extension in the machine description this can cause GCC to move the results of these instructions to the GP regs just to do a (superfluous) zero-extension. This patch just adds a pattern to catch these cases. For the testcases we can now generate no zero-extends or GP<->FP reg moves, whereas before we generated stuff like: foo_8_32: uaddlv h0, v0.8b umov w1, v0.h[0] // FP<->GP move with zero-extension! str w1, [x0] ret Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (*aarch64_<su>addlv<VDQV_L:mode>_ze<GPI:mode>): New pattern. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/addlv_zext.c: New test.
2023-04-24This replaces uses of last_stmt where we do not require debug skippingRichard Biener19-100/+59
There are quite some cases which want to access the control stmt ending a basic-block. Since there cannot be debug stmts after such stmt there's no point in using last_stmt which skips debug stmts and can be a compile-time hog for larger testcases. * gimple-ssa-split-paths.cc (is_feasible_trace): Avoid last_stmt. * graphite-scop-detection.cc (single_pred_cond_non_loop_exit): Likewise. * ipa-fnsummary.cc (set_cond_stmt_execution_predicate): Likewise. (set_switch_stmt_execution_predicate): Likewise. (phi_result_unknown_predicate): Likewise. * ipa-prop.cc (compute_complex_ancestor_jump_func): Likewise. (ipa_analyze_indirect_call_uses): Likewise. * predict.cc (predict_iv_comparison): Likewise. (predict_extra_loop_exits): Likewise. (predict_loops): Likewise. (tree_predict_by_opcode): Likewise. * gimple-predicate-analysis.cc (predicate::init_from_control_deps): Likewise. * gimple-pretty-print.cc (dump_implicit_edges): Likewise. * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Likewise. (replace_phi_edge_with_variable): Likewise. (two_value_replacement): Likewise. (value_replacement): Likewise. (minmax_replacement): Likewise. (spaceship_replacement): Likewise. (cond_removal_in_builtin_zero_pattern): Likewise. * tree-ssa-reassoc.cc (maybe_optimize_range_tests): Likewise. * tree-ssa-sccvn.cc (vn_phi_eq): Likewise. (vn_phi_lookup): Likewise. (vn_phi_insert): Likewise. * tree-ssa-structalias.cc (compute_points_to_sets): Likewise. * tree-ssa-threadbackward.cc (back_threader::maybe_thread_block): Likewise. (back_threader_profitability::possibly_profitable_path_p): Likewise. * tree-ssa-threadedge.cc (jump_threader::thread_outgoing_edges): Likewise. * tree-switch-conversion.cc (pass_convert_switch::execute): Likewise. (pass_lower_switch<O0>::execute): Likewise. * tree-tailcall.cc (tree_optimize_tail_calls_1): Likewise. * tree-vect-loop-manip.cc (vect_loop_versioning): Likewise. * tree-vect-slp.cc (vect_slp_function): Likewise. * tree-vect-stmts.cc (cfun_returns): Likewise. * tree-vectorizer.cc (vect_loop_vectorized_call): Likewise. (vect_loop_dist_alias_call): Likewise.
2023-04-24Avoid repeated forwarder_block_p calls in CFG cleanupRichard Biener1-2/+2
CFG cleanup maintains BB_FORWARDER_BLOCK and uses FORWARDER_BLOCK_P to check that apart from two places which use forwarder_block_p in outgoing_edges_match alongside many BB_FORWARDER_BLOCK uses. The following adjusts those. * cfgcleanup.cc (outgoing_edges_match): Use FORWARDER_BLOCK_P.
2023-04-24RISC-V: Eliminate redundant vsetvli for duplicate AVL defJuzhe-Zhong3-3/+58
This patch is the V2 patch:https://patchwork.sourceware.org/project/gcc/patch/20230328010124.235703-1-juzhe.zhong@rivai.ai/ Address comments from Jeff. Add comments for all_avail_in_compatible_p and refine comments of codes. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vector_infos_manager::all_avail_in_compatible_p): New function. (pass_vsetvl::refine_vsetvls): Optimize vsetvls. * config/riscv/riscv-vsetvl.h: New function. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/avl_single-102.c: New test.
2023-04-24RISC-V: Add function comment for cleanup_insns.Juzhe-Zhong1-0/+15
Add more comment for cleanup_insns. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::pre_vsetvl): Add function comment for cleanup_insns.
2023-04-24RISC-V: Optimize fault only first loadJuzhe-Zhong8-1/+177
V2 patch for: https://patchwork.sourceware.org/project/gcc/patch/20230330012804.110539-1-juzhe.zhong@rivai.ai/ which has been reviewed. This patch address Jeff's comment, refine ChangeLog to give more clear information. gcc/ChangeLog: * config/riscv/vector-iterators.md: New unspec to refine fault first load pattern. * config/riscv/vector.md: Refine fault first load pattern to erase avl from instructions with the fault first load property. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/ffload-1.c: New test. * gcc.target/riscv/rvv/vsetvl/ffload-2.c: New test. * gcc.target/riscv/rvv/vsetvl/ffload-3.c: New test. * gcc.target/riscv/rvv/vsetvl/ffload-5.c: New test. * gcc.target/riscv/rvv/vsetvl/ffload-6.c: New test. * gcc.target/riscv/rvv/vsetvl/ffload-7.c: New test.
2023-04-24Add testcases for ffs/ctz vectorization.liuhongt10-0/+786
gcc/testsuite/ChangeLog: PR tree-optimization/109011 * gcc.target/i386/pr109011-b1.c: New test. * gcc.target/i386/pr109011-b2.c: New test. * gcc.target/i386/pr109011-d1.c: New test. * gcc.target/i386/pr109011-d2.c: New test. * gcc.target/i386/pr109011-q1.c: New test. * gcc.target/i386/pr109011-q2.c: New test. * gcc.target/i386/pr109011-w1.c: New test. * gcc.target/i386/pr109011-w2.c: New test.
2023-04-24Daily bump.GCC Administrator3-1/+83
2023-04-23modula2: Add -lnsl -lsocket libraries to gcc/testsuite/lib/gm2.expGaius Mulley1-0/+4
Solaris requires -lnsl -lsocket (present in the driver) but not when running the testsuite. This patch tests target for *-*-solaris2 and conditionally appends the above libraries. gcc/testsuite/ChangeLog: * lib/gm2.exp (gm2_target_compile_default): Conditionally append -lnsl -lsocket to ldflags. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-04-23aarch64: Annotate fcvtn pattern for vec_concat with zeroesKyrylo Tkachov2-1/+33
Using the define_substs in aarch64-simd.md this is a straightforward annotation to remove a redundant fmov insn. So the codegen goes from: foo_d: fcvtn v0.2s, v0.2d fmov d0, d0 ret to the simple: foo_d: fcvtn v0.2s, v0.2d ret Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_float_truncate_lo_): Rename to... (aarch64_float_truncate_lo_<mode><vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: * gcc.target/aarch64/float_truncate_zero.c: New test.
2023-04-23aarch64: Add vect_concat with zeroes annotation to addp patternKyrylo Tkachov2-7/+12
Similar to others, the addp pattern can be safely annotated with <vczle><vczbe> to create the implicit vec_concat-with-zero variants. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (aarch64_addp<mode>): Rename to... (aarch64_addp<mode><vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: Add testing for vpadd intrinsics.
2023-04-23[xstormy16] Update xstormy16_rtx_costs.Roger Sayle2-11/+163
This patch provides an improved rtx_costs target hook on xstormy16. The current implementation has the unfortunate property that it claims that zero_extendhisi2 is very cheap, even though the machine description doesn't provide that instruction/pattern. Doh! Rewriting the xstormy16_rtx_costs function has additional benefits, including making more use of the (short) "mul" instruction when optimizing for size with -Os. 2023-04-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.cc (xstormy16_rtx_costs): Rewrite to provide reasonable values for common arithmetic operations and immediate operands (in several machine modes). gcc/testsuite/ChangeLog * gcc.target/xstormy16/mulhi.c: New test case.
2023-04-23[xstormy16] Add extendhisi2 and zero_extendhisi2 patterns to stormy16.mdRoger Sayle4-7/+46
This patch adds a pair of define_insn patterns to the xstormy16 machine description that provide extendhisi2 and zero_extendhisi2, i.e. 16-bit to 32-bit sign- and zero-extension respectively. This functionality is already synthesized during RTL expansion, but providing patterns allow the semantics to be exposed to the RTL optimizers. To simplify things, this patch introduces a new %h0 output format, for emitting the high_part register name of a double-word (SImode) register pair. The actual code generated is identical to before. Whilst there, I also fixed the instruction lengths and formatting of the zero_extendqihi2 pattern. Then, mostly for documentation purposes as the 'T' constraint isn't yet implemented, I've added a "and Rx,#255" alternative to zero_extendqihi2 that takes advantage of its efficient instruction encoding. 2023-04-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.cc (xstormy16_print_operand): Add %h format specifier to output high_part register name of SImode reg. * config/stormy16/stormy16.md (extendhisi2): New define_insn. (zero_extendqihi2): Fix lengths, consistent formatting and add "and Rx,#255" alternative, for documentation purposes. (zero_extendhisi2): New define_insn. gcc/testsuite/ChangeLog * gcc.target/xstormy16/extendhisi2.c: New test case. * gcc.target/xstormy16/zextendhisi2.c: Likewise.
2023-04-23[xstormy16] Improved SImode shifts by two bits.Roger Sayle2-0/+35
Currently on xstormy16 SImode shifts by a single bit require two instructions, and shifts by other non-zero integer immediate constants require five instructions. This patch implements the obvious optimization that shifts by two bits can be done in four instructions, by using two single-bit sequences. Hence, ashift_2 was previously generated as: mov r7,r2 | shl r2,#2 | shl r3,#2 | shr r7,#14 | or r3,r7 ret and with this patch we now generate: shl r2,#1 | rlc r3,#1 | shl r2,#1 | rlc r3,#1 ret 2023-04-23 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/stormy16/stormy16.cc (xstormy16_output_shift): Implement SImode shifts by two by performing a single bit SImode shift twice. gcc/testsuite/ChangeLog * gcc.target/xstormy16/shiftsi.c: New test case.
2023-04-23Handle NANs in frange::operator== [PR109593]Aldy Hernandez1-0/+10
This patch... commit 10e481b154c5fc63e6ce4b449ce86cecb87a6015 Return true from operator== for two identical ranges containing NAN. removed the check for NANs, which caused us to read from m_min and m_max which are undefined for NANs. gcc/ChangeLog: PR tree-optimization/109593 * value-range.cc (frange::operator==): Handle NANs.
2023-04-23Adjust testcases after better RA decision.liuhongt5-204/+790
After optimization for RA, memory op is not propagated into instructions(>1), and it make testcases not generate vxorps since the memory is loaded into the dest, and the dest is never unused now. So rewrite testcases to make the codegen more stable. gcc/testsuite/ChangeLog: * gcc.target/i386/avx2-dest-false-dep-for-glc.c: Rewrite testcase to make the codegen more stable. * gcc.target/i386/avx512dq-dest-false-dep-for-glc.c: Ditto * gcc.target/i386/avx512f-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512fp16-dest-false-dep-for-glc.c: Ditto. * gcc.target/i386/avx512vl-dest-false-dep-for-glc.c: Ditto.
2023-04-23Use NO_REGS in cost calculation when the preferred register class are not ↵liuhongt2-1/+20
known yet. gcc/ChangeLog: PR rtl-optimization/108707 * ira-costs.cc (scan_one_insn): Use NO_REGS instead of GENERAL_REGS when preferred reg_class is not known. gcc/testsuite/ChangeLog: * gcc.target/i386/pr108707.c: New test.
2023-04-23Daily bump.GCC Administrator4-1/+100
2023-04-22PHIOPT: Improve readability of tree_ssa_phiopt_workerAndrew Pinski1-25/+21
This small patch just changes around the code slightly to make it easier to understand that the cases were handling diamond shaped BB for both do_store_elim/do_hoist_loads. There is no effect on code output at all since all of the checks are the same still. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Change the code around slightly to move diamond handling for do_store_elim/do_hoist_loads out of the big if/else.
2023-04-22PHIOPT: Improve minmax diamond detection for phiopt1Andrew Pinski2-7/+6
For diamond bb phi node detection, there is a check to make sure bb1 is not empty. But in the case where bb1 is empty except for a predicate, empty_block_p will still return true but the minmax code handles that case already so there is no reason to check if the basic block is empty. This patch removes that check and removes some xfails. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove check on empty_block_p. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-5.c: Remvoe some xfail.
2023-04-22[Committed] Move new test case to gcc.target/avr/mmcu/pr54816.cRoger Sayle1-0/+0
AVR test cases that specify a specific -mmcu option need to be placed in the gcc.target/avr/mmcu subdirectory. Moved thusly. 2023-04-22 Roger Sayle <roger@nextmovesoftware.com> gcc/testsuite/ChangeLog PR target/54816 * gcc.target/avr/pr54816.c: Move to... * gcc.target/avr/mmcu/pr54816.c: ... here.
2023-04-22Fortran: function results never have the ALLOCATABLE attribute [PR109500]Harald Anlauf2-0/+48
Fortran 2018 8.5.3 (ALLOCATABLE attribute) explains in Note 1 that the result of referencing a function whose result variable has the ALLOCATABLE attribute is a value that does not itself have the ALLOCATABLE attribute. gcc/fortran/ChangeLog: PR fortran/109500 * interface.cc (gfc_compare_actual_formal): Reject allocatable functions being used as actual argument for allocable dummy. gcc/testsuite/ChangeLog: PR fortran/109500 * gfortran.dg/allocatable_function_11.f90: New test. Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>
2023-04-22testsuite: Fix up pr109011-*.c tests for powerpc [PR109572]Jakub Jelinek5-18/+18
As reported, pr109011-{4,5}.c tests fail on powerpc. I thought they should have the same counts as the corresponding -{2,3}.c tests, the only difference is that -{2,3}.c are int while -{4,5}.c are long long. But there are 2 issues. One is that in the foo function the vectorization costs comparison triggered in, while in -{2,3}.c we use vectorization factor 4 and it was found beneficial, when using long long it was just vf 2 and the scalar cost of doing p[i] = __builtin_ctzll (q[i]) twice looked smaller than the vectorizated statements. I could disable the cost model, but instead chose to add some further arithmetics to those functions to make it beneficial even with vf 2. After that change, pr109011-4.c still failed; I was expecting 4 .CTZ calls there on power9, 3 vectorized and one in scalar code, but for some reason the scalar one didn't trigger. As I really want to count just the vectorized calls, I've added the vect prefix on the variables to ensure I'm only counting vectorized calls and decreased the 4 counts to 3. 2023-04-22 Jakub Jelinek <jakub@redhat.com> PR testsuite/109572 * gcc.dg/vect/pr109011-1.c: In scan-tree-dump-times regexps match also vect prefix to make sure we only count vectorized calls. * gcc.dg/vect/pr109011-2.c: Likewise. On powerpc* expect just count 3 rather than 4. * gcc.dg/vect/pr109011-3.c: In scan-tree-dump-times regexps match also vect prefix to make sure we only count vectorized calls. * gcc.dg/vect/pr109011-4.c: Likewise. On powerpc* expect just count 3 rather than 4. (foo): Add 2 further arithmetic ops to the loop to make it appear worthwhile for vectorization heuristics on powerpc. * gcc.dg/vect/pr109011-5.c: In scan-tree-dump-times regexps match also vect prefix to make sure we only count vectorized calls. (foo): Add 2 further arithmetic ops to the loop to make it appear worthwhile for vectorization heuristics on powerpc.
2023-04-22Fix up bootstrap with GCC 4.[89] after RAII auto_mpfr and autp_mpz [PR109589]Jakub Jelinek2-0/+6
On Tue, Apr 18, 2023 at 03:39:41PM +0200, Richard Biener via Gcc-patches wrote: > The following adds two RAII classes, one for mpz_t and one for mpfr_t > making object lifetime management easier. Both formerly require > explicit initialization with {mpz,mpfr}_init and release with > {mpz,mpfr}_clear. This unfortunately broke bootstrap when using GCC 4.8.x or 4.9.x as it uses deleted friends which weren't supported until PR62101 fixed them in 2014 for GCC 5. The following patch adds an workaround, not deleting those friends for those old versions. While it means if people add those mp*_{init{,2},clear} calls on auto_mp* objects they won't notice when doing non-bootstrap builds using very old system compilers, people should be bootstrapping their changes and it will be caught during bootstraps even when starting with those old compilers, plus most people actually use much newer compilers when developing. 2023-04-22 Jakub Jelinek <jakub@redhat.com> PR bootstrap/109589 * system.h (class auto_mpz): Workaround PR62101 bug in GCC 4.8 and 4.9. * realmpfr.h (class auto_mpfr): Likewise.
2023-04-22Adjust rx movsicc testsJeff Law9-94/+142
The rx port has target specific test movsicc which is naturally meant to verify that if-conversion is happening on the expected cases. Unfortunately the test is poorly written. The core problem is there are 8 distinct tests and each of those tests is expected to generate a specific sequence. Unfortunately, various generic bits might turn an equality test into an inequality test or make other similar changes. The net result is the assembly matching patterns may find a particular sequence, but it may be for a different function than was originally intended. ie, test1's output may match the expected assembly for test5. Ugh! This patch breaks the movsicc test down into 8 distinct tests and adjusts the patterns they match. The nice thing is all these tests are supposed to have branches that use a bCC 1f form. So we can make them a bit more robust by ignoring the actual condition code used. So if we change eq to ne, as long as we match the movsicc pattern, we're OK. And the 1f style is only used by the movsicc pattern. With the tests broken down it's a lot easier to diagnose why one test fails after the recent changes to if-conversion. movsicc-3 fails because of the profitability test. It's more expensive than the other cases because of its use of (const_int 10) rather than (const_int 0). (const_int 0) naturally has a smaller cost. It looks to me like in this context (const_int 10) should have the same cost as (const_int 0). But I'm nowhere near well versed in the cost model for the rx port. So I'm just leaving the test as xfailed. If someone cares enough, they can dig into it further. gcc/testsuite * gcc.target/rx/movsicc.c: Broken down into ... * gcc.target/rx/movsicc-1.c: Here. * gcc.target/rx/movsicc-2.c: Here. * gcc.target/rx/movsicc-3.c: Here. xfail one test. * gcc.target/rx/movsicc-4.c: Here. * gcc.target/rx/movsicc-5.c: Here. * gcc.target/rx/movsicc-6.c: Here. * gcc.target/rx/movsicc-7.c: Here. * gcc.target/rx/movsicc-8.c: Here.
2023-04-22match.pd: Fix fneg/fadd optimization [PR109583]Jakub Jelinek2-1/+27
The following testcase ICEs on x86, foo function since my r14-22 improvement, but bar already since r13-4122. The problem is the same, in the if expression related_vector_mode is called and that starts with gcc_assert (VECTOR_MODE_P (vector_mode)); but nothing in the fneg/fadd match.pd pattern actually checks if the VEC_PERM type has VECTOR_MODE_P (vec_mode). In this case it has BLKmode and so it ICEs. The following patch makes sure we don't ICE on it. 2023-04-22 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109583 * match.pd (fneg/fadd simplify): Don't call related_vector_mode if vec_mode is not VECTOR_MODE_P. * gcc.dg/pr109583.c: New test.
2023-04-22Update loop estimate after header duplicationJan Hubicka6-38/+178
Loop header copying implements partial loop peelng. If all exits of the loop are peeled (which is the common case) the number of iterations decreases by 1. Without noting this, for loops iterating zero times, we end up re-peeling them later in the loop peeling pass which is wasteful. This patch commonizes the code for esitmate update and adds logic to detect when all (likely) exits were peeled by loop-ch. We are still wrong about update of estimate however: if the exits behave randomly with given probability, loop peeling does not decrease expected iteration counts, just decreases probability that loop will be executed. In this case we thus incorrectly decrease any_estimate. Doing so however at least help us to not peel or optimize hard the lop later. If the loop iterates precisely the estimated nuner of iterations. the estimate decreases, but we are wrong about decreasing the header frequncy. We already have logic that tries to prove that loop exit will not be taken in peeled out iterations and it may make sense to special case this. I also fixed problem where we had off by one error in iteration count updating. It makes perfect sense to expect loop to have 0 iterations. However if bounds drops to negative, we lose info about the loop behaviour (since we have no profile data reaching the loop body). Bootstrapped/regtested x86_64-linux, comitted. Honza gcc/ChangeLog: 2023-04-22 Jan Hubicka <hubicka@ucw.cz> Ondrej Kubanek <kubanek0ondrej@gmail.com> * cfgloopmanip.h (adjust_loop_info_after_peeling): Declare. * tree-ssa-loop-ch.cc (ch_base::copy_headers): Fix updating of loop profile and bounds after header duplication. * tree-ssa-loop-ivcanon.cc (adjust_loop_info_after_peeling): Break out from try_peel_loop; fix handling of 0 iterations. (try_peel_loop): Use adjust_loop_info_after_peeling. gcc/testsuite/ChangeLog: 2023-04-22 Jan Hubicka <hubicka@ucw.cz> Ondrej Kubanek <kubanek0ondrej@gmail.com> * gcc.dg/tree-ssa/peel1.c: Decrease number of peels by 1. * gcc.dg/unroll-8.c: Decrease loop iteration estimate. * gcc.dg/tree-prof/peel-2.c: New test.
2023-04-22Daily bump.GCC Administrator6-1/+301
2023-04-21Do not fold ADDR_EXPR conditions leading to builtin_unreachable early.Andrew MacLeod2-1/+27
Ranges can not represent &var globally yet, so we cannot fold these expressions early or we lose the __builtin_unreachable information. PR tree-optimization/109546 gcc/ * tree-vrp.cc (remove_unreachable::remove_and_update_globals): Do not fold conditions with ADDR_EXPR early. gcc/testsuite/ * gcc.dg/pr109546.c: New.
2023-04-21c++: fix 'unsigned typedef-name' extension [PR108099]Jason Merrill4-12/+60
In the comments for PR108099 Jakub provided some testcases that demonstrated that even before the regression noted in the patch we were getting the semantics of this extension wrong: in the unsigned case we weren't producing the corresponding standard unsigned type but another distinct one of the same size, and in the signed case we were just dropping it on the floor and not actually returning a signed type at all. The former issue is fixed by using c_common_signed_or_unsigned_type instead of unsigned_type_for, and the latter issue by adding a (signed_p && typedef_decl) case. This patch introduces a failure on std/ranges/iota/max_size_type.cc due to the latter issue, since the testcase expects 'signed rep_t' to do something sensible, and previously we didn't. Now that we do, it exposes a bug in the __max_diff_type::operator>>= handling of sign extension: when we evaluate -1000 >> 2 in __max_diff_type we keep the MSB set, but leave the second-most-significant bit cleared. PR c++/108099 gcc/cp/ChangeLog: * decl.cc (grokdeclarator): Don't clear typedef_decl after 'unsigned typedef' pedwarn. Use c_common_signed_or_unsigned_type. Also handle 'signed typedef'. gcc/testsuite/ChangeLog: * g++.dg/ext/int128-8.C: Remove xfailed dg-bogus markers. * g++.dg/ext/unsigned-typedef2.C: New test. * g++.dg/ext/unsigned-typedef3.C: New test.