aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-08-25Apply some TLC to vect_slp_analyze_instance_dependenceRichard Biener1-130/+108
This refactors things, separating load and store handing, adjusting comments to reflect reality and removing some dead code. * tree-vect-data-refs.cc (vect_slp_analyze_store_dependences): Split out from vect_slp_analyze_node_dependences, remove dead code. (vect_slp_analyze_load_dependences): Split out from vect_slp_analyze_node_dependences, adjust comments. Process queued stores before any disambiguation. (vect_slp_analyze_node_dependences): Remove. (vect_slp_analyze_instance_dependence): Adjust.
2023-08-25[frange] Relax floating point relational folding.Aldy Hernandez2-28/+143
This patch implements a new frelop_early_resolve() that handles the NAN special cases instead of calling into the integer version which can break for some combinations. Relaxing FP conditional folding in this matter allows ranger to do a better job resulting in more threading opportunities, among other things. In auditing ranger versus DOM scoped tables I've noticed we are too cautious when folding floating point conditionals involving relationals. We refuse to fold anything if there is the possibility of a NAN, but this is overly restrictive. For example: if (x_5 != y_8) if (x_5 != y_8) link_error (); In range-ops, we fail to fold the second conditional because frelop_early_resolve bails on anything that may have a NAN, but in the above case the possibility of a NAN is inconsequential. However, there are some cases where we must be careful, because a NAN can complicate matters: if (x_5 == x_5) ... Here the operands to EQ_EXPR are the same so we get VREL_EQ as the relation. However, we can't fold the conditional unless we know x_5 cannot be a NAN. On the other hand, we can fold the second conditional here: if (x_5 == x_5) if (x_5 > x_5) Because on the TRUE side of the first conditional we are guaranteed to be free of NANs. This patch is basically an inline of the integer version of relop_early_resolve() with special casing for floats. The main thing to keep in mind is that the relation coming into a range-op entry may have a NAN, and for that one must look at the operands. This makes the relations akin to unordered comparisons, making VREL_LT behave like VREL_UNLT would. The tricky corner cases are VREL_EQ and VREL_NE, as discussed above. Apart from these that are special cased, the relation table for intersect should work fine for returning a FALSE, even with NANs. The union table, not so much and is documented in the code. This allows us to add some optimizations for the unordered operators. For example, a relation of VREL_LT on entry to an operator allows us to fold an UNLT_EXPR as true, even with NANs because in this case VREL_LT is really VREL_UNLT which maps perfectly. BTW, we batted some ideas on how to get this work, and it seems this is the cleaner route with the special cases nestled in the operators themselves. Another idea is to add unordered relations, but that would require bloating the various tables adding spots for VREL_UNEQ, VREL_UNLT, etc, plus adding relations for VREL_UNORDERED so the intersects work correctly. I'm not wed to either one, and we can certainly revisit this if it becomes burdensome to maintain (or to get right). gcc/ChangeLog: * range-op-float.cc (frelop_early_resolve): Rewrite for better NAN handling. (operator_not_equal::fold_range): Adjust for relations. (operator_lt::fold_range): Same. (operator_gt::fold_range): Same. (foperator_unordered_equal::fold_range): Same. (foperator_unordered_lt::fold_range): Same. (foperator_unordered_le::fold_range): Same. (foperator_unordered_gt::fold_range): Same. (foperator_unordered_ge::fold_range): Same. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/vrp-float-12.c: New test.
2023-08-25tree-optimization/111136 - STMT_VINFO_SLP_VECT_ONLY and storesRichard Biener1-2/+7
vect_dissolve_slp_only_groups currently only expects loads, for stores we have to make sure to mark the dissolved "groups" strided. PR tree-optimization/111136 * tree-vect-loop.cc (vect_dissolve_slp_only_groups): For stores force STMT_VINFO_STRIDED_P and also duplicate that to all elements.
2023-08-25RISC-V: Add early continue for ENTRY and EXIT blockJuzhe-Zhong1-0/+2
Committed. gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (pass_vsetvl::compute_local_properties): Add early continue.
2023-08-25Refactor mode iterator V_128 and V_128H, V_256 and V_256Hliuhongt1-58/+7
Merge then together. gcc/ChangeLog: * config/i386/sse.md (vec_set<mode>): Removed. (V_128H): Merge into .. (V_128): .. this. (V_256H): Merge into .. (V_256): .. this. (V_512): Add V32HF, V32BF. (*ssse3_palignr<mode>_perm): Adjust mode iterator from V_128H to V_128. (vcond<mode><sseintvecmodelower>): Removed (vcondu<mode><sseintvecmodelower>): Removed. (avx_vbroadcastf128_<mode>): Refator from V_256H to V_256.
2023-08-24RISC-V: Move vector-abi testcases into rvv/base folderPatrick O'Neill9-0/+0
Resolves failures like this on rv32gcv linux: compiler exited with status 1 output is: In file included from /tc-baseline/build-linux-gcv/sysroot/usr/include/features.h:515, from /tc-baseline/build-linux-gcv/sysroot/usr/include/bits/libc-header-start.h:33, from /tc-baseline/build-linux-gcv/sysroot/usr/include/stdint.h:26, from /tc-baseline/build-linux-gcv/lib/gcc/riscv32-unknown-linux-gnu/14.0.0/include/stdint.h:9, from /tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/stdint.h:9, from /tc-baseline/build-linux-gcv/build-gcc-linux-stage2/gcc/include/riscv_vector.h:28, from /tc-baseline/gcc/gcc/testsuite/gcc.target/riscv/vector-abi-1.c:4: /tc-baseline/build-linux-gcv/sysroot/usr/include/gnu/stubs.h:17:11: fatal error: gnu/stubs-lp64d.h: No such file or directory compilation terminated. Tested using: rv{32/64}{gc/gcv} newlib rv{32/64}gcv linux gcc/testsuite/ChangeLog: * gcc.target/riscv/vector-abi-1.c: Moved to... * gcc.target/riscv/rvv/base/vector-abi-1.c: ...here. * gcc.target/riscv/vector-abi-2.c: Moved to... * gcc.target/riscv/rvv/base/vector-abi-2.c: ...here. * gcc.target/riscv/vector-abi-3.c: Moved to... * gcc.target/riscv/rvv/base/vector-abi-3.c: ...here. * gcc.target/riscv/vector-abi-4.c: Moved to... * gcc.target/riscv/rvv/base/vector-abi-4.c: ...here. * gcc.target/riscv/vector-abi-5.c: Moved to... * gcc.target/riscv/rvv/base/vector-abi-5.c: ...here. * gcc.target/riscv/vector-abi-6.c: Moved to... * gcc.target/riscv/rvv/base/vector-abi-6.c: ...here. * gcc.target/riscv/vector-abi-7.c: Moved to... * gcc.target/riscv/rvv/base/vector-abi-7.c: ...here. * gcc.target/riscv/vector-abi-8.c: Moved to... * gcc.target/riscv/rvv/base/vector-abi-8.c: ...here. * gcc.target/riscv/vector-abi-9.c: Moved to... * gcc.target/riscv/rvv/base/vector-abi-9.c: ...here. Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
2023-08-25Fix avx512ne2ps2bf16 wrong code [PR 111127]Hongyu Wang2-2/+26
Correct the parameter order for avx512ne2ps2bf16_maskz expander gcc/ChangeLog: PR target/111127 * config/i386/sse.md (avx512f_cvtne2ps2bf16_<mode>_maskz): Adjust paramter order. gcc/testsuite/ChangeLog: PR target/111127 * gcc.target/i386/pr111127.c: New test.
2023-08-25Daily bump.GCC Administrator6-1/+892
2023-08-24i386: Optimize pinsrq of 0 with index 1 into movq [PR94866]Uros Bizjak2-0/+25
Add new pattern involving vec_merge RTX that is produced by combine from the combination of sse4_1_pinsrq and *movdi_internal: 7: r86:DI=0 8: r85:V2DI=vec_merge(vec_duplicate(r86:DI),r87:V2DI,0x2) REG_DEAD r87:V2DI REG_DEAD r86:DI Successfully matched this instruction: (set (reg:V2DI 85 [ a ]) (vec_merge:V2DI (reg:V2DI 87) (const_vector:V2DI [ (const_int 0 [0]) repeated x2 ]) (const_int 1 [0x1]))) PR target/94866 gcc/ChangeLog: * config/i386/sse.md (*sse2_movq128_<mode>_1): New insn pattern. gcc/testsuite/ChangeLog: * g++.target/i386/pr94866.C: New test.
2023-08-24Fix tests for PR 106537.Jose E. Marchesi2-4/+8
This patch fixes the tests for PR 106537 (support for -W[no]-compare-distinct-pointer-types) which were expecting the warning when checking for equality/inequality of void pointers with non-function pointers. gcc/testsuite/ChangeLog: PR c/106537 * gcc.c-torture/compile/pr106537-1.c: Comparing void pointers to non-function pointers is legit. * gcc.c-torture/compile/pr106537-2.c: Likewise.
2023-08-24analyzer: implement kf_strcat [PR105899]David Malcolm7-20/+275
gcc/analyzer/ChangeLog: PR analyzer/105899 * call-details.cc (call_details::check_for_null_terminated_string_arg): Split into overloads, one taking just an arg_idx, the other a new "include_terminator" param. * call-details.h: Likewise. * kf.cc (class kf_strcat): New. (kf_strcpy::impl_call_pre): Update for change to check_for_null_terminated_string_arg. (register_known_functions): Register kf_strcat. * region-model.cc (region_model::check_for_null_terminated_string_arg): Split into overloads, one taking just an arg_idx, the other a new "include_terminator" param. When returning an svalue, handle "include_terminator" being false by subtracting one. * region-model.h (region_model::check_for_null_terminated_string_arg): Split into overloads, one taking just an arg_idx, the other a new "include_terminator" param. gcc/ChangeLog: PR analyzer/105899 * doc/invoke.texi (Static Analyzer Options): Add "strcat" to the list of functions known to the analyzer. gcc/testsuite/ChangeLog: PR analyzer/105899 * gcc.dg/analyzer/strcat-1.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-08-24analyzer: handle strlen(BITS_WITHIN) [PR105899]David Malcolm1-1/+20
gcc/analyzer/ChangeLog: PR analyzer/105899 * region-model.cc (fragment::has_null_terminator): Handle SK_BITS_WITHIN. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-08-24analyzer: handle INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL) [PR105899]David Malcolm1-0/+19
gcc/analyzer/ChangeLog: PR analyzer/105899 * region-model-manager.cc (region_model_manager::get_or_create_initial_value): Simplify INIT_VAL(ELEMENT_REG(STRING_REG), CONSTANT_SVAL) to CONSTANT_SVAL(STRING[N]). Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-08-24analyzer: handle strlen(INIT_VAL(STRING_REG)) [PR105899]David Malcolm2-21/+54
gcc/analyzer/ChangeLog: PR analyzer/105899 * region-model.cc (fragment::has_null_terminator): Move STRING_CST handling to fragment::string_cst_has_null_terminator; also use it to handle INIT_VAL(STRING_REG). (fragment::string_cst_has_null_terminator): New, from above. gcc/testsuite/ChangeLog: PR analyzer/105899 * gcc.dg/analyzer/strcpy-3.c (test_2): New. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-08-24analyzer: reimplement kf_memcpy_memmoveDavid Malcolm3-10/+48
gcc/analyzer/ChangeLog: * kf.cc (kf_memcpy_memmove::impl_call_pre): Reimplement using region_model::copy_bytes. * region-model.cc (region_model::read_bytes): New. (region_model::copy_bytes): New. * region-model.h (region_model::read_bytes): New decl. (region_model::copy_bytes): New decl. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-08-24analyzer: eliminate region_model::get_string_size [PR105899]David Malcolm2-32/+0
gcc/analyzer/ChangeLog: PR analyzer/105899 * region-model.cc (region_model::get_string_size): Delete both. * region-model.h (region_model::get_string_size): Delete both decls. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-08-24analyzer: reimplement kf_strcpy [PR105899]David Malcolm7-23/+150
This patch reimplements the analyzer's implementation of strcpy using the region_model::scan_for_null_terminator infrastructure, so that e.g. it can complain about out-of-bounds reads/writes, unterminated strings, etc. gcc/analyzer/ChangeLog: PR analyzer/105899 * kf.cc (kf_strcpy::impl_call_pre): Reimplement using check_for_null_terminated_string_arg. * region-model.cc (region_model::get_store_bytes): Shortcut reading all of a string_region. (region_model::scan_for_null_terminator): Use get_store_value for the bytes rather than "unknown" when returning an unknown length. (region_model::write_bytes): New. * region-model.h (region_model::write_bytes): New decl. gcc/testsuite/ChangeLog: PR analyzer/105899 * gcc.dg/analyzer/out-of-bounds-diagram-16.c: New test. * gcc.dg/analyzer/strcpy-1.c: Add test coverage. * gcc.dg/analyzer/strcpy-3.c: Likewise. * gcc.dg/analyzer/strcpy-4.c: New test. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-08-24analyzer: handle symbolic bindings in scan_for_null_terminator [PR105899]David Malcolm2-0/+26
gcc/analyzer/ChangeLog: PR analyzer/105899 * region-model.cc (iterable_cluster::iterable_cluster): Add symbolic binding keys to m_symbolic_bindings. (iterable_cluster::has_symbolic_bindings_p): New. (iterable_cluster::m_symbolic_bindings): New field. (region_model::scan_for_null_terminator): Treat clusters with symbolic bindings as having unknown strlen. gcc/testsuite/ChangeLog: PR analyzer/105899 * gcc.dg/analyzer/sprintf-1.c: Include "analyzer-decls.h". (test_strlen_1): New. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-08-24analyzer: add logging to impl_path_contextDavid Malcolm1-2/+11
gcc/analyzer/ChangeLog: * engine.cc (impl_path_context::impl_path_context): Add logger param. (impl_path_context::bifurcate): Add log message. (impl_path_context::terminate_path): Likewise. (impl_path_context::m_logger): New field. (exploded_graph::process_node): Pass logger to path_ctxt ctor. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2023-08-24tree-optimization/111123 - indirect clobbers thrown away too earlyRichard Biener3-17/+52
The testcase in the PR shows that late uninit diagnostic relies on indirect clobbers in CTORs but we throw those away in the fab pass which is too early. The reasoning was they were supposed to keep SSA names live but that's no longer the case since DCE doesn't treat them as keeping SSA uses live. The following instead removes them before out-of-SSA coalescing which is the thing that's still affected by them. PR tree-optimization/111123 * tree-ssa-ccp.cc (pass_fold_builtins::execute): Do not remove indirect clobbers here ... * tree-outof-ssa.cc (rewrite_out_of_ssa): ... but here. (remove_indirect_clobbers): New function. * g++.dg/warn/Wuninitialized-pr111123-1.C: New testcase.
2023-08-24Check that passes do not forget to define profileJan Hubicka9-0/+51
This patch extends verifier to check that all probabilities and counts are initialized if profile is supposed to be present. This is a bit complicated by the posibility that we inline !flag_guess_branch_probability function into function with profile defined and in this case we need to stop verification. For this reason I added flag to cfg structure tracking this. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: * cfg.h (struct control_flow_graph): New field full_profile. * auto-profile.cc (afdo_annotate_cfg): Set full_profile to true. * cfg.cc (init_flow): Set full_profile to false. * graphite.cc (graphite_transform_loops): Set full_profile to false. * lto-streamer-in.cc (input_cfg): Initialize full_profile flag. * predict.cc (pass_profile::execute): Set full_profile to true. * symtab-thunks.cc (expand_thunk): Set full_profile to true. * tree-cfg.cc (gimple_verify_flow_info): Verify that profile is full if full_profile is set. * tree-inline.cc (initialize_cfun): Initialize full_profile. (expand_call_inline): Combine full_profile.
2023-08-24libstdc++: Add test for illegal pointer arithmetic in format [PR111102]Paul Dreik1-0/+15
libstdc++-v3/ChangeLog: PR libstdc++/111102 * testsuite/std/format/string.cc: Check wide character format strings with out-of-range widths.
2023-08-24libstdc++: fix illegal pointer arithmetic in format [PR111102]Paul Dreik1-1/+2
When parsing a format string, the width is parsed into an unsigned short but the result is not checked in the case the format string is not a char string (such as a wide string). In case the parse fails, a null pointer is returned which is used for pointer arithmetic which is undefined behaviour. Signed-off-by: Paul Dreik <gccpatches@pauldreik.se> libstdc++-v3/ChangeLog: PR libstdc++/111102 * include/std/format (__format::__parse_integer): Check for non-null pointer.
2023-08-24libstdc++: Fix -Wunused-but-set-variable in std::format_to testJonathan Wakely1-4/+4
libstdc++-v3/ChangeLog: * testsuite/std/format/functions/format_to.cc: Avoid warning for unused variables.
2023-08-24libstdc++: Tweak some preprocessor conditions for feature testsJonathan Wakely3-18/+18
Update a preprocessor condition using __cplusplus and _GLIBCXX_HOSTED to use the relevant feature test macro for <syncstream>. Also add comments to some conditions saying which C++ standard revision the check corresponds to. libstdc++-v3/ChangeLog: * include/std/atomic: Add comment to #ifdef and fix indentation. * include/std/ostream: Check __glibcxx_syncbuf instead of __cplusplus and _GLIBCXX_HOSTED. * include/std/thread: Add comment to #ifdef.
2023-08-24libstdc++: Implement new SI prefixes in <ratio> for C++23 (P2734R0)Jonathan Wakely4-20/+61
This is a no-op for libstdc++, because our intmax_t is a 64-bit type and so is incapable of representing the largest and smallest ratios from C++11, let alone the new ones. I've added them to the file anyway (and defined the feature test macro) so that if somebody ports libstdc++ to a target with 128-bit intmax_t then they'll be present. libstdc++-v3/ChangeLog: * include/bits/version.def (__cpp_lib_ratio): Define. * include/bits/version.h: Regenerate. * include/std/ratio (quecto, ronto, yocto, zepto) (zetta, yotta, ronna, quetta): Define. * testsuite/20_util/ratio/operations/ops_overflow_neg.cc: Adjust dg-error line numbers.
2023-08-24Fix confusion about load_p in vect_build_slp_tree_1Richard Biener1-18/+24
load_p is set and used as to whether the stmt is a memory operation, not whether it is only a load. The following renames it to ldst_p to avoid this confusion. It also replaces checking for a VUSE with checking STMT_VINFO_DATA_REF since VUSE checking doesn't work for pattern matched stores where no virtual operands are present. Where we want to distinguish between loads and stores we then check DR_IS_READ/WRITE. I've made a classification mistake with .MASK_STORE support and this hits other complications when dealing with single-lane SLP. * tree-vect-slp.cc (vect_build_slp_tree_1): Rename load_p to ldst_p, fix mistakes and rely on STMT_VINFO_DATA_REF.
2023-08-24libstdc++: Add pretty printer for std::localeJonathan Wakely2-0/+81
Print the locale's name, except when it uses the same named C locale for all categories except one, in which case print something like: std::locale = "en_GB.UTF-8" with "LC_CTYPE=en_US.UTF-8" libstdc++-v3/ChangeLog: * python/libstdcxx/v6/printers.py (StdLocalePrinter): New printer class. * testsuite/libstdc++-prettyprinters/locale.cc: New test.
2023-08-24libstdc++: Declutter std::optional and std:variant pretty printers [PR110944]Jonathan Wakely4-23/+22
As the PR says, including the template arguments in the GDB output of these class templates can result in very long names, especially for std::variant. You can use 'whatis' or other GDB commands to get details of the type, we don't need to include it in the value. We could consider including the type if it's not too long, but I think consistency is better (and we already omit the template arguments for std::vector and other class templates). libstdc++-v3/ChangeLog: PR libstdc++/110944 * python/libstdcxx/v6/printers.py (StdExpOptionalPrinter): Do not show template arguments. (StdVariantPrinter): Likewise. * testsuite/libstdc++-prettyprinters/compat.cc: Adjust expected output. * testsuite/libstdc++-prettyprinters/cxx17.cc: Likewise. * testsuite/libstdc++-prettyprinters/libfundts.cc: Likewise.
2023-08-24Fix profile update in gimple-harden-conditionals.ccJan Hubicka1-0/+1
gcc/ChangeLog: * gimple-harden-conditionals.cc (insert_check_and_trap): Set count of newly build trap bb.
2023-08-24RISC-V: Add COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS testcasesJuzhe-Zhong27-43/+121
This patch is depending on middle-end patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627621.html We already had COND_LEN_FNMA/COND_LEN_FMS/COND_FNMS patterns. Remove TARGET_PREFERRED_ELSE_VALUE since it forbid the COND_LEN_FMS/COND_LEN_FNMS STMT fold. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_preferred_else_value): Remove it since it forbid COND_LEN_FMS/COND_LEN_FNMS STMT fold. (TARGET_PREFERRED_ELSE_VALUE): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Adapt test. * gcc.target/riscv/rvv/autovec/binop/vadd-rv64gcv-nofm.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto. * gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-1.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-3.c: Ditto. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-10.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-11.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-12.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-6.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-7.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-8.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm-9.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-10.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-11.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-12.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-4.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-5.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-6.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-7.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-8.c: New test. * gcc.target/riscv/rvv/autovec/ternop/ternop_nofm_run-9.c: New test.
2023-08-24RISC-V: Enable pressure-aware scheduling by default.Robin Dapp27-25/+32
this patch enables pressure-aware scheduling for riscv. There have been various requests for it so I figured I'd just go ahead and send the patch. There is some slight regression in code quality for a number of vector tests where we spill more due to different instructions order. The ones I looked at were a mix of bad luck and/or brittle tests. Comparing the size of the generated assembly or the number of vsetvls for SPECint also didn't show any immediate benefit but that's obviously not a very fine-grained analysis. As cost and scheduling models mature I expect the situation to improve and for now I think it's generally favorable to enable pressure-aware scheduling so we can work with it rather than trying to find every possible problem in advance. gcc/ChangeLog: * common/config/riscv/riscv-common.cc: Add -fsched-pressure. * config/riscv/riscv.cc (riscv_option_override): Set sched pressure algorithm. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/narrow_constraint-1.c: Add -fno-sched-pressure. * gcc.target/riscv/rvv/base/narrow_constraint-17.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-18.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-19.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-20.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-21.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-22.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-23.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-24.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-25.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-26.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-27.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-28.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-29.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-30.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-31.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-4.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-5.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-8.c: Ditto. * gcc.target/riscv/rvv/base/narrow_constraint-9.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-10.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-11.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-12.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-3.c: Ditto. * gcc.target/riscv/rvv/vsetvl/vlmax_bb_prop-9.c: Ditto.
2023-08-24RISC-V: Allow const 17-31 for vector shift.Robin Dapp2-1/+18
This patch adds a missing constraint in order to be able to print (and not ICE) vector immediates 17-31 for vector shifts. Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> gcc/ChangeLog: * config/riscv/riscv.cc (riscv_print_operand): Allow vk operand. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/shift-immediate.c: New test.
2023-08-24RISC-V: Add missing conversion tests.Robin Dapp17-20/+302
This adds some missing tests for vf[nw]cvt. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-run.c: Add tests. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-ftoi-template.h: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-template.h: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfncvt-itof-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-template.h: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-ftoi-zvfh-run.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-run.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv32gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-rv64gcv.c: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-template.h: Ditto. * gcc.target/riscv/rvv/autovec/conversions/vfwcvt-itof-zvfh-run.c: Ditto.
2023-08-24RISC-V: Fix reduc_strict_run-1 test case.Robin Dapp1-1/+2
This patch fixes the reduc_strict_run-1 testcase by introducing a variable that holds the reference result. This is necessary because in presence of _Float16 emulation an intermediate result used in a comparison is computed in higher precision. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/reduc/reduc_strict_run-1.c: Add variable to hold reference result.
2023-08-24tree-optimization/111125 - avoid BB vectorization in novector loopsRichard Biener1-12/+29
When a loop is marked with #pragma GCC novector the following makes sure to also skip BB vectorization for contained blocks. That avoids gcc.dg/vect/bb-slp-29.c failing on aarch64 because of extra BB vectorization therein. I'm not specifically dealing with sub-loops of novector loops, the desired semantics isn't documented. PR tree-optimization/111125 * tree-vect-slp.cc (vect_slp_function): Split at novector loop entry, do not push blocks in novector loops.
2023-08-24c: Add support for [[__extension__ ...]]Richard Sandiford4-20/+193
[[]] attributes are a recent addition to C, but as a GNU extension, GCC allows them to be used in C11 and earlier. Normally this use would trigger a pedwarn (for -pedantic, -Wc11-c2x-compat, etc.). This patch allows the pedwarn to be suppressed by starting the attribute-list with __extension__. Also, :: is not a single lexing token prior to C2X, so it wasn't possible to use scoped attributes in C11, even as a GNU extension. The patch allows two colons to be used in place of :: when __extension__ is used. No attempt is made to check whether the two colons are immediately adjacent. gcc/ * doc/extend.texi: Document the C [[__extension__ ...]] construct. gcc/c/ * c-parser.cc (c_parser_std_attribute): Conditionally allow two colons to be used in place of ::. (c_parser_std_attribute_list): New function, split out from... (c_parser_std_attribute_specifier): ...here. Allow the attribute-list to start with __extension__. When it does, also allow two colons to be used in place of ::. gcc/testsuite/ * gcc.dg/c2x-attr-syntax-6.c: New test. * gcc.dg/c2x-attr-syntax-7.c: Likewise.
2023-08-24gimple_fold: Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple foldJuzhe-Zhong4-10/+138
Hi, Richard and Richi. Currently, GCC support COND_LEN_FMA for floating-point **NO** -ffast-math. It's supported in tree-ssa-math-opts.cc. However, GCC failed to support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS. Consider this following case: __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst, \ TYPE *__restrict a, \ TYPE *__restrict b, int n) \ { \ for (int i = 0; i < n; i++) \ dst[i] -= a[i] * b[i]; \ } TEST_TYPE (float) \ TEST_ALL () Gimple IR for RVV: ... _39 = -vect__8.14_26; vect__10.16_21 = .COND_LEN_FMA ({ -1, ... }, vect__6.11_30, _39, vect__4.8_34, vect__4.8_34, _46, 0); ... This is because this following piece of codes in tree-ssa-math-opts.cc: if (len) fma_stmt = gimple_build_call_internal (IFN_COND_LEN_FMA, 7, cond, mulop1, op2, addop, else_value, len, bias); else if (cond) fma_stmt = gimple_build_call_internal (IFN_COND_FMA, 5, cond, mulop1, op2, addop, else_value); else fma_stmt = gimple_build_call_internal (IFN_FMA, 3, mulop1, op2, addop); gimple_set_lhs (fma_stmt, gimple_get_lhs (use_stmt)); gimple_call_set_nothrow (fma_stmt, !stmt_can_throw_internal (cfun, use_stmt)); gsi_replace (&gsi, fma_stmt, true); /* Follow all SSA edges so that we generate FMS, FNMA and FNMS regardless of where the negation occurs. */ gimple *orig_stmt = gsi_stmt (gsi); if (fold_stmt (&gsi, follow_all_ssa_edges)) { if (maybe_clean_or_replace_eh_stmt (orig_stmt, gsi_stmt (gsi))) gcc_unreachable (); update_stmt (gsi_stmt (gsi)); } 'fold_stmt' failed to fold NEGATE_EXPR + COND_LEN_FMA ====> COND_LEN_FNMA. This patch support STMT fold into: vect__10.16_21 = .COND_LEN_FNMA ({ -1, ... }, vect__8.14_26, vect__6.11_30, vect__4.8_34, { 0.0, ... }, _46, 0); Note that COND_LEN_FNMA has 7 arguments and COND_LEN_ADD has 6 arguments. Extend maximum num ops: - static const unsigned int MAX_NUM_OPS = 5; + static const unsigned int MAX_NUM_OPS = 7; Bootstrap and Regtest on X86 passed. Tested on aarch64 Qemu. Fully tested COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS on RISC-V backend. gcc/ChangeLog: * genmatch.cc (decision_tree::gen): Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold. * gimple-match-exports.cc (gimple_simplify): Ditto. (gimple_resimplify6): New function. (gimple_resimplify7): New function. (gimple_match_op::resimplify): Support COND_LEN_FNMA/COND_LEN_FMS/COND_LEN_FNMS gimple fold. (convert_conditional_op): Ditto. (build_call_internal): Ditto. (try_conditional_simplification): Ditto. (gimple_extract): Ditto. * gimple-match.h (gimple_match_cond::gimple_match_cond): Ditto. * internal-fn.cc (CASE): Ditto.
2023-08-24tree-optimization/111115 - SLP of masked storesRichard Biener6-21/+94
The following adds the capability to do SLP on .MASK_STORE, I do not plan to add interleaving support. PR tree-optimization/111115 gcc/ * tree-vectorizer.h (vect_slp_child_index_for_operand): New. * tree-vect-data-refs.cc (can_group_stmts_p): Also group .MASK_STORE. * tree-vect-slp.cc (arg3_arg2_map): New. (vect_get_operand_map): Handle IFN_MASK_STORE. (vect_slp_child_index_for_operand): New function. (vect_build_slp_tree_1): Handle statements with no LHS, masked store ifns. (vect_remove_slp_scalar_calls): Likewise. * tree-vect-stmts.cc (vect_check_store_rhs): Lookup the SLP child corresponding to the ifn value index. (vectorizable_store): Likewise for the mask index. Support masked stores. (vectorizable_load): Lookup the SLP child corresponding to the ifn mask index. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_masked_store): Supported with check_avx_available. * gcc.dg/vect/slp-mask-store-1.c: New testcase.
2023-08-24tree-optimization/111125 - properly cost BB reduction remain stmt handlingRichard Biener1-0/+5
We assume that all root stmts which compose the total reduction chain are vectorized but fail to account for the cost of adding back the scalar defs we are not vectorizing. The following rectifies this, fixing the gcc.dg/tree-ssa/slsr-11.c FAIL on aarch64. PR tree-optimization/111125 * tree-vect-slp.cc (vectorizable_bb_reduc_epilogue): Account for the remain_defs processing.
2023-08-24aarch64: Account for different Advanced SIMD fusing optionsRichard Sandiford3-6/+47
The scalar FNMADD/FNMSUB and SVE FNMLA/FNMLS instructions mean that either side of a subtraction can start an accumulator chain. However, Advanced SIMD doesn't have an equivalent instruction. This means that, for Advanced SIMD, a subtraction can only be fused if the second operand is a multiplication. Also, if both sides of a subtraction are multiplications, and if the second operand is used multiple times, such as: c * d - a * b e * f - a * b then the first rather than second multiplication operand will tend to be fused. On Advanced SIMD, this leads to: tmp1 = a * b tmp2 = -tmp1 ... = tmp2 + c * d // FMLA ... = tmp2 + e * f // FMLA where one of the FMLAs also requires a MOV. This patch tries to account for this in the vector cost model. It improves roms performance by 2-3% on Neoverse V1. It's also needed to avoid a regression in fotonik for Neoverse N2 and Neoverse V2 with the patch for PR110625. gcc/ * config/aarch64/aarch64.cc: Include ssa.h. (aarch64_multiply_add_p): Require the second operand of an Advanced SIMD subtraction to be a multiplication. Assume that such an operation won't be fused if the second operand is used multiple times and if the first operand is also a multiplication. gcc/testsuite/ * gcc.target/aarch64/neoverse_v1_2.c: New test. * gcc.target/aarch64/neoverse_v1_3.c: Likewise.
2023-08-24VECT: Apply LEN_FOLD_EXTRACT_LAST into loop vectorizerJuzhe-Zhong2-9/+50
Hi. This patch is apply LEN_FOLD_EXTRACT_LAST into loop vectorizer. Consider this following case: /* Simple condition reduction. */ int __attribute__ ((noinline, noclone)) condition_reduction (int *a, int min_v) { int last = 66; /* High start value. */ for (int i = 0; i < N; i++) if (a[i] < min_v) last = i; return last; } With this patch, we can generate this following IR: _44 = .SELECT_VL (ivtmp_42, POLY_INT_CST [4, 4]); _34 = vect_vec_iv_.5_33 + { POLY_INT_CST [4, 4], ... }; ivtmp_36 = _44 * 4; vect__4.8_39 = .MASK_LEN_LOAD (vectp_a.6_37, 32B, { -1, ... }, _44, 0); mask__11.9_41 = vect__4.8_39 < vect_cst__40; last_5 = .LEN_FOLD_EXTRACT_LAST (last_14, mask__11.9_41, vect_vec_iv_.5_33, _44, 0); ... gcc/ChangeLog: * tree-vect-loop.cc (vectorizable_reduction): Apply LEN_FOLD_EXTRACT_LAST. * tree-vect-stmts.cc (vectorizable_condition): Ditto.
2023-08-24tree-optimization/111128 - fix shift pattern recogRichard Biener2-1/+17
The following fixes placement of shift operand sanitization with MIN when the original shift operand was external but the actual one is not. PR tree-optimization/111128 * tree-vect-patterns.cc (vect_recog_over_widening_pattern): Emit external shift operand inline if we promoted it with another pattern stmt. * gcc.dg/torture/pr111128.c: New testcase.
2023-08-24testsuite/111125 - disable BB vectorization for the testRichard Biener1-1/+4
The test is for loop vectorization producing non-canonical multiplications. We can now BB vectorize the whole function when the target supports .REDUC_PLUS for V2SImode but we don't have a dejagnu selector for that. Disable BB vectorization like we disabled epilogue vectorization. PR testsuite/111125 * gcc.dg/vect/pr53773.c: Disable BB vectorization.
2023-08-24RISC-V: Fix one typo in autovec.md pattern commentPan Li1-3/+3
vfmsac => vfnmacc vfmsub => vfnmadd Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/autovec.md: Fix typo.
2023-08-24RISC-V: Refactor RVV class by frm_op_type template argPan Li1-428/+143
As suggested by kito, we will add new frm_opt_type template arg to the op class, to avoid the duplicated function expand. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc (class binop_frm): Removed. (class reverse_binop_frm): Ditto. (class widen_binop_frm): Ditto. (class vfmacc_frm): Ditto. (class vfnmacc_frm): Ditto. (class vfmsac_frm): Ditto. (class vfnmsac_frm): Ditto. (class vfmadd_frm): Ditto. (class vfnmadd_frm): Ditto. (class vfmsub_frm): Ditto. (class vfnmsub_frm): Ditto. (class vfwmacc_frm): Ditto. (class vfwnmacc_frm): Ditto. (class vfwmsac_frm): Ditto. (class vfwnmsac_frm): Ditto. (class unop_frm): Ditto. (class vfrec7_frm): Ditto. (class binop): Add frm_op_type template arg. (class unop): Ditto. (class widen_binop): Ditto. (class widen_binop_fp): Ditto. (class reverse_binop): Ditto. (class vfmacc): Ditto. (class vfnmsac): Ditto. (class vfmadd): Ditto. (class vfnmsub): Ditto. (class vfnmacc): Ditto. (class vfmsac): Ditto. (class vfnmadd): Ditto. (class vfmsub): Ditto. (class vfwmacc): Ditto. (class vfwnmacc): Ditto. (class vfwmsac): Ditto. (class vfwnmsac): Ditto. (class float_misc): Ditto.
2023-08-24MATCH: [PR111109] Fix bit_ior(cond,cond) when comparisons are fpAndrew Pinski2-3/+86
The patterns that were added in r13-4620-g4d9db4bdd458, missed that (a > b) and (a <= b) are not inverse of each other for floating point comparisons (if NaNs are supported). Even though there was a check for intergal types, it was only for the result of the cond rather for the type of what is being compared. The fix is to check to see if cmp and icmp are inverse of each other by using the invert_tree_comparison function. OK for trunk and GCC 13 branch? Bootstrapped and tested on x86_64-linux-gnu with no regressions. I added the testcase to execute/ieee as it requires support for NAN. PR tree-optimization/111109 gcc/ChangeLog: * match.pd (ior(cond,cond), ior(vec_cond,vec_cond)): Add check to make sure cmp and icmp are inverse. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/ieee/fp-cmp-cond-1.c: New test.
2023-08-23MATCH: remove negate for 1bit typesAndrew Pinski4-0/+82
For 1bit types, negate is either undefined or don't change the value. In either cases we want to remove them. This patch adds a match pattern to do that. Also converting to a 1bit type we can remove the negate just like we already do for `&1` so this patch adds that too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Notes on the testcases: This patch is the last part to fix PR 95929; cond-bool-2.c testcase. bit1neg-1.c is a 1bit-field testcase where we could remove the assignment all the way in one case (which happened on the RTL level for some targets but not all). cond-bool-2.c is the reduced testcase of PR 95929. PR tree-optimization/95929 gcc/ChangeLog: * match.pd (convert?(-a)): New pattern for 1bit integer types. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/bit1neg-1.c: New test. * gcc.dg/tree-ssa/cond-bool-1.c: New test. * gcc.dg/tree-ssa/cond-bool-2.c: New test.
2023-08-24Revert "Initial support for AVX10.1"Haochen Jiang26-367/+13
This reverts commit 11ad44da01dd1c91c96e45802fd8b1c50e88703f.
2023-08-24Revert "Emit a warning when disabling AVX512 with AVX10 enabled or disabling ↵Haochen Jiang6-91/+15
AVX10 with AVX512 enabled" This reverts commit 0288ab14732a16b3787546cdd159941eb7306cf3.