aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2022-10-19testsuite: Default make check-g++ vs. tests for newest C++ standardJakub Jelinek1-1/+10
When adding tests for upcoming C++ version, one always has a dilemma whether to use explicit // { dg-options "-std=c++2b" } or -std=gnu++2b and similar, then the test works in all modes, but it might be forgotten later on to be converted into // { dg-do whatever { target c++23 } } test so that when 23 is tested by default and say 26 or 29 appears too, we test it also in those modes, or just go with // { dg-do whatever { target c++23 } } which has the disadvantage that it is skipped when testing by default and one only tests it if he asks for the newer version. The following patch changes it, such that it is safe to add // { dg-do whatever { target c++23 } } style tests and make those tested even in the default testing mode (when GXX_TESTSUITE_STDS or check-c++-all etc. aren't used). This is by searching for such dg-do lines and if there is an effective target newer than the latest by default tested language version, it will just use that language version instead of the default list. Without this change, the test would be UNSUPPORTED in currently all of 98 14 17 20 versions, with the patch it will be tested with a single 23 version. 2022-10-19 Jakub Jelinek <jakub@redhat.com> * lib/g++-dg.exp (g++-dg-runtest): When using defaulted std_list, if test has { dg-do * { target c++23 } } directive, use { 23 } with which the test will run instead of { 98 14 17 20 } which would make it UNSUPPORTED in all cases.
2022-10-19testsuite: Fix up c2x-enum-1.c for 32-bit arches [PR107311]Jakub Jelinek1-1/+6
On Wed, Oct 19, 2022 at 02:57:59PM +0000, Joseph Myers wrote: > I think the type checked for e5a should be conditional on __LONG_MAX__ > > __INT_MAX__; everything else there should be OK regardless. This patch does that. 2022-10-19 Jakub Jelinek <jakub@redhat.com> PR c/107311 * gcc.dg/c2x-enum-1.c (enum e5): Expect e5a type inside of enum to be int rather than long if long isn't wider than int.
2022-10-19Use Value_Range when applying inferred ranges.Andrew MacLeod1-1/+1
Applying an inferred range is using int_range_ma as the temporary rather than the general purpose Value_Range. This causes it to trap if we have a non-integral inferred range. * gimple-range-cache.cc (ranger_cache::range_from_dom): Use Value_Range not int_range_max.
2022-10-19[PR tree-optimization/107312] Make range_true_and_false work with 1-bit ↵Aldy Hernandez3-0/+15
signed types. range_true_and_false() returns a range of [0,1], which for a 1-bit signed integer gets passed to the irange setter as [0, -1]. These endpoints are out of order and cause an ICE. Through some dumb luck, the legacy code swaps out of order endpoints, because old VRP would sometimes pass endpoints reversed, depending on the setter to fix them. This swapping does not happen for non-legacy, hence the ICE. The right thing to do (apart from killing legacy and 1-bit signed integers ;-)), is to avoid passing out of order endpoints for 1-bit signed integers. For that matter, a range of [-1, 0] (signed) or [0, 1] (unsigned) is just varying. PR tree-optimization/107312 gcc/ChangeLog: * range.h (range_true_and_false): Special case 1-bit signed types. * value-range.cc (range_tests_misc): New test. gcc/testsuite/ChangeLog: * gcc.target/i386/pr107312.c: New test.
2022-10-19gcc: Add 'mcf' thread model support from mcfgthreadLIU Hao6-7/+24
This patch adds the new thread model `mcf`, which implements mutexes and condition variables with the mcfgthread library. Source code for mcfgthread is available at <https://github.com/lhmouse/mcfgthread>. config/ChangeLog: * gthr.m4 (GCC_AC_THREAD_HEADER): Add new case for `mcf` thread model gcc/ChangeLog: * config/i386/mingw-mcfgthread.h: New file * config/i386/mingw32.h: Add builtin macro and default libraries for mcfgthread when thread model is `mcf` * config.gcc: Include 'i386/mingw-mcfgthread.h' when thread model is `mcf` * configure.ac: Recognize `mcf` as a valid thread model * config.in: Regenerate * configure: Regenerate libatomic/ChangeLog: * configure.tgt: Add new case for `mcf` thread model libgcc/ChangeLog: * config.host: Add new cases for `mcf` thread model * config/i386/gthr-mcf.h: New file * config/i386/t-mingw-mcfgthread: New file * config/i386/t-slibgcc-cygming: Add mcfgthread for libgcc DLL * configure: Regenerate libstdc++-v3/ChangeLog: * libsupc++/atexit_thread.cc (__cxa_thread_atexit): Use implementation from mcfgthread if available * libsupc++/guard.cc (__cxa_guard_acquire, __cxa_guard_release, __cxa_guard_abort): Use implementations from mcfgthread if available * configure: Regenerate
2022-10-19pch: Fix streaming of strings with embedded null bytesLewis Hyatt7-8/+59
When a GTY'ed struct is streamed to PCH, any plain char* pointers it contains (whether they live in GC-controlled memory or not) will be marked for PCH output by the routine gt_pch_note_object in ggc-common.cc. This routine special-cases plain char* strings, and in particular it uses strlen() to get their length. Thus it does not handle strings with embedded null bytes, but it is possible for something PCH cares about (such as a string literal token in a macro definition) to contain such embedded nulls. To fix that up, add a new GTY option "string_length" so that gt_pch_note_object can be informed the actual length it ought to use, and use it in the relevant libcpp structs (cpp_string and ht_identifier) accordingly. gcc/ChangeLog: * gengtype.cc (output_escaped_param): Add missing const. (get_string_option): Add missing check for option type. (walk_type): Support new "string_length" GTY option. (write_types_process_field): Likewise. * ggc-common.cc (gt_pch_note_object): Add optional length argument. * ggc.h (gt_pch_note_object): Adjust prototype for new argument. (gt_pch_n_S2): Declare... * stringpool.cc (gt_pch_n_S2): ...new function. * doc/gty.texi: Document new GTY((string_length)) option. libcpp/ChangeLog: * include/cpplib.h (struct cpp_string): Use new "string_length" GTY. * include/symtab.h (struct ht_identifier): Likewise. gcc/testsuite/ChangeLog: * g++.dg/pch/pch-string-nulls.C: New test. * g++.dg/pch/pch-string-nulls.Hs: New test.
2022-10-19avr: remove useless @tie{} directivesMartin Liska1-3/+3
gcc/ChangeLog: * doc/extend.texi: Remove useless @tie{} directives.
2022-10-19SRA: Limit replacement creation for accesses propagated from LHSsMartin Jambor2-0/+34
PR 107206 is fallout from the fix to PR 92706 where we started propagating accesses across assignments also from LHS to RHS of assignments so that we would not do harmful total scalarization of the aggregates on the RHS. But this can lead to new scalarization of these aggregates and in the testcase of PR 107206 these can appear in superfluous uses of un-initialized values and spurious warnings. Fixed by making sure the the accesses created by propagation in this direction are only used as a basis for replacements when the structure would be totally scalarized anyway. gcc/ChangeLog: 2022-10-18 Martin Jambor <mjambor@suse.cz> PR tree-optimization/107206 * tree-sra.cc (struct access): New field grp_result_of_prop_from_lhs. (analyze_access_subtree): Do not create replacements for accesses with this flag when not toally scalarizing. (propagate_subaccesses_from_lhs): Set the new flag. gcc/testsuite/ChangeLog: 2022-10-18 Martin Jambor <mjambor@suse.cz> PR tree-optimization/107206 * g++.dg/tree-ssa/pr107206.C: New test.
2022-10-19IBM zSystems: Fix function_ok_for_sibcall [PR106355]Stefan Schulze Frielinghaus5-23/+100
For a parameter with BLKmode we cannot use REG_NREGS in order to determine the number of consecutive registers. Streamlined this with the implementation of s390_function_arg. Fix some indentation whitespace, too. gcc/ChangeLog: PR target/106355 * config/s390/s390.cc (s390_call_saved_register_used): For a parameter with BLKmode fix determining number of consecutive registers. gcc/testsuite/ChangeLog: * gcc.target/s390/pr106355.h: Common code for new tests. * gcc.target/s390/pr106355-1.c: New test. * gcc.target/s390/pr106355-2.c: New test. * gcc.target/s390/pr106355-3.c: New test.
2022-10-19xtensa: Prepare the transition from Reload to LRATakayuki 'January June' Suwa7-24/+99
This patch provides the first step in the transition from Reload to LRA in Xtensa. gcc/ChangeLog: * config/xtensa/xtensa-protos.h (xtensa_split1_finished_p, xtensa_split_DI_reg_imm): New prototypes. * config/xtensa/xtensa.cc (xtensa_split1_finished_p, xtensa_split_DI_reg_imm, xtensa_lra_p): New functions. (TARGET_LRA_P): Replace the dummy hook with xtensa_lra_p. (xt_true_regnum): Rework. * config/xtensa/xtensa.h (CALL_REALLY_USED_REGISTERS): Switch from CALL_USED_REGISTERS, and revise the comment. * config/xtensa/constraints.md (Y): Use !xtensa_split1_finished_p() instead of can_create_pseudo_p(). * config/xtensa/predicates.md (move_operand): Ditto. * config/xtensa/xtensa.md: Add two new split patterns: - splits DImode immediate load into two SImode ones - puts out-of-constraint SImode constants into the constant pool * config/xtensa/xtensa.opt (-mlra): New target-specific option for testing purpose.
2022-10-19s390: Fix bootstrap error with checking and -m31.Robin Dapp1-3/+4
For a while already we hit an ICE when bootstrapping with -m31 and --enable-checking=all. ../../../../libgfortran/ieee/ieee_helper.c: In function 'ieee_class_helper_16': ../../../../libgfortran/ieee/ieee_helper.c:77:3: internal compiler error: RTL check: expected code 'reg', have 'subreg' in rhs_regno, at rtl.h:1932 77 | } | ^ ../../../../libgfortran/ieee/ieee_helper.c:87:1: note: in expansion of macro 'CLASSMACRO' 87 | CLASSMACRO(16) | ^~~~~~~~~~ This patch fixes the problem by first checking for reload_completed and also ensuring that REGNO is only called on reg operands rather than subregs. gcc/ChangeLog: * config/s390/s390.md: Move reload_completed and check operands for REG_P.
2022-10-19expr: Fix ICE on BFmode -> SFmode conversion of constant [PR107262]Jakub Jelinek2-2/+22
I forgot to handle the case where lowpart_subreg returns a VOIDmode CONST_INT, in that case convert_mode_scalar obviously doesn't work. The following patch fixes that. 2022-10-19 Jakub Jelinek <jakub@redhat.com> PR middle-end/107262 * expr.cc (convert_mode_scalar): For BFmode -> SFmode conversions of constants, use simplify_unary_operation if fromi has VOIDmode instead of recursive convert_mode_scalar. * gcc.dg/pr107262.c: New test.
2022-10-19match.pd: Add 2 TYPE_OVERFLOW_SANITIZED checks [PR106990]Jakub Jelinek2-4/+35
As requested in the PR, this adds 2 TYPE_OVERFLOW_SANITIZED checks and corresponding testcase. 2022-10-19 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/106990 * match.pd ((~X - ~Y) -> Y - X, -x & 1 -> x & 1): Guard with !TYPE_OVERFLOW_SANITIZED (type). * c-c++-common/ubsan/pr106990.c: New test.
2022-10-19i386: Fix up __bf16 handling on ia32Jakub Jelinek2-10/+9
Last night's testing of the libstdc++ changes revealed a problem in the __bf16 backend registration (while _Float16 seems ok). The problem is that for both BFmode and HFmode we require TARGET_SSE2, the generic code creates {,b}float16_type_node only if that is true at the start of the TU and the builtins for the type are only created in that case (many __builtin_*f16 for HFmode and __builtin_nansf16b for BFmode). Now, for _Float16 I've kept what the code did previously, if float16_type_node from generic code is NULL, create ix86_float16_type_node and register _Float16 for it, but for __bf16 I've changed it so that if bfloat16_type_node from generic code is NULL, ix86_register_bf16_builtin_type makes bfloat16_type_node non-NULL. This has an unfortunate consequence though, __STDCPP_BFLOAT16_T__ is predefined for C++23, __BFLT16_*__ macros are predefined as well, but the type doesn't really work (errors whenever it is used) and the builtin isn't defined. The following patch fixes that by going with what we do for HFmode, bfloat16_type_node stays as initialized by generic code and we have a local type for backend use. On the other side, nothing used ix86_bf16_ptr_type_node so that is now dropped. 2022-10-19 Jakub Jelinek <jakub@redhat.com> * config/i386/i386-builtins.cc (ix86_bf16_ptr_type_node): Remove. (ix86_bf16_type_node): New variable. (ix86_register_bf16_builtin_type): If bfloat16_type_node is NULL from generic code, set only ix86_bf16_type_node to a new REAL_TYPE rather than bfloat16_type_node, otherwise set ix86_bf16_type_node to bfloat16_type_node. Register __bf16 on ix86_bf16_type_node rather than bfloat16_type_node. Don't initialize unused ix86_bf16_ptr_type_node. * config/i386/i386-builtin-types.def (BFLOAT16): Use ix86_bf16_type_node rather than bfloat16_type_node.
2022-10-19tree-optimization/106781 - adjust cgraph lhs removalRichard Biener2-8/+24
The following matches up the cgraph code removing LHS of a noreturn call with what fixup_noreturn_call does which gets along without inserting a definition, fixing the ICE resulting from having no place to actually insert that new def. PR tree-optimization/106781 * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee): Copy LHS removal from fixup_noreturn_call. * gcc.dg/pr106781.c: New testcase.
2022-10-19Canonicalize vec_perm index to make the first index come from the first vector.liuhongt2-0/+33
Fix unexpected non-canon form from gimple vector selector. gcc/ChangeLog: PR target/107271 * config/i386/i386-expand.cc (ix86_vec_perm_index_canon): New. (expand_vec_perm_shufps_shufps): Call ix86_vec_perm_index_canon gcc/testsuite/ChangeLog: * gcc.target/i386/pr107271.c: New test.
2022-10-19Daily bump.GCC Administrator6-1/+332
2022-10-18c: Diagnose "enum tag;" after definition [PR107164]Joseph Myers4-0/+39
As noted in bug 101764, a declaration "enum tag;" is invalid in standard C after a definition, as well as when no definition is visible; we had a pedwarn-if-pedantic for the forward declaration case, but were missing one for the other case. Add that missing diagnostic (if pedantic only). (These diagnostics will need to be appropriately conditioned when support is added for C2x enums with fixed underlying type, since "enum tag : type;" is OK both before and after a definition.) Bootstrapped with no regressions for x86_64-pc-linux-gnu. PR c/107164 gcc/c/ * c-decl.cc (shadow_tag_warned): If pedantic, diagnose "enum tag;" with previous declaration visible. gcc/testsuite/ * gcc.dg/c99-tag-4.c, gcc.dg/c99-tag-5.c, gcc.dg/c99-tag-6.c: New tests.
2022-10-18testsuite: Only run -fcf-protection test on i?86/x86_64 [PR107213]Marek Polacek1-0/+1
This test fails on non-i?86/x86_64 targets because on those targets we get error: '-fcf-protection=full' is not supported for this target so this patch limits where the test is run. PR testsuite/107213 gcc/testsuite/ChangeLog: * c-c++-common/pointer-to-fn1.c: Only run on i?86/x86_64.
2022-10-18c++ modules: stream non-trailing default targs [PR105045]Patrick Palka3-14/+19
This fixes the below testcase in which we neglect to stream the default argument for T only because the subsequent parameter U doesn't also have a default argument. PR c++/105045 gcc/cp/ChangeLog: * module.cc (trees_out::tpl_parms_fini): Don't assume default template arguments must be trailing. (trees_in::tpl_parms_fini): Likewise. gcc/testsuite/ChangeLog: * g++.dg/modules/pr105045_a.C: New test. * g++.dg/modules/pr105045_b.C: New test.
2022-10-18c: C2x enums wider than int [PR36113]Joseph Myers12-46/+309
C2x has two enhancements to enumerations: allowing enumerations whose values do not all fit in int (similar to an existing extension), and allowing an underlying type to be specified (as in C++). This patch implements the first of those enhancements. Apart from adjusting diagnostics to reflect this being a standard feature, there are some semantics differences from the previous extension: * The standard feature gives all the values of such an enum the enumerated type (while keeping type int if that can represent all values of the enumeration), where previously GCC only gave those values outside the range of int the enumerated type. This change was previously requested in PR 36113; it seems unlikely that people are depending on the detail of the previous extension that some enumerators had different types to others depending on whether their values could be represented as int, and this patch makes the change to types of enumerators unconditionally (if that causes problems in practice we could always make it conditional on C2x mode instead). * The types *while the enumeration is being defined*, for enumerators that can't be represented as int, are now based more directly on the types of the expressions used, rather than a possibly different type with the right precision constructed using c_common_type_for_size. Again, this change is made unconditionally. * Where overflow (or wraparound to 0, in the case of an unsigned type) when 1 is implicitly added to determine the value of the next enumerator was previously an error, it now results in a wider type with the same signedness (as the while-being-defined type of the previous enumerator) being chosen, if available. When a type is chosen in such an overflow case, or when a type is chosen for the underlying integer type of the enumeration, it's possible that (unsigned) __int128 is chosen. Although C2x allows for such types wider than intmax_t to be considered extended integer types, we don't have various features required to do so (integer constant suffixes; sufficient library support would also be needed to define the associated macros for printf/scanf conversions, and <stdint.h> typedef names would need to be defined). Thus, there are also pedwarns for exceeding the range of intmax_t / uintmax_t, as while in principle exceeding that range is OK, it's only OK in a context where the relevant types meet the requirements for extended integer types, which does not currently apply here. Bootstrapped with no regressions for x86_64-pc-linux-gnu. Also manually checked diagnostics for c2x-enum-3.c with -m32 to confirm the diagnostics in that { target { ! int128 } } test are as expected. PR c/36113 gcc/c-family/ * c-common.cc (c_common_type_for_size): Add fallback to widest_unsigned_literal_type_node or widest_integer_literal_type_node for precision that may not exactly match the precision of those types. gcc/c/ * c-decl.cc (finish_enum): If any enumerators do not fit in int, convert all to the type of the enumeration. pedwarn if no integer type fits all enumerators and default to widest_integer_literal_type_node in that case. Otherwise pedwarn for type wider than intmax_t. (build_enumerator): pedwarn for enumerators outside the range of uintmax_t or intmax_t, and otherwise use pedwarn_c11 for enumerators outside the range of int. On overflow, attempt to find a wider type that can hold the value of the next enumerator. Do not convert value to type determined with c_common_type_for_size. gcc/testsuite/ * gcc.dg/c11-enum-1.c, gcc.dg/c11-enum-2.c, gcc.dg/c11-enum-3.c, gcc.dg/c2x-enum-1.c, gcc.dg/c2x-enum-2.c, gcc.dg/c2x-enum-3.c, gcc.dg/c2x-enum-4.c, gcc.dg/c2x-enum-5.c: New tests. * gcc.dg/pr30260.c: Explicitly use -std=gnu11. Update expected diagnostics. * gcc.dg/torture/pr25183.c: Update expected diagnostics.
2022-10-18ipa-cp: Better representation of aggregate values in call contextsMartin Jambor4-403/+218
This patch extends the previous one by using the same data structure to represent aggregate values in classes ipa_auto_call_arg_values and ipa_call_arg_values. This usually simplifies handling and makes allocations of memory much cheaper because only a single vectore is needed, as opposed to vectors with each element pointing at other vecs. The only functions which unfortunately are a bit more complec are estimate_local_effects in ipa-cp.cc and ipa_call_context::equal_to but I hope not too much - the latter could probably be shorteneed at the expense of readability. The patch removes types ipa_agg_value ipa_agg_value_set which is no longer used with it. This means that we could replace the "_argagg_" part of the types introduced by the previous patches with more reasonable "_agg_" - possibly as a follow-up patch. gcc/ChangeLog: 2022-08-26 Martin Jambor <mjambor@suse.cz> * ipa-prop.h (ipa_agg_value): Remove type. (ipa_agg_value_set): Likewise. (ipa_copy_agg_values): Remove function. (ipa_release_agg_values): Likewise. (ipa_auto_call_arg_values) Add a forward declaration. (ipa_call_arg_values): Likewise. (class ipa_argagg_value_list): New constructors, added member function value_for_index_p. (class ipa_auto_call_arg_values): Removed the destructor and member function safe_aggval_at. Use ipa_argagg_values for m_known_aggs. (class ipa_call_arg_values): Removed member function safe_aggval_at. Use ipa_argagg_values for m_known_aggs. (ipa_get_indirect_edge_target): Removed declaration. (ipa_find_agg_cst_for_param): Likewise. (ipa_find_agg_cst_from_init): New declaration. (ipa_agg_value_from_jfunc): Likewise. (ipa_agg_value_set_from_jfunc): Removed declaration. (ipa_push_agg_values_from_jfunc): New declaration. * ipa-cp.cc (ipa_agg_value_from_node): Renamed to ipa_agg_value_from_jfunc, made public. (ipa_agg_value_set_from_jfunc): Removed. (ipa_push_agg_values_from_jfunc): New function. (ipa_get_indirect_edge_target_1): Removed known_aggs parameter, use avs for this purpose too. (ipa_get_indirect_edge_target): Removed the overload working on ipa_auto_call_arg_values, use ipa_argagg_value_list in the remaining one. (devirtualization_time_bonus): Use ipa_argagg_value_list and ipa_get_indirect_edge_target_1 instead of ipa_get_indirect_edge_target. (context_independent_aggregate_values): Removed function. (gather_context_independent_values): Work on ipa_argagg_value_list. (estimate_local_effects): Likewise, define some iterator variables only in the construct where necessary. (ipcp_discover_new_direct_edges): Adjust the call to ipa_get_indirect_edge_target_1. (push_agg_values_for_index_from_edge): Adjust the call ipa_agg_value_from_node which has been renamed to ipa_agg_value_from_jfunc. * ipa-fnsummary.cc (evaluate_conditions_for_known_args): Work on ipa_argagg_value_list. (evaluate_properties_for_edge): Replace manual filling in aggregate values with call to ipa_push_agg_values_from_jfunc. (estimate_calls_size_and_time): Work on ipa_argagg_value_list. (ipa_cached_call_context::duplicate_from): Likewise. (ipa_cached_call_context::release): Likewise. (ipa_call_context::equal_to): Likewise. * ipa-prop.cc (ipa_find_agg_cst_from_init): Make public. (ipa_find_agg_cst_for_param): Removed function. (ipa_find_agg_cst_from_jfunc_items): New function. (try_make_edge_direct_simple_call): Replace calls to ipa_agg_value_set_from_jfunc and ipa_find_agg_cst_for_param with ipa_find_agg_cst_from_init and ipa_find_agg_cst_from_jfunc_items. (try_make_edge_direct_virtual_call): Replace calls to ipa_agg_value_set_from_jfunc and ipa_find_agg_cst_for_param with simple query of constant jump function and a call to ipa_find_agg_cst_from_jfunc_items. (ipa_auto_call_arg_values::~ipa_auto_call_arg_values): Removed.
2022-10-18ipa-cp: Better representation of aggregate values we clone forMartin Jambor5-666/+733
This patch replaces linked lists of ipa_agg_replacement_value with vectors of similar structures called ipa_argagg_value and simplifies how we compute them in the first place. Having a vector should also result in less overhead when allocating and because we keep it sorted, it leads to logarithmic searches. The slightly obnoxious "argagg" bit in the name can be changed into "agg" after the next patch removes our current ipa_agg_value type. The patch also introduces type ipa_argagg_value_list which serves as a common view into a vector of ipa_argagg_value structures regardless whether they are stored in GC memory (required for IPA-CP transformation summary because we store trees) or in an auto_vec which is hopefully usually only allocated on stack. The calculation of aggreagete costant values for a given subsert of callers is then rewritten to compute known constants for each edge (some pruning to skip obviously not needed is still employed and should not be really worse than what I am replacing) and these vectors are there intersected, which can be done linearly since they are sorted. The patch also removes a lot of heap allocations of small lists of aggregate values and replaces them with stack based auto_vecs. As Richard Sandiford suggested, I use std::lower_bound from <algorithm> rather than re-implementing bsearch for array_slice. The patch depends on the patch which adds the ability to construct array_slices from gc-allocated vectors. gcc/ChangeLog: 2022-10-17 Martin Jambor <mjambor@suse.cz> * ipa-prop.h (IPA_PROP_ARG_INDEX_LIMIT_BITS): New. (ipcp_transformation): Added forward declaration. (ipa_argagg_value): New type. (ipa_argagg_value_list): New type. (ipa_agg_replacement_value): Removed type. (ipcp_transformation): Switch from using ipa_agg_replacement_value to ipa_argagg_value_list. (ipa_get_agg_replacements_for_node): Removed. (ipa_dump_agg_replacement_values): Removed declaration. * ipa-cp.cc: Define INCLUDE_ALGORITHM. (values_equal_for_ipcp_p): Moved up in the file. (ipa_argagg_value_list::dump): New function. (ipa_argagg_value_list::debug): Likewise. (ipa_argagg_value_list::get_elt): Likewise. (ipa_argagg_value_list::get_elt_for_index): Likewise. (ipa_argagg_value_list::get_value): New overloaded functions. (ipa_argagg_value_list::superset_of_p): New function. (new ipa_argagg_value_list::push_adjusted_values): Likewise. (push_agg_values_from_plats): Likewise. (intersect_argaggs_with): Likewise. (get_clone_agg_value): Removed. (ipa_agg_value_from_node): Make last parameter const, use ipa_argagg_value_list to search values coming from clones. (ipa_get_indirect_edge_target_1): Use ipa_argagg_value_list to search values coming from clones. (ipcp_discover_new_direct_edges): Pass around a vector of ipa_argagg_values rather than a link list of replacement values. (cgraph_edge_brings_value_p): Use ipa_argagg_value_list to search values coming from clones. (create_specialized_node): Work with a vector of ipa_argagg_values rather than a link list of replacement values. (self_recursive_agg_pass_through_p): Make the pointer parameters const. (copy_plats_to_inter): Removed. (intersect_with_plats): Likewise. (agg_replacements_to_vector): Likewise. (intersect_with_agg_replacements): Likewise. (intersect_aggregates_with_edge): Likewise. (push_agg_values_for_index_from_edge): Likewise. (push_agg_values_from_edge): Likewise. (find_aggregate_values_for_callers_subset): Rewrite. (cgraph_edge_brings_all_agg_vals_for_node): Likewise. (ipcp_val_agg_replacement_ok_p): Use ipa_argagg_value_list to search aggregate values. (decide_about_value): Work with a vector of ipa_argagg_values rather than a link list of replacement values. (decide_whether_version_node): Likewise. (ipa_analyze_node): Check number of parameters, assert that there are no descriptors when bailing out. * ipa-prop.cc (ipa_set_node_agg_value_chain): Switch to a vector of ipa_argagg_value. (ipa_node_params_t::duplicate): Removed superfluous handling of ipa_agg_replacement_values. Name of src parameter removed because it is no longer used. (ipcp_transformation_t::duplicate): Replaced duplication of ipa_agg_replacement_values with copying vector m_agg_values. (ipa_dump_agg_replacement_values): Removed. (write_ipcp_transformation_info): Stream the new data-structure instead of the old. (read_ipcp_transformation_info): Likewise. (adjust_agg_replacement_values): Work with ipa_argagg_values instead of linked lists of ipa_agg_replacement_values, copy the items and truncate the vector as necessary to keep it sorted instead of marking items as invalid. Return one bool if CFG should be updated. (ipcp_modif_dom_walker): Store ipcp_transformation instead of linked list of ipa_agg_replacement_values. (ipcp_modif_dom_walker::before_dom_children): Use ipa_argagg_value_list instead of walking a list of ipa_agg_replacement_values. (ipcp_transform_function): Switch to the new data structure, adjust dumping. gcc/testsuite/ChangeLog: 2022-08-15 Martin Jambor <mjambor@suse.cz> * gcc.dg/ipa/ipcp-agg-11.c: Adjust dumps. * gcc.dg/ipa/ipcp-agg-8.c: Likewise.
2022-10-18tree-optimization/107302 - fix vec_perm placement for recurrence vectRichard Biener2-3/+22
The following fixes the VEC_PERM_EXPR placement when the latch definition is a PHI node. PR tree-optimization/107302 * tree-vect-loop.cc (vectorizable_recurrence): Fix vec_perm placement for a PHI latch def. * gcc.dg/vect/pr107302.c: New testcase.
2022-10-18ifcvt: Do not lower bitfields if we can't analyze dr's [PR107275]Andre Vieira2-15/+30
The ifcvt dead code elimination code was not built to deal with inline assembly, loops with such would never be if-converted in the past since we can't do data-reference analysis on them and vectorization would eventually fail. For this reason we now also do not lower bitfields if the data-reference analysis fails, as we would not end up vectorizing it. As a consequence this also fixes this PR as the dead code elimination will not run for such cases and wrongfully eliminate inline assembly statements. gcc/ChangeLog: PR tree-optimization/107275 * tree-if-conv.cc (if_convertible_loop_p_1): Move find_data_references_in_loop call from here... (if_convertible_loop_p): And move data-reference vector initialization from here... (tree_if_conversion):... to here. gcc/testsuite/ChangeLog: * gcc.dg/vect/pr107275.c: New test.
2022-10-18middle-end IFN_ASSUME support [PR106654]Jakub Jelinek31-16/+745
My earlier patches gimplify the simplest non-side-effects assumptions into if (cond) ; else __builtin_unreachable (); and throw the rest on the floor. The following patch attempts to do something with the rest too. For -O0, it throws the more complex assumptions on the floor, we don't expect optimizations and the assumptions are there to allow optimizations. Otherwise arranges for the assumptions to be visible in the IL as .ASSUME (_Z2f4i._assume.0, i_1(D)); call where there is an artificial function like: bool _Z2f4i._assume.0 (int i) { bool _2; <bb 2> [local count: 1073741824]: _2 = i_1(D) == 43; return _2; } with the semantics that there is UB unless the assumption function would return true. Aldy, could ranger handle this? If it sees .ASSUME call, walk the body of such function from the edge(s) to exit with the assumption that the function returns true, so above set _2 [true, true] and from there derive that i_1(D) [43, 43] and then map the argument in the assumption function to argument passed to IFN_ASSUME (note, args there are shifted by 1)? During gimplification it actually gimplifies it into [[assume (D.2591)]] { { i = i + 1; D.2591 = i == 44; } } which is a new GIMPLE_ASSUME statement wrapping a GIMPLE_BIND and specifying a boolean_type_node variable which contains the result. The GIMPLE_ASSUME then survives just a couple of passes and is lowered during gimple lowering into an outlined separate function and IFN_ASSUME call. Variables declared inside of the condition (both static and automatic) just change context, automatic variables from the caller are turned into parameters (note, as the code is never executed, I handle this way even non-POD types, we don't need to bother pretending there would be user copy constructors etc. involved). The assume_function artificial functions are then optimized until the new assumptions pass which doesn't do much right now but I'd like to see there the backwards ranger walk and filling up of SSA_NAME_RANGE_INFO for the parameters. There are a few further changes I'd like to do, like ignoring the .ASSUME calls in inlining size estimations (but haven't figured out where it is done), or for LTO arrange for the assume functions to be emitted in all partitions that reference those (usually there will be just one, unless code with the assumption got inlined, versioned etc.). 2022-10-18 Jakub Jelinek <jakub@redhat.com> PR c++/106654 gcc/ * gimple.def (GIMPLE_ASSUME): New statement kind. * gimple.h (struct gimple_statement_assume): New type. (is_a_helper <gimple_statement_assume *>::test, is_a_helper <const gimple_statement_assume *>::test): New. (gimple_build_assume): Declare. (gimple_has_substatements): Return true for GIMPLE_ASSUME. (gimple_assume_guard, gimple_assume_set_guard, gimple_assume_guard_ptr, gimple_assume_body_ptr, gimple_assume_body): New inline functions. * gsstruct.def (GSS_ASSUME): New. * gimple.cc (gimple_build_assume): New function. (gimple_copy): Handle GIMPLE_ASSUME. * gimple-pretty-print.cc (dump_gimple_assume): New function. (pp_gimple_stmt_1): Handle GIMPLE_ASSUME. * gimple-walk.cc (walk_gimple_op): Handle GIMPLE_ASSUME. * omp-low.cc (WALK_SUBSTMTS): Likewise. (lower_omp_1): Likewise. * omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn): Likewise. * tree-cfg.cc (verify_gimple_stmt, verify_gimple_in_seq_2): Likewise. * function.h (struct function): Add assume_function bitfield. * gimplify.cc (gimplify_call_expr): If the assumption isn't simple enough, expand it into GIMPLE_ASSUME wrapped block or for -O0 drop it. * gimple-low.cc: Include attribs.h. (create_assumption_fn): New function. (struct lower_assumption_data): New type. (find_assumption_locals_r, assumption_copy_decl, adjust_assumption_stmt_r, adjust_assumption_stmt_op, lower_assumption): New functions. (lower_stmt): Handle GIMPLE_ASSUME. * tree-ssa-ccp.cc (pass_fold_builtins::execute): Remove IFN_ASSUME calls. * lto-streamer-out.cc (output_struct_function_base): Pack assume_function bit. * lto-streamer-in.cc (input_struct_function_base): And unpack it. * cgraphunit.cc (cgraph_node::expand): Don't verify assume_function has TREE_ASM_WRITTEN set and don't release its body. (symbol_table::compile): Allow assume functions not to have released body. * internal-fn.cc (expand_ASSUME): Remove gcc_unreachable. * passes.cc (execute_one_pass): For TODO_discard_function don't release body of assume functions. * cgraph.cc (cgraph_node::verify_node): Don't verify cgraph nodes of PROP_assumptions_done functions. * tree-pass.h (PROP_assumptions_done): Define. (TODO_discard_function): Adjust comment. (make_pass_assumptions): Declare. * passes.def (pass_assumptions): Add. * timevar.def (TV_TREE_ASSUMPTIONS): New. * tree-inline.cc (remap_gimple_stmt): Handle GIMPLE_ASSUME. * tree-vrp.cc (pass_data_assumptions): New variable. (pass_assumptions): New class. (make_pass_assumptions): New function. gcc/cp/ * cp-tree.h (build_assume_call): Declare. * parser.cc (cp_parser_omp_assumption_clauses): Use build_assume_call. * cp-gimplify.cc (build_assume_call): New function. (process_stmt_assume_attribute): Use build_assume_call. * pt.cc (tsubst_copy_and_build): Likewise. gcc/testsuite/ * g++.dg/cpp23/attr-assume5.C: New test. * g++.dg/cpp23/attr-assume6.C: New test. * g++.dg/cpp23/attr-assume7.C: New test.
2022-10-18tree-optimization/107301 - check if we can duplicate block before doing soRichard Biener2-2/+19
Path isolation failed to do that. PR tree-optimization/107301 * gimple-ssa-isolate-paths.cc (handle_return_addr_local_phi_arg): Check whether we can duplicate the block. (find_implicit_erroneous_behavior): Likewise. * gcc.dg/torture/pr107301.c: New testcase.
2022-10-18Move scanning pass of forwprop-19.c to dse1 for r13-3212-gb88adba751da63Liwei Xu1-2/+2
gcc/testsuite/ChangeLog: PR testsuite/107220 * gcc.dg/tree-ssa/forwprop-19.c: Move scanning pass from forwprop1 to dse1, This fixs the test case fail.
2022-10-17Merge partial relation precisions properlyAndrew MacLeod3-1/+59
When merging 2 groups of PE's, one group was simply being set to the other instead of properly merging them. PR tree-optimization/107273 gcc/ * value-relation.cc (equiv_oracle::add_partial_equiv): Merge instead of copying precison of each member. gcc/testsuite/ * gcc.dg/tree-ssa/pr107273-1.c: New. * gcc.dg/tree-ssa/pr107273-2.c: New.
2022-10-18Daily bump.GCC Administrator5-1/+297
2022-10-17Fix bogus RTL on the H8.Jeff Law2-41/+69
This patch actually fixes the bogus RTL seen in PR101697. Basically we continue to use the insn condition to catch most of the problem cases related to autoinc addressing modes. This patch adds constraints which can guide reload (and hopefully LRA) away from doing blind replacements during register elimination that would ultimately result in bogus RTL. The idea is from Paul K. who has done something very similar on the pdp11. I guess it shouldn't be a big surprise that the H8 and pdp11 need the same kind of handling given some of the similarities in their architectures. gcc/ PR target/101697 * config/h8300/combiner.md: Replace '<' preincment constraint with ZA/Z1..ZH/Z7 combinations. * config/h8300/movepush.md: Similarly
2022-10-17More infrastructure to avoid bogus RTL on H8.Jeff Law3-0/+35
Continuing the work to add constraints to avoid invalid RTL with autoinc addressing modes. Specifically this patch adds the memory constraints similar to the pdp11. gcc/ * config/h8300/constraints.md (Za..Zh): New constraints for autoinc addresses using a specific register. * config/h8300/h8300.cc (pre_incdec_with_reg): New function. * config/h8300/h8300-protos.h (pre_incdec_with_reg): Add prototype.
2022-10-17Remove accidential commitsJeff Law12-31671/+0
gcc/ * config/i386/cet.c: Remove accidental commit. * config/i386/driver-mingw32.c: Likewise. * config/i386/i386-builtins.c: Likewise. * config/i386/i386-d.c: Likewise. * config/i386/i386-expand.c: Likewise. * config/i386/i386-features.c: Likewise. * config/i386/i386-options.c: Likewise. * config/i386/t-cet: Likewise. * config/i386/x86-tune-sched-atom.c: Likewise. * config/i386/x86-tune-sched-bd.c: Likewise. * config/i386/x86-tune-sched-core.c: Likewise. * config/i386/x86-tune-sched.c: Likewise.
2022-10-17Enable REE for H8Jeff Law13-0/+31673
I was looking at H8 assembly code recently and noticed we had unnecessary extensions. As it turns out we never enabled redundant extension elimination on the H8. This patch fixes that oversight (and was the trigger for the failure fixed my the prior patch). gcc/common * common/config/h8300/h8300-common.cc (h8300_option_optimization_table): Enable redundant extension elimination at -O2 and above.
2022-10-17Add missing splitter for H8Jeff Law1-0/+18
While testing a minor optimization on the H8 my builds failed due to failure to split a zero-extended memory load. That particular pattern is a bit special on the H8 in that it's split at assembly time primarily to get the length computations correct. Arguably that alternative should go away completely, but I haven't really looked into that. Anyway, with the final-asm split we obviously need to match a define_split somewhere. But none was ever written after adding CCZN optimizations. So if we had a zero extend of a memory operand and it was used to eliminate a compare, then we'd abort at final asm time. Regression tested (in conjunction with various other in-progress patches) on H8 without regressions. gcc/ * config/h8300/extensions.md (CCZN setting zero extended load): Add missing splitter.
2022-10-17Fortran: NULL pointer dereference in gfc_simplify_image_index [PR104330]Steve Kargl2-1/+21
gcc/fortran/ChangeLog: PR fortran/104330 * simplify.cc (gfc_simplify_image_index): Do not dereference NULL pointer. gcc/testsuite/ChangeLog: PR fortran/104330 * gfortran.dg/pr104330.f90: New test.
2022-10-17Make sure exported range for SSA post-dominates the DEF in ↵Aldy Hernandez2-1/+37
set_global_ranges_from_unreachable_edges. The problem here is that we're exporting a range for an SSA range that happens on the other side of a __builtin_unreachable, but the SSA does not post-dominate the definition point. This is causing ivcanon to unroll things incorrectly. This was a snafu when converting the code from evrp. PR tree-optimization/107293 gcc/ChangeLog: * tree-ssa-dom.cc (dom_opt_dom_walker::set_global_ranges_from_unreachable_edges): Check that condition post-dominates the definition point. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr107293.c: New test.
2022-10-17Fortran: handle bad array ctors with typespec [PR93483, PR107216, PR107219]Harald Anlauf4-16/+68
gcc/fortran/ChangeLog: PR fortran/93483 PR fortran/107216 PR fortran/107219 * arith.cc (reduce_unary): Handled expressions are EXP_CONSTANT and EXPR_ARRAY. Do not attempt to reduce otherwise. (reduce_binary_ac): Likewise. (reduce_binary_ca): Likewise. (reduce_binary_aa): Moved check for EXP_CONSTANT and EXPR_ARRAY from here ... (reduce_binary): ... to here. (eval_intrinsic): Catch failed reductions. * gfortran.h (GFC_INTRINSIC_OPS): New enum ARITH_NOT_REDUCED to keep track of expressions that were not reduced by the arithmetic evaluation code. gcc/testsuite/ChangeLog: PR fortran/93483 PR fortran/107216 PR fortran/107219 * gfortran.dg/array_constructor_56.f90: New test. * gfortran.dg/array_constructor_57.f90: New test. Co-authored-by: Mikael Morin <mikael@gcc.gnu.org>
2022-10-17Fortran: check type of operands of logical operations, comparisons [PR107272]Harald Anlauf2-0/+54
gcc/fortran/ChangeLog: PR fortran/107272 * arith.cc (gfc_arith_not): Operand must be of type BT_LOGICAL. (gfc_arith_and): Likewise. (gfc_arith_or): Likewise. (gfc_arith_eqv): Likewise. (gfc_arith_neqv): Likewise. (gfc_arith_eq): Compare consistency of types of operands. (gfc_arith_ne): Likewise. (gfc_arith_gt): Likewise. (gfc_arith_ge): Likewise. (gfc_arith_lt): Likewise. (gfc_arith_le): Likewise. gcc/testsuite/ChangeLog: PR fortran/107272 * gfortran.dg/pr107272.f90: New test.
2022-10-17Fortran: Fixes for kind=4 characters strings [PR107266]Tobias Burnus5-12/+131
PR fortran/107266 gcc/fortran/ * trans-expr.cc (gfc_conv_string_parameter): Use passed type to honor character kind. * trans-types.cc (gfc_sym_type): Honor character kind. * trans-decl.cc (gfc_conv_cfi_to_gfc): Fix handling kind=4 character strings. gcc/testsuite/ * gfortran.dg/char4_decl.f90: New test. * gfortran.dg/char4_decl-2.f90: New test.
2022-10-17c++ modules: streaming constexpr_fundef [PR101449]Patrick Palka3-51/+29
It looks like we currently avoid streaming the RESULT_DECL and PARM_DECLs of a constexpr_fundef entry under the assumption that they're just copies of the DECL_RESULT and DECL_ARGUMENTS of the FUNCTION_DECL. Thus we can just make new copies of DECL_RESULT and DECL_ARGUMENTS on stream in rather than separately streaming them. But the FUNCTION_DECL's DECL_RESULT and DECL_ARGUMENTS eventually get genericized, whereas the constexpr_fundef entry consists of a copy of the FUNCTION_DECL's pre-GENERIC trees. And notably during genericization we lower invisref parms (which entails changing their TREE_TYPE and setting DECL_BY_REFERENCE), the lowered form of which the constexpr evaluator doesn't expect to see, and so this copying approach causes us to ICE for the below testcase. This patch fixes this by faithfully streaming the RESULT_DECL and PARM_DECLs of a constexpr_fundef entry, which seems to just work. Nathan says[1]: Hm, the reason for the complexity was that I wanted to recreate the tree graph where the fndecl came from one TU and the defn came from another one -- we need the definition to refer to argument decls from the already-read decl. However, it seems that for constexpr fns here, that is not needed, resulting in a significant simplification. [1]: https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603662.html PR c++/101449 gcc/cp/ChangeLog: * module.cc (trees_out::write_function_def): Stream the parms and result of the constexpr_fundef entry. (trees_in::read_function_def): Likewise. gcc/testsuite/ChangeLog: * g++.dg/modules/cexpr-3_a.C: New test. * g++.dg/modules/cexpr-3_b.C: New test.
2022-10-17[PR tree-optimization/105820] Add test.Aldy Hernandez1-0/+26
PR tree-optimization/105820 gcc/testsuite/ChangeLog: * g++.dg/tree-ssa/pr105820.c: New test.
2022-10-17Do not test for -Inf when flag_finite_math_only.Aldy Hernandez1-4/+7
PR tree-optimization/107286 gcc/ChangeLog: * value-range.cc (range_tests_floats): Do not test for -Inf when flag_finite_math_only.
2022-10-17Add 3 floating NAN tests.Aldy Hernandez3-0/+58
x UNORD x should set NAN on the TRUE side. The false side of x == x should set NAN. The true side of x != x should set NAN. gcc/testsuite/ * gcc.dg/tree-ssa/vrp-float-3a.c: New. * gcc.dg/tree-ssa/vrp-float-4a.c: New. * gcc.dg/tree-ssa/vrp-float-5a.c: New.
2022-10-17Add relation_trio class for range-ops.Andrew MacLeod9-291/+405
There are 3 possible relations range-ops might care about, but only the one most likely to be needed is supplied. This patch provides a new class relation_trio which allows 3 relations to be passed in a single word. fold_range (), op1_range (), and op2_range () are adjusted to take a relation_trio class instead of a relation_kind, then the routine can extract which relation it wants to work with. * gimple-range-fold.cc (fold_using_range::range_of_range_op): Provide relation_trio class. * gimple-range-gori.cc (gori_compute::refine_using_relation): Provide relation_trio class. (gori_compute::refine_using_relation): Ditto. (gori_compute::compute_operand1_range): Provide lhs_op2 and op1_op2 relations via relation_trio class. (gori_compute::compute_operand2_range): Ditto. * gimple-range-op.cc (gimple_range_op_handler::calc_op1): Use relation_trio instead of relation_kind. (gimple_range_op_handler::calc_op2): Ditto. (*::fold_range): Ditto. * gimple-range-op.h (gimple_range_op::calc_op1): Adjust prototypes. (gimple_range_op::calc_op2): Adjust prototypes. * range-op-float.cc (*::fold_range): Use relation_trio instead of relation_kind. (*::op1_range): Ditto. (*::op2_range): Ditto. * range-op.cc (*::fold_range): Use relation_trio instead of relation_kind. (*::op1_range): Ditto. (*::op2_range): Ditto. * range-op.h (class range_operator): Adjust prototypes. (class range_operator_float): Ditto. (class range_op_handler): Adjust prototypes. (relop_early_resolve): Pickup op1_op2 relation from relation_trio. * value-relation.cc (VREL_LAST): Adjust use to be one past the end of the enum. (relation_oracle::validate_relation): Use relation_trio in call to fold_range. * value-relation.h (enum relation_kind_t): Add VREL_LAST as final element. (class relation_trio): New. (TRIO_VARYING, TRIO_SHIFT, TRIO_MASK): New.
2022-10-17Fix nan updating in range-ops.Andrew MacLeod1-13/+10
Calling clean_nan on an undefined type traps, set_varying first. Other tweaks for correctness. * range-op-float.cc (foperator_not_equal::op1_range): Check for VREL_EQ after singleton. (foperator_unordered::op1_range): Set VARYING before calling clear_nan(). (foperator_ordered::op1_range): Set rather than clear NAN if both operands are the same.
2022-10-17Don't set useless relations.Andrew MacLeod2-1/+8
The oracle will not register nonssense/useless relations, class value_relation shouldn't either. * value-relation.cc (value_relation::dump): Change message. * value-relation.h (value_relation::set_relation): If op1 is the same as op2 do not create a relation.
2022-10-17GCN: Restore build with GCC 4.8Thomas Schwinge1-7/+7
For example, for "g++-4.8 (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4", the recent commit r13-3220-g45381d6f9f4e7b5c7b062f5ad8cc9788091c2d07 "amdgcn: add multiple vector sizes" broke the build: In file included from [...]/source-gcc/gcc/coretypes.h:458:0, from [...]/source-gcc/gcc/config/gcn/gcn.cc:24: [...]/source-gcc/gcc/config/gcn/gcn.cc: In function ‘machine_mode VnMODE(int, machine_mode)’: ./insn-modes.h:42:71: error: temporary of non-literal type ‘scalar_int_mode’ in a constant expression #define QImode (scalar_int_mode ((scalar_int_mode::from_int) E_QImode)) ^ [...]/source-gcc/gcc/config/gcn/gcn.cc:405:10: note: in expansion of macro ‘QImode’ case QImode: ^ In file included from [...]/source-gcc/gcc/coretypes.h:478:0, from [...]/source-gcc/gcc/config/gcn/gcn.cc:24: [...]/source-gcc/gcc/machmode.h:410:7: note: ‘scalar_int_mode’ is not literal because: class scalar_int_mode ^ [...]/source-gcc/gcc/machmode.h:410:7: note: ‘scalar_int_mode’ is not an aggregate, does not have a trivial default constructor, and has no constexpr constructor that is not a copy or move constructor [...] Addressing this like simiar issues have been addressed in the past. gcc/ * config/gcn/gcn.cc (VnMODE): Use 'case E_QImode:' instead of 'case QImode:', etc.
2022-10-17Tag 'gcc/gimple-expr.cc:mark_addressable_2' as 'static'Thomas Schwinge1-1/+1
Added in 2015 r229696 (commit 1b223a9f3489296c625bdb7cc764196d04fd9231) "defer mark_addressable calls during expand till the end of expand", it has never been used 'extern'ally. gcc/ * gimple-expr.cc (mark_addressable_2): Tag as 'static'.
2022-10-17Vectorization of first-order recurrencesRichard Biener13-28/+558
The following picks up the prototype by Ju-Zhe Zhong for vectorizing first order recurrences. That solves two TSVC missed optimization PRs. There's a new scalar cycle def kind, vect_first_order_recurrence and it's handling of the backedge value vectorization is complicated by the fact that the vectorized value isn't the PHI but instead a (series of) permute(s) shifting in the recurring value from the previous iteration. I've implemented this by creating both the single vectorized PHI and the series of permutes when vectorizing the scalar PHI but leave the backedge values in both unassigned. The backedge values are (for the testcases) computed by a load which is also the place after which the permutes are inserted. That placement also restricts the cases we can handle (without resorting to code motion). I added both costing and SLP handling though SLP handling is restricted to the case where a single vectorized PHI is enough. Missing is epilogue handling - while prologue peeling would be handled transparently by adjusting iv_phi_p the epilogue case doesn't work with just inserting a scalar LC PHI since that a) keeps the scalar load live and b) that loads is the wrong one, it has to be the last, much like when we'd vectorize the LC PHI as live operation. Unfortunately LIVE compute/analysis happens too early before we decide on peeling. When using fully masked loop vectorization the vect-recurr-6.c works as expected though. I have tested this on x86_64 for now, but since epilogue handling is missing there's probably no practical cases. My prototype WHILE_ULT AVX512 patch can handle vect-recurr-6.c just fine but I didn't feel like running SPEC within SDE nor is the WHILE_ULT patch complete enough. PR tree-optimization/99409 PR tree-optimization/99394 * tree-vectorizer.h (vect_def_type::vect_first_order_recurrence): Add. (stmt_vec_info_type::recurr_info_type): Likewise. (vectorizable_recurr): New function. * tree-vect-loop.cc (vect_phi_first_order_recurrence_p): New function. (vect_analyze_scalar_cycles_1): Look for first order recurrences. (vect_analyze_loop_operations): Handle them. (vect_transform_loop): Likewise. (vectorizable_recurr): New function. (maybe_set_vectorized_backedge_value): Handle the backedge value setting in the first order recurrence PHI and the permutes. * tree-vect-stmts.cc (vect_analyze_stmt): Handle first order recurrences. (vect_transform_stmt): Likewise. (vect_is_simple_use): Likewise. (vect_is_simple_use): Likewise. * tree-vect-slp.cc (vect_get_and_check_slp_defs): Likewise. (vect_build_slp_tree_2): Likewise. (vect_schedule_scc): Handle the backedge value setting in the first order recurrence PHI and the permutes. * gcc.dg/vect/vect-recurr-1.c: New testcase. * gcc.dg/vect/vect-recurr-2.c: Likewise. * gcc.dg/vect/vect-recurr-3.c: Likewise. * gcc.dg/vect/vect-recurr-4.c: Likewise. * gcc.dg/vect/vect-recurr-5.c: Likewise. * gcc.dg/vect/vect-recurr-6.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s252.c: Un-XFAIL. * gcc.dg/vect/tsvc/vect-tsvc-s254.c: Likewise. * gcc.dg/vect/tsvc/vect-tsvc-s291.c: Likewise. Co-authored-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>