aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-09-27i386: Modernize AMD processor typesUros Bizjak2-41/+37
Use iterative PTA definitions for members of the same AMD processor family. Also, fix a couple of related M_CPU_TYPE/M_CPU_SUBTYPE inconsistencies. No functional changes intended. gcc/ChangeLog: * config/i386/i386.h: Add PTA_BDVER1, PTA_BDVER2, PTA_BDVER3, PTA_BDVER4, PTA_BTVER1 and PTA_BTVER2. * common/config/i386/i386-common.cc (processor_alias_table) <"bdver1">: Use PTA_BDVER1. <"bdver2">: Use PTA_BDVER2. <"bdver3">: Use PTA_BDVER3. <"bdver4">: Use PTA_BDVER4. <"btver1">: Use PTA_BTVER1. Use M_CPU_TYPE (AMD_BTVER1). <"btver2">: Use PTA_BTVER2. <"shanghai>: Use M_CPU_SUBTYPE (AMDFAM10H_SHANGHAI). <"istanbul>: Use M_CPU_SUBTYPE (AMDFAM10H_ISTANBUL).
2024-09-27Widening-Mul: Fix one ICE when iterate on phi nodePan Li2-2/+83
We iterate all phi node of bb to try to match the SAT_* pattern for scalar integer. We also remove the phi mode when the relevant pattern matched. Unfortunately the iterator may have no idea the phi node is removed and continue leverage the free data and then ICE similar as below. [0] psi ptr 0x75216340c000 [0] psi ptr 0x75216340c400 [1] psi ptr 0xa5a5a5a5a5a5a5a5 <=== GC freed pointer. during GIMPLE pass: widening_mul tmp.c: In function ‘f’: tmp.c:45:6: internal compiler error: Segmentation fault 45 | void f(int rows, int cols) { | ^ 0x36e2788 internal_error(char const*, ...) ../../gcc/diagnostic-global-context.cc:517 0x18005f0 crash_signal ../../gcc/toplev.cc:321 0x752163c4531f ??? ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0 0x103ae0e bool is_a_helper<gphi*>::test<gimple>(gimple*) ../../gcc/gimple.h:1256 0x103f9a5 bool is_a<gphi*, gimple>(gimple*) ../../gcc/is-a.h:232 0x103dc78 gphi* as_a<gphi*, gimple>(gimple*) ../../gcc/is-a.h:255 0x104f12e gphi_iterator::phi() const ../../gcc/gimple-iterator.h:47 0x1a57bef after_dom_children ../../gcc/tree-ssa-math-opts.cc:6140 0x3344482 dom_walker::walk(basic_block_def*) ../../gcc/domwalk.cc:354 0x1a58601 execute ../../gcc/tree-ssa-math-opts.cc:6312 This patch would like to fix the iterate on modified collection problem by backup the next phi in advance. The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. PR middle-end/116861 gcc/ChangeLog: * tree-ssa-math-opts.cc (math_opts_dom_walker::after_dom_children): Backup the next psi iterator before remove the phi node. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr116861-1.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-27Fix sorting in Contributors.htmlRichard Biener1-4/+4
The following moves my entry to where it belongs alphabetically (it wasn't moved when s/Guenther/Biener/). * doc/contrib.texi (Richard Biener): Move entry.
2024-09-27c++/coro: ignore cleanup_point_exprs while expanding awaits [PR116793]Arsen Arsenović2-1/+29
If we reach a CLEANUP_POINT_EXPR while trying to walk statements, we actually care about the statement or statement list contained within it. Indeed, such a construction started happening with r15-3513-g964577c31df206, after temporary promotion. In the test case presented in PR116793, the compiler generated: <<cleanup_point { struct _cleanup_task Aw0 [value-expr: frame_ptr->Aw0_2_3]; int T002 [value-expr: frame_ptr->T002_2_3]; int T002 [value-expr: frame_ptr->T002_2_3]; <<cleanup_point <<< Unknown tree: expr_stmt (void) (T002 = TARGET_EXPR <D.20994, 3>) >>>>>; struct _cleanup_task Aw0 [value-expr: frame_ptr->Aw0_2_3]; <<cleanup_point <<< Unknown tree: expr_stmt (void) (Aw0 = TARGET_EXPR <D.20995, func ((int &) &T002)>) >>>>>; <<cleanup_point <<< Unknown tree: expr_stmt (void) (D.22450 = <<< Unknown tree: co_await TARGET_EXPR <D.20995, func ((int &) &T002)> Aw0 {_cleanup_task::await_ready (&Aw0), _cleanup_task::await_suspend<_task1::promise_type> (&Aw0, TARGET_EXPR <D.21078, _Coro_self_handle>), <<< Unknown tree: aggr_init_expr 4 await_resume D.22443 &Aw0 >>>} 0 >>>) >>>>>; <<cleanup_point <<< Unknown tree: expr_stmt (void) (D.20991 = (struct tuple &) &D.22450) >>>>>; } D.22467 = 1; int & i [value-expr: frame_ptr->i_1_2]; <<cleanup_point <<< Unknown tree: expr_stmt (void) (i = std::get<0, int&> (NON_LVALUE_EXPR <D.20991>)) >>>>>;>>; ... i.e. a statement list within a cleanup point. In such a case, we don't actually care about the cleanup point, but we do care about the statement inside, so, we can just walk down into the CLEANUP_POINT_EXPR. PR c++/116793 gcc/cp/ChangeLog: * coroutines.cc (await_statement_expander): Just process subtrees if encountering a CLEANUP_POINT_EXPR. gcc/testsuite/ChangeLog: * g++.dg/coroutines/pr116793-1.C: New test.
2024-09-27c++: simplify handling implicit INDIRECT_REF and co_await in convert_to_voidArsen Arsenović4-53/+132
convert_to_void has, so far, when converting a co_await expression to void altered the await_resume expression of a co_await so that it is also converted to void. This meant that the type of the await_resume expression, which is also supposed to be the type of the whole co_await expression, was not the same as the type of the CO_AWAIT_EXPR tree. While this has not caused problems so far, it is unexpected, I think. Also, convert_to_void had a special case when an INDIRECT_REF wrapped a CALL_EXPR. In this case, we also diagnosed maybe_warn_nodiscard. This was a duplication of logic related to converting call expressions to void. Instead, we can generalize a bit, and rather discard the expression that was implicitly dereferenced instead. This patch changes the diagnostic of: void f(struct S* x) { static_cast<volatile S&>(*x); } ... from: warning: indirection will not access object of incomplete type 'volatile S' in statement ... to: warning: implicit dereference will not access object of type ‘volatile S’ in statement ... but should have no impact in other cases. gcc/cp/ChangeLog: * coroutines.cc (co_await_get_resume_call): Return a tree directly, rather than a tree pointer. * cp-tree.h (co_await_get_resume_call): Adjust signature accordingly. * cvt.cc (convert_to_void): Do not alter CO_AWAIT_EXPRs when discarding them. Simplify handling implicit INDIRECT_REFs. gcc/testsuite/ChangeLog: * g++.dg/coroutines/nodiscard-1.C: New test.
2024-09-27c++/coro: prevent ICV_STATEMENT diagnostics in temp promotion [PR116502]Arsen Arsenović3-3/+75
If such a diagnostic is necessary, it has already been emitted, otherwise, it is not correct and emitting it here is inactionable by the user, and bogus. PR c++/116502 gcc/cp/ChangeLog: * coroutines.cc (maybe_promote_temps): Convert temporary initializers to void without complaining. gcc/testsuite/ChangeLog: * g++.dg/coroutines/maybe-unused-1.C: New test. * g++.dg/coroutines/pr116502.C: New test.
2024-09-27tree-optimization/116818 - try VMAT_GATHER_SCATTER also for SLPRichard Biener1-14/+15
When not doing SLP and we end up with VMAT_ELEMENTWISE we consider using strided loads, aka VMAT_GATHER_SCATTER. The following moves this logic down to also apply to SLP where we now can end up using VMAT_ELEMENTWISE as well. PR tree-optimization/116818 * tree-vect-stmts.cc (get_group_load_store_type): Consider VMAT_GATHER_SCATTER instead of VMAT_ELEMENTWISE also for SLP. (vectorizable_load): For single-lane VMAT_GATHER_SCATTER also ignore permutations.
2024-09-27Fix bogus SLP nvector compute in check_load_store_for_partial_vectorsRichard Biener1-7/+3
We have a new overload for vect_get_num_copies that handles both SLP and non-SLP. Use it and avoid the division by group_size for SLP when not using load-store lanes. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Use the new vect_get_num_copies overload. Only divide by group_size for SLP for load-store lanes.
2024-09-27unswitch: Replace manual ondemand maybe_undef with ↵Andrew Pinski1-59/+2
ssa_name_maybe_undef_p/mark_ssa_maybe_undefs [PR116848] The ondemand maybe_undef that follows phis was added in r7-6427-g8b670f93ab1136 but then later ssa_name_maybe_undef_p/mark_ssa_maybe_undefs was added in r13-972-gbe2861fe8c527a. This moves the ondemand one to use mark_ssa_maybe_undefs/ssa_name_maybe_undef_p instead. Which itself will be faster since the mark_ssa_maybe_undefs is a walk based on the uses of undefined names (and only once) rather than a walk based on the def of ones which are more likely defined (and on demand). Even though the ondemand maybe_undef had some extra special cases, those won't make a big difference in most code. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/116848 gcc/ChangeLog: * tree-ssa-loop-unswitch.cc (tree_ssa_unswitch_loops): Call mark_ssa_maybe_undefs. (is_maybe_undefined): Call ssa_name_maybe_undef_p instead of ondemand undef. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-27c++/modules: Allow imported references in constant expressionsNathaniel Shead3-1/+23
Currently the streaming code uses TREE_CONSTANT to determine whether an entity will have a definition that is interesting to stream out. This is not sufficient, however; we also need to write the definition of references, since although not TREE_CONSTANT they can still be usable in constant expressions. As such this patch uses the existing decl_maybe_constant_var function which correctly handles this case. gcc/cp/ChangeLog: * module.cc (has_definition): Use decl_maybe_constant_var instead of TREE_CONSTANT. gcc/testsuite/ChangeLog: * g++.dg/modules/cexpr-5_a.C: New test. * g++.dg/modules/cexpr-5_b.C: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-27c++/modules: Fix linkage checks for exported using-declsNathaniel Shead7-41/+154
This fixes some inconsistencies with what kinds of linkage various entities are assumed to have. This also fixes handling of exported using-decls binding to GM entities and type aliases to better align with the standard's requirements. gcc/cp/ChangeLog: * name-lookup.cc (check_can_export_using_decl): Handle internal linkage GM entities (but ignore in header units); use linkage of entity ultimately referred to by aliases. gcc/testsuite/ChangeLog: * g++.dg/modules/using-10.C: Add tests for no-linkage, fix expected linkage of aliases. * g++.dg/modules/using-12.C: Likewise. * g++.dg/modules/using-27.C: New test. * g++.dg/modules/using-28_a.C: New test. * g++.dg/modules/using-28_b.C: New test. * g++.dg/modules/using-29.H: New test. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-27c++/modules: Use decl_linkage in maybe_record_mergeable_declNathaniel Shead1-8/+1
This avoids any possible inconsistencies (current or future) about whether a declaration is internal or not. gcc/cp/ChangeLog: * name-lookup.cc (maybe_record_mergeable_decl): Use decl_linkage instead of ad-hoc checks. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-27c++: Update decl_linkage for C++11Nathaniel Shead3-37/+60
Currently modules code uses a variety of ad-hoc methods to attempt to determine whether an entity has internal linkage, which leads to inconsistencies and some correctness issues as different edge cases are neglected. While investigating this I discovered 'decl_linkage', but it doesn't seem to have been updated to account for the C++11 clarification that all entities declared in an anonymous namespace are internal. I'm not convinced that even in C++98 it was intended that e.g. types in anonymous namespaces should be external, but some tests in the testsuite rely on this, so for compatibility I restricted those modifications to C++11 and later. This should have relatively minimal impact as not much seems to actually rely on decl_linkage, but does change the mangling of symbols in anonymous namespaces slightly. Previously, we had namespace { int x; // mangled as '_ZN12_GLOBAL__N_11xE' static int y; // mangled as '_ZN12_GLOBAL__N_1L1yE' } but with this patch the x is now mangled like y (with the extra 'L'). For contrast, Clang currently mangles neither x nor y with the 'L'. Since this only affects internal-linkage entities I don't believe this should break ABI in any observable fashion. gcc/cp/ChangeLog: * name-lookup.cc (do_namespace_alias): Propagate TREE_PUBLIC for namespace aliases. * tree.cc (decl_linkage): Update rules for C++11. gcc/testsuite/ChangeLog: * g++.dg/modules/mod-sym-4.C: Update test to account for non-static internal-linkage variables new mangling. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-27testsuite/gfortran.dg/open_errors_2.f90: Remove now-redundant file deletionHans-Peter Nilsson1-1/+0
Now that fort.N files are removed by the testsuite framework, remove this single "manual" file deletion. (Also, it should have been "remote_file target delete", since it's the target that creates the file, not the build framework, which might matter to some setups.) * gfortran.dg/open_errors_2.f90: Remove now-redundant file deletion.
2024-09-27Daily bump.GCC Administrator6-1/+130
2024-09-26c++: tweak for -Wrange-loop-construct [PR116731]Marek Polacek2-3/+62
This PR reports that the warning would be better off using a check for trivially constructible rather than trivially copyable. LLVM accepted a similar fix: https://github.com/llvm/llvm-project/issues/47355 PR c++/116731 gcc/cp/ChangeLog: * parser.cc (warn_for_range_copy): Check if TYPE is trivially constructible, not copyable. gcc/testsuite/ChangeLog: * g++.dg/warn/Wrange-loop-construct3.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-26testsuite: XFAIL gfortran.dg/initialization_25.f90 properlySam James1-2/+2
The test was disabled/XFAIL'd informally in r0-100012-gcdc6637d7c78ec, but r15-3890-g34bf6aa41ba539 didn't realize this, causing a FAIL. Fix that by marking it as XFAIL per the original intent. gcc/testsuite/ChangeLog: PR fortran/35779 PR fortran/116858 * gfortran.dg/initialization_25.f90: Mark as XFAIL.
2024-09-26doc: Remove index reference to removed documentation in fortran manualMikael Morin1-1/+0
Fortran option -M used to be an alias for -J. After some deprecation time, it was reused for another purpose at revision r0-100725-gd8ddea4044ee8212d5fe305e8e2a547700cd7b8f. That revision removed the documentation parts of -J mentioning -M, but left a reference to -M in the index. This change removes the remaining reference. gcc/fortran/ChangeLog: * invoke.texi (-M): Remove index reference to removed documentation.
2024-09-26Add virtual destructor to AbstractExprOwen Avery1-0/+2
gcc/rust/ChangeLog: * checks/errors/borrowck/rust-bir.h (class AbstractExpr): Add virtual destructor. Signed-off-by: Owen Avery <powerboat9.gamer@gmail.com>
2024-09-26tree-optimization/114855 - speed up dom_oracle::register_transitivesRichard Biener3-18/+44
dom_oracle::register_transitives contains an unbound dominator walk which for the testcase in PR114855 dominates the profile. The following fixes the unbound work done by assigning a constant work budget to the loop, bounding the number of dominators visited but also the number of relations processed. This gets both dom_oracle::register_transitives and get_immediate_dominator off the profile. I'll note that we're still doing an unbound dominator walk via equiv_set in find_equiv_dom at the start of the function and when we register a relation that also looks up the same way. At least for the testcase at hand this isn't an issue. I've also amended the guard to register_transitives with the per-basic-block limit for the number of relations registered not being exhausted. PR tree-optimization/114855 * params.opt (--param transitive-relations-work-bound): New. * doc/invoke.texi (--param transitive-relations-work-bound): Document. * value-relation.cc (dom_oracle::register_transitives): Assing an overall work budget, bounding the dominator walk and the number of relations processed. (dom_oracle::record): Only register_transitives when the number of already registered relations does not yet exceed the per-BB limit.
2024-09-26Fortran/OpenMP: Middle-end support for mapping of DT with allocatable componentsTobias Burnus5-26/+265
gcc/ChangeLog: * langhooks-def.h (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New. (LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT, LANG_HOOKS_OMP_DEEP_MAPPING): Define. (LANG_HOOKS_DECLS): Use it. * langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New stubs. * langhooks.h (struct lang_hooks_for_decls): Add new hooks * omp-expand.cc (expand_omp_target): Handle dynamic-size addr/sizes/kinds arrays. * omp-low.cc (build_sender_ref, fixup_child_record_type, scan_sharing_clauses, lower_omp_target): Update to handle new hooks and dynamic-size addr/sizes/kinds arrays.
2024-09-26pretty-print: Fix up allocate_objectJakub Jelinek1-1/+1
On Thu, Aug 29, 2024 at 06:58:12PM -0400, David Malcolm wrote: > The following patch rewrites the internals of pp_format. > The tokens and token lists are allocated on the chunk_obstack, and so > there's no additional heap activity required, with the memory reclaimed > when the chunk_obstack is freed after phase 3 of formatting. > +static void * > +allocate_object (size_t sz, obstack &s) > +{ > + /* We must not be half-way through an object. */ > + gcc_assert (obstack_base (&s) == obstack_next_free (&s)); > + > + obstack_grow (&s, obstack_base (&s), sz); > + void *buf = obstack_finish (&s); > + return buf; > } I think this is wrong. I hoped it would be the reason of the unexpected libstdc++ warnings on certain architectures after seeing ==4027220== Source and destination overlap in memcpy(0x4627154, 0x4627154, 12) ==4027220== at 0x404B93E: memcpy (vg_replace_strmem.c:1123) ==4027220== by 0xAAD5618: allocate_object(unsigned int, obstack&) (pretty-print.cc:1183) ==4027220== by 0xAAD8C0E: operator new (pretty-print.cc:1210) ==4027220== by 0xAAD8C0E: make (pretty-print-format-impl.h:305) ==4027220== by 0xAAD8C0E: format_phase_1 (pretty-print.cc:1659) ==4027220== by 0xAAD8C0E: pretty_printer::format(text_info&) (pretty-print.cc:1618) ==4027220== by 0xAAA840E: pp_format (pretty-print.h:583) ==4027220== by 0xAAA840E: diagnostic_context::report_diagnostic(diagnostic_info*) (diagnostic.cc:1260) ==4027220== by 0xAAA8703: diagnostic_context::diagnostic_impl(rich_location*, diagnostic_metadata const*, diagnostic_option_id, char const*, char**, diagnostic_t) (diagnostic.cc:1404) ==4027220== by 0xAAB8682: warning(diagnostic_option_id, char const*, ...) (diagnostic-global-context.cc:166) ==4027220== by 0x97725F5: warn_deprecated_use(tree_node*, tree_node*) (tree.cc:12485) ==4027220== by 0x8B6694B: mark_used(tree_node*, int) (decl2.cc:6121) ==4027220== by 0x8C9E25E: tsubst_expr(tree_node*, tree_node*, int, tree_node*) [clone .part.0] (pt.cc:21626) ==4027220== by 0x8C9E5E6: tsubst_expr(tree_node*, tree_node*, int, tree_node*) [clone .part.0] (pt.cc:20935) ==4027220== by 0x8C9E1D7: tsubst_expr(tree_node*, tree_node*, int, tree_node*) [clone .part.0] (pt.cc:20424) ==4027220== by 0x8C9DF2E: tsubst_expr(tree_node*, tree_node*, int, tree_node*) [clone .part.0] (pt.cc:20496) ==4027220== etc. valgrind warnings, unfortunately it is not, but still I think this is a bug. If the obstack has enough space in it, i.e. if obstack_room (&s) >= sz, then obstack_grow from obstack_base will copy uninitialized bytes through memcpy (obstack_base (&s), obstack_base (&s), sz); (which pedantically isn't valid due to the overlap, and so the reason why valgrind complains, but in reality I think most implementations can handle it fine, after all, we also use it for structure assignments which could have full or no overlap but never partial). If obstack_room (&s) < sz, then obstack_grow will first _obstack_newchunk (&s, sz); which will allocate new memory and copy the existing data of the object (but the above assertion guartantees it will copy 0 bytes) and then the memcpy copies sz bytes from the old base to the new (if unlucky, that could crash as there could be end of page and unmapped next page in between). I think we should use obstack_blank instead of obstack_grow, which does everything obstack_grow does, except for the memcpy of the uninitialized data. 2024-09-25 Jakub Jelinek <jakub@redhat.com> * pretty-print.cc (allocate_object): Use obstack_blank rather than obstack_grow.
2024-09-26testsuite: fix hyphen typosSam James2-3/+3
gcc/testsuite/ChangeLog: * g++.dg/modules/reparent-1_c.C: Fix whitespace around '-' in dg directive. * gfortran.dg/initialization_25.f90: Ditto.
2024-09-26testsuite: fix comment-only directive typosSam James4-4/+4
Doing this to avoid FPs from grepping but also to avoid the potential for people learning bad habits. gcc/testsuite/ChangeLog: * gfortran.dg/coarray/caf.exp: Fix 'dg-do-run' typo. * lib/gfortran-dg.exp: Ditto. * lib/gm2-dg.exp: Ditto. * lib/go-dg.exp: Ditto.
2024-09-26doc: Remove MinGW note on binutils 2.16Gerald Pfeifer1-4/+0
Binutils 2.16 is 13 years old; no need to specifically refer to it as a requirement. gcc: PR target/69374 * doc/install.texi (Specific) <*-*-mingw32>: Remove note regarding binutils 2.16.
2024-09-26[match.pd] Handle abs pattern with convertKugan Vivekanandarajah3-21/+61
gcc/ChangeLog: * match.pd: Extend A CMP 0 ? A : -A into (type)A CMP 0 ? A : -A. Extend A CMP 0 ? A : -A into (type) A CMP 0 ? A : -A. gcc/testsuite/ChangeLog: * g++.dg/absvect.C: New test. * gcc.dg/tree-ssa/absfloat16.c: New test. Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
2024-09-26x86: Extend AVX512 Vectorization for Popcount in Various ModesLevy Hsu2-0/+73
This patch enables vectorization of the popcount operation for V2QI, V4QI, V8QI, V2HI, V4HI, and V2SI modes. gcc/ChangeLog: * config/i386/mmx.md: (VQI_16_32_64): New mode iterator for 8-byte, 4-byte, and 2-byte QImode. (popcount<mode>2): New pattern for popcount of V2QI/V4QI/V8QI mode. (popcount<mode>2): New pattern for popcount of V2HI/V4HI mode. (popcountv2si2): New pattern for popcount of V2SI mode. gcc/testsuite/ChangeLog: * gcc.target/i386/part-vect-popcount-1.c: New test.
2024-09-26Define VECTOR_STORE_FLAG_VALUEliuhongt2-1/+30
gcc/ChangeLog: * config/i386/i386.h (VECTOR_STORE_FLAG_VALUE): New macro. gcc/testsuite/ChangeLog: * gcc.dg/rtl/x86_64/vector_eq.c: New test.
2024-09-26testsuite: Fix testcase g++.dg/modules/indirect-1_b.C [PR116846]Nathaniel Shead1-5/+2
r15-3878 exposed a mistake in the testcase, probably from an older version of the dumping logic. Apart from the slightly different syntax for the dump line, also check for importing the type_decl rather than the const_decl (we need the type anyway and importing the type also brings along the enumerators so it would be unnecessary to seed an import for them as well). PR c++/116846 gcc/testsuite/ChangeLog: * g++.dg/modules/indirect-1_b.C: Fix testcase. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
2024-09-26RISC-V: Add testcases for form 3 of signed vector SAT_ADDPan Li9-0/+126
Form 3: #define DEF_VEC_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T sum; \ bool overflow = __builtin_add_overflow (x, y, &sum); \ out[i] = overflow ? x < 0 ? MIN : MAX : sum; \ } \ } DEF_VEC_SAT_S_ADD_FMT_3 (int8_t, uint8_t, INT8_MIN, INT8_MAX) The below test are passed for this patch. * The rv64gcv fully regression test. It is test only patch and obvious up to a point, will commit it directly if no comments in next 48H. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-10.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-11.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-12.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-9.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-10.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-11.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-12.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_s_add-run-9.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-26Match: Support form 3 for vector signed integer .SAT_ADDPan Li1-1/+3
This patch would like to support the form 3 of the vector signed integer .SAT_ADD. Aka below example: Form 3: #define DEF_VEC_SAT_S_ADD_FMT_3(T, UT, MIN, MAX) \ void __attribute__((noinline)) \ vec_sat_s_add_##T##_fmt_3 (T *out, T *op_1, T *op_2, unsigned limit) \ { \ unsigned i; \ for (i = 0; i < limit; i++) \ { \ T x = op_1[i]; \ T y = op_2[i]; \ T sum; \ bool overflow = __builtin_add_overflow (x, y, &sum); \ out[i] = overflow ? x < 0 ? MIN : MAX : sum; \ } \ } DEF_VEC_SAT_S_ADD_FMT_3(int8_t, uint8_t, INT8_MIN, INT8_MAX) Before this patch: 40 │ # ivtmp.7_34 = PHI <0(3), ivtmp.7_30(7)> 41 │ _26 = op_1_12(D) + ivtmp.7_34; 42 │ x_29 = MEM[(int8_t *)_26]; 43 │ _1 = op_2_14(D) + ivtmp.7_34; 44 │ y_24 = MEM[(int8_t *)_1]; 45 │ _9 = .ADD_OVERFLOW (y_24, x_29); 46 │ _7 = IMAGPART_EXPR <_9>; 47 │ if (_7 != 0) 48 │ goto <bb 6>; [50.00%] 49 │ else 50 │ goto <bb 5>; [50.00%] 51 │ ;; succ: 6 52 │ ;; 5 53 │ 54 │ ;; basic block 5, loop depth 1 55 │ ;; pred: 4 56 │ _42 = REALPART_EXPR <_9>; 57 │ _2 = out_17(D) + ivtmp.7_34; 58 │ MEM[(int8_t *)_2] = _42; 59 │ ivtmp.7_27 = ivtmp.7_34 + 1; 60 │ if (_13 != ivtmp.7_27) 61 │ goto <bb 7>; [89.00%] 62 │ else 63 │ goto <bb 8>; [11.00%] 64 │ ;; succ: 7 65 │ ;; 8 66 │ 67 │ ;; basic block 6, loop depth 1 68 │ ;; pred: 4 69 │ _38 = x_29 < 0; 70 │ _39 = (signed char) _38; 71 │ _40 = -_39; 72 │ _41 = _40 ^ 127; 73 │ _33 = out_17(D) + ivtmp.7_34; 74 │ MEM[(int8_t *)_33] = _41; 75 │ ivtmp.7_25 = ivtmp.7_34 + 1; 76 │ if (_13 != ivtmp.7_25) After this patch: 77 │ _94 = .SELECT_VL (ivtmp_92, POLY_INT_CST [16, 16]); 78 │ vect_x_13.9_81 = .MASK_LEN_LOAD (vectp_op_1.7_79, 8B, { -1, ... }, _94, 0); 79 │ vect_y_15.12_85 = .MASK_LEN_LOAD (vectp_op_2.10_83, 8B, { -1, ... }, _94, 0); 80 │ vect_patt_49.13_86 = .SAT_ADD (vect_x_13.9_81, vect_y_15.12_85); 81 │ .MASK_LEN_STORE (vectp_out.14_88, 8B, { -1, ... }, _94, 0, vect_patt_49.13_86); 82 │ vectp_op_1.7_80 = vectp_op_1.7_79 + _94; 83 │ vectp_op_2.10_84 = vectp_op_2.10_83 + _94; 84 │ vectp_out.14_89 = vectp_out.14_88 + _94; 85 │ ivtmp_93 = ivtmp_92 - _94; The below test suites are passed for this patch. * The rv64gcv fully regression test. * The x86 bootstrap test. * The x86 fully regression test. gcc/ChangeLog: * match.pd: Add optional nop_convert for signed SAT_ADD case 4. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-26Daily bump.GCC Administrator12-1/+390
2024-09-26gfortran testsuite: Remove unit-files in files having open-statements, PR116701Hans-Peter Nilsson3-0/+24
PR testsuite/116701 shows that left-behind files from unnamed gfortran open statements (named unit.N, where N = unit number) can interfere with the result of a subsequent run. While that's unlikely to happen for a "real" fortran target or a test with a deleting close-statement, test-cases should not rely on previous test-cases passing and not execute along different execution paths depending on earlier runs, even if the difference is benevolent. Most but not all fortran test-cases go through gfortran-dg-runtest (gfortran.dg) or fortran-torture-execute (gfortran.fortran-torture). However, the exceptions, with more complex framework and call-chains, either don't run or don't have open-statements, so a more complex solution doesn't seem worthwhile. If test-cases with open-statements are added later to those parts of the test-suite, calls to fortran-delete-unit-files at the right spot may be added or worst case, "manual" cleanup-calls added, like: ! { dg-final { remote_file target delete "fort.10" } } Put the new proc in fortran-modules.exp since that's where other common fortran-testsuite dejagnu-library functions are located. PR testsuite/116701 * lib/fortran-modules.exp (fortran-delete-unit-files): New proc. * lib/gfortran-dg.exp (gfortran-dg-runtest): Call fortran-delete-unit-files after executing test. * lib/fortran-torture.exp (fortran-torture-execute): Ditto.
2024-09-25testsuite: XFAIL g++.dg/modules/indirect-1_b.CSam James1-4/+5
Mark the newly typo-fixed dg-final bits as XFAIL until investigated. gcc/testsuite/ChangeLog: PR c++/116846 * g++.dg/modules/indirect-1_b.C: Add XFAIL.
2024-09-25testsuite: fix dejagnu typos with underscoresSam James4-15/+15
Fix typos in dejagnu 'dg-*' directives with erroneous underscores like 'dg_'. gcc/testsuite/ChangeLog: PR debug/30161 PR c++/91826 PR c++/116846 * g++.dg/debug/dwarf2/template-func-params-7.C: Fix errant underscore. Cleanup whitespace in directives too. * g++.dg/lookup/pr91826.C: Fix errant underscore. * g++.dg/modules/indirect-1_b.C: Ditto. * gcc.target/powerpc/vsx-builtin-msum.c: Ditto.
2024-09-25doc: Remove @code wrapping of fortran option names [PR116801]Mikael Morin9-226/+556
The documentation of gfortran options uses @code wrappings for arguments to @opindex. This is superfluous, as 'op' index is a texinfo 'code' index, that is it already implicitly formats its arguments as if in a @code block. The superfluous wrapping has the effect of creating a nested <code class="..."> tag inside the regular automatic <code> tag, in the option index HTML page, preventing the recognition of the corresponding option by the option URL generation script. This change removes those superfluous @code wrappings. Additionally, variables appearing as separate argument in index are removed, permitting a few more URL recognition. Finally, the URL files are regenerated with the new URLs recognized on the updated HTML files. By the way, a spurious 'option' is removed from the label of the std= option in the index, without any effect on URL recognition. PR other/116801 gcc/fortran/ChangeLog: * invoke.texi: Remove @code wrapping in arguments to @opindex. (std=): Remove spurious 'option' in index. (idirafter, imultilib, iprefix, isysroot, iquote, isystem, fintrinsic-modules-path): Remove variable from index. * lang.opt.urls: Regenerate. gcc/ada/ChangeLog: * gcc-interface/lang.opt.urls: Regenerate. gcc/c-family/ChangeLog: * c.opt.urls: Regenerate. gcc/ChangeLog: * common.opt.urls: Regenerate. gcc/d/ChangeLog: * lang.opt.urls: Regenerate. gcc/go/ChangeLog: * lang.opt.urls: Regenerate. gcc/m2/ChangeLog: * lang.opt.urls: Regenerate. gcc/rust/ChangeLog: * lang.opt.urls: Regenerate.
2024-09-25i386: Add GENERIC and GIMPLE folders of __builtin_ia32_{min,max}* [PR116738]Jakub Jelinek3-0/+266
The following patch adds GENERIC and GIMPLE folders for various x86 min/max builtins. As discussed, these builtins have effectively x < y ? x : y (or x > y ? x : y) behavior. The GENERIC folding is done if all the (relevant) arguments are constants (such as VECTOR_CST for vectors) and is done because the GIMPLE folding can't easily handle masking, rounding and the ss/sd cases (in a way that it would be pattern recognized back to the corresponding instructions). The GIMPLE folding is also done just for TARGET_SSE4 or later when optimizing, otherwise it is apparently not matched back. 2024-09-25 Jakub Jelinek <jakub@redhat.com> PR target/116738 * config/i386/i386.cc (ix86_fold_builtin): Handle IX86_BUILTIN_M{IN,AX}{S,P}{S,H,D}*. (ix86_gimple_fold_builtin): Handle IX86_BUILTIN_M{IN,AX}P{S,H,D}*. * gcc.target/i386/avx512f-pr116738-1.c: New test. * gcc.target/i386/avx512f-pr116738-2.c: New test.
2024-09-26x86: Don't use address override with segment regsiterH.J. Lu2-1/+56
Address override only applies to the (reg32) part in the thread address fs:(reg32). Don't rewrite thread address like (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 98 [ __gmpfr_emax.0_1 ]) (mem/c:SI (plus:SI (plus:SI (unspec:SI [ (const_int 0 [0]) ] UNSPEC_TP) (reg:SI 107)) (const:SI (unspec:SI [ (symbol_ref:SI ("previous_emax") [flags 0x1a] <var_decl 0x7fffe9a11cf0 previous_emax>) ] UNSPEC_DTPOFF))) [1 previous_emax+0 S4 A32]))) if address override is used to avoid the invalid memory operand like cmpl %fs:previous_emax@dtpoff(%eax), %r12d gcc/ PR target/116839 * config/i386/i386.cc (ix86_rewrite_tls_address_1): Make it static. Return if TLS address is thread register plus an integer register. gcc/testsuite/ PR target/116839 * gcc.target/i386/pr116839.c: New file. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-09-25Fix testsuite failure on 32-bit targets.Thomas Koenig1-3/+3
gcc/testsuite/ChangeLog: * gfortran.dg/unsigned_25.f90: Change KIND=16 to KIND=8.
2024-09-25Add an alternative testcase for PR 70740Andrew Pinski1-0/+41
While looking into improving phiprop, I noticed that the current pr70740.c testcase was being optimized almost all the way before phiprop because the addresses were considered the same; the arrays were all zero in size. This adds an alternative testcase which changes the array sizes to be 1 and phiprop can and will act on this testcase now and the fix which was being tested is actually tested now. Tested on x86_64-linux-gnu. PR tree-optimization/70740 gcc/testsuite/ChangeLog: * gcc.dg/torture/pr70740-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-25match: Fix `a != 0 ? a * b : 0` patterns for things that trap [PR116772]Andrew Pinski4-2/+54
For generic, `a != 0 ? a * b : 0` would match where `b` would be an expression which trap (in the case of the testcase, it was an integer division but it could be any). This adds a new helper function, expr_no_side_effects_p which tests if there is no side effects and the expression is not trapping which might be used in other locations. Changes since v1: * v2: Add move check to helper function instead of inlining it. PR middle-end/116772 gcc/ChangeLog: * generic-match-head.cc (expr_no_side_effects_p): New function * gimple-match-head.cc (expr_no_side_effects_p): New function * match.pd (`a != 0 ? a / b : 0`): Check expr_no_side_effects_p. (`a != 0 ? a * b : 0`, `a != 0 ? a & b : 0`): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr116772-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-25c++: Add testcase for DR 2874Jakub Jelinek1-0/+13
Seems we already allow the partial specializations the way the DR clarifies, so this patch just adds a testcase which verifies that. 2024-09-25 Jakub Jelinek <jakub@redhat.com> * g++.dg/DRs/dr2874.C: New test.
2024-09-25c++: Add testcase for DR 2836Jakub Jelinek1-0/+30
Seems we already handle it the way the DR clarifies, if double/long double and std::float64_t have the same mode, foo has long double type (while x + y would be _Float64 in C23), so this patch just adds a testcase which verifies that. 2024-09-25 Jakub Jelinek <jakub@redhat.com> * g++.dg/DRs/dr2836.C: New test.
2024-09-25c++: Add testcase for DR 2728Jakub Jelinek1-0/+20
Seems we already handle delete expressions the way the DR clarifies, so this patch just adds a testcase which verifies that. 2024-09-25 Jakub Jelinek <jakub@redhat.com> * g++.dg/DRs/dr2728.C: New test.
2024-09-25match: Fix A || B not optimized to true when !B implies A [PR114326]Konstantinos Eleftheriou3-0/+140
In expressions like (a != b || ((a ^ b) & c) == d) and (a != b || (a ^ b) == c), (a ^ b) is folded to false. In the equivalent expressions (((a ^ b) & c) == d || a != b) and ((a ^ b) == c || a != b) this is not happening. This patch adds the following simplifications in match.pd: ((a ^ b) & c) cmp d || a != b --> 0 cmp d || a != b (a ^ b) cmp c || a != b --> 0 cmp c || a != b PR tree-optimization/114326 gcc/ChangeLog: * match.pd: Add two patterns to fold a ^ b to 0, when a == b. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/fold-xor-and-or.c: New test. * gcc.dg/tree-ssa/fold-xor-or.c: New test. Tested-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>
2024-09-25Speed up get_bitmask_from_rangeRichard Biener1-4/+1
When min != max we know min ^ max != 0. * value-range.cc (get_bitmask_from_range): Remove redundant compare of xorv with zero.
2024-09-25Speed up wide_int_storage::operator=(wide_int_storage const&)Richard Biener1-3/+3
wide_int_storage shows up high in the profile for the testcase in PR114855 where the apparent issue is that the conditional jump on 'precision' after the (inlined) memcpy stalls the pipeline due to the data dependence and required store-to-load forwarding. We can add scheduling freedom by instead testing precision as from the source which speeds up the function by 30%. I've applied the same logic to the copy CTOR. * wide-int.h (wide_int_storage::wide_int_storage): Branch on source precision to avoid data dependence on memcpy destination. (wide_int_storage::operator=): Likewise.
2024-09-25c++: use TARGET_EXPR accessorsMarek Polacek6-36/+37
While futzing around with PR116416 I noticed that we can use the _SLOT and _INITIAL macros to make the code more readable. gcc/c-family/ChangeLog: * c-pretty-print.cc (c_pretty_printer::primary_expression): Use TARGET_EXPR accessors. (c_pretty_printer::expression): Likewise. gcc/cp/ChangeLog: * coroutines.cc (build_co_await): Use TARGET_EXPR accessors. (finish_co_yield_expr): Likewise. (register_awaits): Likewise. (tmp_target_expr_p): Likewise. (flatten_await_stmt): Likewise. * error.cc (dump_expr): Likewise. * semantics.cc (finish_omp_target_clauses): Likewise. * tree.cc (bot_manip): Likewise. (cp_tree_equal): Likewise. * typeck.cc (cxx_mark_addressable): Likewise. (cp_build_compound_expr): Likewise. (cp_build_modify_expr): Likewise. (check_return_expr): Likewise. Reviewed-by: Jason Merrill <jason@redhat.com>
2024-09-25match: Change (A * B) + (-C) to (B - C/A) * A, if C multiple of A [PR109393]Konstantinos Eleftheriou2-1/+43
The following function: int foo(int *a, int j) { int k = j - 1; return a[j - 1] == a[k]; } does not fold to `return 1;` using -O2 or higher. The cause of this is that the expression `4 * j + (-4)` for the index computation is not folded to `4 * (j - 1)`. Existing simplifications that handle similar cases are applied when A == C, which is not the case in this instance. A previous attempt to address this issue is https://gcc.gnu.org/pipermail/gcc-patches/2024-April/649896.html This patch adds the following simplification in match.pd: (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A which also handles cases where the index is j - 2, j - 3, etc. Bootstrapped for all languages and regression tested on x86-64 and aarch64. PR tree-optimization/109393 gcc/ChangeLog: * match.pd: (A * B) + (-C) -> (B - C/A) * A, if C a multiple of A. gcc/testsuite/ChangeLog: * gcc.dg/pr109393.c: New test. Tested-by: Christoph Müllner <christoph.muellner@vrull.eu> Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Signed-off-by: Konstantinos Eleftheriou <konstantinos.eleftheriou@vrull.eu>
2024-09-25remove dominator recursion from reassocRichard Biener1-13/+32
The reassoc pass currently walks dominators in a recursive way where I ran into a stack overflow with. The following replaces it with worklists following patterns used elsewhere. * tree-ssa-reassoc.cc (break_up_subtract_bb): Remove recursion. (reassociate_bb): Likewise. (do_reassoc): Implement worklist based dominator walks for both break_up_subtract_bb and reassociate_bb.