aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-05-03reg-stack: Fix a -fcompare-debug bug in reg-stack [PR107183]Jakub Jelinek2-21/+77
As the following testcase shows, the swap_rtx_condition function in reg-stack can result in different code generation between -g and -g0. The function is doing the changes as it goes, so does analysis and changes together, which makes it harder to deal with DEBUG_INSNs, where normally analysis phase ignores them and the later phase doesn't. swap_rtx_condition walks instructions two different ways, one is using next_flags_user function which stops on non-call instructions that mention the flags register, and the other is a loop on fnstsw where it stops on instructions mentioning it and tries to find sahf instruction that uses it (in both cases calls stop it and so does end of basic block). Now both of these currently stop on DEBUG_INSNs that mention the flags register resp. the fnstsw result register. On success the function recurses on next flags user instruction if still live and if the recursion failed, reverts the changes it did too and fails. If it were just for the next_flags_user case, the fix could be just not doing INSN_CODE (insn) = -1; if (recog_memoized (insn) == -1) fail = 1; on DEBUG_INSNs (assuming all changes to those are fine), swap_rtx_condition_1 just changes one comparison to a different one. But due to the possibility of fnstsw result being used in theory before sahf in some DEBUG_INSNs, this patch takes a different approach. swap_rtx_condition has now a new argument and two modes. The first mode is when debug_seen is >= 0, in this case both next_flags_user and the loop for fnstsw -> sahf will ignore but note DEBUG_INSNs (that mention flags register or fnstsw result). If no such DEBUG_INSN is found during the whole call including recursive invocations (so e.g. for -g0 but probably most often for -g as well), it behaves as before, if it returns true all the changes are done and nothing further needs to be done later. If any DEBUG_INSNs are seen along the way, even when returning success all the changes are reverted, so it just reports that the function would be successful if DEBUG_INSNs were ignored. In this case, compare_for_stack_reg needs to call it again in debug_seen = -1 mode, which tells the function to update everything including DEBUG_INSNs. For the fnstsw -> sahf case which I hope will be very rare I just reset the DEBUG_INSNs, I don't really know how to express it easily otherwise. For the rest swap_rtx_condition_1 is done even on the DEBUG_INSNs. 2022-11-20 Jakub Jelinek <jakub@redhat.com> PR target/107183 * reg-stack.c (next_flags_user): Add DEBUG_SEEN argument. If >= 0 and a DEBUG_INSN would be otherwise returned, set DEBUG_SEEN to 1 and ignore it. (swap_rtx_condition): Add DEBUG_SEEN argument. In >= 0 mode only set DEBUG_SEEN to 1 if problematic DEBUG_ISNSs were seen and revert all changes on success in that case. Don't try to recog_memoized DEBUG_INSNs. (compare_for_stack_reg): Adjust swap_rtx_condition caller. If it returns true and debug_seen is 1, call swap_rtx_condition again with debug_seen -1. * gcc.dg/ubsan/pr107183.c: New test. (cherry picked from commit 6b5c98c1c0003bd470a4428bede6c862637a94b8)
2023-05-03c, c++: Fix up excess precision handling of scalar_to_vector conversion ↵Jakub Jelinek2-2/+32
[PR107358] As mentioned earlier in the C++ excess precision support mail, the following testcase is broken with excess precision both in C and C++ (though just in C++ it was triggered in real-world code). scalar_to_vector is called in both FEs after the excess precision promotions (or stripping of EXCESS_PRECISION_EXPR), so we can then get invalid diagnostics that say float vector + float involves truncation (on ia32 from long double to float). The following patch fixes that by calling scalar_to_vector on the operands before the excess precision promotions, let scalar_to_vector just do the diagnostics (it does e.g. fold_for_warn so it will fold EXCESS_PRECISION_EXPR around REAL_CST to constants etc.) but will then do the actual conversions using the excess precision promoted operands (so say if we have vector double + (float + float) we don't actually do vector double + (float) ((long double) float + (long double) float) but vector double + (double) ((long double) float + (long double) float) 2022-10-24 Jakub Jelinek <jakub@redhat.com> PR c++/107358 gcc/c/ * c-typeck.c (build_binary_op): Pass operands before excess precision promotions to scalar_to_vector call. gcc/testsuite/ * c-c++-common/pr107358.c: New test. (cherry picked from commit 65e3274e363cb2c6bfe6b5e648916eb7696f7e2f)
2023-05-03c++: Fix up constexpr handling of char/signed char/short pre/post ↵Jakub Jelinek2-0/+27
inc/decrement [PR105774] signed char, char or short int pre/post inc/decrement are represented by normal {PRE,POST}_{INC,DEC}REMENT_EXPRs in the FE and only gimplification ensures that the {PLUS,MINUS}_EXPR is done in unsigned version of those types: case PREINCREMENT_EXPR: case PREDECREMENT_EXPR: case POSTINCREMENT_EXPR: case POSTDECREMENT_EXPR: { tree type = TREE_TYPE (TREE_OPERAND (*expr_p, 0)); if (INTEGRAL_TYPE_P (type) && c_promoting_integer_type_p (type)) { if (!TYPE_OVERFLOW_WRAPS (type)) type = unsigned_type_for (type); return gimplify_self_mod_expr (expr_p, pre_p, post_p, 1, type); } break; } This means during constant evaluation we need to do it similarly (either using unsigned_type_for or using widening to integer_type_node). The following patch does the latter. 2022-10-24 Jakub Jelinek <jakub@redhat.com> PR c++/105774 * constexpr.c (cxx_eval_increment_expression): For signed types that promote to int, evaluate PLUS_EXPR or MINUS_EXPR in int type. * g++.dg/cpp1y/constexpr-105774.C: New test. (cherry picked from commit da8c362c4c18cff2f2dfd5c4706bdda7576899a4)
2023-05-03libgomp: Fix up creation of artificial teamsJakub Jelinek6-6/+117
When not in explicit parallel/target/teams construct, we in some cases create an artificial parallel with a single thread (either to handle target nowait or for task reduction purposes). In those cases, it handled again artificially created implicit task (created by gomp_new_icv for cases where we needed to write to some ICVs), but as the testcases show, didn't take into account possibility of this being done from explicit task(s). The code would destroy/free the previous task and replace it with the new implicit task. If task is an explicit task (when teams is NULL, all explicit tasks behave like if (0)), it is a pointer to a local stack variable, so freeing it doesn't work, and additionally we shouldn't lose the explicit tasks - the new implicit task should instead replace the ancestor task which is the first implicit one. 2022-10-12 Jakub Jelinek <jakub@redhat.com> * task.c (gomp_create_artificial_team): Fix up handling of invocations from within explicit task. * target.c (GOMP_target_ext): Likewise. * testsuite/libgomp.c/task-7.c: New test. * testsuite/libgomp.c/task-8.c: New test. * testsuite/libgomp.c-c++-common/task-reduction-17.c: New test. * testsuite/libgomp.c-c++-common/task-reduction-18.c: New test. (cherry picked from commit a58a965eb73253759f6a3e1c7380392557da89c8)
2023-05-03openmp: Fix ICE with taskgroup at -O0 -fexceptions [PR107001]Jakub Jelinek3-3/+29
The following testcase ICEs because with -O0 -fexceptions GOMP_taskgroup_end call isn't directly followed by GOMP_RETURN statement, but there are some conditionals to handle exceptions and we fail to find the correct GOMP_RETURN. The fix is to treat taskgroup similarly to target data, both of these constructs emit a try { body } finally { end_call } around the construct's body during gimplification and we need to see proper construct nesting during gimplification and omp lowering (including nesting of regions checks), but during omp expansion we don't really need their nesting anymore, all we need is emit something at the start of the region and the end of the region is the end API call we've already emitted during gimplification. For target data, we weren't adding GOMP_RETURN statement during omp lowering, so after that pass it is treated merely like stand-alone omp directives. This patch does the same for taskgroup too. 2022-09-24 Jakub Jelinek <jakub@redhat.com> PR c/107001 * omp-low.c (lower_omp_taskgroup): Don't add GOMP_RETURN statement at the end. * omp-expand.c (build_omp_regions_1): Clarify GF_OMP_TARGET_KIND_DATA is not stand-alone directive. For GIMPLE_OMP_TASKGROUP, also don't update parent. (omp_make_gimple_edges) <case GIMPLE_OMP_TASKGROUP>: Reset cur_region back after new_omp_region. * c-c++-common/gomp/pr107001.c: New test. (cherry picked from commit ad2aab5c816a6fd56b46210c0a4a4c6243da1de9)
2023-05-03openmp, c: Tighten up c_tree_equal [PR106981]Jakub Jelinek3-6/+38
This patch changes c_tree_equal to work more like cp_tree_equal, be more strict in what it accepts. The ICE on the first testcase was due to INTEGER_CST wi::wide (t1) == wi::wide (t2) comparison which ICEs if the two constants have different precision, but as the second testcase shows, being too lenient in it can also lead to miscompilation of valid OpenMP programs where we think certain expression is the same even when it isn't and can be guaranteed at runtime to represent different memory location. So, the patch looks through only NON_LVALUE_EXPRs and for constants as well as casts requires that the types match before actually comparing the constant values or recursing on the cast operands. 2022-09-24 Jakub Jelinek <jakub@redhat.com> PR c/106981 gcc/c/ * c-typeck.c (c_tree_equal): Only strip NON_LVALUE_EXPRs at the start. For CONSTANT_CLASS_P or CASE_CONVERT: return false if t1 and t2 have different types. gcc/testsuite/ * c-c++-common/gomp/pr106981.c: New test. libgomp/ * testsuite/libgomp.c-c++-common/pr106981.c: New test. (cherry picked from commit 3c5bccb608c665ac3f62adb1817c42c845812428)
2023-05-03c++: Implement P2327R1 - De-deprecating volatile compound operationsJakub Jelinek5-14/+28
From what I can see, this has been voted in as a DR and as it means we warn less often than before in -std={gnu,c}++2{0,3} modes or with -Wvolatile, I wonder if it shouldn't be backported to affected release branches as well. 2022-08-16 Jakub Jelinek <jakub@redhat.com> * typeck.c (cp_build_modify_expr): Implement P2327R1 - De-deprecating volatile compound operations. Don't warn for |=, &= or ^= with volatile lhs. * expr.c (mark_use) <case MODIFY_EXPR>: Adjust warning wording, leave out simple. * g++.dg/cpp2a/volatile1.C: Adjust for de-deprecation of volatile compound |=, &= and ^= operations. * g++.dg/cpp2a/volatile3.C: Likewise. * g++.dg/cpp2a/volatile5.C: Likewise. (cherry picked from commit 6e790ca4615443fa395ac5cdba1ab6c87810985c)
2023-05-03cgraphunit: Don't emit asm thunks for -dx [PR106261]Jakub Jelinek2-1/+37
When -dx option is used (didn't know we have it and no idea what is it useful for), we just expand functions to RTL and then omit all further RTL passes, so the normal functions aren't actually emitted into assembly, just variables. The following testcase ICEs, because we don't emit the methods, but do emit thunks pointing to that and those thunks have unwind info and rely on at least some real functions to be emitted (which is normally the case, thunks are only emitted for locally defined functions) because otherwise there are no CIEs, only FDEs and dwarf2out is upset about it. The following patch fixes that by not emitting assembly thunks for -dx either. 2022-07-27 Jakub Jelinek <jakub@redhat.com> PR debug/106261 * cgraphunit.c (cgraph_node::assemble_thunks_and_aliases): Don't output asm thunks for -dx. * g++.dg/debug/pr106261.C: New test. (cherry picked from commit f9671b60f9395cb1dca128b92f5dd215f5aeaae1)
2023-05-03wide-int: Fix up wi::shifted_mask [PR106144]Jakub Jelinek1-1/+12
As the following self-test testcase shows, wi::shifted_mask sometimes doesn't create canonicalized wide_ints, which then fail to compare equal to canonicalized wide_ints with the same value. In particular, wi::mask (128, false, 128) gives { -1 } with len 1 and prec 128, while wi::shifted_mask (0, 128, false, 128) gives { -1, -1 } with len 2 and prec 128. The problem is that the code is written with the assumption that there are 3 bit blocks (or 2 if start is 0), but doesn't consider the possibility where there are 2 bit blocks (or 1 if start is 0) where the highest block isn't present. In that case, there is the optional block of negate ? 0 : -1 elts, followed by just one elt (either one from the if (shift) or just negate ? -1 : 0) and the rest is implicit sign-extension. Only if end < prec there is 1 or more bits above it that have different bit value and so we need to emit all the elts till end and then one more elt. if (end == prec) would work too, because we have: if (width > prec - start) width = prec - start; unsigned int end = start + width; so end is guaranteed to be end <= prec, dunno what is preferred. 2022-07-01 Jakub Jelinek <jakub@redhat.com> PR middle-end/106144 * wide-int.cc (wi::shifted_mask): If end >= prec, return right after emitting element for shift or if shift is 0 first element after start. (wide_int_cc_tests): Add tests for equivalency of wi::mask and wi::shifted_mask with 0 start. (cherry picked from commit e52592073f6df3d7a3acd9f0436dcc32a8b7493d)
2023-05-03ifcvt: Don't introduce trapping or faulting reads in noce_try_sign_mask ↵Jakub Jelinek2-7/+29
[PR106032] noce_try_sign_mask as documented will optimize if (c < 0) x = t; else x = 0; into x = (c >> bitsm1) & t; The optimization is done if either t is unconditional (e.g. for x = t; if (c >= 0) x = 0; ) or if it is cheap. We already check that t doesn't have side-effects, but if t is conditional, we need to punt also if it may trap or fault, as we make it unconditional. I've briefly skimmed other noce_try* optimizations and didn't find one that would suffer from the same problem. 2022-06-21 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/106032 * ifcvt.c (noce_try_sign_mask): Punt if !t_unconditional, and t may_trap_or_fault_p, even if it is cheap. * gcc.c-torture/execute/pr106032.c: New test. (cherry picked from commit a0c30fe3b888f20215f3e040d21b62b603804ca9)
2023-05-03expand: Fix up expand_cond_expr_using_cmove [PR106030]Jakub Jelinek2-1/+18
If expand_cond_expr_using_cmove can't find a cmove optab for a particular mode, it tries to promote the mode and perform the cmove in the promoted mode. The testcase in the patch ICEs on arm because in that case we pass temp which has the promoted mode (SImode) as target to expand_operands where the operands have the non-promoted mode (QImode). Later on the function uses paradoxical subregs: if (GET_MODE (op1) != mode) op1 = gen_lowpart (mode, op1); if (GET_MODE (op2) != mode) op2 = gen_lowpart (mode, op2); to change the operand modes. The following patch fixes it by passing NULL_RTX as target if it has promoted mode. 2022-06-21 Jakub Jelinek <jakub@redhat.com> PR middle-end/106030 * expr.c (expand_cond_expr_using_cmove): Pass NULL_RTX instead of temp to expand_operands if mode has been promoted. * gcc.c-torture/compile/pr106030.c: New test. (cherry picked from commit 2df1df945fac85d7b3d084001414a66a2709d8fe)
2023-05-03libgomp: Fix up target-31.c test [PR106045]Jakub Jelinek1-1/+1
The i variable is used inside of the parallel in: #pragma omp simd safelen(32) private (v) for (i = 0; i < 64; i++) { v = 3 * i; ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i]; } where i is predetermined linear (so while inside of the body it is safe, private per SIMD lane var) the final value is written to the shared variable, and in: for (i = 0; i < 64; i++) if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 + 14 + i) #pragma omp atomic write err = 1; which is a normal loop and so it isn't in any way privatized there. So we have a data race, fixed by adding private (i) clause to the parallel. 2022-06-21 Jakub Jelinek <jakub@redhat.com> Paul Iannetta <piannetta@kalrayinc.com> PR libgomp/106045 * testsuite/libgomp.c/target-31.c: Add private (i) clause. (cherry picked from commit 85d613da341b76308edea48359a5dbc7061937c4)
2023-05-03Daily bump.GCC Administrator1-1/+1
2023-05-02Daily bump.GCC Administrator1-1/+1
2023-05-01Daily bump.GCC Administrator1-1/+1
2023-04-30Daily bump.GCC Administrator1-1/+1
2023-04-29Daily bump.GCC Administrator2-1/+10
2023-04-28libstdc++: Throw instead of segfaulting in std::thread constructor [PR 67791]Jonathan Wakely1-0/+10
This turns a mysterious segfault into an exception with a more useful message. If the exception isn't caught, the user sees this instead of just a segfault: terminate called after throwing an instance of 'std::system_error' what(): Enable multithreading to use std::thread: Operation not permitted Aborted (core dumped) libstdc++-v3/ChangeLog: PR libstdc++/67791 * src/c++11/thread.cc (thread::_M_start_thread(_State_ptr, void (*)())): Check that gthreads is available before calling __gthread_create. (cherry picked from commit 4bbd5d0c5fb2b7527938ad44a6d8a2f2ef8bbe12)
2023-04-28Daily bump.GCC Administrator2-1/+99
2023-04-27libstdc++: Fix outdated docs about demangling exception messagesJonathan Wakely2-22/+4
The string returned by std::bad_exception::what() hasn't been a mangled name since PR libstdc++/14493 was fixed for GCC 4.2.0, so remove the docs showing how to demangle it. libstdc++-v3/ChangeLog: * doc/xml/manual/extensions.xml: Remove std::bad_exception from example program. * doc/html/manual/ext_demangling.html: Regenerate. (cherry picked from commit 688d126b69215db29774c249b052e52d765782b3)
2023-04-27libstdc++: Reduce Doxygen output for PDFJonathan Wakely6-1/+11
Including the header source code in the doxygen-generated PDF file makes it too large, and causes pdflatex to run out of memory. If we only set SOURCE_BROWSER=YES for the HTML docs then we won't include the sources in the PDF file. There are several macros defined for std::valarray that are only used to generate repetitive code and then #undef'd. Those aren't useful in the doxygen docs, especially the ones that reuse the same name in different files. Omitting them avoids warnings about duplicate labels in the refman.tex file. libstdc++-v3/ChangeLog: * doc/doxygen/user.cfg.in (SOURCE_BROWSER): Only set to YES for HTML docs. * include/bits/gslice_array.h (_DEFINE_VALARRAY_OPERATOR): Omit from doxygen docs. * include/bits/indirect_array.h (_DEFINE_VALARRAY_OPERATOR): Likewise. * include/bits/mask_array.h (_DEFINE_VALARRAY_OPERATOR): Likewise. * include/bits/slice_array.h (_DEFINE_VALARRAY_OPERATOR): Likewise. * include/std/valarray (_DEFINE_VALARRAY_UNARY_OPERATOR) (_DEFINE_VALARRAY_AUGMENTED_ASSIGNMENT) (_DEFINE_VALARRAY_EXPR_AUGMENTED_ASSIGNMENT) (_DEFINE_BINARY_OPERATOR): Likewise. (cherry picked from commit afa69618d1627435841c9164b019ef98000e0365)
2023-04-27libstdc++: Fix dangling reference in filesystem::path::filename()Jonathan Wakely1-3/+3
The new -Wdangling-reference warning noticed this. libstdc++-v3/ChangeLog: * include/bits/fs_path.h (path::filename()): Fix dangling reference. (cherry picked from commit 49237fe6ef677a81eae701f937546210c90b5914)
2023-04-27libstdc++: Fix GDB Xmethod for std::shared_ptr::use_count() [PR109064]Jonathan Wakely2-1/+11
libstdc++-v3/ChangeLog: PR libstdc++/109064 * python/libstdcxx/v6/xmethods.py (SharedPtrUseCountWorker): Remove self-recursion in __init__. Add missing _supports. * testsuite/libstdc++-xmethods/shared_ptr.cc: Check use_count() and unique().
2023-04-27libstdc++: Fix uses_allocator_construction_args for pair<T&&, U&&> [PR108952]Jonathan Wakely4-2/+118
This implements LWG 3527 which fixes the handling of pair<T&&, U&&> in std::uses_allocator_construction_args. libstdc++-v3/ChangeLog: PR libstdc++/108952 * include/std/memory (uses_allocator_construction_args): Implement LWG 3527. * testsuite/20_util/pair/astuple/get-2.cc: New test. * testsuite/20_util/scoped_allocator/108952.cc: New test. * testsuite/20_util/uses_allocator/lwg3527.cc: New test. (cherry picked from commit 8e342c04550466ab088c33746091ce7f3498ee44)
2023-04-27libstdc++: Fix name of <experimental/optional> in commentJonathan Wakely1-1/+1
libstdc++-v3/ChangeLog: * include/experimental/optional: Fix header name in comment. (cherry picked from commit 38f321793ae18d25399f0396ac1371caa7cc7043)
2023-04-27libstdc++: Fix std::common_iterator assignment [PR100823]Jonathan Wakely2-40/+129
This fixes the following conformance problems reported in the PR: - Move constructor and move assignment should be defined. - Copy assignment from a valueless object should be allowed. Assignment is completely rewritten by this patch, as the previous version had a number of problems. The converting assignment failed to handle the case of assigning a new value to a valueless object, which should work. It only accepted lvalue arguments, so wasn't usable to implement the move assignment operator. Finally, it enforced the precondition that the argument is not valueless, which is correct for the converting assignment but not for the copy assignment. A new _M_assign member is added to handle all cases of assignment (copying from an lvalue, moving from an rvalue, and converting from a different type). The not valueless precondition is checked in the converting assignment before calling _M_assign, so isn't enforced for copy and move assignment. The new function no longer uses a switch, so handles valueless objects as the LHS or RHS of the assignment. libstdc++-v3/ChangeLog: PR libstdc++/100823 * include/bits/stl_iterator.h (common_iterator): Define move constructor and move assignment operator. (common_iterator::_M_assign): New function implementing assignment. (common_iterator::operator=): Use _M_assign. (common_iterator::_S_valueless): New constant. * testsuite/24_iterators/common_iterator/100823.cc: New test. (cherry picked from commit 56c999860bbbb2fd5091ba0985e2e5eaa90c6478)
2023-04-27libstdc++: Fix minor bugs in std::common_iteratorJonathan Wakely2-6/+28
The noexcept-specifier for some std::common_iterator constructors was incorrectly using an rvalue as the first argument of std::is_nothrow_assignable_v. This gave the wrong answer for some types, e.g. std::common_iterator<int*, S>, because an rvalue of scalar type cannot be assigned to. Also fix the friend declaration to use the same constraints as on the definition of the class template. G++ fails to diagnose this error, due to PR c++/96830. Finally, the copy constructor was using std::move for its argument in some cases, which should be removed. libstdc++-v3/ChangeLog: * include/bits/stl_iterator.h (common_iterator): Fix incorrect uses of is_nothrow_assignable_v. Fix inconsistent constraints on friend declaration. Do not move argument in copy constructor. * testsuite/24_iterators/common_iterator/1.cc: Check for noexcept constructibnle/assignable. (cherry picked from commit 3b5567c3ec7e5759bdecc6a6fc0be2b65a93636e)
2023-04-27libstdc++: Fix unsafe use of dirent::d_name [PR107814]Jonathan Wakely1-13/+22
Copy the fix for PR 104731 to the equivalent experimental::filesystem test. libstdc++-v3/ChangeLog: PR libstdc++/107814 * testsuite/experimental/filesystem/iterators/error_reporting.cc: Use a static buffer with space after it. (cherry picked from commit 1cac00d013856fea4cee0f13c4959c8e21afd2d9)
2023-04-27Daily bump.GCC Administrator1-1/+1
2023-04-26Daily bump.GCC Administrator2-1/+6
2023-04-25testsuite: remove stray ';' [PR109608]Jason Merrill1-1/+1
GCC 10 is still pedantic about empty declarations. PR testsuite/109608 gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-pmf3.C: Remove stray ';'.
2023-04-25Daily bump.GCC Administrator1-1/+1
2023-04-24Daily bump.GCC Administrator1-1/+1
2023-04-23Daily bump.GCC Administrator1-1/+1
2023-04-22Daily bump.GCC Administrator4-1/+55
2023-04-21c-family: -Wsequence-point and COMPONENT_REF [PR107163]Jason Merrill2-1/+43
The patch for PR91415 fixed -Wsequence-point to treat shifts and ARRAY_REF as sequenced in C++17, and COMPONENT_REF as well. But this is unnecessary for COMPONENT_REF, since the RHS is just a FIELD_DECL with no actual evaluation, and in this testcase handling COMPONENT_REF as sequenced blows up fast in a deep inheritance tree. Instead, look through it. PR c++/107163 gcc/c-family/ChangeLog: * c-common.c (verify_tree): Don't use sequenced handling for COMPONENT_REF. gcc/testsuite/ChangeLog: * g++.dg/warn/Wsequence-point-5.C: New test.
2023-04-21c++: constexpr PMF conversion [PR105996]Jason Merrill2-8/+18
Here, we were calling build_reinterpret_cast regardless of whether there was actually a cast, and that now sets REINTERPRET_CAST_P. But that optimization seems dodgy anyway, as it involves NOP_EXPR from one RECORD_TYPE to another and we try to reserve NOP_EXPR for fundamental types. And the generated code seems the same, so let's drop it. And also strip location wrappers. PR c++/105996 gcc/cp/ChangeLog: * typeck.c (build_ptrmemfunc): Drop 0-offset optimization and location wrappers. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/constexpr-pmf3.C: New test.
2023-04-21c++: constant, array, lambda, template [PR108975]Jason Merrill2-0/+17
When a lambda refers to a constant local variable in the enclosing scope, we tentatively capture it, but if we end up pulling out its constant value, we go back at the end of the lambda and prune any unneeded captures. Here while parsing the template we decided that the dim capture was unneeded, because we folded it away, but then we brought back the use in the template trees that try to preserve the source representation with added type info. So then when we tried to instantiate that use, we couldn't find what it was trying to use, and crashed. Fixed by not trying to prune when parsing a template; we'll prune at instantiation time. PR c++/108975 gcc/cp/ChangeLog: * lambda.c (prune_lambda_captures): Don't bother in a template. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/lambda/lambda-const11.C: New test.
2023-04-21c++: namespace-scoped friend in local class [PR69410]Jason Merrill3-5/+28
do_friend was only considering class-qualified identifiers for the qualified-id case, but we also need to skip local scope when there's an explicit namespace scope. PR c++/69410 gcc/cp/ChangeLog: * friend.c (do_friend): Handle namespace as scope argument. * decl.c (grokdeclarator): Pass down in_namespace. gcc/testsuite/ChangeLog: * g++.dg/lookup/friend24.C: New test.
2023-04-21c++: &enum::enumerator [PR101869]Jason Merrill2-1/+13
We don't want to call build_offset_ref with an enum. PR c++/101869 gcc/cp/ChangeLog: * semantics.c (finish_qualified_id_expr): Don't try to build a pointer-to-member if the scope is an enumeration. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/enum43.C: New test.
2023-04-21Daily bump.GCC Administrator1-1/+1
2023-04-20Daily bump.GCC Administrator1-1/+1
2023-04-19Daily bump.GCC Administrator3-1/+97
2023-04-18PR target/108589 - Check REG_P for AARCH64_FUSE_ADDSUB_2REG_CONST1Philipp Tomsich2-0/+16
This adds a check for REG_P on SET_DEST for the new idiom recognizer for AARCH64_FUSE_ADDSUB_2REG_CONST1. The reported ICE is only observable with checking=rtl. Bootstrapped/regtested aarch64-linux, committed. PR target/108589 gcc/ChangeLog: * config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Check REG_P on SET_DEST. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr108589.c: New test. (cherry picked from commit a39c6ec97906766ad65d15d4856fd41121ee7a45)
2023-04-18aarch64: disable LDP via tuning structure for -mcpu=ampere1Philipp Tomsich3-2/+22
AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops. Given the chance that this causes instructions to slip into the next decoding cycle and the additional overheads when handling cacheline-crossing LDP instructions, we disable the generation of LDP isntructions through the tuning structure from instruction combining (such as in peephole2). Given the code-density benefits in builtins and prologue/epilogue expansion, we allow LDPs there. This commit: * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE * allows -moverride=tune=... to override this These changes are benchmark-driven, yielding the following changes (with a net-overall improvement): 503.bwaves_r. -0.88% 507.cactuBSSN_r 0.35% 508.namd_r 3.09% 510.parest_r -2.99% 511.povray_r 5.54% 519.lbm_r 15.83% 521.wrf_r 0.56% 526.blender_r 2.47% 527.cam4_r 0.70% 538.imagick_r 0.00% 544.nab_r -0.33% 549.fotonik3d_r. -0.42% 554.roms_r 0.00% ------------------------- = total 1.79% Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu> Co-Authored-By: Di Zhao <di.zhao@amperecomputing.com> gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION): Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE. * config/aarch64/aarch64.c (aarch64_operands_ok_for_ldpstp): Check for the above tuning option when processing loads. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ampere1-no_ldp_combine.c: New test. (cherry picked from commit f200c56787f2c6f93ffb739d57d01a294ab72f68)
2023-04-18aarch64: update ampere1 vectorization costPhilipp Tomsich1-6/+6
The original submission of AmpereOne (-mcpu=ampere1) costs occurred prior to exhaustive testing of vectorizable workloads against hardware. Adjust the vector costs to achieve the best results and more closely match the underlying hardware. gcc/ChangeLog: * config/aarch64/aarch64.c: Update vector costs for ampere1. Co-Authored-By: Jiangning Liu <jiangning.liu@amperecomputing.com> Co-Authored-By: Manolis Tsamis <manolis.tsamis@vrull.eu> (cherry picked from commit ff1f2f2412bda118f7ddc10e69bd4284d9b24b9e)
2023-04-18aarch64: Add support for Ampere-1A (-mcpu=ampere1a) CPUPhilipp Tomsich6-2/+165
This patch adds support for Ampere-1A CPU: - recognize the name of the core and provide detection for -mcpu=native, - updated extra_costs, - adds a new fusion pair for (A+B+1 and A-B-1). Ampere-1A and Ampere-1 have more timing difference than the extra costs indicate, but these don't propagate through to the headline items in our extra costs (e.g. the change in latency for scalar sqrt doesn't have a corresponding table entry). gcc/ChangeLog: * config/aarch64/aarch64-cores.def (AARCH64_CORE): Add ampere1a. * config/aarch64/aarch64-cost-tables.h: Add ampere1a_extra_costs. * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR): Define a new fusion pair for A+B+1/A-B-1 (i.e., add/subtract two registers and then +1/-1). * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Implement idiom-matcher for the new fusion pair. * doc/invoke.texi: Add ampere1a. (cherry picked from commit 590a06afbf0e96813b5879742f38f3665512c854)
2023-04-18aarch64: update Ampere-1 core definitionPhilipp Tomsich1-1/+1
This brings the extensions detected by -mcpu=native on Ampere-1 systems in sync with the defaults generated for -mcpu=ampere1. Note that some early kernel versions on Ampere1 may misreport the presence of PAUTH and PREDRES (i.e., -mcpu=native will add 'nopauth' and 'nopredres'). gcc/ChangeLog: * config/aarch64/aarch64-cores.def (AARCH64_CORE): Update Ampere-1 core entry. (cherry picked from commit db2f5d661239737157cf131de7d4df1c17d8d88d)
2023-04-18aarch64: fix off-by-one in reading cpuinfoPhilipp Tomsich3-2/+25
Fixes: 341573406b39 Don't subtract one from the result of strnlen() when trying to point to the first character after the current string. This issue would cause individual characters (where the 128 byte buffers are stitched together) to be lost. gcc/ChangeLog: * config/aarch64/driver-aarch64.c (readline): Fix off-by-one. gcc/testsuite/ChangeLog: * gcc.target/aarch64/cpunative/info_18: New test. * gcc.target/aarch64/cpunative/native_cpu_18.c: New test. (cherry picked from commit b1cfbccc41de6aec950c0f662e7e85ab34bfff8a)
2023-04-18aarch64: enable Ampere-1 CPUPhilipp Tomsich5-4/+168
This adds support and a basic tuning model for the Ampere Computing "Ampere-1" CPU. The Ampere-1 implements the ARMv8.6 architecture in A64 mode and is modelled as a 4-wide issue (as with all modern micro-architectures, the chosen issue rate is a compromise between the maximum dispatch rate and the maximum rate of uops issued to the scheduler). This adds the -mcpu=ampere1 command-line option and the relevant cost information/tuning tables for the Ampere-1. gcc/ChangeLog: * config/aarch64/aarch64-cores.def (AARCH64_CORE): New Ampere-1 core. * config/aarch64/aarch64-tune.md: Regenerate. * config/aarch64/aarch64-cost-tables.h: Add extra costs for Ampere-1. * config/aarch64/aarch64.c: Add tuning structures for Ampere-1. * doc/invoke.texi: Add documentation for Ampere-1 core. (cherry picked from commit 67b0d47e20e655c0dd53a76ea88aab60fafb2059)