aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-05-04tree-optimization/109724 - new testcaseRichard Biener1-0/+32
The following adds a testcase for PR109724 which was caused by backporting r13-2375-gbe1b42de9c151d and fixed by r11-199-g2b42509f8b7bdf. PR tree-optimization/109724 * g++.dg/torture/pr109724.C: New testcase. (cherry picked from commit ee99aaae4aeecd55f1d945a959652cf07e3b2e9e)
2023-05-04Revert "tree-optimization/106809 - compile time hog in VN"Richard Biener2-58/+27
This reverts commit 051f78a5c1d6994c10ee7c35453ff0ccee94e5c6.
2023-05-04Daily bump.GCC Administrator12-1/+964
2023-05-03libstdc++: Ensure constexpr std::lcm detects out-of-range result [PR105844]Jonathan Wakely2-1/+6
On the gcc-10 branch, __glibcxx_assert does not unconditionally check the condition during constant evaluation. This means we need an explicit additional check for std::lcm results that cannot be represented in an unsigned result type. libstdc++-v3/ChangeLog: PR libstdc++/105844 * include/std/numeric (lcm): Ensure out-of-range result is detected in constant evaluation. * testsuite/26_numerics/lcm/105844.cc: Adjust dg-error string.
2023-05-03libstdc++: Make std::lcm and std::gcd detect overflow [PR105844]Jonathan Wakely6-61/+131
When I fixed PR libstdc++/92978 I introduced a regression whereby std::lcm(INT_MIN, 1) and std::lcm(50000, 49999) would no longer produce errors during constant evaluation. Those calls are undefined, because they violate the preconditions that |m| and the result can be represented in the return type (which is int in both those cases). The regression occurred because __absu<unsigned>(INT_MIN) is well-formed, due to the explicit casts to unsigned in that new helper function, and the out-of-range multiplication is well-formed, because unsigned arithmetic wraps instead of overflowing. To fix 92978 I made std::gcm and std::lcm calculate |m| and |n| immediately, yielding a common unsigned type that was used to calculate the result. That was partly correct, but there's no need to use an unsigned type. Doing so only suppresses the overflow errors so the compiler can't detect them. This change replaces __absu with __abs_r that returns the common type (not its corresponding unsigned type). This way we can detect overflow in __abs_r when required, while still supporting the most-negative value when it can be represented in the result type. To detect LCM results that are out of range of the result type we still need explicit checks, because neither constant evaluation nor UBsan will complain about unsigned wrapping for cases such as std::lcm(500000u, 499999u). We can detect those overflows efficiently by using __builtin_mul_overflow and asserting. libstdc++-v3/ChangeLog: PR libstdc++/105844 * include/experimental/numeric (experimental::gcd): Simplify assertions. Use __abs_r instead of __absu. (experimental::lcm): Likewise. Remove use of __detail::__lcm so overflow can be detected. * include/std/numeric (__detail::__absu): Rename to __abs_r and change to allow signed result type, so overflow can be detected. (__detail::__lcm): Remove. (gcd): Simplify assertions. Use __abs_r instead of __absu. (lcm): Likewise. Remove use of __detail::__lcm so overflow can be detected. * testsuite/26_numerics/gcd/gcd_neg.cc: Adjust dg-error lines. * testsuite/26_numerics/lcm/lcm_neg.cc: Likewise. * testsuite/26_numerics/gcd/105844.cc: New test. * testsuite/26_numerics/lcm/105844.cc: New test. (cherry picked from commit 671970a5621e18e7079b4ca113e56434c858db66)
2023-05-03libstdc++: Strip absolute paths from files shown in Doxygen docsJonathan Wakely1-1/+2
This avoids showing absolute paths from the expansion of @srcdir@/libsupc++/ in the doxygen File List view. libstdc++-v3/ChangeLog: * doc/doxygen/user.cfg.in (STRIP_FROM_PATH): Remove prefixes from header paths. (cherry picked from commit 975e8e836ead0e9055a125a2a23463db5d847cb3)
2023-05-03call_summary: add missing template keywordAnthony Sharp1-2/+2
Without the 'template', this function template compares 'traverse' to 'f', and then compares the result to 'a'. Evidently it hasn't been instantiated yet. gcc/ChangeLog: * symbol-summary.h: Added missing template keyword. (cherry picked from commit fccd5b48adf568f0aabe5d5f51206a9d42da095a)
2023-05-03reassoc: Fix up another ICE with returns_twice call [PR109410]Jakub Jelinek2-0/+28
The following testcase ICEs in reassoc, unlike the last case I've fixed there here SSA_NAME_USED_IN_ABNORMAL_PHI is not the case anywhere. build_and_add_sum places new statements after the later appearing definition of an operand but if both operands are default defs or constants, we place statement at the start of the function. If the very first statement of a function is a call to returns_twice function, this doesn't work though, because that call has to be the first thing in its basic block, so the following patch splits the entry successor edge such that the new statements are added into a different block from the returns_twice call. I think we should in stage1 reconsider such placements, I think it unnecessarily enlarges the lifetime of the new lhs if its operand(s) are used more than once in the function. Unless something sinks those again. Would be nice to place it closer to the actual uses (or where they will be placed). 2023-04-12 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109410 * tree-ssa-reassoc.c (build_and_add_sum): Split edge from entry block if first statement of the function is a call to returns_twice function. * gcc.dg/pr109410.c: New test. (cherry picked from commit 51856718a82ce60f067910d9037ca255645b37eb)
2023-05-03libiberty: Make strstr.c in libiberty ANSI compliantJakub Jelinek1-0/+3
On Fri, Nov 13, 2020 at 11:53:43AM -0700, Jeff Law via Gcc-patches wrote: > > On 5/1/20 6:06 PM, Seija Kijin via Gcc-patches wrote: > > The original code in libiberty says "FIXME" and then says it has not been > > validated to be ANSI compliant. However, this patch changes the function to > > match implementations that ARE compliant, and such code is in the public > > domain. > > > > I ran the test results, and there are no test failures. > > Thanks.  This seems to be the standard "simple" strstr implementation.  > There's significantly faster implementations available, but I doubt it's > worth the effort as the version in this file only gets used if there is > no system strstr.c. Except that PR109306 says the new version is non-compliant and is certainly slower than what we used to have. The only problem I see on the old version (sure, it is not very fast version) is that for strstr ("abcd", "") it returned "abcd"+4 rather than "abcd" because strchr in that case changed p to point to the last character and then strncmp returned 0. The question reported in PR109306 is whether memcmp is required not to access characters beyond the first difference or not. For all of memcmp/strcmp/strncmp, C17 says: "The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared." but then in memcmp description says: "The memcmp function compares the first n characters of the object pointed to by s1 to the first n characters of the object pointed to by s2." rather than something similar to strncmp wording: "The strncmp function compares not more than n characters (characters that follow a null character are not compared) from the array pointed to by s1 to the array pointed to by s2." So, while for strncmp it seems clearly well defined when there is zero terminator before reaching the n, for memcmp it is unclear if say int memcmp (const void *s1, const void *s2, size_t n) { int ret = 0; size_t i; const unsigned char *p1 = (const unsigned char *) s1; const unsigned char *p2 = (const unsigned char *) s2; for (i = n; i; i--) if (p1[i - 1] != p2[i - 1]) ret = p1[i - 1] < p2[i - 1] ? -1 : 1; return ret; } wouldn't be valid implementation (one which always compares all characters and just returns non-zero from the first one that differs). So, shouldn't we just revert and handle the len == 0 case correctly? I think almost nothing really uses it, but still, the old version at least worked nicer with a fast strchr. Could as well strncmp (p + 1, s2 + 1, len - 1) if that is preferred because strchr already compared the first character. 2023-04-02 Jakub Jelinek <jakub@redhat.com> PR other/109306 * strstr.c (strstr): Return s1 if len is 0. (cherry picked from commit 1719fa40c4ee4def60a2ce2f27e17f8168cf28ba)
2023-05-03sanopt: Return TODO_cleanup_cfg if any .{UB,HWA,A}SAN_* calls were lowered ↵Jakub Jelinek2-1/+20
[PR106190] The following testcase ICEs, because without optimization eh lowering decides not to duplicate finally block of try/finally and so we end up with variable guarded cleanup. The sanopt pass creates a cfg that ought to be cleaned up (some IFN_UBSAN_* functions are lowered in this case with constant conditions in gcond and when not allowing recovery some bbs which end with noreturn calls actually have successor edges), but the cfg cleanup is actually (it is -O0) done only during the optimized pass. We notice there that the d[1][a] = 0; statement which has an EH edge is unreachable (because ubsan would always abort on the out of bounds d[1] access), remove the EH landing pad and block, but because that block just sets a variable and jumps to another one which tests that variable and that one is reachable from normal control flow, the __builtin_eh_pointer (1) later in there is kept in the IL and we ICE during expansion of that statement because the EH region has been removed. The following patch fixes it by doing the cfg cleanup already during sanopt pass if we create something that might need it, while the EH landing pad is then removed already during sanopt pass, there is ehcleanup later and we don't ICE anymore. 2023-03-28 Jakub Jelinek <jakub@redhat.com> PR middle-end/106190 * sanopt.c (pass_sanopt::execute): Return TODO_cleanup_cfg if any of the IFN_{UB,HWA,A}SAN_* internal fns are lowered. * gcc.dg/asan/pr106190.c: New test. (cherry picked from commit 39a43dc336561e0eba0de477b16c7355f19d84ee)
2023-05-03predict: Don't emit -Wsuggest-attribute=cold warning for functions which ↵Jakub Jelinek2-1/+22
already have that attribute [PR105685] In the following testcase, we predict baz to have cold entry regardless of the user supplied attribute (as it call unconditionally a cold function), but still issue a -Wsuggest-attribute=cold warning despite it having that attribute already. The following patch avoids that. 2023-03-26 Jakub Jelinek <jakub@redhat.com> PR ipa/105685 * predict.c (compute_function_frequency): Don't call warn_function_cold if function already has cold attribute. * c-c++-common/cold-2.c: New test. (cherry picked from commit 7eca91d4781bb3df941f25c30b971dac66ba1b3d)
2023-05-03c++: Drop TREE_READONLY on vars (possibly) initialized by tls wrapper [PR109164]Jakub Jelinek7-1/+115
The following two testcases are miscompiled, because we keep TREE_READONLY on the vars even when they are (possibly) dynamically initialized by a TLS wrapper function. Normally cp_finish_decl drops TREE_READONLY from vars which need dynamic initialization, but for TLS we do this kind of initialization upon every access to those variables. Keeping them TREE_READONLY means e.g. PRE can hoist loads from those before loops which contain the TLS wrapper calls, so we can access the TLS variables before they are initialized. 2023-03-20 Jakub Jelinek <jakub@redhat.com> PR c++/109164 * cp-tree.h (var_needs_tls_wrapper): Declare. * decl2.c (var_needs_tls_wrapper): No longer static. * decl.c (cp_finish_decl): Clear TREE_READONLY on TLS variables for which a TLS wrapper will be needed. * g++.dg/tls/thread_local13.C: New test. * g++.dg/tls/thread_local13-aux.cc: New file. * g++.dg/tls/thread_local14.C: New test. * g++.dg/tls/thread_local14-aux.cc: New file. (cherry picked from commit 0a846340b99675d57fc2f2923a0412134eed09d3)
2023-05-03tree-inline: Fix up multiversioning with vector arguments [PR105554]Jakub Jelinek4-11/+16
The following testcase ICEs, because we call tree_function_versioning from old_decl which has target attributes not supporting V4DImode and so DECL_MODE of DECL_ARGUMENTS is BLKmode, while new_decl supports those. tree_function_versioning initially copies DECL_RESULT and DECL_ARGUMENTS from old_decl to new_decl, then calls initialize_cfun to create cfun and only when the cfun is created it can later actually remap_decl DECL_RESULT and DECL_ARGUMENTS etc. The problem is that initialize_cfun -> push_struct_function -> allocate_struct_function calls relayout_decl on DECL_RESULT and DECL_ARGUMENTS, which clobbers DECL_MODE of old_decl and we then ICE because of it. In particular, allocate_struct_function does: if (!abstract_p) { /* Now that we have activated any function-specific attributes that might affect layout, particularly vector modes, relayout each of the parameters and the result. */ relayout_decl (result); for (tree parm = DECL_ARGUMENTS (fndecl); parm; parm = DECL_CHAIN (parm)) relayout_decl (parm); /* Similarly relayout the function decl. */ targetm.target_option.relayout_function (fndecl); } if (!abstract_p && aggregate_value_p (result, fndecl)) { #ifdef PCC_STATIC_STRUCT_RETURN cfun->returns_pcc_struct = 1; #endif cfun->returns_struct = 1; } Now, in the case of tree_function_versioning, I believe all that we need from these is possibly the targetm.target_option.relayout_function (fndecl); call (arm only), we will remap DECL_RESULT and DECL_ARGUMENTS later on and copy_decl_for_dup_finish in that case will handle all we need: /* For vector typed decls make sure to update DECL_MODE according to the new function context. */ if (VECTOR_TYPE_P (TREE_TYPE (copy))) SET_DECL_MODE (copy, TYPE_MODE (TREE_TYPE (copy))); We don't need the cfun->returns_*struct either, because we override it in initialize_cfun a few lines later: /* Copy items we preserve during cloning. */ ... cfun->returns_struct = src_cfun->returns_struct; cfun->returns_pcc_struct = src_cfun->returns_pcc_struct; So, to avoid the clobbering of DECL_RESULT/DECL_ARGUMENTS of old_decl, the following patch arranges allocate_struct_function to be called with abstract_p true and calls targetm.target_option.relayout_function (fndecl); by hand. The removal of DECL_RESULT/DECL_ARGUMENTS copying at the start of initialize_cfun is removed because the only caller - tree_function_versioning, does that unconditionally before. 2023-03-17 Jakub Jelinek <jakub@redhat.com> PR target/105554 * function.h (push_struct_function): Add ABSTRACT_P argument defaulted to false. * function.c (push_struct_function): Add ABSTRACT_P argument, pass it to allocate_struct_function instead of false. * tree-inline.c (initialize_cfun): Don't copy DECL_ARGUMENTS nor DECL_RESULT here. Pass true as ABSTRACT_P to push_struct_function. Call targetm.target_option.relayout_function after it. (tree_function_versioning): Formatting fix. * gcc.target/i386/pr105554.c: New test. (cherry picked from commit 24c06560a7fa39049911eeb8777325d112e0deb9)
2023-05-03c, ubsan: Instrument even shortened divisions [PR109151]Jakub Jelinek2-2/+16
On the following testcase, the C FE decides to shorten the division because it has a guarantee that INT_MIN / -1 division won't be encountered, the first operand is widened from narrower unsigned and/or the second operand is a constant other than all ones (in this case both are true). The problem is that the narrower type in this case is _Bool and ubsan_instrument_division only instruments it if op0's type is INTEGER_TYPE or REAL_TYPE. Strangely this doesn't happen in C++ FE. Anyway, we only shorten divisions if the INT_MIN / -1 case is impossible, so I think we should be fine even with -fstrict-enums in C++ in case it shortened to ENUMERAL_TYPEs. The following patch just instruments those on the ubsan_instrument_division side. Perhaps only the first hunk and testcase might be needed because we shouldn't shorten if the other case could be triggered. 2023-03-17 Jakub Jelinek <jakub@redhat.com> PR c/109151 * c-ubsan.c (ubsan_instrument_division): Handle all scalar integral types rather than just INTEGER_TYPE. * c-c++-common/ubsan/div-by-zero-8.c: New test. (cherry picked from commit 103d423f6ce72ccb03d55b7b1dfa2dabd5854371)
2023-05-03openmp: Fix up handling of doacross loops with noreturn body in loops [PR108685]Jakub Jelinek2-4/+27
The following patch fixes an ICE with doacross loops which have a single entry no exit body, at least one of the ordered > collapse loops isn't guaranteed to have at least one iteration and the whole doacross loop is inside some other loop. The OpenMP constructs aren't represented by struct loop until the omp expansions, so for a normal doacross loop which doesn't have a noreturn body the entry_bb with the GOMP_FOR statement and the first bb of the body typically have the same loop_father, and if the doacross loop isn't inside of some other loop and the body is noreturn as well, both are part of loop 0. The problematic case is when the entry_bb is inside of some deeper loop, but the body, because it falls through into EXIT, has loop 0 as loop_father. l0_bb is created by splitting the entry_bb fallthru edge into l1_bb, and because the two basic blocks have different loop_father, a common loop is found for those (which is loop 0). Now, if the doacross loop has collapse == ordered or all the ordered > collapse loops are guaranteed to iterate at least once, all is still fine, because all enter the l1_bb (body), which doesn't return and so doesn't loop further either. But, if one of those loops could loop 0 times, the user written body wouldn't be reached at all, so unlike the expectations the whole construct actually wouldn't be noreturn if entry_bb is encountered and decides to handle at least one iteration. In this case, we need to fix up, move the l0_bb into the same loop as entry_bb (initially) and for the extra added loops put them as children of that same loop, rather than of loop 0. 2023-03-17 Jakub Jelinek <jakub@redhat.com> PR middle-end/108685 * omp-expand.c (expand_omp_for_ordered_loops): Add L0_BB argument, use its loop_father rather than BODY_BB's loop_father. (expand_omp_for_generic): Adjust expand_omp_for_ordered_loops caller. If broken_loop with ordered > collapse and at least one of those extra loops aren't guaranteed to have at least one iteration, change l0_bb's loop_father to entry_bb's loop_father. Set cont_bb's loop_father to l0_bb's loop_father rather than l1_bb's. * c-c++-common/gomp/doacross-8.c: New test. (cherry picked from commit 713fa5db8ceb4ba8783a0d690ceb4c07f2ff03d0)
2023-05-03c++: Treat unnamed bitfields as padding for ↵Jakub Jelinek2-2/+12
__has_unique_object_representations [PR109096] As reported in the PR, for __has_unique_object_representations we were treating unnamed bitfields as named ones, which is wrong, they are actually padding. THe following patch fixes that. 2023-03-14 Jakub Jelinek <jakub@redhat.com> PR c++/109096 * tree.c (record_has_unique_obj_representations): Ignore unnamed bitfields. * g++.dg/cpp1z/has-unique-obj-representations3.C: New test. (cherry picked from commit c35cf160a0ed81570cff6600dba465cf95fa80fa)
2023-05-03c++: Don't clear TREE_READONLY for -fmerge-all-constants for non-aggregates ↵Jakub Jelinek2-2/+18
[PR107558] The following testcase ICEs, because OpenMP lowering for shared clause on l variable with REFERENCE_TYPE creates POINTER_TYPE to REFERENCE_TYPE. The reason is that the automatic variable has non-trivial construction (reference to a lambda) and -fmerge-all-constants is on and so TREE_READONLY isn't set - omp-low will handle automatic TREE_READONLY vars in shared specially and only copy to the construct and not back, while !TREE_READONLY are assumed to be changeable. The PR91529 change rationale was that the gimplification can change some non-addressable automatic variables to TREE_STATIC with -fmerge-all-constants and therefore TREE_READONLY on them is undesirable. But, the gimplifier does that only for aggregate variables: switch (TREE_CODE (type)) { case RECORD_TYPE: case UNION_TYPE: case QUAL_UNION_TYPE: case ARRAY_TYPE: and not for anything else. So, I think clearing TREE_READONLY for automatic integral or reference or pointer etc. vars for -fmerge-all-constants only is unnecessary. 2023-03-10 Jakub Jelinek <jakub@redhat.com> PR c++/107558 * decl.c (cp_finish_decl): Don't clear TREE_READONLY on automatic non-aggregate variables just because of -fmerge-all-constants. * g++.dg/gomp/pr107558.C: New test. (cherry picked from commit 60b6f5c0a334db3f8f6dffaf0b9aab42fd5c54a2)
2023-05-03c-family: Incremental fix for -Wsign-compare BIT_NOT_EXPR handling [PR107465]Jakub Jelinek2-28/+30
There can be too many extensions and seems I didn't get everything right in the previously posted patch. The following incremental patch ought to fix that. The code can deal with quite a few sign/zero extensions at various spots and it is important to deal with all of them right. On the argument that contains BIT_NOT_EXPR we have: MSB bits#4 bits#3 BIT_NOT_EXPR bits#2 bits#1 LSB where bits#1 is one or more bits (TYPE_PRECISION (TREE_TYPE (arg0)) at the end of the function) we don't know anything about, for the purposes of this warning it is VARYING that is inverted with BIT_NOT_EXPR to some other VARYING bits; bits#2 is one or more bits (TYPE_PRECISION (TREE_TYPE (op0)) - TYPE_PRECISION (TREE_TYPE (arg0)) at the end of the function) which are known to be 0 before the BIT_NOT_EXPR and 1 after it. bits#3 is zero or more bits from the TYPE_PRECISION (TREE_TYPE (op0)) at the end of function to the TYPE_PRECISION (TREE_TYPE (op0)) at the end of the function to TYPE_PRECISION (TREE_TYPE (op0)) at the start of the function, which are either zero extension or sign extension. And bits#4 is zero or more bits from the TYPE_PRECISION (TREE_TYPE (op0)) at the start of the function to TYPE_PRECISION (result_type), which again can be zero or sign extension. Now, vanilla trunk as well as the previously posted patch mishandles the case where bits#3 are sign extended (as bits#2 are known to be all set, that means bits#3 are all set too) but bits#4 are zero extended and are thus all 0. The patch fixes it by tracking the lowest bit which is known to be clear above the known to be set bits (if any, otherwise it is precision of result_type). 2023-03-04 Jakub Jelinek <jakub@redhat.com> PR c/107465 * c-warn.c (warn_for_sign_compare): Don't warn for unset bits above innermost zero extension of BIT_NOT_EXPR result. * c-c++-common/Wsign-compare-2.c (f18): New test. (cherry picked from commit 3ec9a8728086ad86a2d421e067329f305f40e005)
2023-05-03c-family: Fix up -Wsign-compare BIT_NOT_EXPR handling [PR107465]Jakub Jelinek3-31/+184
The following patch fixes multiple bugs in warn_for_sign_compare related to the BIT_NOT_EXPR related warnings. My understanding is that what those 3 warnings are meant to warn (since 1995 apparently) is the case where we have BIT_NOT_EXPR of a zero-extended value, so in result_type the value is something like: 0b11111111XXXXXXXX (e.g. ~ of a 8->16 bit zero extension) 0b000000000000000011111111XXXXXXXX (e.g. ~ of a 8->16 bit zero extension then zero extended to 32 bits) 0b111111111111111111111111XXXXXXXX (e.g. ~ of a 8->16 bit zero extension then sign extended to 32 bits) and the intention of the warning is to warn when this is compared against something that has some 0 bits at the place where the above has guaranteed 1 bits, either ensured through comparison against constant where we know the bits exactly, or through zero extension from some narrower type where again we know at least some upper bits are zero extended. The bugs in the warning code are: 1) misunderstanding of the {,c_common_}get_narrower APIs - the unsignedp it sets is only meaningful if the function actually returns something narrower (in that case it says whether the narrower value is then sign (0) or zero (1) extended to the originally passed value. Though op0 or op1 at this point might be already narrower than result_type, and if the function doesn't return anything narrower, it all depends on whether the passed in op{0,1} had TYPE_UNSIGNED type or not 2) the code didn't check at all whether the BIT_NOT_EXPR operand was actually zero extended (i.e. that it was narrower and unsignedp was set to 1 for it), all it did is check that unsignedp from the call was 1. But that isn't well defined thing, if the argument is returned as is, the function sets unsignedp to 0, but if there is e.g. a useless cast to the same or compatible type in between, it can return 1 if the cast is unsigned; now, if BIT_NOT_EXPR operand is not zero extended, we know nothing at all about any bits in the operand containing BIT_NOT_EXPR, so there is nothing to warn about 3) the code was actually testing both operands after calling c_common_get_narrower on them and on the one with BIT_NOT_EXPR again for constants; I think that is just wrong in case the BIT_NOT_EXPR operand wouldn't be fully folded, the warning makes sense only if the other operand not having BIT_NOT_EXPR in it is constant 4) as can be seen from the above bit pattern examples, the upper bits above (in the patch arg0) aren't always all 1s, there could be some zero extension above it and from it one would have 0s, so that needs to be taken into account for the choice which constant bits to test for being always set otherwise warning is emitted, or for the zero extension guaranteed zero bits 5) the patch also simplifies the handling, we only do it if one but not both operands are BIT_NOT_EXPR after first {,c_common_}get_narrower, so we can just use std::swap to ensure it is the first one 6) the code compared bits against HOST_BITS_PER_LONG, which made sense back in 1995 when the values were stored into long, but now that they are HOST_WIDE_INT should test HOST_BITS_PER_WIDE_INT (or we could rewrite the stuff to wide_int, not done in the patch) 2023-03-04 Jakub Jelinek <jakub@redhat.com> PR c/107465 * c-warn.c (warn_for_sign_compare): If c_common_get_narrower doesn't return a narrower result, use TYPE_UNSIGNED to set unsignedp0 and unsignedp1. For the one BIT_NOT_EXPR case vs. one without, only check for constant in the non-BIT_NOT_EXPR operand, use std::swap to simplify the code, only warn if BIT_NOT_EXPR operand is extended from narrower unsigned, fix up computation of mask for the constant cases and for unsigned other operand case handle differently BIT_NOT_EXPR result being sign vs. zero extended. * c-c++-common/Wsign-compare-2.c: New test. * c-c++-common/pr107465.c: New test. (cherry picked from commit daaf74a714c41c8dbaf9954bcc58462c63062b4f)
2023-05-03diagnostics: Fix up selftests with $COLUMNS < 42 [PR108973]Jakub Jelinek1-0/+1
As mentioned in the PR, GCC's diagnostics self-tests fail if $COLUMNS < 42. Guarding each self-test with if (get_terminal_width () > 41) or similar would be a maintainance nightmare (PR has a patch to do so without reformatting to make it work for $COLUMNS in [30, 41] inclusive, but I'm afraid going down to $COLUMNS 1 would mean marking everything). Furthermore, the self-tests don't really emit stuff to the terminal, but into a buffer, so using get_terminal_width () for it seems inappropriate. The following patch makes sure test_diagnostic_context constructor uses exactly 80 columns wide caret max width, of course some tests override it already if they want to test for behavior in narrower cases. 2023-03-04 Jakub Jelinek <jakub@redhat.com> PR testsuite/108973 * selftest-diagnostic.c (test_diagnostic_context::test_diagnostic_context): Set caret_max_width to 80. (cherry picked from commit 739e7ebb3d378ece25d64b39baae47c584253498)
2023-05-03libquadmath: Assorted libquadmath strtoflt128 fixes [PR87204, PR94756]Jakub Jelinek1-12/+26
This patch cherry-pickx 8 commits from glibc which fix various strtod_l bugs. 2023-03-03 niXman <i.nixman@autistici.org> Jakub Jelinek <jakub@redhat.com> PR libquadmath/87204 PR libquadmath/94756 * strtod/strtod_l.c (round_and_return): Cherry-pick glibc 9310c284ae9 BZ #16151, 4406c41c1d6 BZ #16965 and fcd6b5ac36a BZ #23279 fixes. (____STRTOF_INTERNAL): Cherry-pick glibc b0debe14fcf BZ #23007, 5556d30caee BZ #18247, 09555b9721d and c6aac3bf366 BZ #26137 and d84f25c7d87 fixes. (cherry picked from commit df63f4162c78ef799d4ea9dec3443d5e9c51e5aa)
2023-05-03c++, debug: Fix up locus of DW_TAG_imported_module [PR108716]Jakub Jelinek2-0/+16
Before IMPORTED_DECL has been introduced in PR37410, we used to emit correct DW_AT_decl_line on DW_TAG_imported_module on the testcase below, after that change we haven't emitted it at all for a while and after some time started emitting incorrect locus, in particular the location of } closing the function. The problem is that while we have correct EXPR_LOCATION on the USING_STMT, when genericizing that USING_STMT into IMPORTED_DECL we don't copy the location to DECL_SOURCE_LOCATION, so it gets whatever input_location happens to be when it is created. 2023-03-02 Jakub Jelinek <jakub@redhat.com> PR debug/108716 * cp-gimplify.c (cp_genericize_r) <case USING_STMT>: Set DECL_SOURCE_LOCATION on IMPORTED_DECL to expression location of USING_STMT or input_location. * g++.dg/debug/dwarf2/pr108716.C: New test. (cherry picked from commit 4d82022bfd15d36717bf60a11e75e9ea02204269)
2023-05-03cgraphclones: Don't share DECL_ARGUMENTS between thunk and its artificial ↵Jakub Jelinek2-1/+48
thunk [PR108854] The following testcase ICEs on x86_64-linux with -m32. The problem is we create an artificial thunk and because of -fPIC, ia32 and thunk destination which doesn't bind locally can't use a mi thunk. The ICE is because during expansion to RTL we see SSA_NAME for a PARM_DECL, but the PARM_DECL doesn't have DECL_CONTEXT of the current function. This is because duplicate_thunk_for_node creates a new DECL_ARGUMENTS chain only if some arguments need modification. The following patch fixes it by copying the DECL_ARGUMENTS list even if the arguments can stay as is, to update DECL_CONTEXT on them. While for mi thunks it doesn't really matter because we don't use those arguments in any way, for other thunks it is important. 2023-02-23 Jakub Jelinek <jakub@redhat.com> PR middle-end/108854 * cgraphclones.c (duplicate_thunk_for_node): If no parameter changes are needed, copy at least DECL_ARGUMENTS PARM_DECL nodes and adjust their DECL_CONTEXT. * g++.dg/opt/pr108854.C: New test. (cherry picked from commit 2f1691be517fcdcabae9cd671ab511eb0e08b1d5)
2023-05-03i386: Fix up builtins used in avx512bf16vlintrin.h [PR108881]Jakub Jelinek2-18/+32
The builtins used in avx512bf16vlintrin.h implementation need both avx512bf16 and avx512vl ISAs, which the header ensures for them, but the builtins weren't actually requiring avx512vl, so when used by hand with just -mavx512bf16 -mno-avx512vl it resulted in ICEs. Fixed by adding OPTION_MASK_ISA_AVX512VL to their BDESC. 2023-02-24 Jakub Jelinek <jakub@redhat.com> PR target/108881 * config/i386/i386-builtin.def (__builtin_ia32_cvtne2ps2bf16_v16hi, __builtin_ia32_cvtne2ps2bf16_v16hi_mask, __builtin_ia32_cvtne2ps2bf16_v16hi_maskz, __builtin_ia32_cvtne2ps2bf16_v8hi, __builtin_ia32_cvtne2ps2bf16_v8hi_mask, __builtin_ia32_cvtne2ps2bf16_v8hi_maskz, __builtin_ia32_cvtneps2bf16_v8sf_mask, __builtin_ia32_cvtneps2bf16_v8sf_maskz, __builtin_ia32_cvtneps2bf16_v4sf_mask, __builtin_ia32_cvtneps2bf16_v4sf_maskz, __builtin_ia32_dpbf16ps_v8sf, __builtin_ia32_dpbf16ps_v8sf_mask, __builtin_ia32_dpbf16ps_v8sf_maskz, __builtin_ia32_dpbf16ps_v4sf, __builtin_ia32_dpbf16ps_v4sf_mask, __builtin_ia32_dpbf16ps_v4sf_maskz): Require also OPTION_MASK_ISA_AVX512VL. * gcc.target/i386/avx512bf16-pr108881.c: New test. (cherry picked from commit 0ccfa3884f638816af0f5a3f0ee2695e0771ef6d)
2023-05-03libgomp: Fix up some typos in libgomp.texiJakub Jelinek1-7/+7
I decided to check for repeated the the in libgomp and noticed there are several occurrences of a typo theads rather than threads in libgomp.texi. 2023-02-16 Jakub Jelinek <jakub@redhat.com> * libgomp.texi: Fix typos - theads -> threads. (cherry picked from commit 0b9bd33d69d0c30330a465e6bad262d90c94d4ea)
2023-05-03i386: Call get_available_features for all CPUs with max_level >= 1 [PR100758]Jakub Jelinek1-4/+3
get_available_features doesn't depend on cpu_model2->__cpu_{family,model} and just sets stuff up based on CPUID leaf 1, or some extended ones, so I wonder why are we calling it separately for Intel, AMD and Zhaoxin and not for all other CPUs too? I think various programs in the wild which aren't using __builtin_cpu_{is,supports} just check the various CPUID leafs and query bits in there, without blacklisting unknown CPU vendors, so I think even __builtin_cpu_supports ("sse2") etc. should be reliable if those VENDOR_{CENTAUR,CYRIX,NSC,OTHER} CPUs set those bits in CPUID leaf 1 or some extended ones. Calling it for all CPUs also means it can be inlined because there will be just a single caller. I have tested it on Intel and Martin tested it on AMD, but can't test it on non-Intel/AMD; for Intel/AMD/Zhaoxin it should be really no change in behavior. 2023-02-09 Jakub Jelinek <jakub@redhat.com> PR target/100758 * config/i386/cpuinfo.c (cpu_indicator_init): Call get_available_features for all CPUs with max_level >= 1, rather than just Intel or AMD. (cherry picked from commit b24e9c083093a9e1b1007933a184c02f7ff058db)
2023-05-03c++: Handle structured bindings like anon unions in initializers [PR108474]Jakub Jelinek3-2/+72
As reported by Andrew Pinski, structured bindings (with the exception of the ones using std::tuple_{size,element} and get which are really standalone variables in addition to the binding one) also use DECL_VALUE_EXPR and needs the same treatment in static initializers. On Sun, Jan 22, 2023 at 07:19:07PM -0500, Jason Merrill wrote: > Though, actually, why not instead fix expand_expr_real_1 (and staticp) to > look through DECL_VALUE_EXPR? Doing it when emitting the initializers seems to be too late to me, we in various spots try to put parts of the static var DECL_INITIAL expressions into the IL, or e.g. for varpool purposes remember which vars are referenced there. This patch moves it to record_reference, which is called from varpool_node::analyze and so about the same time as gimplification of the bodies which also replaces DECL_VALUE_EXPRs. 2023-01-24 Jakub Jelinek <jakub@redhat.com> PR c++/108474 * cp-gimplify.c (cp_fold_r): Handle structured bindings vars like anon union artificial vars. * g++.dg/cpp1z/decomp57.C: New test. * g++.dg/cpp1z/decomp58.C: New test. (cherry picked from commit b84e21115700523b4d0ac44275443f7b9c670344)
2023-05-03c++: Avoid incorrect shortening of divisions [PR108365]Jakub Jelinek3-2/+28
The following testcase is miscompiled, because we shorten the division in a case where it should not be shortened. Divisions (and modulos) can be shortened if it is unsigned division/modulo, or if it is signed division/modulo where we can prove the dividend will not be the minimum signed value or divisor will not be -1, because e.g. on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fffffff targets (-2147483647 - 1) / -1 is UB but (int) (-2147483648LL / -1LL) is not, it is -2147483648. The primary aim of both the C and C++ FE division/modulo shortening I assume was for the implicit integral promotions of {,signed,unsigned} {char,short} and because at this point we have no VRP information etc., the shortening is done if the integral promotion is from unsigned type for the divisor or if the dividend is an integer constant other than -1. This works fine for char/short -> int promotions when char/short have smaller precision than int - unsigned char -> int or unsigned short -> int will always be a positive int, so never the most negative. Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either the same as orig_op0 or that promoted to int, I think that works fine, if it isn't promoted, either the division/modulo common type will have the same precision as op0 but then the division/modulo is unsigned and so without UB, or it will be done in wider precision (e.g. because op1 has wider precision), but then op0 can't be minimum signed value. Or it has been promoted to int, but in that case it was again from narrower type and so never minimum signed int. But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED. First of all, not sure if the operand of NOP_EXPR couldn't be non-integral type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly, even if it is a cast from unsigned integral type, we only know it can't be minimum signed value if it is a widening cast, if it is same precision or narrowing cast, we know nothing. So, the following patch for the NOP_EXPR cases checks just in case that it is from integral type and more importantly checks it is a widening conversion. 2023-01-14 Jakub Jelinek <jakub@redhat.com> PR c++/108365 * typeck.c (cp_build_binary_op): For integral division or modulo, shorten if type0 is unsigned, or op0 is cast from narrower unsigned integral type or stripped_op1 is INTEGER_CST other than -1. * g++.dg/opt/pr108365.C: New test. * g++.dg/warn/pr108365.C: New test. (cherry picked from commit 5b3a88640f962d4ffca31ae651bed2d8672f1a8c)
2023-05-03match.pd: When simplifying BFR of an insert, require a mode precision ↵Andrew Pinski2-1/+18
integral type [PR108688] The same problem as PR 88739 has crept in but this time in match.pd when simplifying bit_field_ref of an bit_insert. That is we are generating a BIT_FIELD_REF of a non-mode-precision integral type. PR tree-optimization/108688 * match.pd (bit_field_ref [bit_insert]): Avoid generating BIT_FIELD_REFs of non-mode-precision integral operands. * gcc.c-torture/compile/pr108688-1.c: New test. (cherry picked from commit 44f308e59bfa0f93ae05b17e257d8563c12399fd)
2023-05-03fortran: Fix up hash table usage in gfc_trans_use_stmts [PR108451]Jakub Jelinek1-1/+5
The first testcase in the PR (which I haven't included in the patch because it is unclear to me if it is supposed to be valid or not) ICEs since extra hash table checking has been added recently. The problem is that gfc_trans_use_stmts does tree *slot = entry->decls->find_slot_with_hash (rent->use_name, hash, INSERT); if (*slot == NULL) and later on doesn't store anything into *slot and continues. Another spot a few lines later correctly clears the slot if it decides not to use the slot, so the following patch does the same. 2023-02-03 Jakub Jelinek <jakub@redhat.com> PR fortran/108451 * trans-decl.c (gfc_trans_use_stmts): Call clear_slot before doing continue. (cherry picked from commit 76f7f0eddcb7c418d1ec3dea3e2341ca99097301)
2023-05-03nested, openmp: Wrap OMP_CLAUSE_*_GIMPLE_SEQ into GIMPLE_BIND for ↵Jakub Jelinek2-16/+34
declare_vars [PR108435] When gimplifying OMP_CLAUSE_{LASTPRIVATE,LINEAR}_STMT, we wrap it always into a GIMPLE_BIND, but when putting statements directly into OMP_CLAUSE_{LASTPRIVATE,LINEAR}_GIMPLE_SEQ, we do it only if needed (there are any temporaries that need to be declared in the sequence). convert_nonlocal_omp_clauses was relying on the GIMPLE_BIND to be there always because it called declare_vars on it. The following patch wraps it into GIMPLE_BIND in tree-nested if we need to declare_vars on it on demand. 2023-02-02 Jakub Jelinek <jakub@redhat.com> PR middle-end/108435 * tree-nested.c (convert_nonlocal_omp_clauses) <case OMP_CLAUSE_LASTPRIVATE>: If info->new_local_var_chain and *seq is not a GIMPLE_BIND, wrap the sequence into a new GIMPLE_BIND before calling declare_vars. (convert_nonlocal_omp_clauses) <case OMP_CLAUSE_LINEAR>: Merge with the OMP_CLAUSE_LASTPRIVATE handling except for whether seq is initialized to &OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (clause) or &OMP_CLAUSE_LINEAR_GIMPLE_SEQ (clause). * gcc.dg/gomp/pr108435.c: New test. (cherry picked from commit 0f349928e16fdc7dba52561e8d40347909f9f0ff)
2023-05-03ree: Fix -fcompare-debug issues in combine_reaching_defs [PR108573]Jakub Jelinek2-2/+22
The PR78437 r7-4871 changes made combine_reaching_defs punt on WORD_REGISTER_OPERATIONS targets if a setter of smaller than word register has wider uses. This unfortunately breaks -fcompare-debug, because if such a use appears only in DEBUG_INSN(s), while all other uses aren't wider than the setter, we can REE optimize it without -g and not with -g. Such decisions shouldn't be based on debug instructions. We could try to reset them or adjust in some other way after we decide to perform the change, but at least on the testcase which used to fail on riscv64-linux the (debug_insn 8 7 9 2 (var_location:HI s (minus:HI (subreg:HI (and:DI (reg:DI 10 a0 [160]) (const_int 1 [0x1])) 0) (subreg:HI (ashiftrt:DI (reg/v:DI 9 s1 [orig:151 l ] [151]) (debug_expr:SI D#1)) 0))) "pr108573.c":12:5 -1 (nil)) clearly doesn't care about the upper bits and I have hard time imaging how could one end up with DEBUG_INSN which actually cares about those upper bits. So, the following patch just ignores uses on DEBUG_INSNs in this case, if we run into something where we'd need to do something further later on, let's deal with it when we have a testcase for it. 2023-02-01 Jakub Jelinek <jakub@redhat.com> PR debug/108573 * ree.c (combine_reaching_defs): Don't return false for paradoxical subregs in DEBUG_INSNs. * gcc.dg/pr108573.c: New test. (cherry picked from commit e4473d7cf871c8ddf8f22d105c5af6375ebe37bf)
2023-05-03c++, openmp: Handle some OMP_*/OACC_* constructs during constant expression ↵Jakub Jelinek2-0/+77
evaluation [PR108607] While potential_constant_expression_1 handled most of OMP_* codes (by saying that they aren't potential constant expressions), OMP_SCOPE was missing in that list. I've also added OMP_SCAN, though that is less important (similarly to OMP_SECTION it ought to appear solely inside of OMP_{FOR,SIMD} resp. OMP_SECTIONS). As the testcase shows, it isn't enough, potential_constant_expression_1 can catch only some cases, as soon as one uses switch or ifs where at least one of the possible paths could be constant expression, we can run into the same codes during cxx_eval_constant_expression, so this patch handles those there as well. 2023-02-01 Jakub Jelinek <jakub@redhat.com> PR c++/108607 * constexpr.c (cxx_eval_constant_expression): Handle OMP_* and OACC_* constructs as non-constant. (potential_constant_expression_1): Handle OMP_SCAN. * g++.dg/gomp/pr108607.C: New test. (cherry picked from commit bfc070595bfb00abef88a002eee5d9117f5b86a7)
2023-05-03bbpart: Fix up ICE on asm goto [PR108596]Jakub Jelinek2-1/+46
On the following testcase we have asm goto in hot block with 2 successors, one cold to which it both falls through and has one of the label pointing to it and another hot successor with another label. Now, during bbpart we want to ensure that no blocks from one partition fall through into a block in a different partition. fix_up_fall_thru_edges does that by temporarily clearing the EDGE_CROSSING on the fallthrough edge, calling force_nonfallthru and then depending on whether it created a new bb either set EDGE_CROSSING on the single successor edge from the new bb (the new bb is kept in the same partition as the predecessor block), or if no new bb has been created setting EDGE_CROSSING back on the fallthru edge which has been forced non-EDGE_FALLTHRU. For asm goto this doesn't always work, force_nonfallthru can create a new bb and change the fallthrough edge to point to that, but if the original fallthru destination block has its label referenced among the asm goto labels, it will create a new non-fallthru edge for the label(s). But because we've temporarily cheated and cleared EDGE_CROSSING on the edge, it is cleared on the new edge as well, then the caller sees we've created a new bb and just sets EDGE_CROSSING on the single fallthru edge from the new bb. But the direct edge from cur_bb to fallthru edge's destination isn't handled and fails afterwards consistency checks, because it crosses partitions. The following patch notes the case and sets EDGE_CROSSING on that edge too. 2023-01-31 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/108596 * bb-reorder.c (fix_up_fall_thru_edges): Handle the case where cur_bb ends with asm goto and has a crossing fallthrough edge to the same bb that contains at least one of its labels by restoring EDGE_CROSSING flag even on possible edge from cur_bb to new_bb successor. * gcc.c-torture/compile/pr108596.c: New test. (cherry picked from commit 603a6fbcaac1e80aa90d1d26318c881a53473066)
2023-05-03doc: Fix up return type of __builtin_va_arg_pack_len [PR108560]Jakub Jelinek1-1/+1
__builtin_va_arg_pack_len as implemented returned int since its introduction in 2007. The initial documentation didn't mention any return type, which changed in 2010 in r0-103077-gab940b73bfabe2cec4 during some documentation formatting cleanups https://gcc.gnu.org/legacy-ml/gcc-patches/2010-09/msg01632.html I can understand that for formatting some type was needed there but what exactly hasn't been really discussed. So, I think we should change documentation to match the implementation, rather than change implementation to match the documentation. Most people don't use more than 2147483647 arguments to inline functions, and on poor targets with 16-bit ints I bet even having more than 65535 arguments to inline functions would be highly unexpected. 2023-01-27 Jakub Jelinek <jakub@redhat.com> PR other/108560 * doc/extend.texi: Fix up return type of __builtin_va_arg_pack_len from size_t to int. (cherry picked from commit 16f30680f403891556da2ad6329fcef9dc9b47db)
2023-05-03options: fix cl_target_option_print_diff() with stringsEric Biggers1-1/+1
Fix an obvious copy-and-paste error where ptr1 was used instead of ptr2. This bug caused the dump file produced by -fdump-ipa-inline-details to not correctly show the difference in target options when a function could not be inlined due to a target option mismatch. gcc/ChangeLog: PR bootstrap/90543 * optc-save-gen.awk: Fix copy-and-paste error. Signed-off-by: Eric Biggers <ebiggers@google.com> (cherry picked from commit 9f0cb3368af735e95776769c4f28fa9cbb60eaf8)
2023-05-03c++: Fix up handling of references to anon union members in initializers ↵Jakub Jelinek2-10/+60
[PR53932] For anonymous union members we create artificial VAR_DECLs which have DECL_VALUE_EXPR for the actual COMPONENT_REF. That works just fine inside of functions (including global dynamic constructors), because during gimplification such VAR_DECLs are gimplified as their DECL_VALUE_EXPR. This is also done during regimplification. But references to these artificial vars in DECL_INITIAL expressions aren't ever replaced by the DECL_VALUE_EXPRs, so we end up either with link failures like on the testcase below, or worse ICEs with LTO. The following patch fixes those during cp_fully_fold_init where we already walk all the trees (!data->genericize means that function rather than cp_fold_function). 2023-01-19 Jakub Jelinek <jakub@redhat.com> PR c++/53932 * cp-gimplify.c (cp_fold_r): During cp_fully_fold_init replace DECL_ANON_UNION_VAR_P VAR_DECLs with their corresponding DECL_VALUE_EXPR. * g++.dg/init/pr53932.C: New test. (cherry picked from commit 9b9a989adc042b304572fd6d4ade46b47be6ccb8)
2023-05-03fortran: Fix up function types for realloc and sincos{,f,l} builtins [PR108349]Jakub Jelinek1-18/+20
As reported in the PR, the FUNCTION_TYPE for __builtin_realloc in the Fortran FE is wrong since r0-100026-gb64fca63690ad which changed -  tmp = tree_cons (NULL_TREE, pvoid_type_node, void_list_node); -  tmp = tree_cons (NULL_TREE, size_type_node, tmp); -  ftype = build_function_type (pvoid_type_node, tmp); +  ftype = build_function_type_list (pvoid_type_node, +                                    size_type_node, pvoid_type_node, +                                    NULL_TREE);    gfc_define_builtin ("__builtin_realloc", ftype, BUILT_IN_REALLOC,                       "realloc", false); The return type is correct, void *, but the first argument should be void * too and only second one size_t, while the above change changed realloc to be void *__builtin_realloc (size_t, void *); I went through all other changes from that commit and found that __builtin_sincos{,f,l} got broken as well, instead of the former void __builtin_sincos{,f,l} (ftype, ftype *, ftype *); where ftype is {double,float,long double} it is now incorrectly void __builtin_sincos{,f,l} (ftype *, ftype *); The following patch fixes that, plus some formatting issues around the spots I've changed. 2023-01-11 Jakub Jelinek <jakub@redhat.com> PR fortran/108349 * f95-lang.c (gfc_init_builtin_function): Fix up function types for BUILT_IN_REALLOC and BUILT_IN_SINCOS{F,,L}. Formatting fixes. (cherry picked from commit 0986c351aa8a9f08b3cb614baec13564dd62c114)
2023-05-03generic-match-head: Don't assume GENERIC folding is done only early [PR108237]Jakub Jelinek2-1/+17
We ICE on the following testcase, because a valid V2DImode != comparison is folded into an unsupported V2DImode > comparison. The match.pd pattern which does this looks like: /* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z where ~Y + 1 == pow2 and Z = ~Y. */ (for cst (VECTOR_CST INTEGER_CST) (for cmp (eq ne) icmp (le gt) (simplify (cmp (bit_and:c@2 @0 cst@1) integer_zerop) (with { tree csts = bitmask_inv_cst_vector_p (@1); } (if (csts && (VECTOR_TYPE_P (TREE_TYPE (@1)) || single_use (@2))) (with { auto optab = VECTOR_TYPE_P (TREE_TYPE (@1)) ? optab_vector : optab_default; tree utype = unsigned_type_for (TREE_TYPE (@1)); } (if (target_supports_op_p (utype, icmp, optab) || (optimize_vectors_before_lowering_p () && (!target_supports_op_p (type, cmp, optab) || !target_supports_op_p (type, BIT_AND_EXPR, optab)))) (if (TYPE_UNSIGNED (TREE_TYPE (@1))) (icmp @0 { csts; }) (icmp (view_convert:utype @0) { csts; }))))))))) and that optimize_vectors_before_lowering_p () guarded stuff there already deals with this problem, not trying to fold a supported comparison into a non-supported one. The reason it doesn't work in this case is that it isn't GIMPLE folding which does this, but GENERIC folding done during forwprop4 - forward_propagate_into_comparison -> forward_propagate_into_comparison_1 -> combine_cond_expr_cond -> fold_binary_loc -> generic_simplify and we simply assumed that GENERIC folding happens only before gimplification. The following patch fixes that by checking cfun properties instead of always returning true in those cases. 2023-01-04 Jakub Jelinek <jakub@redhat.com> PR middle-end/108237 * generic-match-head.c: Include tree-pass.h. (canonicalize_math_p): Define to false if cfun and cfun->curr_properties has PROP_gimple_opt_math resp. PROP_gimple_lvec property set. * gcc.c-torture/compile/pr108237.c: New test. (cherry picked from commit 345dffd0d4ebff7e705dfff1a8a72017a167120a)
2023-05-03tree-ssa-dom: can_infer_simple_equiv fixes [PR108068]Jakub Jelinek4-6/+50
As reported in the PR, tree-ssa-dom.cc uses real_zerop call to find if a floating point constant is zero and it shouldn't try to infer equivalences from comparison against it if signed zeros are honored. This doesn't work at all for decimal types, because real_zerop always returns false for them (one can have different representations of decimal zero beyond -0/+0), and it doesn't work for vector compares either, as real_zerop checks if all elements are zero, while we need to avoid infering equivalences from comparison against vector constants which have at least one zero element in it (if signed zeros are honored). Furthermore, as mentioned by Joseph, for decimal types many other values aren't singleton. So, this patch stops infering anything if element mode is decimal, and otherwise uses instead of real_zerop a new function, real_maybe_zerop, which will work even for decimal types and for complex or vector will return true if any element is or might be zero (so it returns true for anything but constants for now). 2022-12-23 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108068 * tree.h (real_maybe_zerop): Declare. * tree.c (real_maybe_zerop): Define. * tree-ssa-dom.c (record_edge_info): Use it instead of real_zerop or TREE_CODE (op1) == SSA_NAME || real_zerop. Always set can_infer_simple_equiv to false for decimal floating point types. * gcc.dg/dfp/pr108068.c: New test. (cherry picked from commit fd1b0aefda5b65f3f841ca6e61ccea6a72daa060)
2023-05-03cse: Fix up CSE const_anchor handling [PR108193]Jakub Jelinek2-5/+29
The following testcase ICEs on aarch64, because insert_const_anchor inserts invalid CONST_INT into the CSE tables - 0x80000000 for SImode. The second hunk of the patch fixes that, the first one is to avoid triggering undefined behavior at compile time during compute_const_anchors computations - performing those additions and subtractions in HOST_WIDE_INT means it can overflow for certain constants. 2022-12-22 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/108193 * cse.c (compute_const_anchors): Change n type to unsigned HOST_WIDE_INT, adjust comparison against it to avoid warnings. Formatting fix. (insert_const_anchor): Use gen_int_mode instead of GEN_INT. * gfortran.dg/pr108193.f90: New test. (cherry picked from commit 0cb5d7cdbab8e5f8359764ef5f62d93c2bc88552)
2023-05-03openmp: Don't try to destruct DECL_OMP_PRIVATIZED_MEMBER vars [PR108180]Jakub Jelinek2-0/+60
DECL_OMP_PRIVATIZED_MEMBER vars are artificial vars with DECL_VALUE_EXPR of this->field used just during gimplification and omp lowering/expansion to privatize individual fields in methods when needed. As the following testcase shows, when not in templates, they were handled right, but in templates we actually called cp_finish_decl on them and that can result in their destruction, which is obviously undesirable, we should only destruct the privatized copies of them created in omp lowering. Fixed thusly. 2022-12-21 Jakub Jelinek <jakub@redhat.com> PR c++/108180 * pt.c (tsubst_expr): Don't call cp_finish_decl on DECL_OMP_PRIVATIZED_MEMBER vars. * testsuite/libgomp.c++/pr108180.C: New test. (cherry picked from commit 1119902b6c7c1c50123ed85ec1def8be4772d68c)
2023-05-03testsuite: Fix up pr64536.c for LLP64 targets [PR108151]Jakub Jelinek1-2/+2
Apparently llp64 had 2 further warnings, fixed thusly. 2022-12-19 Jakub Jelinek <jakub@redhat.com> PR testsuite/108151 * gcc.dg/pr64536.c (bar): Cast long to __INTPTR_TYPE__ before casting to long *. (cherry picked from commit 6e85f89a7d59a99a3395b6e153b99262a58b2f6c)
2023-05-03testsuite: Fix up pr64536.c for LLP64 targets [PR108151]Jakub Jelinek1-2/+2
The test casts a pointer to long, which is ok for ilp32 and lp64 targets but not for llp64 targets. Nothing reads the values later, it is a link test, so all we care about is that it is the same cast on s390x-linux where it used to fail before the PR64536 fix, and that we don't warn about it. 2022-12-19 Jakub Jelinek <jakub@redhat.com> PR testsuite/108151 * gcc.dg/pr64536.c (bar): Use casts to __INTPTR_TYPE__ rather than long when casting pointer to integral type. (cherry picked from commit ea37e96a37b50dad17b91d46edc518bbb9132d8e)
2023-05-03loop-invariant: Split preheader edge if the preheader bb ends with jump ↵Jakub Jelinek2-0/+19
[PR106751] The RTL loop passes only request simple preheaders, but don't require fallthru preheaders, while move_invariant_reg apparently assumes the latter, that it can just append instruction(s) to the end of the preheader basic block. The following patch fixes that by splitting the preheader edge if the preheader bb ends with a JUMP_INSN (asm goto in this case). Without that we get control flow in the middle of a bb. 2022-12-16 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/106751 * loop-invariant.c (move_invariant_reg): If preheader bb ends with a JUMP_INSN, split the preheader edge and emit invariants into the new preheader basic block. * gcc.c-torture/compile/pr106751.c: New test. (cherry picked from commit ddcaa60983b50378bde1b7e327086fe0ce101795)
2023-05-03c++: Ensure !!var is not an lvalue [PR107065]Jakub Jelinek3-3/+24
The TRUTH_NOT_EXPR case in cp_build_unary_op is one of the spots where we somewhat fold immediately using invert_truthvalue_loc. I've tried using return build1_loc (location, TRUTH_NOT_EXPR, boolean_type_node, arg); in there instead, but unfortunately that regressed Wlogical-not-parentheses-*.c pr49706.c pr62199.c pr65120.c sequence-pt-1.C tests, so at least for backporting that doesn't seem to be a way to go. So, this patch instead wraps it into NON_LVALUE_EXPR if needed (which also need a tweak for some tests in the pr47906.c test, but nothing major), with the intent to make it backportable, and later I'll try to do further steps to avoid folding here prematurely. Most of the problems with build1 TRUTH_NOT_EXPR are that it doesn't even invert comparisons as most common case and lots of warning code isn't able to deal with ! around comparisons; so perhaps one way to do this would be fold by hand only invertable comparisons and for the rest create TRUTH_NOT_EXPR. 2022-12-15 Jakub Jelinek <jakub@redhat.com> PR c++/107065 gcc/cp/ * typeck.c (cp_build_unary_op) <case TRUTH_NOT_EXPR>: If invert_truthvalue_loc returns obvalue_p, wrap it into NON_LVALUE_EXPR. * parser.c (cp_parser_binary_expression): Don't call warn_logical_not_parentheses if current.lhs is a NON_LVALUE_EXPR of a decl with boolean type. gcc/testsuite/ * g++.dg/cpp0x/pr107065.C: New test. (cherry picked from commit 8b775b4c48a3cc4ef5c50e56144aea02da2e9cc6)
2023-05-03ivopts: Fix IP_END handling for asm goto [PR107997]Jakub Jelinek2-0/+30
The following testcase ICEs, because the latch bb ends with asm goto which has both fallthrough to the header and one or more labels in the header too. In that case there is just a single edge out of the latch block, but still the asm goto is stmt_ends_bb_p statement, yet ivopts decides to emit an IV bump at the IP_END position and inserts it into the same bb as the asm goto after it, which then fails verification (control flow in the middle of bb). The following patch fixes it by splitting the latch -> header edge in that case and inserting into the newly created bb, where split_edge -> redirect_edge_and_branch is able to deal with this case correctly. 2022-12-10 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/107997 * tree-ssa-loop-ivopts.c: Include cfganal.h. (create_new_iv) <case IP_END>: If ip_end_pos bb is non-empty and ends with a stmt which ends bb, instead of adding iv update after it split the latch edge and insert iterator into the new latch bb. * gcc.c-torture/compile/pr107997.c: New test. (cherry picked from commit 7676235f690e624b7ed41a22b22ce8ccfac1492f)
2023-05-03cfgbuild: Fix DEBUG_INSN handling in find_bb_boundaries [PR106719]Jakub Jelinek2-2/+60
The following testcase FAILs on aarch64-linux. We have some atomic instruction followed by 2 DEBUG_INSNs (if -g only of course) followed by NOTE_INSN_EPILOGUE_BEG followed by some USE insn. Now, split3 pass replaces the atomic instruction with a code sequence which ends with a conditional jump and the split3 pass calls find_many_sub_basic_blocks. For -g0, find_bb_boundaries sees the flow_transfer_insn (the new conditional jump), then NOTE_INSN_EPILOGUE_BEG which can live in between basic blocks and then the USE insn, so splits block after the NOTE_INSN_EPILOGUE_BEG and puts the NOTE in between the blocks. For -g, if sees a DEBUG_INSN after the flow_transfer_insn, so sets debug_insn to it, then walks over another DEBUG_INSN, NOTE_INSN_EPILOGUE_BEG until it finally sees the USE insn, and triggers the: rtx_insn *prev = PREV_INSN (insn); /* If the first non-debug inside_basic_block_p insn after a control flow transfer is not a label, split the block before the debug insn instead of before the non-debug insn, so that the debug insns are not lost. */ if (debug_insn && code != CODE_LABEL && code != BARRIER) prev = PREV_INSN (debug_insn); code I've added for PR81325. If there are only DEBUG_INSNs, that is the right thing to do, but if in between debug_insn and insn there are notes which can stay in between basic blocks or simnilarly JUMP_TABLE_DATA or their associated CODE_LABELs, it causes -fcompare-debug differences. The following patch fixes it by clearing debug_insn if JUMP_TABLE_DATA or associated CODE_LABEL is seen (I'm afraid there is no good answer what to do with DEBUG_INSNs before those; the code then removes them: /* Clean up the bb field for the insns between the blocks. */ for (x = NEXT_INSN (flow_transfer_insn); x != BB_HEAD (fallthru->dest); x = next) { next = NEXT_INSN (x); /* Debug insns should not be in between basic blocks, drop them on the floor. */ if (DEBUG_INSN_P (x)) delete_insn (x); else if (!BARRIER_P (x)) set_block_for_insn (x, NULL); } but if there are NOTEs, the patch just reorders the NOTEs and DEBUG_INSNs, such that the NOTEs come first (so that they stay in between basic blocks like with -g0) and DEBUG_INSNs after those (so that bb is split before them, so they will be in the basic block after NOTE_INSN_BASIC_BLOCK). 2022-12-08 Jakub Jelinek <jakub@redhat.com> PR debug/106719 * cfgbuild.c (find_bb_boundaries): If there are NOTEs in between debug_insn (seen after flow_transfer_insn) and insn, move NOTEs before all the DEBUG_INSNs and split after NOTEs. If there are other insns like jump table data, clear debug_insn. * gcc.dg/pr106719.c: New test. (cherry picked from commit d9f9d5d30feb33c359955d7030cc6be50ef6dc0a)
2023-05-03asan: Fix up error recovery for too large frames [PR107317]Jakub Jelinek2-0/+19
asan_emit_stack_protection and functions it calls have various asserts that verify sanity of the stack protection instrumentation. But, that verification can easily fail if we've diagnosed a frame offset overflow. asan_emit_stack_protection just emits some extra code in the prologue, if we've reported errors, we aren't producing assembly, so it doesn't really matter if we don't include the protection code, compilation is going to fail anyway. 2022-11-24 Jakub Jelinek <jakub@redhat.com> PR middle-end/107317 * asan.c: Include diagnostic-core.h. (asan_emit_stack_protection): Return NULL early if seen_error (). * gcc.dg/asan/pr107317.c: New test. (cherry picked from commit b6330a7685476fc30b8ae9bbf3fca1a9b0d4be95)
2023-05-03i386: Uglify some local identifiers in *intrin.h [PR107748]Jakub Jelinek1-6/+7
While reporting PR107748 (where is a problem with non-uglified names, but I've left it out because it needs fixing anyway), I've noticed various spots where identifiers in *intrin.h headers weren't uglified. The following patch fixed those that are related to unions (I've grepped for [a-zA-Z]\.[a-zA-Z] spots). The reason we need those to be uglified is the same as why the arguments of the inlines are __ prefixed and most of automatic vars in the inlines - say a, v or u aren't part of implementation namespace and so users could #define u whatever->something #include <x86intrin.h> and it should still work, as long as u is not e.g. one of the names of the functions/macros the header provides (_mm* etc.). 2022-11-21 Jakub Jelinek <jakub@redhat.com> PR target/107748 * config/i386/smmintrin.h (_mm_extract_ps): Uglify names of local variables and union members. (cherry picked from commit ec8ec09f9414be871e322fecf4ebf53e3687bd22)