aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-02-22Fortran/OpenMP: Fix mapping of array descriptors and deferred-length stringsTobias Burnus5-124/+248
Previously, array descriptors might have been mapped as 'alloc' instead of 'to' for 'alloc', not updating the array bounds. The 'alloc' could also appear for 'data exit', failing with a libgomp assert. In some cases, either array descriptors or deferred-length string's length variable was not mapped. And, finally, some offset calculations with array-sections mappings went wrong. The testcases contain some comment-out tests which require follow-up work and for which PR exist. Those mostly relate to deferred-length strings which have several issues beyong OpenMP support. This is the OG12 variant of the submitted but unreviewed GCC 13/mainline patch at https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612387.html gcc/fortran/ChangeLog: * trans-decl.cc (gfc_get_symbol_decl): Add attributes such as 'declare target' also to hidden artificial variable for deferred-length character variables. * trans-openmp.cc (gfc_trans_omp_array_section, gfc_trans_omp_clauses, gfc_trans_omp_target_exit_data): Improve mapping of array descriptors and deferred-length string variables. gcc/ChangeLog: * gimplify.cc (gimplify_scan_omp_clauses): Remove Fortran special case. libgomp/ChangeLog: * testsuite/libgomp.fortran/target-enter-data-3.f90: Uncomment 'target exit data'. * testsuite/libgomp.fortran/target-enter-data-4.f90: New test. * testsuite/libgomp.fortran/target-enter-data-5.f90: New test. * testsuite/libgomp.fortran/target-enter-data-6.f90: New test. * testsuite/libgomp.fortran/target-enter-data-7.f90: New test.
2023-02-22Fix: Fortran/OpenMP: align/allocator modifiers to the allocate clauseTobias Burnus2-8/+13
When merging r13-4584-gb2e1c49b4a4 to OG12 as commit 58e0579ed87, the 'align' handling seemingly ended up in the wrong clause. (Result: libgomp.fortran/allocate-2a.f90 FAILED; now fixed.) gcc/fortran/ * trans-openmp.cc (gfc_trans_omp_clauses): Move align modifier handling from OMP_LIST_ALLOCATOR to OMP_LIST_ALLOCATE.
2023-02-22Daily bump.GCC Administrator1-1/+1
2023-02-21Daily bump.GCC Administrator4-1/+39
2023-02-20c++: ICE with redundant capture [PR108829]Marek Polacek3-3/+28
Here we crash in is_capture_proxy: /* Location wrappers should be stripped or otherwise handled by the caller before using this predicate. */ gcc_checking_assert (!location_wrapper_p (decl)); We only crash with the redundant capture: int abyPage = [=, abyPage] { ... } because prune_lambda_captures is only called when there was a default capture, and with [=] only abyPage won't be in LAMBDA_EXPR_CAPTURE_LIST. The problem is that LAMBDA_CAPTURE_EXPLICIT_P wasn't propagated correctly and so var_to_maybe_prune proceeded where it shouldn't. Co-Authored by: Patrick Palka <ppalka@redhat.com> PR c++/108829 gcc/cp/ChangeLog: * pt.cc (prepend_one_capture): Set LAMBDA_CAPTURE_EXPLICIT_P. (tsubst_lambda_expr): Pass LAMBDA_CAPTURE_EXPLICIT_P to prepend_one_capture. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/lambda/lambda-108829-2.C: New test. * g++.dg/cpp0x/lambda/lambda-108829.C: New test. (cherry picked from commit 02d8ab3e4e2f3d9dc12157a98c976d6698e71e29)
2023-02-20aarch64: Fix up bfmlal lane pattern [PR104921]Alex Coplan4-1/+28
As the testcase shows, this pattern had an incorrect constraint leading to GCC's output getting rejected by the assembler. This patch fixes the constraint accordingly. The test is split into two: one that can run without bf16 support from the assembler and another that checks that the output actually assembles when such support is available. gcc/ChangeLog: PR target/104921 * config/aarch64/aarch64-simd.md (aarch64_bfmlal<bt>_lane<q>v4sf): Use correct constraint for operand 3. gcc/testsuite/ChangeLog: PR target/104921 * gcc.target/aarch64/pr104921-1.c: New test. * gcc.target/aarch64/pr104921-2.c: New test. * gcc.target/aarch64/pr104921.x: Include file for new tests. (cherry picked from commit 277e1f30a5e4e634304a7b8a532825119f0ea47f)
2023-02-20Merge branch 'releases/gcc-12' into devel/omp/gcc-12Tobias Burnus17-30/+228
Merge up to r12-9189-gc6e3ecca0e3dcf567d0c843a4987e52591041372 (20th Feb 2023)
2023-02-20Daily bump.GCC Administrator1-1/+1
2023-02-19Daily bump.GCC Administrator2-1/+14
2023-02-18LoongArch: Fix multiarch tuple canonizationXi Ruoyao2-8/+8
Multiarch tuple will be coded in file or directory names in multiarch-aware distros, so one ABI should have only one multiarch tuple. For example, "--target=loongarch64-linux-gnu --with-abi=lp64s" and "--target=loongarch64-linux-gnusf" should both set multiarch tuple to "loongarch64-linux-gnusf". Before this commit, "--target=loongarch64-linux-gnu --with-abi=lp64s --disable-multilib" will produce wrong result (loongarch64-linux-gnu). A recent LoongArch psABI revision mandates "loongarch64-linux-gnu" to be used for -mabi=lp64d (instead of "loongarch64-linux-gnuf64") for some non-technical reason [1]. Note that we cannot make "loongarch64-linux-gnuf64" an alias for "loongarch64-linux-gnu" because to implement such an alias, we must create thousands of symlinks in the distro and doing so would be completely unpractical. This commit also aligns GCC with the revision. Tested by building cross compilers with --enable-multiarch and multiple combinations of --target=loongarch64-linux-gnu*, --with-abi=lp64{s,f,d}, and --{enable,disable}-multilib; and run "xgcc --print-multiarch" then manually verify the result with eyesight. [1]: https://github.com/loongson/LoongArch-Documentation/pull/80 gcc/ChangeLog: * config.gcc (triplet_abi): Set its value based on $with_abi, instead of $target. (la_canonical_triplet): Set it after $triplet_abi is set correctly. * config/loongarch/t-linux (MULTILIB_OSDIRNAMES): Make the multiarch tuple for lp64d "loongarch64-linux-gnu" (without "f64" suffix). (cherry picked from commit 017849d9d88f021770a90f12fffec9aa2425ed27)
2023-02-18Daily bump.GCC Administrator1-1/+1
2023-02-17Daily bump.GCC Administrator3-1/+18
2023-02-16amdgcn, libgomp: low-latency allocatorAndrew Stubbs3-1/+24
This implements the OpenMP low-latency memory allocator for AMD GCN using the small per-team LDS memory (Local Data Store). Since addresses can now refer to LDS space, the "Global" address space is no-longer compatible. This patch therefore switches the backend to use entirely "Flat" addressing (which supports both memories). A future patch will re-enable "global" instructions for cases where it is known to be safe to do so. gcc/ChangeLog: * config/gcn/gcn-builtins.def (DISPATCH_PTR): New built-in. * config/gcn/gcn.cc (gcn_init_machine_status): Disable global addressing. (gcn_expand_builtin_1): Implement GCN_BUILTIN_DISPATCH_PTR. libgomp/ChangeLog: * config/gcn/libgomp-gcn.h (TEAM_ARENA_START): Move to here. (TEAM_ARENA_FREE): Likewise. (TEAM_ARENA_END): Likewise. (GCN_LOWLAT_HEAP): New. * config/gcn/team.c (LITTLEENDIAN_CPU): New, and import hsa.h. (__gcn_lowlat_init): New prototype. (gomp_gcn_enter_kernel): Initialize the low-latency heap. * libgomp.h (TEAM_ARENA_START): Move to libgomp.h. (TEAM_ARENA_FREE): Likewise. (TEAM_ARENA_END): Likewise. * plugin/plugin-gcn.c (lowlat_size): New variable. (print_kernel_dispatch): Label the group_segment_size purpose. (init_environment_variables): Read GOMP_GCN_LOWLAT_POOL. (create_kernel_dispatch): Pass low-latency head allocation to kernel. (run_kernel): Use shadow; don't assume values. * testsuite/libgomp.c/allocators-7.c: Enable for amdgcn. * config/gcn/allocator.c: New file.
2023-02-16Fortran: error recovery on invalid assumed size reference [PR104554]Steve Kargl2-3/+16
gcc/fortran/ChangeLog: PR fortran/104554 * resolve.cc (check_assumed_size_reference): Avoid NULL pointer dereference. gcc/testsuite/ChangeLog: PR fortran/104554 * gfortran.dg/pr104554.f90: New test. (cherry picked from commit a418129273725fd02e881e6fb5e0877287a1356c)
2023-02-16Daily bump.GCC Administrator3-1/+26
2023-02-15Fix PR target/90458Eric Botcazou1-3/+8
This is the incompatibility of -fstack-clash-protection with Windows SEH. Now the Windows ports always enable TARGET_STACK_PROBE, which means that the stack is always probed (out of line) so -fstack-clash-protection does nothing more. gcc/ PR target/90458 * config/i386/i386.cc (ix86_compute_frame_layout): Disable the effects of -fstack-clash-protection for TARGET_STACK_PROBE. (ix86_expand_prologue): Likewise.
2023-02-15warn-access: wrong -Wdangling-pointer with labels [PR106080]Marek Polacek3-14/+26
-Wdangling-pointer warns when the address of a label escapes. This causes grief in OCaml (<https://github.com/ocaml/ocaml/issues/11358>) as well as in the kernel: <https://bugzilla.kernel.org/show_bug.cgi?id=215851> because it uses #define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; }) to get the PC. -Wdangling-pointer is documented to warn about pointers to objects. However, it uses is_auto_decl which checks DECL_P, but DECL_P is also true for a label/enumerator/function declaration, none of which is an object. Rather, it should use auto_var_p which correctly checks VAR_P and PARM_DECL. PR middle-end/106080 gcc/ChangeLog: * gimple-ssa-warn-access.cc (is_auto_decl): Remove. Use auto_var_p instead. gcc/testsuite/ChangeLog: * c-c++-common/Wdangling-pointer-10.c: New test. * c-c++-common/Wdangling-pointer-9.c: New test. (cherry picked from commit d482b20fd346482635a770281a164a09d608b058)
2023-02-15OpenMP/Fortran: Fix loop-iter var privatization with !$OMP LOOP [PR108512]Tobias Burnus7-4/+133
For 'parallel', loop-iteration variables are marked are marked as 'private', unless they either appear in an omp do/simd loop or an data-sharing clause already exists for those on 'parallel'. 'omp loop' wasn't handled, leading to (potentially) multiple data-sharing clauses in gfc_resolve_do_iterator as omp_current_ctx pointed to the 'parallel' directive, ignoring the in-betwen 'loop' directive. The latter lead to a bogus diagnostic - or rather an ICE as the source location var contained only '\0'. Additionally, several 'case EXEC_OMP...LOOP' have been added to call the right resolution function and likewise for '{masked,master} taskloop'. gcc/fortran/ChangeLog: PR fortran/108512 * openmp.cc (gfc_resolve_omp_parallel_blocks): Handle combined 'loop' directives. (gfc_resolve_do_iterator): Set a source location for added 'private'-clause arguments. * resolve.cc (gfc_resolve_code): Call gfc_resolve_omp_do_blocks also for EXEC_OMP_LOOP and gfc_resolve_omp_parallel_blocks for combined directives with loop + '{masked,master} taskloop (simd)'. gcc/testsuite/ChangeLog: PR fortran/108512 * gfortran.dg/gomp/loop-5.f90: New test. * gfortran.dg/gomp/loop-2.f90: Update dg-error. * gfortran.dg/gomp/taskloop-2.f90: Update dg-error. (cherry picked from commit 7a8cada824c5e45ea729c112f3d1d29956067b7b)
2023-02-15Daily bump.GCC Administrator4-1/+25
2023-02-14c++: fix ICE in joust_maybe_elide_copy [PR106675]Marek Polacek2-0/+23
joust_maybe_elide_copy checks that the last conversion in the ICS for the first argument is ck_ref_bind, which is reasonable, because we've checked that we're dealing with a copy/move constructor. But it can also happen that we couldn't figure out which conversion function is better to convert the argument, as in this testcase: joust couldn't decide if we should go with operator foo &() or operator foo const &() so we get a ck_ambig, which then upsets joust_maybe_elide_copy. Since a ck_ambig can validly occur, I think we should just return early, as in the patch below. PR c++/106675 gcc/cp/ChangeLog: * call.cc (joust_maybe_elide_copy): Return false for ck_ambig. gcc/testsuite/ChangeLog: * g++.dg/cpp0x/overload-conv-5.C: New test. (cherry picked from commit cce62625025380c2ea2a220deb10f8f355f83abf)
2023-02-14openmp: Add support for 'present' modifier in the Fortran parse tree dumpKwok Cheung Yeung2-0/+22
2023-02-14 Kwok Cheung Yeung <kcy@codesourcery.com> gcc/fortran/ * dump-parse-tree.cc (show_omp_namelist): Display 'present' map modifier. (show_omp_clauses): Display 'present' motion modifier for 'to' and 'from' clauses.
2023-02-14Fix small regression in AdaEric Botcazou2-1/+9
gcc/ * gimplify.cc (gimplify_save_expr): Add missing guard. gcc/testsuite/ * gnat.dg/shift2.adb: New test.
2023-02-14Daily bump.GCC Administrator3-1/+59
2023-02-13gomp/openmp-simd-8.f90: Remove .ASSUME tree-dump checkTobias Burnus2-3/+5
While mainline (GCC 13) converts assumptions to the ASSUME internal function, OG12 has only parsing-only support. Thus, remove the failing dump scan from the testcase. gcc/testsuite/ * gfortran.dg/gomp/openmp-simd-8.f90: Remove dump test.
2023-02-13Merge branch 'releases/gcc-12' into devel/omp/gcc-12Tobias Burnus110-145/+2986
Merge up to r12-9170-gcb6861acc4074fd2c30a96b52d68c2cd33b9e94d (13th Feb 2023)
2023-02-12rs6000: Fix typo on vec_vsubcuq in rs6000-overload.def [PR108396]Kewen Lin2-1/+15
As Andrew pointed out in PR108396, there is one typo in rs6000-overload.def on built-in function vec_vsubcuq: [VEC_VSUBCUQ, vec_vsubcuqP, __builtin_vec_vsubcuq] "vec_vsubcuqP" should be "vec_vsubcuq", this typo caused us to define vec_vsubcuqP in rs6000-vecdefines.h instead of vec_vsubcuq, so that compiler is not able to realize the built-in function name vec_vsubcuq any more. Co-authored-By: Andrew Pinski <apinski@marvell.com> PR target/108396 gcc/ChangeLog: * config/rs6000/rs6000-overload.def (VEC_VSUBCUQ): Fix typo vec_vsubcuqP with vec_vsubcuq. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108396.c: New test. (cherry picked from commit aaf29ae6cdbaad58b709a77784375d15138174b3)
2023-02-12rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348]Kewen Lin3-4/+61
PR108348 shows one special case that MMA opaque types are used in function arguments and treated as pass by reference, it results in one copying from argument to a temp variable, since this copying happens before rs6000_function_arg check, it can cause ICE without MMA support then. This patch is to teach function rs6000_opaque_type_invalid_use_p to check if any function argument in a gcall stmt has the invalid use of MMA opaque types. btw, I checked the handling on return value, it doesn't have this kind of issue as its checking and error emission is quite early, so this doesn't handle function return value. PR target/108348 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the support for invalid uses of MMA opaque type in function arguments. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108348-1.c: New test. * gcc.target/powerpc/pr108348-2.c: New test. (cherry picked from commit 5d9529687deb9ed009361a16c02a7f6c3e2ebbf3)
2023-02-12rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272]Kewen Lin5-10/+110
As PR108272 shows, there are some invalid uses of MMA opaque types in inline asm statements. This patch is to teach the function rs6000_opaque_type_invalid_use_p for inline asm, check and error any invalid use of MMA opaque types in input and output operands. PR target/108272 gcc/ChangeLog: * config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the support for invalid uses in inline asm, factor out the checking and erroring to lambda function check_and_error_invalid_use. gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr108272-1.c: New test. * gcc.target/powerpc/pr108272-2.c: New test. * gcc.target/powerpc/pr108272-3.c: New test. * gcc.target/powerpc/pr108272-4.c: New test. (cherry picked from commit 074b0c03eabeb8e9c8de813c81bf87a1f88fdb65)
2023-02-13Daily bump.GCC Administrator1-1/+1
2023-02-12Daily bump.GCC Administrator3-1/+10
2023-02-11Suppress -fstack-protector warning on hppa.John David Anglin2-0/+6
Some package builds enable -fstack-protector and -Werror. Since -fstack-protector is not supported on hppa because the stack grows up, these packages must check for the warning generated by -fstack-protector and suppress it on hppa. This is problematic since hppa is the only significant architecture where the stack grows up. 2022-12-16 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa.cc (pa_option_override): Disable -fstack-protector. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_static): Return 0 on hppa*-*-*.
2023-02-11Daily bump.GCC Administrator7-1/+742
2023-02-10i386: Fix up -Wuninitialized warnings in avx512erintrin.h [PR105593]Jakub Jelinek2-13/+7
As reported in the PR, there are some -Wuninitialized warnings in avx512erintrin.h. One can see that by compiling sse-23.c testcase with -Wuninitialized (or when actually using those intrinsics). Those 6 spots use an uninitialized variable and pass it as one of the argument to a builtin with constant mask -1, because there is no unmasked builtin. It is true that expansion of those builtins into RTL will see mask is all ones and ignore the unneeded argument, but -Wuninitialized is diagnosed on GIMPLE and on GIMPLE these builtins are just builtin calls. avx512fintrin.h and other headers use in these cases the _mm*_undefined_* () intrinsics, like: return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X, (__v8di) __Y, (__v8di) _mm512_undefined_epi32 (), (__mmask8) -1); etc. and the following patch does the same for avx512erintrin.h. With the recent changes in C++ FE and the _mm*_undefined_* intrinsics, we don't emit -Wuninitialized warnings for those (previously we didn't just in C due to self-initialization). Of course we could also just self-initialize these uninitialized vars and add the #pragma GCC diagnostic dances around it, but using the intrinsics is consistent with the rest and IMHO cleaner. 2023-01-31 Jakub Jelinek <jakub@redhat.com> PR c++/105593 * config/i386/avx512erintrin.h (_mm512_exp2a23_round_pd, _mm512_exp2a23_round_ps, _mm512_rcp28_round_pd, _mm512_rcp28_round_ps, _mm512_rsqrt28_round_pd, _mm512_rsqrt28_round_ps): Use _mm512_undefined_pd () or _mm512_undefined_ps () instead of using uninitialized automatic variable __W. * gcc.target/i386/sse-23.c: Add -Wuninitialized to dg-options. (cherry picked from commit 41602390456901c14ecdfa2fa64c3cebd5b6ff09)
2023-02-10x86: Avoid -Wuninitialized warnings on _mm*_undefined_* in C++ [PR105593]Jakub Jelinek6-0/+56
In https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609844.html I've posted a patch to allow ignoring -Winit-self using GCC diagnostic pragmas, such that one can mark self-initialization as intentional disabling of -Wuninitialized warnings. The following incremental patch uses that in the x86 intrinsic headers. 2023-01-16 Jakub Jelinek <jakub@redhat.com> PR c++/105593 gcc/ * config/i386/xmmintrin.h (_mm_undefined_ps): Temporarily disable -Winit-self using pragma GCC diagnostic ignored. * config/i386/emmintrin.h (_mm_undefined_pd, _mm_undefined_si128): Likewise. * config/i386/avxintrin.h (_mm256_undefined_pd, _mm256_undefined_ps, _mm256_undefined_si256): Likewise. * config/i386/avx512fintrin.h (_mm512_undefined_pd, _mm512_undefined_ps, _mm512_undefined_epi32): Likewise. * config/i386/avx512fp16intrin.h (_mm_undefined_ph, _mm256_undefined_ph, _mm512_undefined_ph): Likewise. gcc/testsuite/ * g++.target/i386/pr105593.C: New test. (cherry picked from commit 6b0907b4fc455377e5f8109f427d97da02b6aec9)
2023-02-10c, c++: Allow ignoring -Winit-self through pragmas [PR105593]Jakub Jelinek5-2/+110
As mentioned in the PR, various x86 intrinsics need to return an uninitialized vector. Currently they use self initialization to avoid -Wuninitialized warnings, which works fine in C, but doesn't work in C++ where -Winit-self is enabled in -Wall. We don't have an attribute to mark a variable as knowingly uninitialized (the uninitialized attribute exists but means something else, only in the -ftrivial-auto-var-init context), and trying to suppress either -Wuninitialized or -Winit-self inside of the _mm_undefined_ps etc. intrinsic definitions doesn't work, one needs to currently disable through pragmas -Wuninitialized warning at the point where _mm_undefined_ps etc. result is actually used, but that goes against the intent of those intrinsics. The -Winit-self warning option actually doesn't do any warning, all we do is record a suppression for -Winit-self if !warn_init_self on the decl definition and later look that up in uninit pass. The following patch changes those !warn_init_self tests which are true only based on the command line option setting, not based on GCC diagnostic pragma overrides to !warning_enabled_at (DECL_SOURCE_LOCATION (decl), OPT_Winit_self) such that it takes them into account. 2023-01-16 Jakub Jelinek <jakub@redhat.com> PR c++/105593 gcc/c/ * c-parser.cc (c_parser_initializer): Check warning_enabled_at at the DECL_SOURCE_LOCATION (decl) for OPT_Winit_self instead of warn_init_self. gcc/cp/ * decl.cc (cp_finish_decl): Check warning_enabled_at at the DECL_SOURCE_LOCATION (decl) for OPT_Winit_self instead of warn_init_self. gcc/testsuite/ * c-c++-common/Winit-self3.c: New test. * c-c++-common/Winit-self4.c: New test. * c-c++-common/Winit-self5.c: New test. (cherry picked from commit 98b41fd4045b7856e7b85dd58d67c600bd909379)
2023-02-10c-family: Honor -Wno-init-self for cv-qual vars [PR102633]Marek Polacek5-16/+85
Since r11-5188-g32934a4f45a721, we drop qualifiers during l-to-r conversion by creating a NOP_EXPR. For e.g. const int i = i; that means that the DECL_INITIAL is '(int) i' and not 'i' anymore. Consequently, we don't suppress_warning here: 711 case DECL_EXPR: 715 if (VAR_P (DECL_EXPR_DECL (*expr_p)) 716 && !DECL_EXTERNAL (DECL_EXPR_DECL (*expr_p)) 717 && !TREE_STATIC (DECL_EXPR_DECL (*expr_p)) 718 && (DECL_INITIAL (DECL_EXPR_DECL (*expr_p)) == DECL_EXPR_DECL (*expr_p)) 719 && !warn_init_self) 720 suppress_warning (DECL_EXPR_DECL (*expr_p), OPT_Winit_self); because of the check on line 718 -- (int) i is not i. So -Wno-init-self doesn't disable the warning as it's supposed to. The following patch fixes it by moving the suppress_warning call from c_gimplify_expr to the front ends, at points where we haven't created the NOP_EXPR yet. PR middle-end/102633 gcc/c-family/ChangeLog: * c-gimplify.cc (c_gimplify_expr) <case DECL_EXPR>: Don't call suppress_warning here. gcc/c/ChangeLog: * c-parser.cc (c_parser_initializer): Add new tree parameter. Use it. Call suppress_warning. (c_parser_declaration_or_fndef): Pass d down to c_parser_initializer. (c_parser_omp_declare_reduction): Pass omp_priv down to c_parser_initializer. gcc/cp/ChangeLog: * decl.cc (cp_finish_decl): Call suppress_warning. gcc/testsuite/ChangeLog: * c-c++-common/Winit-self1.c: New test. * c-c++-common/Winit-self2.c: New test. (cherry picked from commit 04ce2400b35225302e0d6883bb0817378180f5d7)
2023-02-10c++: Handle structured bindings like anon unions in initializers [PR108474]Jakub Jelinek3-2/+72
As reported by Andrew Pinski, structured bindings (with the exception of the ones using std::tuple_{size,element} and get which are really standalone variables in addition to the binding one) also use DECL_VALUE_EXPR and needs the same treatment in static initializers. On Sun, Jan 22, 2023 at 07:19:07PM -0500, Jason Merrill wrote: > Though, actually, why not instead fix expand_expr_real_1 (and staticp) to > look through DECL_VALUE_EXPR? Doing it when emitting the initializers seems to be too late to me, we in various spots try to put parts of the static var DECL_INITIAL expressions into the IL, or e.g. for varpool purposes remember which vars are referenced there. This patch moves it to record_reference, which is called from varpool_node::analyze and so about the same time as gimplification of the bodies which also replaces DECL_VALUE_EXPRs. 2023-01-24 Jakub Jelinek <jakub@redhat.com> PR c++/108474 * cp-gimplify.cc (cp_fold_r): Handle structured bindings vars like anon union artificial vars. * g++.dg/cpp1z/decomp57.C: New test. * g++.dg/cpp1z/decomp58.C: New test. (cherry picked from commit b84e21115700523b4d0ac44275443f7b9c670344)
2023-02-10forwprop: Further fixes for simplify_rotate [PR108440]Jakub Jelinek3-9/+170
As mentioned in the simplify_rotate comment, for e.g. ((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1)))) we already emit X r<< (Y & (B - 1)) as replacement. This PR is about the ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y))) ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y))) forms if T2 is wider than T. Unlike e.g. (X << Y) OP (X >> (B - Y)) which is valid just for Y in [1, B - 1], the above 2 forms are actually valid and do the rotates for Y in [0, B] - for Y 0 the X value is preserved by the left shift and right logical shift by B adds just zeros (but because the shift is in wider precision B is still valid shift count), while for Y equal to B X is preserved through the latter shift and the former adds just zeros. Now, it is unclear if we in the middle-end treat rotates with rotate count equal or larger than precision as UB or not, unlike shifts there are less reasons to do so, but e.g. expansion of X r<< Y if there is no rotate optab for the mode is emitted as (X << Y) | (((unsigned) X) >> ((-Y) & (B - 1))) and so with UB on Y == B. The following patch does multiple things: 1) for the above 2, asks the ranger if Y could be equal to B and if so, instead of using X r<< Y uses X r<< (Y & (B - 1)) 2) for the ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1)))) ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1)))) forms that were fixed 2 days ago it only punts if Y might be in the [B,B2-1] range but isn't known to be in the [0,B][2*B,2*B][3*B,3*B]... range. Because for Y which is a multiple of B but smaller than B2 it acts as a rotate too, left shift provides 0 and (-Y) & (B - 1) is 0 and so preserves X. Though, for the cases where Y is not known to be in [0,B-1] the patch also uses X r<< (Y & (B - 1)) rather than X r<< Y 3) as discussed with Aldy, instead of using global ranger it uses a pass specific copy but lazily created on first simplify_rotate that needs it; this e.g. handles rotate inside of if body where the guarding condition limits the shift count to some range which will not work with the global ranger (unless there is some SSA_NAME to attach the range to). Note, e.g. on x86 X r<< (Y & (B - 1)) and X r<< Y actually emit the same assembly because rotates work the same even for larger rotate counts, but that is handled only during combine. 2023-01-19 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108440 * tree-ssa-forwprop.cc: Include gimple-range.h. (simplify_rotate): For the forms with T2 wider than T and shift counts of Y and B - Y add & (B - 1) masking for the rotate count if Y could be equal to B. For the forms with T2 wider than T and shift counts of Y and (-Y) & (B - 1), don't punt if range could be [B, B2], but only if range doesn't guarantee Y < B or Y = N * B. If range doesn't guarantee Y < B, also add & (B - 1) masking for the rotate count. Use lazily created pass specific ranger instead of get_global_range_query. (pass_forwprop::execute): Disable that ranger at the end of pass if it has been created. * c-c++-common/rotate-10.c: New test. * c-c++-common/rotate-11.c: New test. (cherry picked from commit 05b9868b182bb9ed2013b39a0bc6297354a0db49)
2023-02-10forwprop: Fix up rotate pattern matching [PR106523]Jakub Jelinek6-4/+335
The comment above simplify_rotate roughly describes what patterns are matched into what: We are looking for X with unsigned type T with bitsize B, OP being +, | or ^, some type T2 wider than T. For: (X << CNT1) OP (X >> CNT2) iff CNT1 + CNT2 == B ((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2)) iff CNT1 + CNT2 == B transform these into: X r<< CNT1 Or for: (X << Y) OP (X >> (B - Y)) (X << (int) Y) OP (X >> (int) (B - Y)) ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y))) ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y))) (X << Y) | (X >> ((-Y) & (B - 1))) (X << (int) Y) | (X >> (int) ((-Y) & (B - 1))) ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1)))) ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1)))) transform these into (last 2 only if ranger can prove Y < B): X r<< Y Or for: (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1))) (X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1))) ((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1)))) ((T) ((T2) X << (int) (Y & (B - 1)))) \ | ((T) ((T2) X >> (int) ((-Y) & (B - 1)))) transform these into: X r<< (Y & (B - 1)) The following testcase shows that 2 of these are problematic. If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one of the shift counts but Y on the can do something different from rotate. E.g.: __attribute__((noipa)) unsigned char f7 (unsigned char x, unsigned int y) { unsigned int t = x; return (t << y) | (t >> ((-y) & 7)); } if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U] then it is UB, but for y in [9, 31] the left shift in this case will never leave any bits in the result, while in a rotate they are left there. Say for y 5 and x 0xaa the expression gives 0x55 which is the same thing as rotate, while for y 19 and x 0xaa 0x5, which is different. Now, I believe the ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y))) ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y))) forms are ok, because B - Y still needs to be a valid shift count, and if Y > B then B - Y should be either negative or very large positive (for unsigned types). And similarly the last 2 cases above which use & (B - 1) on both shift operands are definitely ok. The following patch disables the ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1)))) ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1)))) unless ranger says Y is not in [B, B2 - 1] range. And, looking at it again this morning, actually the Y equal to B case is still fine, if Y is equal to 0, then it is (T) (((T2) X << 0) | ((T2) X >> 0)) and so X, for Y == B it is (T) (((T2) X << B) | ((T2) X >> 0)) which is the same as (T) (0 | ((T2) X >> 0)) which is also X. So instead of the [B, B2 - 1] range we could use [B + 1, B2 - 1]. And, if we wanted to go further, even multiplies of B are ok if they are smaller than B2, so we could construct a detailed int_range_max if we wanted. 2023-01-17 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/106523 * tree-ssa-forwprop.cc (simplify_rotate): For the patterns with (-Y) & (B - 1) in one operand's shift count and Y in another, if T2 has wider precision than T, punt if Y could have a value in [B, B2 - 1] range. * c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16, f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using __builtin_unreachable about shift count. * c-c++-common/rotate-2b.c: New test. * c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16, f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using __builtin_unreachable about shift count. * c-c++-common/rotate-4b.c: New test. * gcc.c-torture/execute/pr106523.c: New test. (cherry picked from commit 001121e8921d5d1a439ce0e64ab04c5959b0bfd8)
2023-02-10c++: Avoid incorrect shortening of divisions [PR108365]Jakub Jelinek3-2/+28
The following testcase is miscompiled, because we shorten the division in a case where it should not be shortened. Divisions (and modulos) can be shortened if it is unsigned division/modulo, or if it is signed division/modulo where we can prove the dividend will not be the minimum signed value or divisor will not be -1, because e.g. on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fffffff targets (-2147483647 - 1) / -1 is UB but (int) (-2147483648LL / -1LL) is not, it is -2147483648. The primary aim of both the C and C++ FE division/modulo shortening I assume was for the implicit integral promotions of {,signed,unsigned} {char,short} and because at this point we have no VRP information etc., the shortening is done if the integral promotion is from unsigned type for the divisor or if the dividend is an integer constant other than -1. This works fine for char/short -> int promotions when char/short have smaller precision than int - unsigned char -> int or unsigned short -> int will always be a positive int, so never the most negative. Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either the same as orig_op0 or that promoted to int, I think that works fine, if it isn't promoted, either the division/modulo common type will have the same precision as op0 but then the division/modulo is unsigned and so without UB, or it will be done in wider precision (e.g. because op1 has wider precision), but then op0 can't be minimum signed value. Or it has been promoted to int, but in that case it was again from narrower type and so never minimum signed int. But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED. First of all, not sure if the operand of NOP_EXPR couldn't be non-integral type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly, even if it is a cast from unsigned integral type, we only know it can't be minimum signed value if it is a widening cast, if it is same precision or narrowing cast, we know nothing. So, the following patch for the NOP_EXPR cases checks just in case that it is from integral type and more importantly checks it is a widening conversion. 2023-01-14 Jakub Jelinek <jakub@redhat.com> PR c++/108365 * typeck.cc (cp_build_binary_op): For integral division or modulo, shorten if type0 is unsigned, or op0 is cast from narrower unsigned integral type or stripped_op1 is INTEGER_CST other than -1. * g++.dg/opt/pr108365.C: New test. * g++.dg/warn/pr108365.C: New test. (cherry picked from commit 5b3a88640f962d4ffca31ae651bed2d8672f1a8c)
2023-02-10match.pd: When simplifying BFR of an insert, require a mode precision ↵Andrew Pinski2-1/+18
integral type [PR108688] The same problem as PR 88739 has crept in but this time in match.pd when simplifying bit_field_ref of an bit_insert. That is we are generating a BIT_FIELD_REF of a non-mode-precision integral type. PR tree-optimization/108688 * match.pd (bit_field_ref [bit_insert]): Avoid generating BIT_FIELD_REFs of non-mode-precision integral operands. * gcc.c-torture/compile/pr108688-1.c: New test. (cherry picked from commit 44f308e59bfa0f93ae05b17e257d8563c12399fd)
2023-02-10vect-patterns: Fix up vect_widened_op_tree [PR108692]Jakub Jelinek2-1/+50
The following testcase is miscompiled on aarch64-linux since r11-5160. Given <bb 3> [local count: 955630225]: # i_22 = PHI <i_20(6), 0(5)> # r_23 = PHI <r_19(6), 0(5)> ... a.0_5 = (unsigned char) a_15; _6 = (int) a.0_5; b.1_7 = (unsigned char) b_17; _8 = (int) b.1_7; c_18 = _6 - _8; _9 = ABS_EXPR <c_18>; r_19 = _9 + r_23; ... where SSA_NAMEs 15/17 have signed char, 5/7 unsigned char and rest is int we first pattern recognize c_18 as patt_34 = (a.0_5) w- (b.1_7); which is still correct, 5/7 are unsigned char subtracted in wider type, but then vect_recog_sad_pattern turns it into SAD_EXPR <a_15, b_17, r_23> which is incorrect, because 15/17 are signed char and so it is sum of absolute signed differences rather than unsigned sum of absolute unsigned differences. The reason why this happens is that vect_recog_sad_pattern calls vect_widened_op_tree with MINUS_EXPR, WIDEN_MINUS_EXPR on the patt_34 = (a.0_5) w- (b.1_7); statement's vinfo and vect_widened_op_tree calls vect_look_through_possible_promotion on the operands of the WIDEN_MINUS_EXPR, which looks through the further casts. vect_look_through_possible_promotion has careful code to stop when there would be nested casts that need to be preserved, but the problem here is that the WIDEN_*_EXPR operation itself has an implicit cast on the operands already - in this case of WIDEN_MINUS_EXPR the unsigned char 5/7 SSA_NAMEs are widened to unsigned short before the subtraction, and vect_look_through_possible_promotion obviously isn't told about that. Now, I think when we see those WIDEN_{MULT,MINUS,PLUS}_EXPR codes, we had to look through possible promotions already when creating those and so vect_look_through_possible_promotion again isn't really needed, all we need to do is arrange what that function will do if the operand isn't result of any cast. Other option would be let vect_look_through_possible_promotion know about the implicit promotion from the WIDEN_*_EXPR, but I'm afraid that would be much harder. 2023-02-08 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108692 * tree-vect-patterns.cc (vect_widened_op_tree): If rhs_code is widened_code which is different from code, don't call vect_look_through_possible_promotion but instead just check op is SSA_NAME with integral type for which vect_is_simple_use is true and call set_op on this_unprom. * gcc.dg/pr108692.c: New test. (cherry picked from commit 6ad1c1027628f094260037536f6b6fcdb63b5add)
2023-02-10fortran: Fix up hash table usage in gfc_trans_use_stmts [PR108451]Jakub Jelinek1-1/+5
The first testcase in the PR (which I haven't included in the patch because it is unclear to me if it is supposed to be valid or not) ICEs since extra hash table checking has been added recently. The problem is that gfc_trans_use_stmts does tree *slot = entry->decls->find_slot_with_hash (rent->use_name, hash, INSERT); if (*slot == NULL) and later on doesn't store anything into *slot and continues. Another spot a few lines later correctly clears the slot if it decides not to use the slot, so the following patch does the same. 2023-02-03 Jakub Jelinek <jakub@redhat.com> PR fortran/108451 * trans-decl.cc (gfc_trans_use_stmts): Call clear_slot before doing continue. (cherry picked from commit 76f7f0eddcb7c418d1ec3dea3e2341ca99097301)
2023-02-10nested, openmp: Wrap OMP_CLAUSE_*_GIMPLE_SEQ into GIMPLE_BIND for ↵Jakub Jelinek2-16/+34
declare_vars [PR108435] When gimplifying OMP_CLAUSE_{LASTPRIVATE,LINEAR}_STMT, we wrap it always into a GIMPLE_BIND, but when putting statements directly into OMP_CLAUSE_{LASTPRIVATE,LINEAR}_GIMPLE_SEQ, we do it only if needed (there are any temporaries that need to be declared in the sequence). convert_nonlocal_omp_clauses was relying on the GIMPLE_BIND to be there always because it called declare_vars on it. The following patch wraps it into GIMPLE_BIND in tree-nested if we need to declare_vars on it on demand. 2023-02-02 Jakub Jelinek <jakub@redhat.com> PR middle-end/108435 * tree-nested.cc (convert_nonlocal_omp_clauses) <case OMP_CLAUSE_LASTPRIVATE>: If info->new_local_var_chain and *seq is not a GIMPLE_BIND, wrap the sequence into a new GIMPLE_BIND before calling declare_vars. (convert_nonlocal_omp_clauses) <case OMP_CLAUSE_LINEAR>: Merge with the OMP_CLAUSE_LASTPRIVATE handling except for whether seq is initialized to &OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (clause) or &OMP_CLAUSE_LINEAR_GIMPLE_SEQ (clause). * gcc.dg/gomp/pr108435.c: New test. (cherry picked from commit 0f349928e16fdc7dba52561e8d40347909f9f0ff)
2023-02-10ree: Fix -fcompare-debug issues in combine_reaching_defs [PR108573]Jakub Jelinek2-2/+22
The PR78437 r7-4871 changes made combine_reaching_defs punt on WORD_REGISTER_OPERATIONS targets if a setter of smaller than word register has wider uses. This unfortunately breaks -fcompare-debug, because if such a use appears only in DEBUG_INSN(s), while all other uses aren't wider than the setter, we can REE optimize it without -g and not with -g. Such decisions shouldn't be based on debug instructions. We could try to reset them or adjust in some other way after we decide to perform the change, but at least on the testcase which used to fail on riscv64-linux the (debug_insn 8 7 9 2 (var_location:HI s (minus:HI (subreg:HI (and:DI (reg:DI 10 a0 [160]) (const_int 1 [0x1])) 0) (subreg:HI (ashiftrt:DI (reg/v:DI 9 s1 [orig:151 l ] [151]) (debug_expr:SI D#1)) 0))) "pr108573.c":12:5 -1 (nil)) clearly doesn't care about the upper bits and I have hard time imaging how could one end up with DEBUG_INSN which actually cares about those upper bits. So, the following patch just ignores uses on DEBUG_INSNs in this case, if we run into something where we'd need to do something further later on, let's deal with it when we have a testcase for it. 2023-02-01 Jakub Jelinek <jakub@redhat.com> PR debug/108573 * ree.cc (combine_reaching_defs): Don't return false for paradoxical subregs in DEBUG_INSNs. * gcc.dg/pr108573.c: New test. (cherry picked from commit e4473d7cf871c8ddf8f22d105c5af6375ebe37bf)
2023-02-10c++, openmp: Handle some OMP_*/OACC_* constructs during constant expression ↵Jakub Jelinek2-0/+94
evaluation [PR108607] While potential_constant_expression_1 handled most of OMP_* codes (by saying that they aren't potential constant expressions), OMP_SCOPE was missing in that list. I've also added OMP_SCAN, though that is less important (similarly to OMP_SECTION it ought to appear solely inside of OMP_{FOR,SIMD} resp. OMP_SECTIONS). As the testcase shows, it isn't enough, potential_constant_expression_1 can catch only some cases, as soon as one uses switch or ifs where at least one of the possible paths could be constant expression, we can run into the same codes during cxx_eval_constant_expression, so this patch handles those there as well. 2023-02-01 Jakub Jelinek <jakub@redhat.com> PR c++/108607 * constexpr.cc (cxx_eval_constant_expression): Handle OMP_* and OACC_* constructs as non-constant. (potential_constant_expression_1): Handle OMP_SCAN and OMP_SCOPE. * g++.dg/gomp/pr108607.C: New test. (cherry picked from commit bfc070595bfb00abef88a002eee5d9117f5b86a7)
2023-02-10i386: Fix up ix86_convert_const_wide_int_to_broadcast [PR108599]Jakub Jelinek2-1/+35
The following testcase is miscompiled. The problem is that during RTL DSE we see a V4DI register is being loaded { 16, 16, 0, 0 } value and DSE mostly works in terms of scalar modes, so it calls movoi to set an OImode REG to (const_wide_int 0x100000000000000010) and ix86_convert_const_wide_int_to_broadcast thinks it can compute that value by broadcasting DImode 0x10. While it is true that for TImode result the broadcast could be used, for OImode/XImode it can't be, because all but the lowest 2 HOST_WIDE_INTs aren't present (so are 0 or -1 depending on sign), not 0x10 in this case. The function checks if the least significant HOST_WIDE_INT elt of the CONST_WIDE_INT is broadcastable from QI/HI/SI/DImode and then /* Check if OP can be broadcasted from VAL. */ for (int i = 1; i < CONST_WIDE_INT_NUNITS (op); i++) if (val != CONST_WIDE_INT_ELT (op, i)) return nullptr; That is needed of course, but nothing checks that CONST_WIDE_INT_NUNITS (op) isn't too small for the mode in question. I think if op would be 0 or -1, it ought to be never CONST_WIDE_INT, but CONST_INT and so we can just punt whenever the number of CONST_WIDE_INT elts is not the expected one. 2023-01-31 Jakub Jelinek <jakub@redhat.com> PR target/108599 * config/i386/i386-expand.cc (ix86_convert_const_wide_int_to_broadcast): Return nullptr if CONST_WIDE_INT_NUNITS (op) times HOST_BITS_PER_WIDE_INT isn't equal to bitsize of mode. * gcc.target/i386/avx2-pr108599.c: New test. (cherry picked from commit 963315a922e228c4f6853826666151fc540f111a)
2023-02-10bbpart: Fix up ICE on asm goto [PR108596]Jakub Jelinek2-1/+46
On the following testcase we have asm goto in hot block with 2 successors, one cold to which it both falls through and has one of the label pointing to it and another hot successor with another label. Now, during bbpart we want to ensure that no blocks from one partition fall through into a block in a different partition. fix_up_fall_thru_edges does that by temporarily clearing the EDGE_CROSSING on the fallthrough edge, calling force_nonfallthru and then depending on whether it created a new bb either set EDGE_CROSSING on the single successor edge from the new bb (the new bb is kept in the same partition as the predecessor block), or if no new bb has been created setting EDGE_CROSSING back on the fallthru edge which has been forced non-EDGE_FALLTHRU. For asm goto this doesn't always work, force_nonfallthru can create a new bb and change the fallthrough edge to point to that, but if the original fallthru destination block has its label referenced among the asm goto labels, it will create a new non-fallthru edge for the label(s). But because we've temporarily cheated and cleared EDGE_CROSSING on the edge, it is cleared on the new edge as well, then the caller sees we've created a new bb and just sets EDGE_CROSSING on the single fallthru edge from the new bb. But the direct edge from cur_bb to fallthru edge's destination isn't handled and fails afterwards consistency checks, because it crosses partitions. The following patch notes the case and sets EDGE_CROSSING on that edge too. 2023-01-31 Jakub Jelinek <jakub@redhat.com> PR rtl-optimization/108596 * bb-reorder.cc (fix_up_fall_thru_edges): Handle the case where cur_bb ends with asm goto and has a crossing fallthrough edge to the same bb that contains at least one of its labels by restoring EDGE_CROSSING flag even on possible edge from cur_bb to new_bb successor. * gcc.c-torture/compile/pr108596.c: New test. (cherry picked from commit 603a6fbcaac1e80aa90d1d26318c881a53473066)
2023-02-10doc: Fix up return type of __builtin_va_arg_pack_len [PR108560]Jakub Jelinek1-1/+1
__builtin_va_arg_pack_len as implemented returned int since its introduction in 2007. The initial documentation didn't mention any return type, which changed in 2010 in r0-103077-gab940b73bfabe2cec4 during some documentation formatting cleanups https://gcc.gnu.org/legacy-ml/gcc-patches/2010-09/msg01632.html I can understand that for formatting some type was needed there but what exactly hasn't been really discussed. So, I think we should change documentation to match the implementation, rather than change implementation to match the documentation. Most people don't use more than 2147483647 arguments to inline functions, and on poor targets with 16-bit ints I bet even having more than 65535 arguments to inline functions would be highly unexpected. 2023-01-27 Jakub Jelinek <jakub@redhat.com> PR other/108560 * doc/extend.texi: Fix up return type of __builtin_va_arg_pack_len from size_t to int. (cherry picked from commit 16f30680f403891556da2ad6329fcef9dc9b47db)
2023-02-10store-merging: Disable string_concatenate mode if start or end aren't byte ↵Jakub Jelinek3-2/+179
aligned [PR108498] The first of the following testcases is miscompiled on powerpc64-linux -O2 -m64 at least, the latter at least on x86_64-linux -m32/-m64. Since GCC 11 store-merging has a separate string_concatenation mode which turns stores into setting a MEM_REF from a STRING_CST. This mode is triggered if at least one of the to be merged stores is a STRING_CST store and either the first store (to earliest address) is that STRING_CST store or the first store is 8-bit INTEGER_CST store and then there are some rules when to turn that mode off or not merge further stores into it. The problem with these 2 testcases is that the actual implementation relies on start/width of the store to be at byte boundaries, as it simply creates a char array, MEM_REF can be only on byte boundaries and the char array too, plus obviously STRING_CST as well. But as can be easily seen in the second testcase, nothing verifies this, while the first store has to be a STRING_CST (which will be aligned) or 8-bit INTEGER_CST, that 8-bit INTEGER_CST store could be a bitfield store, nothing verifies any stores in between whether they actually are 8-bit and aligned, the only major requirement is that all the stores are consecutive. For GCC 14 I think we should reconsider this, simply treat STRING_CST stores during the merging like INTEGER_CST stores and deal with it only during split_group where we can create multiple parts, this part would be a normal store, this part would be STRING_CST store, this part another normal store etc. But that is quite a lot of work, the following patch just disables the string_concatenate mode if boundaries aren't byte aligned in the spot where we disable it if it is too short too. If that happens, we'll just try to do the merging using normal 1/2/4/8 etc. byte stores as usually with RMW masking for any bits that shouldn't be touched or punt if we end up with too many stores compared to the original. Note, an original STRING_CST store will count as one store in that case, something we might want to reconsider later too (but, after all, CONSTRUCTOR stores (aka zeroing) already have the same problem, they can be large and expensive and we still count them as one store). 2023-01-25 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/108498 * gimple-ssa-store-merging.cc (class store_operand_info): End coment with full stop rather than comma. (split_group): Likewise. (merged_store_group::apply_stores): Clear string_concatenation if start or end aren't on a byte boundary. * gcc.c-torture/execute/pr108498-1.c: New test. * gcc.c-torture/execute/pr108498-2.c: New test. (cherry picked from commit 617be7ba436bcbf9d7b883968c6b3c011206b56c)