aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-07RA: Constrain class of pic offset table pseudo to general regsVladimir N. Makarov2-0/+33
On some targets an integer pseudo can be assigned to a FP reg. For pic offset table pseudo it means we will reload the pseudo in this case and, as a consequence, memory containing the pseudo might be recognized as wrong one. The patch fix this problem. PR target/109541 gcc/ChangeLog: * ira-costs.cc: (find_costs_and_classes): Constrain classes of pic offset table pseudo to a general reg subset. gcc/testsuite/ChangeLog: * gcc.target/sparc/pr109541.c: New.
2023-06-07aarch64: Represent SQXTUN with RTL operationsKyrylo Tkachov3-14/+56
This patch removes UNSPEC_SQXTUN and uses organic RTL codes to represent the operation. SQXTUN is an odd one. It's described in the architecture as "Signed saturating extract Unsigned Narrow". It's not a straightforward ss_truncate nor a us_truncate. It is a sort of truncating signed clamp operation with limits derived from the unsigned extrema of the narrow mode: (truncate:N (smin:M (smax:M (reg:M) (const_int 0)) (const_int <unsigned-max-for-mode-N>))) This patch implements these semantics. I've checked that the vqmovun tests in advsimd-intrinsics.exp now get constant-folded and still pass validation, so I'm pretty confident in the semantics. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_sqmovun<mode><vczle><vczbe>): Rename to... (*aarch64_sqmovun<mode>_insn<vczle><vczbe>): ... This. Reimplement with RTL codes. (aarch64_sqmovun<mode> [SD_HSDI]): Reimplement with RTL codes. (aarch64_sqxtun2<mode>_le): Likewise. (aarch64_sqxtun2<mode>_be): Likewise. (aarch64_sqxtun2<mode>): Adjust for the above. (aarch64_sqmovun<mode>): New define_expand. * config/aarch64/iterators.md (UNSPEC_SQXTUN): Delete. (half_mask): New mode attribute. * config/aarch64/predicates.md (aarch64_simd_umax_half_mode): New predicate.
2023-06-07aarch64: Improve RTL representation of ADDP instructionsKyrylo Tkachov1-7/+63
Similar to the ADDLP instructions the non-widening ADDP ones can be represented by adding the odd lanes with the even lanes of a vector. These instructions take two vector inputs and the architecture spec describes the operation as concatenating them together before going through it with pairwise additions. This patch chooses to represent ADDP on 64-bit and 128-bit input vectors slightly differently, reasons explained in the comments in aarhc64-simd.md. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_addp<mode><vczle><vczbe>): Reimplement as... (aarch64_addp<mode>_insn): ... This... (aarch64_addp<mode><vczle><vczbe>_insn): ... And this. (aarch64_addp<mode>): New define_expand.
2023-06-07Revert "libstdc++: Use AS_IF in configure.ac"Jonathan Wakely2-590/+578
This reverts commit 97a5e8a2a48d162744a5bd60a012ce6fca13cbbe. libstdc++-v3/ChangeLog: * configure: Regenerate. * configure.ac:
2023-06-07Fix expected test output on hppaJeff Law1-1/+1
Recent changes in the hoisting code change the optimized gimple for the shadd-3 testcase on the PA. That in turn changes the number of expected shadd instructions. I'm not entirely sure the test is actually testing what we want anymore since I don't see a CSE for postreload to discover. But I did verify that the number of shadd instructions is sane, so I just changed the count in the obvious way. gcc/testsuite * gcc.target/hppa/shadd-3.c: Update expected output.
2023-06-07testsuite/libgomp.*/target-present-*.{c,f90}: Improve and fixTobias Burnus6-25/+35
One of the testcases lacked variables in a map clause such that the fail occurred too early. Additionally, it would have failed for all those non-host devices where 'present' is always true, i.e. non-host devices which can access all of the host memory (shared-memory devices). [There are currently none.] The commit now runs the code on all devices, which should succeed for host fallback and for shared-memory devices, finding potenial issues that way. Additionally, a checkpoint (required stdout output) is used to ensure that the execution won't fail (with the same error) before reaching the expected fail location. 2023-06-07 Thomas Schwinge <thomas@codesourcery.com> Tobias Burnus <tobias@codesourcery.com> libgomp/ * testsuite/libgomp.c-c++-common/target-present-1.c: Run code also for non-offload_device targets; check that it runs successfully for those and for all until a checkpoint for all * testsuite/libgomp.c-c++-common/target-present-2.c: Likewise. * testsuite/libgomp.c-c++-common/target-present-3.c: Likewise. * testsuite/libgomp.fortran/target-present-1.f90: Likewise. * testsuite/libgomp.fortran/target-present-3.f90: Likewise. * testsuite/libgomp.fortran/target-present-2.f90: Likewise; add missing vars to map clause.
2023-06-07Support 'UNSUPPORTED: [...]: exception handling disabled' for libstdc++ testingThomas Schwinge1-0/+12
Verbatim copy of what was added to 'gcc/testsuite/lib/gcc-dg.exp:gcc-dg-prune' in Subversion r279246 (Git commit a9046e9853024206bec092dd63e21e152cb5cbca) "[MSP430] -Add fno-exceptions multilib". This greatly improves 'make check-target-libstdc++-v3' results for, for example, x86_64-pc-linux-gnu with: RUNTESTFLAGS='--target_board=unix/-fno-exceptions\{,-m32\}' libstdc++-v3/ * testsuite/lib/prune.exp (libstdc++-dg-prune): Support 'UNSUPPORTED: [...]: exception handling disabled'.
2023-06-07modula2: Fix bootstrapJakub Jelinek1-0/+2
internal-fn.h since yesterday includes insn-opinit.h, which is a generated header. One of my bootstraps today failed because some m2 sources started compiling before insn-opinit.h has been generated. Normally, gcc/Makefile.in has # In order for parallel make to really start compiling the expensive # objects from $(OBJS) as early as possible, build all their # prerequisites strictly before all objects. $(ALL_HOST_OBJS) : | $(generated_files) rule which ensures that all the generated files are generated before any $(ALL_HOST_OBJS) objects start, but use order-only dependency for this because we don't want to rebuild most of the objects whenever one generated header is regenerated. After the initial build in an empty directory we'll have .deps/ files contain the detailed dependencies. $(ALL_HOST_OBJS) includes even some FE files, I think in the m2 case would be m2_OBJS, but m2/Make-lang.in doesn't define those. The following patch just adds a similar rule to m2/Make-lang.in. Another option would be to set m2_OBJS variable in m2/Make-lang.in to something, but not really sure to which exactly and why it isn't done. 2023-06-07 Jakub Jelinek <jakub@redhat.com> * Make-lang.in: Build $(generated_files) before building all $(GM2_C_OBJS).
2023-06-07RISC-V: Support RVV VLA SLP auto-vectorizationJuzhe-Zhong26-37/+1010
This patch enables basic VLA SLP auto-vectorization. Consider this following case: void f (uint8_t *restrict a, uint8_t *restrict b) { for (int i = 0; i < 100; ++i) { a[i * 8 + 0] = b[i * 8 + 7] + 1; a[i * 8 + 1] = b[i * 8 + 7] + 2; a[i * 8 + 2] = b[i * 8 + 7] + 8; a[i * 8 + 3] = b[i * 8 + 7] + 4; a[i * 8 + 4] = b[i * 8 + 7] + 5; a[i * 8 + 5] = b[i * 8 + 7] + 6; a[i * 8 + 6] = b[i * 8 + 7] + 7; a[i * 8 + 7] = b[i * 8 + 7] + 3; } } To enable VLA SLP auto-vectorization, we should be able to handle this following const vector: 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3. { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. { 1, 2, 8, 4, 5, 6, 7, 3, ... } And these vector can be generated at prologue. After this patch, we end up with this following codegen: Prologue: ... vsetvli a7,zero,e16,m2,ta,ma vid.v v4 vsrl.vi v4,v4,3 li a3,8 vmul.vx v4,v4,a3 ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... } ... li t1,67633152 addi t1,t1,513 li a3,50790400 addi a3,a3,1541 slli a3,a3,32 add a3,a3,t1 vsetvli t1,zero,e64,m1,ta,ma vmv.v.x v3,a3 ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... } ... LoopBody: ... min a3,... vsetvli zero,a3,e8,m1,ta,ma vle8.v v2,0(a6) vsetvli a7,zero,e8,m1,ta,ma vrgatherei16.vv v1,v2,v4 vadd.vv v1,v1,v3 vsetvli zero,a3,e8,m1,ta,ma vse8.v v1,0(a2) add a6,a6,a4 add a2,a2,a4 mv a3,a5 add a5,a5,t1 bgtu a3,a4,.L3 ... Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 since "vrgatherei16.vv" can cover larger range than "vrgather.vv" (which only can maximum element index = 255). Epilogue: lbu a5,799(a1) addiw a4,a5,1 sb a4,792(a0) addiw a4,a5,2 sb a4,793(a0) addiw a4,a5,8 sb a4,794(a0) addiw a4,a5,4 sb a4,795(a0) addiw a4,a5,5 sb a4,796(a0) addiw a4,a5,6 sb a4,797(a0) addiw a4,a5,7 sb a4,798(a0) addiw a5,a5,3 sb a5,799(a0) ret There is one more last thing we need to do is the "Epilogue auto-vectorization" which needs VLS modes support. I will support VLS modes for "Epilogue auto-vectorization" in the future. gcc/ChangeLog: * config/riscv/riscv-protos.h (expand_vec_perm_const): New function. * config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling. (rvv_builder::single_step_npatterns_p): New function. (rvv_builder::npatterns_all_equal_p): Ditto. (const_vec_all_in_range_p): Support POLY handling. (gen_const_vector_dup): Ditto. (emit_vlmax_gather_insn): Add vrgatherei16. (emit_vlmax_masked_gather_mu_insn): Ditto. (expand_const_vector): Add VLA SLP const vector support. (expand_vec_perm): Support POLY. (struct expand_vec_perm_d): New struct. (shuffle_generic_patterns): New function. (expand_vec_perm_const_1): Ditto. (expand_vec_perm_const): Ditto. * config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto. (TARGET_VECTORIZE_VEC_PERM_CONST): New targethook. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA vectorizer. * gcc.target/riscv/rvv/autovec/v-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto. * gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-6.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp-7.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-1.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-2.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-3.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-4.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-5.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-6.c: New test. * gcc.target/riscv/rvv/autovec/partial/slp_run-7.c: New test.
2023-06-06Handle const_int in expand_single_bit_testAndrew Pinski3-3/+45
After expanding directly to rtl instead of creating a tree, we could end up with a const_int which is not ready to be handled by extract_bit_field. So need to the constant folding here instead. OK? bootstrapped and tested on x86_64-linux-gnu with no regressions. PR middle-end/110117 gcc/ChangeLog: * expr.cc (expand_single_bit_test): Handle const_int from expand_expr. gcc/testsuite/ChangeLog: * gcc.dg/pr110117-1.c: New test. * gcc.dg/pr110117-2.c: New test.
2023-06-06Improve do_store_flag for single bit when there is no non-zero bitsAndrew Pinski1-17/+11
In r14-1534-g908e5ab5c11c, I forgot you could turn off CCP or turn off the bit tracking part of CCP so we would lose out what TER was able to do before hand. This moves around the TER code so that it is used instead of just the nonzerobits. It also makes it easier to remove the TER part of the code later on too. OK? Bootstrapped and tested on x86_64-linux-gnu. Note it reintroduces PR 110117 (which was accidently fixed after r14-1534-g908e5ab5c11c). The next patch in series will fix that. gcc/ChangeLog: * expr.cc (do_store_flag): Rearrange the TER code so that it overrides the nonzero bits info if we had `a & POW2`.
2023-06-06For the `-A CMP -B -> B CMP A` pattern allow EQ/NE for all integer typesAndrew Pinski5-2/+80
I noticed while looking at some code generation issue, that forwprop was not handling `-a == 0` for unsigned types and I was confused why it was not. r6-1814-g66e1cacf608045 removed these from fold because they were supposed to be already handled by the match.pd patterns but it was missed that the match.pd patterns checked TYPE_OVERFLOW_UNDEFINED while fold didn't do that for NE/EQ. This patch removes the restriction on NE/EQ on TYPE_OVERFLOW_UNDEFINED. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: PR tree-optimization/110134 * match.pd (-A CMP -B -> B CMP A): Allow EQ/NE for all integer types. (-A CMP CST -> B CMP (-CST)): Likewise. gcc/testsuite/ChangeLog: PR tree-optimization/110134 * gcc.dg/tree-ssa/negneq-1.c: New test. * gcc.dg/tree-ssa/negneq-2.c: New test. * gcc.dg/tree-ssa/negneq-3.c: New test. * gcc.dg/tree-ssa/negneq-4.c: New test.
2023-06-06libiberty: writeargv: Simplify function error mode.Costas Argyris1-3/+1
You are right, this is also a remnant of the old function design that I completely missed. Here is the follow-up patch for that. Thanks for pointing it out. Costas On Tue, 6 Jun 2023 at 04:12, Jeff Law <jeffreyalaw@gmail.com> wrote: On 6/5/23 08:37, Costas Argyris via Gcc-patches wrote: > writeargv can be simplified by getting rid of the error exit mode > that was only relevant many years ago when the function used > to open the file descriptor internally. [ ... ] Thanks. I've pushed this to the trunk. You could (as a follow-up) simplify it even further. There's no need for the status variable as far as I can tell. You could just have the final return be "return 0;" instead of "return status;". libiberty/ * argv.c (writeargv): Constant propagate "0" for "status", simplifying the code slightly.
2023-06-06Add match patterns for `a ? onezero : onezero` where one of the two operands ↵Andrew Pinski10-11/+165
are constant This adds a match pattern that are for boolean values that optimizes `a ? onezero : 0` to `a & onezero` and `a ? 1 : onezero` to `a | onezero`. This was reported a few times and I thought I would finally add the match pattern for this. This hits a few times in GCC itself too. Notes on the testcases: * phi-opt-2.c: This now is optimized to `a & b` in phiopt rather than ifcombine * phi-opt-25b.c: The test part that was failing was parity which now gets `x & y` treatment. * ssa-thread-21.c: there is no longer a threading opportunity, so need to disable phiopt. Note PR 109957 is filed for the now missing optimization in that testcase too. gcc/ChangeLog: PR tree-optimization/89263 PR tree-optimization/99069 PR tree-optimization/20083 PR tree-optimization/94898 * match.pd: Add patterns to optimize `a ? onezero : onezero` with one of the operands are constant. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phi-opt-2.c: Adjust the testcase. * gcc.dg/tree-ssa/phi-opt-25b.c: Adjust the testcase. * gcc.dg/tree-ssa/ssa-thread-21.c: Disable phiopt. * gcc.dg/tree-ssa/phi-opt-27.c: New test. * gcc.dg/tree-ssa/phi-opt-28.c: New test. * gcc.dg/tree-ssa/phi-opt-29.c: New test. * gcc.dg/tree-ssa/phi-opt-30.c: New test. * gcc.dg/tree-ssa/phi-opt-31.c: New test. * gcc.dg/tree-ssa/phi-opt-32.c: New test.
2023-06-06Match: zero_one_valued_p should match 0 constants tooAndrew Pinski1-0/+5
While working on `bool0 ? bool1 : bool2` I noticed that zero_one_valued_p does not match on the constant zero as in that case tree_nonzero_bits will return 0 and that is different from 1. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * match.pd (zero_one_valued_p): Match 0 integer constant too.
2023-06-07RISC-V: Fix ICE when include riscv_vector.h with rv64gcvPan Li1-33/+33
This patch would like to fix the incorrect requirement of the vector builtin types for the ZVFH/ZVFHMIN extension. The incorrect requirement will result in the ops mismatch with iterators, and then ICE will be triggered if ZVFH/ZVFHMIN is not given. Sorry for inconviensient. Signed-off-by: Pan Li <pan2.li@intel.com> gcc/ChangeLog: * config/riscv/riscv-vector-builtins-types.def (vfloat32mf2_t): Take RVV_REQUIRE_ELEN_FP_16 as requirement. (vfloat32m1_t): Ditto. (vfloat32m2_t): Ditto. (vfloat32m4_t): Ditto. (vfloat32m8_t): Ditto. (vint16mf4_t): Ditto. (vint16mf2_t): Ditto. (vint16m1_t): Ditto. (vint16m2_t): Ditto. (vint16m4_t): Ditto. (vint16m8_t): Ditto. (vuint16mf4_t): Ditto. (vuint16mf2_t): Ditto. (vuint16m1_t): Ditto. (vuint16m2_t): Ditto. (vuint16m4_t): Ditto. (vuint16m8_t): Ditto. (vint32mf2_t): Ditto. (vint32m1_t): Ditto. (vint32m2_t): Ditto. (vint32m4_t): Ditto. (vint32m8_t): Ditto. (vuint32mf2_t): Ditto. (vuint32m1_t): Ditto. (vuint32m2_t): Ditto. (vuint32m4_t): Ditto. (vuint32m8_t): Ditto.
2023-06-06c++: Add -WnrvoJason Merrill4-2/+61
While looking at PRs about cases where we don't perform the named return value optimization, it occurred to me that it might be useful to have a warning for that. This does not fix PR58487, but might be interesting to people watching it. PR c++/58487 gcc/c-family/ChangeLog: * c.opt: Add -Wnrvo. gcc/ChangeLog: * doc/invoke.texi: Document it. gcc/cp/ChangeLog: * typeck.cc (want_nrvo_p): New. (check_return_expr): Handle -Wnrvo. gcc/testsuite/ChangeLog: * g++.dg/opt/nrv25.C: New test.
2023-06-06c++: enable NRVO from inner block [PR51571]Jason Merrill6-28/+64
Our implementation of the named return value optimization has been limited to variables declared in the outermost block of the function, to avoid needing to handle the case where the variable needs to be destroyed due to going out of scope. PR92407 pointed out a case we were missing, where the variable goes out of scope due to a goto and we were failing to destroy it. It occurred to me that this problem is the flip side of PR33799, where we need to be sure to destroy the return value if a cleanup throws on return; here we want to avoid destroying the return value when exiting the variable's scope on return. We can use the same flag to indicate to both cleanups that we're returning. This implements the guaranteed copy elision specified by P2025 (which is not yet part of the draft standard). PR c++/51571 PR c++/92407 gcc/cp/ChangeLog: * decl.cc (finish_function): Simplify NRV handling. * except.cc (maybe_set_retval_sentinel): Also set if NRV. (maybe_splice_retval_cleanup): Don't add the cleanup region if we don't need it. * semantics.cc (nrv_data): Add simple field. (finalize_nrv): Set it. (finalize_nrv_r): Check it and retval sentinel. * cp-tree.h (finalize_nrv): Adjust declaration. * typeck.cc (check_return_expr): Remove named_labels check. gcc/testsuite/ChangeLog: * g++.dg/opt/nrv23.C: New test.
2023-06-06c++: NRV and goto [PR92407]Jason Merrill2-0/+33
Here our named return value optimization was breaking the required destructor when the goto takes 'a' out of scope. The simplest fix is to disable the optimization in the presence of user labels. We could do better by disabling the optimization only if there is a backward goto across the variable declaration, but we don't currently track that. PR c++/92407 gcc/cp/ChangeLog: * typeck.cc (check_return_expr): Prevent NRV in the presence of named labels. gcc/testsuite/ChangeLog: * g++.dg/opt/nrv22.C: New test.
2023-06-06c++: fix throwing cleanup with labelJason Merrill4-15/+49
While looking at PR92407 I noticed that the expectations of maybe_splice_retval_cleanup weren't being met; an sk_cleanup level was confusing its attempt to recognize the outer block of the function. And even if I fixed the detection, it failed to actually wrap the body of the function because the STATEMENT_LIST it got only had the label, not anything after it. So I moved the call after poplevel does pop_stmt_list on all the sk_cleanup levels. PR c++/33799 gcc/cp/ChangeLog: * except.cc (maybe_splice_retval_cleanup): Change recognition of function body and try scopes. * semantics.cc (do_poplevel): Call it after poplevel. (at_try_scope): New. * cp-tree.h (maybe_splice_retval_cleanup): Adjust. gcc/testsuite/ChangeLog: * g++.dg/eh/return1.C: Add label cases.
2023-06-06c++: fix contracts with NRVJason Merrill2-2/+39
The NRV implementation was blindly replacing the operand of RETURN_EXPR, clobbering anything that check_return_expr might have added on to the actual initialization, such as checking the postcondition. gcc/cp/ChangeLog: * semantics.cc (finalize_nrv_r): [RETURN_EXPR]: Only replace the INIT_EXPR. gcc/testsuite/ChangeLog: * g++.dg/contracts/contracts-post7.C: New test.
2023-06-06c++: add NRV testcase [PR58050]Jason Merrill1-0/+18
This was fixed in GCC 10. PR c++/58050 gcc/testsuite/ChangeLog: * g++.dg/opt/nrv24.C: New test.
2023-06-07PR modula2/110019 Reported line numbers off by 1 when cpp invoked.Gaius Mulley5-10/+56
Fix off by one in m2.flex when the line number is set via cpp. gcc/m2/ChangeLog: PR modula2/110019 * gm2-compiler/SymbolKey.mod (SearchAndDo): Reformatted. (ForeachNodeDo): Reformatted. * gm2-compiler/SymbolTable.mod (AddListify): Join list with "," or "and" if more than one word is in the list. * m2.flex: Remove -1 from atoi(yytext) line number. gcc/testsuite/ChangeLog: PR modula2/110019 * gm2/cpp/fail/cpp-fail.exp: New test. * gm2/cpp/fail/foocpp.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-06-07Daily bump.GCC Administrator10-1/+406
2023-06-07Add RTX codes for BITREVERSE and COPYSIGN.Roger Sayle3-2/+64
An analysis of backend UNSPECs reveals that two of the most common UNSPECs across target backends are for copysign and bit reversal. This patch adds RTX codes for these expressions to allow their representation to be standardized, and them to optimized by the middle-end RTL optimizers. 2023-06-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * doc/rtl.texi (bitreverse, copysign): Document new RTX codes. * rtl.def (BITREVERSE, COPYSIGN): Define new RTX codes. * simplify-rtx.cc (simplify_unary_operation_1): Optimize NOT (BITREVERSE x) as BITREVERSE (NOT x). Optimize POPCOUNT (BITREVERSE x) as POPCOUNT x. Optimize PARITY (BITREVERSE x) as PARITY x. Optimize BITREVERSE (BITREVERSE x) as x. (simplify_const_unary_operation) <case BITREVERSE>: Evaluate BITREVERSE of a constant integer at compile-time. (simplify_binary_operation_1) <case COPYSIGN>: Optimize COPY_SIGN (x, x) as x. Optimize COPYSIGN (x, C) as ABS x or NEG (ABS x) for constant C. Optimize COPYSIGN (ABS x, y) and COPYSIGN (NEG x, y) as COPYSIGN (x, y). Optimize COPYSIGN (x, ABS y) as ABS x. Optimize COPYSIGN (COPYSIGN (x, y), z) as COPYSIGN (x, z). Optimize COPYSIGN (x, COPYSIGN (y, z)) as COPYSIGN (x, z). (simplify_const_binary_operation): Evaluate COPYSIGN of constant arguments at compile-time.
2023-06-06reload1: Change return type of predicate function from int to boolUros Bizjak2-3/+3
gcc/ChangeLog: * rtl.h (function_invariant_p): Change return type from int to bool. * reload1.cc (function_invariant_p): Change return type from int to bool and adjust function body accordingly.
2023-06-06libgomp: plugin-gcn - support 'unified_address'Tobias Burnus2-6/+7
Effectively, for GCN (as for nvptx) there is a common address space between host and device, whether being accessible or not. Thus, this commit permits to use 'omp requires unified_address' with GCN devices. (nvptx accepts this requirement since r13-3460-g131d18e928a3ea.) libgomp/ * plugin/plugin-gcn.c (GOMP_OFFLOAD_get_num_devices): Regard unified_address requirement as supported. * libgomp.texi (OpenMP 5.0, AMD Radeon, nvptx): Remove 'unified_address' from the not-supported requirements.
2023-06-06libstdc++: Update list of known symbol versions for abi-checkJonathan Wakely1-5/+2
Add the recently added CXXABI_1.3.15 version. Also remove two "frozen" versions from the latestp list, as no more symbols should be added to those now. libstdc++-v3/ChangeLog: * testsuite/util/testsuite_abi.cc (check_version): Add CXXABI_1.3.15 symver and make it the latestp. Remove GLIBCXX_IEEE128_3.4.31 and GLIBCXX_LDBL_3.4.31 from latestp.
2023-06-06libstdc++: Make std::numeric_limits<__float128> more portable [PR104772]Jonathan Wakely2-17/+80
This redefines std::numeric_limits<__float128> so that it works with non-GCC compilers. The previous definition didn't work with Clang, due to it not supporting __builtin_high_valq, __builtin_nanq, and __builtin_nansq. It also didn't work in strict modes, due to using Q literal suffixes. The new definition uses the Q suffixes when supported, or calculates the correct values using __float128 arithmetic from double values. Ideally the values would be defined as hexadecimal-floating-point-literals, but that won't work for C++14 and older. The only member that can't be defined this way is signaling_NaN() which still requires a built-in. If __builtin_nansq is not supported, try to use __builtin_nansf128 (with a possibly-redundant bit_cast) and if that isn't supported, return a quiet NaN and define has_signaling_NaN and is_iec754 to be false. libstdc++-v3/ChangeLog: PR libstdc++/104772 * include/std/limits: (numeric_limits<__float128>): Define for __STRICT_ANSI__ as well. * testsuite/18_support/numeric_limits/128bit.cc: Remove check for __STRICT_ANSI__. Co-authored-by: Jakub Jelinek <jakub@redhat.com>
2023-06-06libstdc++: Use AS_IF in configure.acJonathan Wakely2-578/+590
This ensures that anything that depends on AC_REQUIRE is hoisted out of the conditional block. The always-false test x"long_double_math_on_this_cpu" = x"yes" condition is not altered by this commit, only changed to use the AS_IF syntax. libstdc++-v3/ChangeLog: * configure.ac: Use AS_IF. * configure: Regenerate.
2023-06-06RISC-V: Add RVV vwmacc/vwmaccu/vwmaccsu combine lowering optmizationJuzhe-Zhong8-0/+346
Fix according to comments from Robin of V1 patch. This patch add combine optimization for following case: __attribute__ ((noipa)) void vwmaccsu (int16_t *__restrict dst, int8_t *__restrict a, uint8_t *__restrict b, int n) { for (int i = 0; i < n; i++) dst[i] += (int16_t) a[i] * (int16_t) b[i]; } Before this patch: ... vsext.vf2 vzext.vf2 vmadd.vv .. After this patch: ... vwmaccsu.vv ... gcc/ChangeLog: * config/riscv/autovec-opt.md (*<optab>_fma<mode>): New pattern. (*single_<optab>mult_plus<mode>): Ditto. (*double_<optab>mult_plus<mode>): Ditto. (*sign_zero_extend_fma): Ditto. (*zero_sign_extend_fma): Ditto. * config/riscv/riscv-protos.h (enum insn_type): New enum. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/widen-8.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-9.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-5.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen-complicate-6.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-8.c: New test. * gcc.target/riscv/rvv/autovec/widen/widen_run-9.c: New test.
2023-06-06openmp: Add support for the 'present' modifierTobias Burnus32-64/+1064
This implements support for the OpenMP 5.1 'present' modifier, which can be used in map clauses in the 'target', 'target data', 'target data enter' and 'target data exit' constructs, and in the 'to' and 'from' clauses of the 'target update' construct. It is also supported in defaultmap. The modifier triggers a fatal runtime error if the data specified by the clause is not already present on the target device. It can also be combined with 'always' in map clauses. 2023-06-06 Kwok Cheung Yeung <kcy@codesourcery.com> Tobias Burnus <tobias@codesourcery.com> gcc/c/ * c-parser.cc (c_parser_omp_clause_defaultmap, c_parser_omp_clause_map): Parse 'present'. (c_parser_omp_clause_to, c_parser_omp_clause_from): Remove. (c_parser_omp_clause_from_to): New; parse to/from clauses with optional present modifer. (c_parser_omp_all_clauses): Update call. (c_parser_omp_target_data, c_parser_omp_target_enter_data, c_parser_omp_target_exit_data): Handle new map enum values for 'present' mapping. gcc/cp/ * parser.cc (cp_parser_omp_clause_defaultmap, cp_parser_omp_clause_map): Parse 'present'. (cp_parser_omp_clause_from_to): New; parse to/from clauses with optional 'present' modifier. (cp_parser_omp_all_clauses): Update call. (cp_parser_omp_target_data, cp_parser_omp_target_enter_data, cp_parser_omp_target_exit_data): Handle new enum value for 'present' mapping. * semantics.cc (finish_omp_target): Likewise. gcc/fortran/ * dump-parse-tree.cc (show_omp_namelist): Display 'present' map modifier. (show_omp_clauses): Display 'present' motion modifier for 'to' and 'from' clauses. * gfortran.h (enum gfc_omp_map_op): Add entries with 'present' modifiers. (struct gfc_omp_namelist): Add 'present_modifer'. * openmp.cc (gfc_match_motion_var_list): New, handles optional 'present' modifier for to/from clauses. (gfc_match_omp_clauses): Call it for to/from clauses; parse 'present' in defaultmap and map clauses. (resolve_omp_clauses): Allow 'present' modifiers on 'target', 'target data', 'target enter' and 'target exit' directives. * trans-openmp.cc (gfc_trans_omp_clauses): Apply 'present' modifiers to tree node for 'map', 'to' and 'from' clauses. Apply 'present' for defaultmap. gcc/ * gimplify.cc (omp_notice_variable): Apply GOVD_MAP_ALLOC_ONLY flag and defaultmap flags if the defaultmap has GOVD_MAP_FORCE_PRESENT flag set. (omp_get_attachment): Handle map clauses with 'present' modifier. (omp_group_base): Likewise. (gimplify_scan_omp_clauses): Reorder present maps to come first. Set GOVD flags for present defaultmaps. (gimplify_adjust_omp_clauses_1): Set map kind for present defaultmaps. * omp-low.cc (scan_sharing_clauses): Handle 'always, present' map clauses. (lower_omp_target): Handle map clauses with 'present' modifier. Handle 'to' and 'from' clauses with 'present'. * tree-core.h (enum omp_clause_defaultmap_kind): Add OMP_CLAUSE_DEFAULTMAP_PRESENT defaultmap kind. * tree-pretty-print.cc (dump_omp_clause): Handle 'map', 'to' and 'from' clauses with 'present' modifier. Handle present defaultmap. * tree.h (OMP_CLAUSE_MOTION_PRESENT): New #define. include/ * gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_5): New. (GOMP_MAP_FLAG_FORCE): Redefine. (GOMP_MAP_FLAG_PRESENT, GOMP_MAP_FLAG_ALWAYS_PRESENT): New. (enum gomp_map_kind): Add map kinds with 'present' modifiers. (GOMP_MAP_COPY_TO_P, GOMP_MAP_COPY_FROM_P): Evaluate to true for map variants with 'present' (GOMP_MAP_ALWAYS_TO_P, GOMP_MAP_ALWAYS_FROM_P): Evaluate to true for map variants with 'always, present' modifiers. (GOMP_MAP_ALWAYS): Redefine. (GOMP_MAP_FORCE_P, GOMP_MAP_PRESENT_P): New. libgomp/ * libgomp.texi (OpenMP 5.1 Impl. status): Set 'present' support for defaultmap to 'Y', add 'Y' entry for 'present' on to/from/map clauses. * target.c (gomp_to_device_kind_p): Add map kinds with 'present' modifier. (gomp_map_vars_existing): Use new GOMP_MAP_FORCE_P macro. (gomp_map_vars_internal, gomp_update, gomp_target_rev): Emit runtime error if memory region not present. * testsuite/libgomp.c-c++-common/target-present-1.c: New test. * testsuite/libgomp.c-c++-common/target-present-2.c: New test. * testsuite/libgomp.c-c++-common/target-present-3.c: New test. * testsuite/libgomp.fortran/target-present-1.f90: New test. * testsuite/libgomp.fortran/target-present-2.f90: New test. * testsuite/libgomp.fortran/target-present-3.f90: New test. gcc/testsuite/ * c-c++-common/gomp/map-6.c: Update dg-error, extend to test for duplicated 'present' and extend scan-dump tests for 'present'. * gfortran.dg/gomp/defaultmap-1.f90: Update dg-error. * gfortran.dg/gomp/map-7.f90: Extend parse and dump test for 'present'. * gfortran.dg/gomp/map-8.f90: Extend for duplicate 'present' modifier checking. * c-c++-common/gomp/defaultmap-4.c: New test. * c-c++-common/gomp/map-9.c: New test. * c-c++-common/gomp/target-update-1.c: New test. * gfortran.dg/gomp/defaultmap-8.f90: New test. * gfortran.dg/gomp/map-11.f90: New test. * gfortran.dg/gomp/map-12.f90: New test. * gfortran.dg/gomp/target-update-1.f90: New test.
2023-06-06libstdc++: Avoid vector casts while still avoiding PR90424Matthias Kretz1-25/+15
Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: PR libstdc++/109822 * include/experimental/bits/simd_builtin.h (_S_store): Rewrite to avoid casts to other vector types. Implement store as succession of power-of-2 sized memcpy to avoid PR90424.
2023-06-06libstdc++: Replace use of incorrect non-temporal storeMatthias Kretz2-37/+7
The call to the base implementation sometimes didn't find a matching signature because the _Abi parameter of _SimdImpl* was "wrong" after conversion. It has to call into <new ABI tag>::_SimdImpl instead of the current ABI tag's _SimdImpl. This also reduces the number of possible template instantiations. Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: PR libstdc++/110054 * include/experimental/bits/simd_builtin.h (_S_masked_store): Call into deduced ABI's SimdImpl after conversion. * include/experimental/bits/simd_x86.h (_S_masked_store_nocvt): Don't use _mm_maskmoveu_si128. Use the generic fall-back implementation. Also fix masked stores without SSE2, which were not doing anything before.
2023-06-06rs6000: genfusion: Delete dead codeSegher Boessenkool1-3/+0
2023-06-06 Segher Boessenkool <segher@kernel.crashing.org> * config/rs6000/genfusion.pl: Delete some dead code.
2023-06-06rs6000: genfusion: Rewrite load/compare codeSegher Boessenkool1-82/+103
This makes the code more readable, more digestible, more maintainable, more extensible. That kind of thing. It does that by pulling things apart a bit, but also making what stays together more cohesive lumps. The original function was a bunch of loops and early-outs, and then quite a bit of stuff done per iteration, with the iterations essentially independent of each other. This patch moves the stuff done for one iteration to a new _one function. The second big thing is the stuff printed to the .md file is done in "here documents" now, which is a lot more readable than having to quote and escape and double-escape pieces of text. Whitespace inside the here-document is significant (will be printed as-is), which is a bit awkward sometimes, or might take some getting used to, but it is also one of the benefits of using them. Local variables are declared at first use (or close to first use). There also shouldn't be many at all, often you can write easier to read and manage code by omitting to name something that is hard to name in the first place. Finally some things are done in more typical, more modern, and tighter Perl style, for example REs in "if"s or "qw" for lists of constants. 2023-06-06 Segher Boessenkool <segher@kernel.crashing.org> * config/rs6000/genfusion.pl (gen_ld_cmpi_p10_one): New, rewritten and split out from... (gen_ld_cmpi_p10): ... this.
2023-06-06libstdc++: Protect against macrosMatthias Kretz1-4/+4
Signed-off-by: Matthias Kretz <m.kretz@gsi.de> libstdc++-v3/ChangeLog: * include/experimental/bits/simd.h (__bit_cast): Use __gnu__::__vector_size__ instead of gnu::vector_size.
2023-06-06libstdc++: Fix ambiguous expression in std::array<T, 0>::front() [PR110139]Jonathan Wakely2-5/+10
For 32-bit targets using -pedantic (or using Clang) makes the expression _M_elems[0] ambiguous. The overloaded operator[] that we want to call has a size_t parameter, but 0 is type ptrdiff_t for many ILP32 targets, so using the implicit conversion from _M_elems to T* and then subscripting that is also viable. Change the 0 to (size_type)0 and also make the conversion to T* explicit, so that's it's not viable here. The latter change requires a static_cast in data() where we really do want to convert _M_elems to a pointer. libstdc++-v3/ChangeLog: PR libstdc++/110139 * include/std/array (__array_traits<T, 0>::operator T*()): Make conversion operator explicit. (array::front): Use size_type as subscript operand. (array::data): Use static_cast to make conversion explicit. * testsuite/23_containers/array/element_access/110139.cc: New test.
2023-06-06libstdc++: Do not assume existence of char8_t codecvt facetJoseph Faulls1-3/+0
It is not required that codecvt<char8_t, char, mbstate_t> facet be supported by the locale, nor is it added as part of the default locale. This can lead to dangerous behaviour when static_cast. libstdc++-v3/ChangeLog: * include/bits/locale_classes.tcc: Remove check for codecvt<char8_t, char, mbstate_t> facet.
2023-06-06libstdc++: Use close-on-exec for file descriptors in filesystem::copy_fileJonathan Wakely1-6/+7
libstdc++-v3/ChangeLog: * src/filesystem/ops-common.h (do_copy_file) [O_CLOEXEC]: Set close-on-exec flag on file descriptors.
2023-06-06libstdc++: Make std::filesystem::copy_file work for procfs [PR108178]Jonathan Wakely2-5/+43
The size reported by stat is always zero for some special files such as those under /proc, which means the current copy_file implementation thinks there is nothing to copy. Instead of trusting the stat value, try to read a character from a streambuf and check for EOF. libstdc++-v3/ChangeLog: PR libstdc++/108178 * src/filesystem/ops-common.h (do_copy_file): Check for empty files by trying to read a character. * testsuite/27_io/filesystem/operations/copy_file_108178.cc: New test.
2023-06-06libstdc++: Use copy_file_range for filesystem::copy_fileJannik Glückert4-0/+141
copy_file_range is a recent-ish syscall for copying files. It is similar to sendfile but allows filesystem-specific optimizations. Common are: Reflinks: BTRFS, XFS, ZFS (does not implement the syscall yet) Server-side copy: NFS, SMB, Ceph If copy_file_range is not available for the given files, fall back to sendfile / userspace copy. libstdc++-v3/ChangeLog: * acinclude.m4 (_GLIBCXX_USE_COPY_FILE_RANGE): Define. * config.h.in: Regenerate. * configure: Regenerate. * src/filesystem/ops-common.h (copy_file_copy_file_range): Define new function. (do_copy_file): Use it. Signed-off-by: Jannik Glückert <jannik.glueckert@gmail.com>
2023-06-06libstdc++: Also use sendfile for big filesJannik Glückert4-84/+170
We were previously only using sendfile for files smaller than 2GB, as sendfile needs to be called repeatedly for files bigger than that. Some quick numbers, copying a 16GB file, average of 10 repetitions: old: real: 13.4s user: 0.14s sys : 7.43s new: real: 8.90s user: 0.00s sys : 3.68s libstdc++-v3/ChangeLog: * acinclude.m4 (_GLIBCXX_HAVE_LSEEK): Define. * config.h.in: Regenerate. * configure: Regenerate. * src/filesystem/ops-common.h (copy_file_sendfile): Define new function for sendfile logic. Loop to support large files. Skip zero-length files. (do_copy_file): Use it. Signed-off-by: Jannik Glückert <jannik.glueckert@gmail.com>
2023-06-06rs6000: Remove duplicate expression [PR106907]Jeevitha Palanisamy1-1/+0
PR106907 has few warnings spotted from cppcheck. In that addressing duplicate expression issue here. Here the same expression is used twice in logical AND(&&) operation which result in same result so removing that. 2023-06-06 Jeevitha Palanisamy <jeevitha@linux.ibm.com> gcc/ PR target/106907 * config/rs6000/rs6000.cc (vec_const_128bit_to_bytes): Remove duplicate expression.
2023-06-06aarch64: Improve representation of vpaddd intrinsicsKyrylo Tkachov4-14/+3
The aarch64_addpdi pattern is redundant as the reduc_plus_scal_<mode> pattern can already generate the required form of the ADDP instruction, and is mostly folded to GIMPLE early on so can benefit from more optimisations. Though it turns out that we were missing the folding for the unsigned variants. This patch adds that and wires up the vpaddd_u64 and vpaddd_s64 intrinsics through the above pattern instead so that we can remove a redundant pattern and get more optimisation earlier. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (aarch64_general_gimple_fold_builtin): Handle unsigned reduc_plus_scal_ builtins. * config/aarch64/aarch64-simd-builtins.def (addp): Delete DImode instances. * config/aarch64/aarch64-simd.md (aarch64_addpdi): Delete. * config/aarch64/arm_neon.h (vpaddd_s64): Reimplement with __builtin_aarch64_reduc_plus_scal_v2di. (vpaddd_u64): Reimplement with __builtin_aarch64_reduc_plus_scal_v2di_uu.
2023-06-06aarch64: Reimplement URSHR,SRSHR patterns with standard RTL codesKyrylo Tkachov2-7/+93
Having converted the patterns for the URSRA,SRSRA instructions to standard RTL codes we can also easily convert the non-accumulating forms URSHR,SRSHR. This patch does that, reusing the various helpers and predicates from that patch in a straightforward way. This allows GCC to perform the optimisations in the testcase, matching what Clang does. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur>shr_n<mode>): Delete. (aarch64_<sra_op>rshr_n<mode><vczle><vczbe>_insn): New define_insn. (aarch64_<sra_op>rshr_n<mode>): New define_expand. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/vrshr_1.c: New test.
2023-06-06aarch64: Simplify SHRN, RSHRN expanders and patternsKyrylo Tkachov2-81/+14
Now that we've got the <vczle><vczbe> annotations we can get rid of explicit !BYTES_BIG_ENDIAN and BYTES_BIG_ENDIAN patterns for the narrowing shift instructions. This allows us to clean up the expanders as well. Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_shrn<mode>_insn_le): Delete. (aarch64_shrn<mode>_insn_be): Delete. (*aarch64_<srn_op>shrn<mode>_vect): Rename to... (*aarch64_<srn_op>shrn<mode><vczle><vczbe>): ... This. (aarch64_shrn<mode>): Remove reference to the above deleted patterns. (aarch64_rshrn<mode>_insn_le): Delete. (aarch64_rshrn<mode>_insn_be): Delete. (aarch64_rshrn<mode><vczle><vczbe>_insn): New define_insn. (aarch64_rshrn<mode>): Remove references to the above deleted patterns. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/pr99195_5.c: Add testing for shrn_n, rshrn_n intrinsics.
2023-06-06aarch64: Improve representation of ADDLV instructionsKyrylo Tkachov6-11/+168
We've received requests to optimise the attached intrinsics testcase. We currently generate: foo_1: uaddlp v0.4s, v0.8h uaddlv d31, v0.4s fmov x0, d31 ret foo_2: uaddlp v0.4s, v0.8h addv s31, v0.4s fmov w0, s31 ret foo_3: saddlp v0.4s, v0.8h addv s31, v0.4s fmov w0, s31 ret The widening pair-wise addition addlp instructions can be omitted if we're just doing an ADDV afterwards. Making this optimisation would be quite simple if we had a standard RTL PLUS vector reduction code. As we don't, we can use UNSPEC_ADDV as a stand in. This patch expresses the SADDLV and UADDLV instructions as an UNSPEC_ADDV over a widened input, thus removing the need for separate UNSPEC_SADDLV and UNSPEC_UADDLV codes. To optimise the testcases involved we add two splitters that match a vector addition where all participating elements are taken and widened from the same vector and then fed into an UNSPEC_ADDV. In that case we can just remove the vector PLUS and just emit the simple RTL for SADDLV/UADDLV. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-protos.h (aarch64_parallel_select_half_p): Define prototype. (aarch64_pars_overlap_p): Likewise. * config/aarch64/aarch64-simd.md (aarch64_<su>addlv<mode>): Express in terms of UNSPEC_ADDV. (*aarch64_<su>addlv<VDQV_L:mode>_ze<GPI:mode>): Likewise. (*aarch64_<su>addlv<mode>_reduction): Define. (*aarch64_uaddlv<mode>_reduction_2): Likewise. * config/aarch64/aarch64.cc (aarch64_parallel_select_half_p): Define. (aarch64_pars_overlap_p): Likewise. * config/aarch64/iterators.md (UNSPEC_SADDLV, UNSPEC_UADDLV): Delete. (VQUADW): New mode attribute. (VWIDE2X_S): Likewise. (USADDLV): Delete. (su): Delete handling of UNSPEC_SADDLV, UNSPEC_UADDLV. * config/aarch64/predicates.md (vect_par_cnst_select_half): Define. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/addlv_1.c: New test.
2023-06-06middle-end/110055 - avoid CLOBBERing static variablesRichard Biener2-1/+19
The gimplifier can elide initialized constant automatic variables to static storage in which case TARGET_EXPR gimplification needs to avoid emitting a CLOBBER for them since their lifetime is no longer limited. Failing to do so causes spurious dangling-pointer diagnostics on the added testcase for some targets. PR middle-end/110055 * gimplify.cc (gimplify_target_expr): Do not emit CLOBBERs for variables which have static storage duration after gimplifying their initializers. * g++.dg/warn/Wdangling-pointer-pr110055.C: New testcase.
2023-06-06tree-optimization/109143 - improve PTA compile timeRichard Biener1-13/+25
The following improves solution_set_expand to require one less iteration over the bitmap and avoid changing the bitmap we iterate over. Plus we handle adjacent subvars in the ID space (the common case) and use bitmap_set_range. This cuts a bit less than 10% off the PTA time from the testcase in the PR. PR tree-optimization/109143 * tree-ssa-structalias.cc (solution_set_expand): Avoid one bitmap iteration and optimize bit range setting.