aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2023-04-19Remove special-cased edges when solving copiesRichard Biener1-11/+14
The following makes sure to remove the copy edges we ignore or need to special-case only once. * tree-ssa-structalias.cc (solve_graph): Remove self-copy edges, remove edges from escaped after special-casing them.
2023-04-19Fix do_sd_constraint escape special casingRichard Biener1-1/+1
The following fixes the escape special casing to test the proper variable IDs. * tree-ssa-structalias.cc (do_sd_constraint): Fixup escape special casing.
2023-04-19Remove senseless store in do_sd_constraintRichard Biener1-4/+1
* tree-ssa-structalias.cc (do_sd_constraint): Do not write to the LHS varinfo solution member.
2023-04-19Avoid non-unified nodes on the topological sorting for PTA solvingRichard Biener1-2/+3
Since we do not update successor edges when merging nodes we have to deal with this in the users. The following avoids putting those on the topo order vector. * tree-ssa-structalias.cc (topo_visit): Look at the real destination of edges.
2023-04-19tree-optimization/44794 - avoid excessive RTL unrolling on epiloguesRichard Biener1-0/+6
The following adjusts tree_[transform_and_]unroll_loop to set an upper bound on the number of iterations on the epilogue loop it creates. For the testcase at hand which involves array prefetching this avoids applying RTL unrolling to them when -funroll-loops is specified. Other users of this API includes predictive commoning and unroll-and-jam. PR tree-optimization/44794 * tree-ssa-loop-manip.cc (tree_transform_and_unroll_loop): If an epilogue loop is required set its iteration upper bound.
2023-04-19LoongArch: Improve cpymemsi expansion [PR109465]Xi Ruoyao7-49/+91
We'd been generating really bad block move sequences which is recently complained by kernel developers who tried __builtin_memcpy. To improve it: 1. Take the advantage of -mno-strict-align. When it is set, set mode size to UNITS_PER_WORD regardless of the alignment. 2. Half the mode size when (block size) % (mode size) != 0, instead of falling back to ld.bu/st.b at once. 3. Limit the length of block move sequence considering the number of instructions, not the size of block. When -mstrict-align is set and the block is not aligned, the old size limit for straight-line implementation (64 bytes) was definitely too large (we don't have 64 registers anyway). Change since v1: add a comment about the calculation of num_reg. gcc/ChangeLog: PR target/109465 * config/loongarch/loongarch-protos.h (loongarch_expand_block_move): Add a parameter as alignment RTX. * config/loongarch/loongarch.h: (LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER): Remove. (LARCH_MAX_MOVE_BYTES_STRAIGHT): Remove. (LARCH_MAX_MOVE_OPS_PER_LOOP_ITER): Define. (LARCH_MAX_MOVE_OPS_STRAIGHT): Define. (MOVE_RATIO): Use LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER. * config/loongarch/loongarch.cc (loongarch_expand_block_move): Take the alignment from the parameter, but set it to UNITS_PER_WORD if !TARGET_STRICT_ALIGN. Limit the length of straight-line implementation with LARCH_MAX_MOVE_OPS_STRAIGHT instead of LARCH_MAX_MOVE_BYTES_STRAIGHT. (loongarch_block_move_straight): When there are left-over bytes, half the mode size instead of falling back to byte mode at once. (loongarch_block_move_loop): Limit the length of loop body with LARCH_MAX_MOVE_OPS_PER_LOOP_ITER instead of LARCH_MAX_MOVE_BYTES_PER_LOOP_ITER. * config/loongarch/loongarch.md (cpymemsi): Pass the alignment to loongarch_expand_block_move. gcc/testsuite/ChangeLog: PR target/109465 * gcc.target/loongarch/pr109465-1.c: New test. * gcc.target/loongarch/pr109465-2.c: New test. * gcc.target/loongarch/pr109465-3.c: New test.
2023-04-19LoongArch: Improve GAR store for va_listXi Ruoyao2-1/+27
LoongArch backend used to save all GARs for a function with variable arguments. But sometimes a function only accepts variable arguments for a purpose like C++ function overloading. For example, POSIX defines open() as: int open(const char *path, int oflag, ...); But only two forms are actually used: int open(const char *pathname, int flags); int open(const char *pathname, int flags, mode_t mode); So it's obviously a waste to save all 8 GARs in open(). We can use the cfun->va_list_gpr_size field set by the stdarg pass to only save the GARs necessary to be saved. If the va_list escapes (for example, in fprintf() we pass it to vfprintf()), stdarg would set cfun->va_list_gpr_size to 255 so we don't need a special case. With this patch, only one GAR ($a2/$r6) is saved in open(). Ideally even this stack store should be omitted too, but doing so is not trivial and AFAIK there are no compilers (for any target) performing the "ideal" optimization here, see https://godbolt.org/z/n1YqWq9c9. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk (GCC 14 or now)? gcc/ChangeLog: * config/loongarch/loongarch.cc (loongarch_setup_incoming_varargs): Don't save more GARs than cfun->va_list_gpr_size / UNITS_PER_WORD. gcc/testsuite/ChangeLog: * gcc.target/loongarch/va_arg.c: New test.
2023-04-19Avoid unnecessary epilogues from tree_unroll_loopRichard Biener1-1/+1
The following fixes the condition determining whether we need an epilogue. * tree-ssa-loop-manip.cc (determine_exit_conditions): Fix no epilogue condition.
2023-04-19Simplify gimple_assign_loadRichard Biener2-17/+21
The following simplifies and outlines gimple_assign_load. In particular it is not necessary to get at the base of the possibly loaded expression but just handle the case of a single handled component wrapping a non-memory operand. * gimple.h (gimple_assign_load): Outline... * gimple.cc (gimple_assign_load): ... here. Avoid get_base_address and instead just strip the outermost handled component, treating a remaining handled component as load.
2023-04-19aarch64: Delete __builtin_aarch64_neg* builtins and their useKyrylo Tkachov2-4/+1
I don't think we need to keep the __builtin_aarch64_neg* builtins around. They are only used once in the vnegh_f16 intrinsic in arm_fp16.h and I AFAICT it was added this way only for the sake of orthogonality in https://gcc.gnu.org/g:d7f33f07d88984cbe769047e3d07fc21067fbba9 We already use normal "-" negation in the other vneg* intrinsics, so do so here as well. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-simd-builtins.def (neg): Delete builtins definition. * config/aarch64/arm_fp16.h (vnegh_f16): Reimplement using normal negation.
2023-04-19tree-vect-patterns: Improve __builtin_{clz,ctz,ffs}ll vectorization [PR109011]Jakub Jelinek2-25/+171
For __builtin_popcountll tree-vect-patterns.cc has vect_recog_popcount_pattern, which improves the vectorized code. Without that the vectorization is always multi-type vectorization in the loop (at least int and long long types) where we emit two .POPCOUNT calls with long long arguments and int return value and then widen to long long, so effectively after vectorization do the V?DImode -> V?DImode popcount twice, then pack the result into V?SImode and immediately unpack. The following patch extends that handling to __builtin_{clz,ctz,ffs}ll builtins as well (as long as there is an optab for them; more to come laster). x86 can do __builtin_popcountll with -mavx512vpopcntdq, __builtin_clzll with -mavx512cd, ppc can do __builtin_popcountll and __builtin_clzll with -mpower8-vector and __builtin_ctzll with -mpower9-vector, s390 can do __builtin_{popcount,clz,ctz}ll with -march=z13 -mzarch (i.e. VX). 2023-04-19 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109011 * tree-vect-patterns.cc (vect_recog_popcount_pattern): Rename to ... (vect_recog_popcount_clz_ctz_ffs_pattern): ... this. Handle also CLZ, CTZ and FFS. Remove vargs variable, use gimple_build_call_internal rather than gimple_build_call_internal_vec. (vect_vect_recog_func_ptrs): Adjust popcount entry. * gcc.dg/vect/pr109011-1.c: New test.
2023-04-19dse: Use SUBREG_REG for copy_to_mode_reg in DSE replace_read for ↵Jakub Jelinek1-1/+13
WORD_REGISTER_OPERATIONS targets [PR109040] While we've agreed this is not the right fix for the PR109040 bug, the patch clearly improves generated code (at least on the testcase from the PR), so I'd like to propose this as optimization heuristics improvement for GCC 14. 2023-04-19 Jakub Jelinek <jakub@redhat.com> PR target/109040 * dse.cc (replace_read): If read_reg is a SUBREG of a word mode REG, for WORD_REGISTER_OPERATIONS copy SUBREG_REG of it into a new REG rather than the SUBREG.
2023-04-19[aarch64] Use wzr/xzr for assigning 0 to vector element.Prathamesh Kulkarni2-0/+54
gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): New pattern. gcc/testsuite/ChangeLog: * gcc.target/aarch64/vec-set-zero.c: New test.
2023-04-19aarch64: PR target/108840 Simplify register shift RTX costs and eliminate ↵Kyrylo Tkachov2-52/+49
shift amount masking In this PR we fail to eliminate explicit &31 operations for variable shifts such as in: void bar (int x[3], int y) { x[0] <<= (y & 31); x[1] <<= (y & 31); x[2] <<= (y & 31); } This is rejected by RTX costs that end up giving too high a cost for: (set (reg:SI 96) (ashift:SI (reg:SI 98) (subreg:QI (and:SI (reg:SI 99) (const_int 31 [0x1f])) 0))) There is code to handle the AND-31 case in rtx costs, but it gets confused by the subreg. It's easy enough to fix by looking inside the subreg when costing the expression. While doing that I noticed that the ASHIFT case and the other shift-like cases are almost identical and we should just merge them. This code will only be used for valid insns anyway, so the code after this patch should do the Right Thing (TM) for all such shift cases. With this patch there are no more "and wn, wn, 31" instructions left in the testcase. Bootstrapped and tested on aarch64-none-linux-gnu. PR target/108840 gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_rtx_costs): Merge ASHIFT and ROTATE, ROTATERT, LSHIFTRT, ASHIFTRT cases. Handle subregs in op1. gcc/testsuite/ChangeLog: * gcc.target/aarch64/pr108840.c: New test.
2023-04-19rtl-optimization/109237 - quadraticness in delete_trivially_dead_insnsRichard Biener1-15/+24
The following addresses quadraticness in processing debug insns in delete_trivially_dead_insns and insn_live_p by using TREE_VISITED on the INSN_VAR_LOCATION_DECL to indicate a later debug bind with the same decl and no intervening real insn or debug marker. That gets rid of the NEXT_INSN walk in insn_live_p in favor of first clearing TREE_VISITED in the first loop over insn and the book-keeping of decls we set the bit since we need to clear them when visiting a real or debug marker insn. That improves the time spent in delete_trivially_dead_insns from 10.6s to 2.2s for the testcase. PR rtl-optimization/109237 * cse.cc (insn_live_p): Remove NEXT_INSN walk, instead check TREE_VISITED on INSN_VAR_LOCATION_DECL. (delete_trivially_dead_insns): Maintain TREE_VISITED on active debug bind INSN_VAR_LOCATION_DECL.
2023-04-19rtl-optimization/109237 - speedup bb_is_just_returnRichard Biener1-2/+2
For the testcase bb_is_just_return is on top of the profile, changing it to walk BB insns backwards puts it off the profile. That's because in the forward walk you have to process possibly many debug insns but in a backward walk you very likely run into control insns first. PR rtl-optimization/109237 * cfgcleanup.cc (bb_is_just_return): Walk insns backwards.
2023-04-19testsuite: Fix up pr109524.C for -std=c++23 [PR109524]Jakub Jelinek1-1/+1
This testcase was reduced such that it isn't valid C++23, so with my usual testing with GXX_TESTSUITE_STDS=98,11,14,17,20,2b it fails: FAIL: g++.dg/pr109524.C -std=gnu++2b (test for excess errors) .../gcc/testsuite/g++.dg/pr109524.C: In function 'nn hh(nn)': .../gcc/testsuite/g++.dg/pr109524.C:35:12: error: cannot bind non-const lvalue reference of type 'nn&' to an rvalue of type 'nn' .../gcc/testsuite/g++.dg/pr109524.C:17:6: note: initializing argument 1 of 'nn::nn(nn&)' The following patch fixes that and I've verified it doesn't change anything on what the test was testing, it still ICEs in r13-7198 and passes in r13-7203, now in all language modes (except for 98 where it is intentionally UNSUPPORTED). 2023-04-19 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109524 * g++.dg/pr109524.C (nn::nn): Change argument type from nn & to const nn &.
2023-04-19install.texi: Document --enable-decimal-float for AArch64Christophe Lyon1-7/+8
When I committed the patches to enable support for DFP on AArch64, I forgot to update the installation documentation. This patch adds AArch64 as needed (same as i386/x86_64). 2023-04-17 Christophe Lyon <christophe.lyon@arm.com> gcc/ * doc/install.texi (enable-decimal-float): Add AArch64.
2023-04-19Check hard_regno_mode_ok before setting lowest memory move cost for the mode ↵liuhongt1-0/+4
with different reg classes. There's a potential performance issue when backend returns some unreasonable value for the mode which can be never be allocate with reg class. gcc/ChangeLog: PR rtl-optimization/109351 * ira.cc (setup_class_subset_and_memory_move_costs): Check hard_regno_mode_ok before setting lowest memory move cost for the mode with different reg classes.
2023-04-19Daily bump.GCC Administrator4-1/+339
2023-04-18doc: remove stray @golJason Merrill1-1/+1
@gol was removed in r13-6778, new doc additions can't use it. gcc/ChangeLog: * doc/invoke.texi: Remove stray @gol.
2023-04-18ifcvt.cc: Prevent excessive if-conversion for conditional movesTakayuki 'January June' Suwa1-1/+1
gcc/ * ifcvt.cc (cond_move_process_if_block): Consider the result of targetm.noce_conversion_profitable_p() when replacing the original sequence with the converted one.
2023-04-18Add -gcodeview optionMark Harmstone4-0/+18
gcc/ * common.opt (gcodeview): Add new option. * gcc.cc (driver_handle_option); Handle OPT_gcodeview. * opts.cc (command_handle_option): Similarly. * doc/invoke.texi: Add documentation for -gcodeview.
2023-04-18PHIOPT: Move tree_ssa_cs_elim into pass_cselim::execute.Andrew Pinski1-61/+57
This moves around the code for tree_ssa_cs_elim slightly improving code readability and removing declarations that are no longer needed. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (tree_ssa_phiopt_worker): Remove declaration. (make_pass_phiopt): Make execute out of line. (tree_ssa_cs_elim): Move code into ... (pass_cselim::execute): here.
2023-04-18gcc: Drop obsolete INCLUDE_PTHREAD_HSam James1-4/+0
gcc/ChangeLog: * system.h: Drop unused INCLUDE_PTHREAD_H.
2023-04-18vect: Verify that GET_MODE_UNITS is greater than one for ↵Kevin Lee1-0/+2
vect_grouped_store_supported gcc/ChangeLog: * tree-vect-data-refs.cc (vect_grouped_store_supported): Add new condition.
2023-04-18Add TARGET_ZBKB to the condition of bswapsi2, bswapdi2 and rotr<mode>3 patternsSinan Lin1-3/+3
gcc/ * config/riscv/bitmanip.md (rotr<mode>3 expander): Enable for ZBKB. (bswapdi2, bswapsi2): Similarly.
2023-04-18i386: Improve permutations with INSERTPS instruction [PR94908]Uros Bizjak8-9/+164
INSERTPS can select any element from src and insert into any place of the dest. For SSE4.1 targets, compiler can generate e.g. insertps $64, %xmm0, %xmm1 to insert element 1 from %xmm1 to element 0 of %xmm0. gcc/ChangeLog: PR target/94908 * config/i386/i386-builtin.def (__builtin_ia32_insertps128): Use CODE_FOR_sse4_1_insertps_v4sf. * config/i386/i386-expand.cc (expand_vec_perm_insertps): New. (expand_vec_perm_1): Call expand_vec_per_insertps. * config/i386/i386.md ("unspec"): Declare UNSPEC_INSERTPS here. * config/i386/mmx.md (mmxscalarmode): New mode attribute. (@sse4_1_insertps_<mode>): New insn pattern. * config/i386/sse.md (@sse4_1_insertps_<mode>): Macroize insn pattern from sse4_1_insertps using VI4F_128 mode iterator. gcc/testsuite/ChangeLog: PR target/94908 * gcc.target/i386/pr94908.c: New test. * gcc.target/i386/sse4_1-insertps-5.c: New test. * gcc.target/i386/vperm-v4sf-2-sse4.c: New test.
2023-04-18Add GTY support for vrange.Aldy Hernandez2-37/+99
IPA currently puts *some* irange's in GC memory. When I contribute support for generic ranges in IPA, we'll need to change this to vrange. This patch adds GTY support for both vrange and frange. gcc/ChangeLog: * value-range.cc (gt_ggc_mx): New. (gt_pch_nx): New. * value-range.h (class vrange): Add GTY marker. (class frange): Same. (gt_ggc_mx): Remove. (gt_pch_nx): Remove.
2023-04-18constraint: fix relaxed memory and repeated constraint handlingVictor L. Do Nascimento2-4/+38
The function `constrain_operands' lacked the logic to consider relaxed memory constraints when "traditional" memory constraints were not satisfied, creating potential issues as observed during the reload compilation pass. In addition, it was observed that while `constrain_operands' chooses to disregard constraints when more than one alternative is provided, e.g. "m,r" using CONSTRAINT__UNKNOWN, it has no checks in place to determine whether the multiple constraints in a given string are in fact repetitions of the same constraint and should thus in fact be treated as a single constraint, as ought to be the case for something like "m,m". Both of these issues are dealt with here, thus ensuring that we get appropriate pattern matching. gcc/ * lra-constraints.cc (constraint_unique): New. (process_address_1): Apply constraint_unique test. * recog.cc (constrain_operands): Allow relaxed memory constaints.
2023-04-18Docs: Add doc for RISC-V vector intrinsicsKito Cheng1-0/+9
Document which version of RISC-V vector intrinsics has implemented in GCC. gcc/ChangeLog: * doc/extend.texi (Target Builtins): Add RISC-V Vector Intrinsics. (RISC-V Vector Intrinsics): Document GCC implemented which version of RISC-V vector intrinsics and its reference.
2023-04-18middle-end/108786 - add bitmap_clear_first_set_bitRichard Biener9-20/+48
This adds bitmap_clear_first_set_bit and uses it where previously bitmap_clear_bit followed bitmap_first_set_bit. The advantage is speeding up the search and avoiding to clobber ->current. PR middle-end/108786 * bitmap.h (bitmap_clear_first_set_bit): New. * bitmap.cc (bitmap_first_set_bit_worker): Rename from bitmap_first_set_bit and add optional clearing of the bit. (bitmap_first_set_bit): Wrap bitmap_first_set_bit_worker. (bitmap_clear_first_set_bit): Likewise. * df-core.cc (df_worklist_dataflow_doublequeue): Use bitmap_clear_first_set_bit. * graphite-scop-detection.cc (scop_detection::merge_sese): Likewise. * sanopt.cc (sanitize_asan_mark_unpoison): Likewise. (sanitize_asan_mark_poison): Likewise. * tree-cfgcleanup.cc (cleanup_tree_cfg_noloop): Likewise. * tree-into-ssa.cc (rewrite_blocks): Likewise. * tree-ssa-dce.cc (simple_dce_from_worklist): Likewise. * tree-ssa-sccvn.cc (do_rpo_vn_1): Likewise.
2023-04-18Shrink points-to analysis dumps when not dumping with -detailsRichard Biener19-46/+53
The following allows to get PTA stats with -stats without blowing up your filesystem by guarding constraint and solution dumping with TDF_DETAILS and the SSA points-to info with TDF_DETAILS or TDF_ALIAS. * tree-ssa-structalias.cc (dump_sa_stats): Split out from... (dump_sa_points_to_info): ... this function. (compute_points_to_sets): Guard large dumps with TDF_DETAILS, and call dump_sa_stats guarded with TDF_STATS. (ipa_pta_execute): Likewise. (compute_may_aliases): Guard dump_alias_info with TDF_DETAILS|TDF_ALIAS. * gcc.dg/ipa/ipa-pta-16.c: Use -details for dump. * gcc.dg/tm/alias-1.c: Likewise. * gcc.dg/tm/alias-2.c: Likewise. * gcc.dg/torture/ipa-pta-1.c: Likewise. * gcc.dg/torture/pr39074-2.c: Likewise. * gcc.dg/torture/pr39074.c: Likewise. * gcc.dg/torture/pta-callused-1.c: Likewise. * gcc.dg/torture/pta-escape-1.c: Likewise. * gcc.dg/torture/pta-ptrarith-1.c: Likewise. * gcc.dg/torture/pta-ptrarith-2.c: Likewise. * gcc.dg/torture/pta-ptrarith-3.c: Likewise. * gcc.dg/torture/pta-structcopy-1.c: Likewise. * gcc.dg/torture/ssa-pta-fn-1.c: Likewise. * gcc.dg/tree-ssa/alias-19.c: Likewise. * gcc.dg/tree-ssa/pta-callused.c: Likewise. * gcc.dg/tree-ssa/pta-fp.c: Likewise. * gcc.dg/tree-ssa/pta-ptrarith-1.c: Likewise. * gcc.dg/tree-ssa/pta-ptrarith-2.c: Likewise.
2023-04-18PHIOPT: add folding/simplification detail to the dumpAndrew Pinski1-0/+29
While debugging PHI-OPT with match-and-simplify, I found that adding more dumping to the debug dumps made it easier to understand what was going on rather than stepping in the debugger so this adds them. Note I used TDF_FOLDING rather than TDF_DETAILS as these debug messages can be chatty and only needed if you are debugging match and simplify with PHI-OPT and match and simplify uses TDF_FOLDING as its check. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (gimple_simplify_phiopt): Dump the expression that is being tried when TDF_FOLDING is true. (phiopt_worker::match_simplify_replacement): Dump the sequence which was created by gimple_simplify_phiopt when TDF_FOLDING is true.
2023-04-18PHIOPT: small cleanup in match_simplify_replacementAndrew Pinski1-3/+2
We know that the statement we are moving is already have a SSA_NAME on the lhs so we don't need to check that and can also just call reset_flow_sensitive_info with the name we already got. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (match_simplify_replacement): Simplify code that does the movement slightly.
2023-04-18aarch64: Use standard RTL codes for __rev16 intrinsic expansionKyrylo Tkachov1-9/+17
I noticed for the expansion of the __rev16* arm_acle.h intrinsics we don't need to use an unspec just because it doesn't match neatly to a bswap code. We have organic combine patterns for it that we can reuse. This patch removes the define_insn using UNSPEC_REV (should it have been an UNSPEC_REV16?) and adds an expander to emit the patterns we have for rev16 using standard RTL codes. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64.md (@aarch64_rev16<mode>): Change to define_expand. (rev16<mode>2): Rename to... (aarch64_rev16<mode>2_alt1): ... This. (rev16<mode>2_alt): Rename to... (*aarch64_rev16<mode>2_alt2): ... This.
2023-04-18Declare dconstm0 to go along with dconst0 and friends.Aldy Hernandez5-9/+11
Negating dconst0 is getting pretty old, and we will keep adding copies of the same idiom. Fixed by adding a dconstm0 constant to go along with dconst1, dconstm1, etc. gcc/ChangeLog: * emit-rtl.cc (init_emit_once): Initialize dconstm0. * gimple-range-op.cc (class cfn_signbit): Remove dconstm0 declaration. * range-op-float.cc (zero_range): Use dconstm0. (zero_to_inf_range): Same. * real.h (dconstm0): New. * value-range.cc (frange::flush_denormals_to_zero): Use dconstm0. (frange::set_zero): Do not declare dconstm0.
2023-04-18RAII auto_mpfr and autp_mpzRichard Biener4-15/+41
The following adds two RAII classes, one for mpz_t and one for mpfr_t making object lifetime management easier. Both formerly require explicit initialization with {mpz,mpfr}_init and release with {mpz,mpfr}_clear. I've converted two example places (where lifetime is trivial). * system.h (class auto_mpz): New, * realmpfr.h (class auto_mpfr): Likewise. * fold-const-call.cc (do_mpfr_arg1): Use auto_mpfr. (do_mpfr_arg2): Likewise. * tree-ssa-loop-niter.cc (bound_difference): Use auto_mpz;
2023-04-18aarch64: Use intrinsic flags information rather than hardcoding FLAG_AUTO_FPKyrylo Tkachov1-1/+1
We record the flags to use for the intrinsics in aarch64_simd_intrinsic_data, so use it when initialising them rather than using a hardcoded FLAG_AUTO_FP. The current vreinterpret intrinsics use FLAG_AUTO_FP anyway so this patch is an NFC but this will be needed as we migrate more builtins into the intrinsics infrastructure. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64-builtins.cc (aarch64_init_simd_intrinsics): Take builtin flags from intrinsic data rather than hardcoded FLAG_AUTO_FP.
2023-04-18Fixed typo.Jin Ma1-1/+1
gcc/ada * gcc-interface/utils.cc (unchecked_convert): Fixed typo.
2023-04-18Return true from operator== for two identical ranges containing NAN.Aldy Hernandez1-10/+0
The == operator for ranges signifies that two ranges contain the same thing, not that they are ultimately equal. So [2,4] == [2,4], even though one may be a 2 and the other may be a 3. Similarly with two VARYING ranges. There is an oversight in frange::operator== where we are returning false for two identical NANs. This is causing us to never cache NANs in sbr_sparse_bitmap::set_bb_range. gcc/ChangeLog: * value-range.cc (frange::operator==): Adjust for NAN. (range_tests_nan): Remove some NAN tests.
2023-04-18Add inchash support for vrange.Aldy Hernandez4-0/+95
This patch provides inchash support for vrange. It is along the lines of the streaming support I just posted and will be used for IPA hashing of ranges. gcc/ChangeLog: * inchash.cc (hash::add_real_value): New. * inchash.h (class hash): Add add_real_value. * value-range.cc (add_vrange): New. * value-range.h (inchash::add_vrange): New.
2023-04-18tree-optimization/109539 - restrict PHI handling in access diagnosticsRichard Biener1-11/+45
Access diagnostics visits the SSA def-use chains to diagnose things like dangling pointer uses. When that runs into PHIs it tries to prove all incoming pointers of which one is the currently visited use are related to decide whether to keep looking for the PHI def uses. That turns out to be overly optimistic and thus costly. The following scraps the existing handling for simply requiring that we eventually visit all incoming pointers of the PHI during the def-use chain analysis and only then process uses of the PHI def. Note this handles backedges of natural loops optimistically, diagnosing the first iteration. There's gcc.dg/Wuse-after-free-2.c containing a testcase requiring this. PR tree-optimization/109539 * gimple-ssa-warn-access.cc (pass_waccess::check_pointer_uses): Re-implement pointer relatedness for PHIs.
2023-04-18amdgcn: HardFP divideAndrew Stubbs4-97/+144
Implement FP division using hardware instructions. This replaces both the softfp library calls, and the --fast-math inaccurate divsion we had previously. The GCN architecture does not have a single divide instruction, but it does have a number of support instructions designed to make multiply-by-reciprocal sufficiently accurate for non-fast-math usage. gcc/ChangeLog: * config/gcn/gcn-valu.md (SV_SFDF): New iterator. (SV_FP): New iterator. (scalar_mode, SCALAR_MODE): Add identity mappings for scalar modes. (recip<mode>2): Unify the two patterns using SV_FP. (div_scale<mode><exec_vcc>): New insn. (div_fmas<mode><exec>): New insn. (div_fixup<mode><exec>): New insn. (div<mode>3): Unify the two expanders and rewrite using hardfp. * config/gcn/gcn.cc (gcn_md_reorg): Support "vccwait" attribute. * config/gcn/gcn.md (unspec): Add UNSPEC_DIV_SCALE, UNSPEC_DIV_FMAS, and UNSPEC_DIV_FIXUP. (vccwait): New attribute. gcc/testsuite/ChangeLog: * gcc.target/gcn/fpdiv.c: Remove the -ffast-math requirement.
2023-04-18aarch64: Give hint for -mcpu options that match -march insteadKyrylo Tkachov2-0/+19
We should redirect users of the erroneous -mcpu=armv8.2-a to use -march instead. There is an equivalent hint for -march used with a CPU name. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_validate_mcpu): Add hint to use -march if the argument matches that. gcc/testsuite/ChangeLog: * gcc.target/aarch64/spellcheck_11.c: New test.
2023-04-18aarch64: Add QI -> HI zero-extension for LDAPRKyrylo Tkachov2-3/+11
This patch is a straightforward extension of the zero-extending LDAPR pattern to represent QI -> HI load-extends. This maps down to a LDAPRB-W instruction. This lets us remove a redundant zero-extend in the new test function. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/atomics.md (*aarch64_atomic_load<ALLX:mode>_rcpc_zext): Use SD_HSDI for destination mode iterator. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ldapr-zext.c: Add test for u8 to u16 extension.
2023-04-18RISC-V: Adjust the parsing order of extensions to be consistent with ↵Jin Ma2-7/+7
riscv-spec and binutils. The current order of gcc and binutils parsing extensions is inconsistent. According to latest risc-v spec, the canonical order in which extension names must appear in the name string specified in Table 29.1 is different from before. In the latest table, non-standard extensions must be listed after all standard extensions. To keep consistent, we now change the parsing order. Related llvm patch links: https://reviews.llvm.org/D148315 gcc/ChangeLog: * common/config/riscv/riscv-common.cc (multi_letter_subset_rank): Swap the order of z-extensions and s-extensions. (riscv_subset_list::parse): Likewise. gcc/testsuite/ChangeLog: * gcc.target/riscv/arch-5.c: Likewise.
2023-04-18match.pd: Improve fneg/fadd optimization [PR109240]Jakub Jelinek3-54/+175
match.pd has mostly for AArch64 an optimization in which it optimizes certain forms of __builtin_shuffle of x + y and x - y vectors into fneg using twice as wide element type so that every other sign is changed, followed by fadd. The following patch extends that optimization, so that it can handle other forms as well, using the same fneg but fsub instead of fadd. As the plus is commutative and minus is not and I want to handle vec_perm with plus minus and minus plus order preferrably in one pattern, I had to do the matching operand checks by hand. 2023-04-18 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109240 * match.pd (fneg/fadd): Rewrite such that it handles both plus as first vec_perm operand and minus as second using fneg/fadd and minus as first vec_perm operand and plus as second using fneg/fsub. * gcc.target/aarch64/simd/addsub_2.c: New test. * gcc.target/aarch64/sve/addsub_2.c: New test.
2023-04-18Abstract out REAL_VALUE_TYPE streaming.Aldy Hernandez4-25/+38
In upcoming patches I will contribute code to stream out frange's as well as vrange's. This patch abstracts out the REAL_VALUE_TYPE streaming into their own functions, so that they may be used elsewhere. gcc/ChangeLog: * data-streamer.cc (bp_pack_real_value): New. (bp_unpack_real_value): New. * data-streamer.h (bp_pack_real_value): New. (bp_unpack_real_value): New. * tree-streamer-in.cc (unpack_ts_real_cst_value_fields): Use bp_unpack_real_value. * tree-streamer-out.cc (pack_ts_real_cst_value_fields): Use bp_pack_real_value.
2023-04-18Abstract out calculation of max HWIs per wide int.Aldy Hernandez1-5/+7
I'm about to add one more use of the same snippet of code, for a total of 4 identical calculations in the code base. gcc/ChangeLog: * wide-int.h (WIDE_INT_MAX_HWIS): New. (class fixed_wide_int_storage): Use it. (trailing_wide_ints <N>::set_precision): Use it. (trailing_wide_ints <N>::extra_size): Use it.