aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-21configure: Only create serdep.tmp if neededPeter Foley2-0/+4
There's no reason to create this file if none of the serial configure options are passed. ChangeLog: * configure: Regenerate. * configure.ac: Only create serdep.tmp if needed
2023-04-21gcc/m2: Drop references to $(P)Arsen Arsenović2-3/+3
$(P) seems to have been a workaround for some old, proprietary make implementations that we no longer support. It was removed in r0-31149-gb8dad04b688e9c. gcc/m2/ChangeLog: * Make-lang.in: Remove references to $(P). * Make-maintainer.in: Ditto.
2023-04-21Adjust x86 testsuite for recent if-conversion cost checkingJeff Law1-1/+4
gcc/testsuite PR testsuite/109549 * gcc.target/i386/cmov6.c: No longer expect this test to generate 'cmov' instructions.
2023-04-21aarch64: Emit single-instruction for smin (x, 0) and smax (x, 0)Kyrylo Tkachov3-15/+97
Motivated by https://reviews.llvm.org/D148249, we can expand to a single instruction for the SMIN (x, 0) and SMAX (x, 0) cases using the combined AND/BIC and ASR operations. Given that we already have well-fitting TARGET_CSSC patterns and expanders for the min/max codes in the backend this patch does some minor refactoring to ensure we emit the right SMAX/SMIN RTL codes for TARGET_CSSC, fall back to the generic expanders or emit a simple SMIN/SMAX with 0 RTX for !TARGET_CSSC that is now matched by a separate pattern. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: * config/aarch64/aarch64.md (aarch64_umax<mode>3_insn): Delete. (umax<mode>3): Emit raw UMAX RTL instead of going through gen_ function for umax. (<optab><mode>3): New define_expand for MAXMIN_NOUMAX codes. (*aarch64_<optab><mode>3_zero): Define. (*aarch64_<optab><mode>3_cssc): Likewise. * config/aarch64/iterators.md (maxminand): New code attribute. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sminmax-asr_1.c: New test.
2023-04-21PR target/108779 aarch64: Implement -mtp= optionKyrylo Tkachov11-1/+94
A user has requested that we support the -mtp= option in aarch64 GCC for changing the TPIDR register to read for TLS accesses. I'm not a big fan of the option name, but we already support it in the arm port and Clang supports it for AArch64 already, where it accepts the 'el0', 'el1', 'el2', 'el3' values. This patch implements the same functionality in GCC. Bootstrapped and tested on aarch64-none-linux-gnu. Confirmed with godbolt that the sequences and options are the same as what Clang accepts/generates. gcc/ChangeLog: PR target/108779 * config/aarch64/aarch64-opts.h (enum aarch64_tp_reg): Define. * config/aarch64/aarch64-protos.h (aarch64_output_load_tp): Define prototype. * config/aarch64/aarch64.cc (aarch64_tpidr_register): Declare. (aarch64_override_options_internal): Handle the above. (aarch64_output_load_tp): New function. * config/aarch64/aarch64.md (aarch64_load_tp_hard): Call aarch64_output_load_tp. * config/aarch64/aarch64.opt (aarch64_tp_reg): Define enum. (mtp=): New option. * doc/invoke.texi (AArch64 Options): Document -mtp=. gcc/testsuite/ChangeLog: PR target/108779 * gcc.target/aarch64/mtp.c: New test. * gcc.target/aarch64/mtp_1.c: New test. * gcc.target/aarch64/mtp_2.c: New test. * gcc.target/aarch64/mtp_3.c: New test. * gcc.target/aarch64/mtp_4.c: New test.
2023-04-21aarch64: PR target/99195 Add scheme to optimise away vec_concat with zeroes ↵Kyrylo Tkachov3-6/+87
on 64-bit Advanced SIMD ops I finally got around to trying out the define_subst approach for PR target/99195. The problem we have is that many Advanced SIMD instructions have 64-bit vector variants that clear the top half of the 128-bit Q register. This would allow the compiler to avoid generating explicit zeroing instructions to concat the 64-bit result with zeroes for code like: vcombine_u16(vadd_u16(a, b), vdup_n_u16(0)) We've been getting user reports of GCC missing this optimisation in real world code, so it's worth doing something about it. The straightforward approach that we've been taking so far is adding extra patterns in aarch64-simd.md that match the 64-bit result in a vec_concat with zeroes. Unfortunately for big-endian the vec_concat operands to match have to be the other way around, so we would end up adding two extra define_insns. This would lead to too much bloat in aarch64-simd.md This patch defines a pair of define_subst constructs that allow us to annotate patterns in aarch64-simd.md with the <vczle> and <vczbe> subst_attrs and the compiler will automatically produce the vec_concat widening patterns, properly gated for BYTES_BIG_ENDIAN when needed. This seems like the least intrusive way to describe the extra zeroing semantics. I've had a look at the generated insn-*.cc files in the build directory and it seems that define_subst does what we want it to do when applied multiple times on a pattern in terms of insn conditions and modes. This patch adds the define_subst machinery and adds the annotations to some of the straightforward binary and unary integer operations. Many more such annotations are possible and I aim add them in future patches if this approach is acceptable. Bootstrapped and tested on aarch64-none-linux-gnu and on aarch64_be-none-elf. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (add_vec_concat_subst_le): Define. (add_vec_concat_subst_be): Likewise. (vczle): Likewise. (vczbe): Likewise. (add<mode>3): Rename to... (add<mode>3<vczle><vczbe>): ... This. (sub<mode>3): Rename to... (sub<mode>3<vczle><vczbe>): ... This. (mul<mode>3): Rename to... (mul<mode>3<vczle><vczbe>): ... This. (and<mode>3): Rename to... (and<mode>3<vczle><vczbe>): ... This. (ior<mode>3): Rename to... (ior<mode>3<vczle><vczbe>): ... This. (xor<mode>3): Rename to... (xor<mode>3<vczle><vczbe>): ... This. * config/aarch64/iterators.md (VDZ): Define. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: New test.
2023-04-21c++, tree: optimize walk_tree_1 and cp_walk_subtreesPatrick Palka2-118/+119
These functions currently repeatedly dereference tp during the subtree walks, dereferences which the compiler can't CSE because it can't guarantee that the subtree walking doesn't modify *tp. But we already implicitly require that TREE_CODE (*tp) remains the same throughout the subtree walks, so it doesn't seem to be a huge leap to strengthen that to requiring *tp remains the same. So this patch manually CSEs the dereferences of *tp. This means that a callback function can no longer replace *tp with another tree (of the same TREE_CODE) when walking one of its subtrees, but that doesn't sound like a useful capability anyway. gcc/cp/ChangeLog: * tree.cc (cp_walk_subtrees): Avoid repeatedly dereferencing tp. <case DECLTYPE_TYPE>: Use cp_unevaluated and WALK_SUBTREE. <case ALIGNOF_EXPR etc>: Likewise. gcc/ChangeLog: * tree.cc (walk_tree_1): Avoid repeatedly dereferencing tp and type_p.
2023-04-21 Add Ajit Kumar Agarwal to write after approval“ajit.kumar.agarwal@ibm.com”1-0/+1
2023-04-21 Ajit Kumar Agarwal <aagarwa1@linux.ibm.com> ChangeLog: * MAINTAINERS (Write After Approval): Add myself.
2023-04-21Fix boostrap failure in tree-ssa-loop-ch.ccJan Hubicka1-7/+9
I managed to mix up patch and its WIP version in previous commit. This patch adds the missing edge iterator and also fixes a side case where new loop header would have multiple latches. gcc/ChangeLog: * tree-ssa-loop-ch.cc (ch_base::copy_headers): Fix previous commit.
2023-04-21expansion: make layout of x_shift*cost[][][] more efficientVineet Gupta1-14/+13
when debugging expmed.[ch] for PR/108987 saw that some of the cost arrays have less than ideal layout as follows: x_shift*cost[0..63][speed][modes] We would want speed to be first index since a typical compile will have that fixed, followed by mode and then the shift values. It should be non-functional from compiler semantics pov, except executing slightly faster due to better locality of shift values for given speed and mode. And also a bit more intutive when debugging. gcc/Changelog: * expmed.h (x_shift*_cost): convert to int [speed][mode][shift]. (shift*_cost_ptr ()): Access x_shift*_cost array directly. Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-04-21MAINTAINERS: add Vineet Gupta to write after approvalVineet Gupta1-0/+1
ChangeLog: * MAINTAINERS (Write After Approval): Add myself. Ref: <680c7bbe-5d6e-07cd-8468-247afc65e1dd@gmail.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
2023-04-21[aarch64] Use force_reg instead of copy_to_mode_reg.Prathamesh Kulkarni1-6/+6
Use force_reg instead of copy_to_mode_reg in aarch64_simd_dup_constant and aarch64_expand_vector_init to avoid creating pseudo if original value is already in a register. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_simd_dup_constant): Use force_reg instead of copy_to_mode_reg. (aarch64_expand_vector_init): Likewise.
2023-04-21i386: Remove REG_OK_FOR_INDEX/REG_OK_FOR_BASE and their derivativesUros Bizjak3-50/+34
x86 was converted to TARGET_LEGITIMATE_ADDRESS_P long ago. Remove remnants of the conversion. Also, cleanup the remaining macros a bit by introducing INDEX_REGNO_P macro. No functional change. gcc/ChangeLog: 2023-04-21 Uroš Bizjak <ubizjak@gmail.com> * config/i386/i386.h (REG_OK_FOR_INDEX_P, REG_OK_FOR_BASE_P): Remove. (REG_OK_FOR_INDEX_NONSTRICT_P, REG_OK_FOR_BASE_NONSTRICT_P): Ditto. (REG_OK_FOR_INDEX_STRICT_P, REG_OK_FOR_BASE_STRICT_P): Ditto. (FIRST_INDEX_REG, LAST_INDEX_REG): New defines. (LEGACY_INDEX_REG_P, LEGACY_INDEX_REGNO_P): New macros. (INDEX_REG_P, INDEX_REGNO_P): Ditto. (REGNO_OK_FOR_INDEX_P): Use INDEX_REGNO_P predicates. (REGNO_OK_FOR_INDEX_NONSTRICT_P): New macro. (EG_OK_FOR_BASE_NONSTRICT_P): Ditto. * config/i386/predicates.md (index_register_operand): Use REGNO_OK_FOR_INDEX_P and REGNO_OK_FOR_INDEX_NONSTRICT_P macros. * config/i386/i386.cc (ix86_legitimate_address_p): Use REGNO_OK_FOR_BASE_P, REGNO_OK_FOR_BASE_NONSTRICT_P, REGNO_OK_FOR_INDEX_P and REGNO_OK_FOR_INDEX_NONSTRICT_P macros.
2023-04-21Fix latent bug in loop header copying which forgets to update the loop ↵Jan Hubicka1-0/+13
header pointer gcc/ChangeLog: 2023-04-21 Jan Hubicka <hubicka@ucw.cz> Ondrej Kubanek <kubanek0ondrej@gmail.com> * tree-ssa-loop-ch.cc (ch_base::copy_headers): Update loop header and latch.
2023-04-21Add safe_is_aRichard Biener1-0/+13
The following adds safe_is_a, an is_a check handling nullptr gracefully. * is-a.h (safe_is_a): New.
2023-04-21Add operator* to gimple_stmt_iterator and gphi_iteratorRichard Biener1-0/+4
This allows STL style iterator dereference. It's the same as gsi_stmt () or .phi (). * gimple-iterator.h (gimple_stmt_iterator::operator*): Add. (gphi_iterator::operator*): Likewise.
2023-04-21Stabilize inlinerJan Hubicka1-13/+70
The Fibonacci heap can change its behaviour quite significantly for no good reasons when multiple edges with same key occurs. This is quite common for small functions. This patch stabilizes the order by adding edge uids into the info. Again I think this is good idea regardless of the incremental WPA project since we do not want random changes in inline decisions. gcc/ChangeLog: 2023-04-21 Jan Hubicka <hubicka@ucw.cz> Michal Jires <michal@jires.eu> * ipa-inline.cc (class inline_badness): New class. (edge_heap_t, edge_heap_node_t): Use inline_badness for badness instead of sreal. (update_edge_key): Update. (lookup_recursive_calls): Likewise. (recursive_inlining): Likewise. (add_new_edges_to_heap): Likewise. (inline_small_functions): Likewise.
2023-04-21Cleanup odr_types_equivalent_pJan Hubicka1-6/+9
gcc/ChangeLog: 2023-04-21 Jan Hubicka <hubicka@ucw.cz> * ipa-devirt.cc (odr_types_equivalent_p): Cleanup warned checks.
2023-04-21PR modula2/109586 cc1gm2 ICE when compiling large source files.Gaius Mulley1-2/+2
The function m2block_RememberConstant calls m2tree_IsAConstant. However IsAConstant does not recognise TREE_CODE(t) == CONSTRUCTOR as a constant. Without this patch CONSTRUCTOR contants are garbage collected (and not preserved) resulting in a corrupt tree and crash. gcc/m2/ChangeLog: PR modula2/109586 * gm2-gcc/m2tree.cc (m2tree_IsAConstant): Add (TREE_CODE (t) == CONSTRUCTOR) to expression. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-04-21tree-optimization/109573 - avoid ICEing on unexpected live defRichard Biener2-3/+95
The following relaxes the assert in vectorizable_live_operation where we catch currently unhandled cases to also allow an intermediate copy as it happens here but also relax the assert to checking only. PR tree-optimization/109573 * tree-vect-loop.cc (vectorizable_live_operation): Allow unhandled SSA copy as well. Demote assert to checking only. * g++.dg/vect/pr109573.cc: New testcase.
2023-04-21Use correct CFG orders for DF worklist processingRichard Biener1-16/+20
This adjusts the remaining three RPO computes in DF. The DF_FORWARD problems should use a RPO on the forward graph, the DF_BACKWARD problems should use a RPO on the inverted graph. Conveniently now inverted_rev_post_order_compute computes a RPO. We still use post_order_compute and reverse its order for its side-effect of deleting unreachable blocks. This resuls in an overall reduction on visited blocks on cc1files by 5.2%. Because on the reverse CFG most regions are irreducible, there's few cases the number of visited blocks increases. For the set of cc1files I have this is for et-forest.i, graph.i, hwint.i, tree-ssa-dom.i, tree-ssa-loop-ch.i and tree-ssa-threadedge.i. For tree-ssa-dse.i it's off-noise and I've more closely investigated and figured it is really bad luck due to the irreducibility. * df-core.cc (df_analyze): Compute RPO on the reverse graph for DF_BACKWARD problems. (loop_post_order_compute): Rename to ... (loop_rev_post_order_compute): ... this, compute a RPO. (loop_inverted_post_order_compute): Rename to ... (loop_inverted_rev_post_order_compute): ... this, compute a RPO. (df_analyze_loop): Use RPO on the forward graph for DF_FORWARD problems, RPO on the inverted graph for DF_BACKWARD.
2023-04-21change inverted_post_order_compute to inverted_rev_post_order_computeRichard Biener6-44/+53
The following changes the inverted_post_order_compute API back to a plain C array interface and computing a reverse post order since that's what's always required. It will make massaging DF to use the correct iteration orders easier. Elsewhere it requires turning backward iteration over the computed order with forward iteration. * cfganal.h (inverted_rev_post_order_compute): Rename from ... (inverted_post_order_compute): ... this. Add struct function argument, change allocation to a C array. * cfganal.cc (inverted_rev_post_order_compute): Likewise. * lcm.cc (compute_antinout_edge): Adjust. * lra-lives.cc (lra_create_live_ranges_1): Likewise. * tree-ssa-dce.cc (remove_dead_stmt): Likewise. * tree-ssa-pre.cc (compute_antic): Likewise.
2023-04-21change DF to use the proper CFG order for DF_FORWARD problemsRichard Biener2-32/+34
This changes DF to use RPO on the forward graph for DF_FORWARD problems. While that naturally maps to pre_and_rev_postorder_compute we use the existing (wrong) CFG order for DF_BACKWARD problems computed by post_order_compute since that provides the required side-effect of deleting unreachable blocks. The change requires turning the inconsistent vec<int> vs int * back to consistent int *. A followup patch will change the inverted_post_order_compute API and change the DF_BACKWARD problem to use the correct RPO on the backward graph together with statistics I produced last year for the combined effect. * df.h (df_d::postorder_inverted): Change back to int *, clarify comments. * df-core.cc (rest_of_handle_df_finish): Adjust. (df_analyze_1): Likewise. (df_analyze): For DF_FORWARD problems use RPO on the forward graph. Adjust. (loop_inverted_post_order_compute): Adjust API. (df_analyze_loop): Adjust. (df_get_n_blocks): Likewise. (df_get_postorder): Likewise.
2023-04-21RISC-V: Defer vsetvli insertion to later if possible [PR108270]Juzhe-Zhong5-3/+47
Fix issue: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108270. Consider the following testcase: void f (void * restrict in, void * restrict out, int l, int n, int m) { for (int i = 0; i < l; i++){ for (int j = 0; j < m; j++){ for (int k = 0; k < n; k++) { vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, 17); __riscv_vse8_v_i8mf8 (out + i + j, v, 17); } } } } Compile option: -O3 Before this patch: mv a7,a2 mv a6,a0 mv t1,a1 mv a2,a3 vsetivli zero,17,e8,mf8,ta,ma ble a7,zero,.L1 ble a4,zero,.L1 ble a3,zero,.L1 ... After this patch: mv a7,a2 mv a6,a0 mv t1,a1 mv a2,a3 ble a7,zero,.L1 ble a4,zero,.L1 ble a3,zero,.L1 add a1,a0,a4 li a0,0 vsetivli zero,17,e8,mf8,ta,ma ... This issue is a missed optmization produced by Phase 3 global backward demand fusion instead of LCM. This patch is fixing poor placement of the vsetvl. This point is seletected not because LCM but by Phase 3 (VL/VTYPE demand info backward fusion and propogation) which is I introduced into VSETVL PASS to enhance LCM && improve vsetvl instruction performance. This patch is to supress the Phase 3 too aggressive backward fusion and propagation to the top of the function program when there is no define instruction of AVL (AVL is 0 ~ 31 imm since vsetivli instruction allows imm value instead of reg). You may want to ask why we need Phase 3 to the job. Well, we have so many situations that pure LCM fails to optimize, here I can show you a simple case to demonstrate it: void f (void * restrict in, void * restrict out, int n, int m, int cond) { size_t vl = 101; for (size_t j = 0; j < m; j++){ if (cond) { for (size_t i = 0; i < n; i++) { vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + j, vl); __riscv_vse8_v_i8mf8 (out + i, v, vl); } } else { for (size_t i = 0; i < n; i++) { vint32mf2_t v = __riscv_vle32_v_i32mf2 (in + i + j, vl); v = __riscv_vadd_vv_i32mf2 (v,v,vl); __riscv_vse32_v_i32mf2 (out + i, v, vl); } } } } You can see: The first inner loop needs vsetvli e8 mf8 for vle+vse. The second inner loop need vsetvli e32 mf2 for vle+vadd+vse. If we don't have Phase 3 (Only handled by LCM (Phase 4)), we will end up with : outerloop: ... vsetvli e8mf8 inner loop 1: .... vsetvli e32mf2 inner loop 2: .... However, if we have Phase 3, Phase 3 is going to fuse the vsetvli e32 mf2 of inner loop 2 into vsetvli e8 mf8, then we will end up with this result after phase 3: outerloop: ... inner loop 1: vsetvli e32mf2 .... inner loop 2: vsetvli e32mf2 .... Then, this demand information after phase 3 will be well optimized after phase 4 (LCM), after Phase 4 result is: vsetvli e32mf2 outerloop: ... inner loop 1: .... inner loop 2: .... You can see this is the optimal codegen after current VSETVL PASS (Phase 3: Demand backward fusion and propagation + Phase 4: LCM ). This is a known issue when I start to implement VSETVL PASS. gcc/ChangeLog: PR target/108270 * config/riscv/riscv-vsetvl.cc (vector_infos_manager::all_empty_predecessor_p): New function. (pass_vsetvl::backward_demand_fusion): Ditto. * config/riscv/riscv-vsetvl.h: Ditto. gcc/testsuite/ChangeLog: PR target/108270 * gcc.target/riscv/rvv/vsetvl/imm_bb_prop-1.c: Adapt testcase. * gcc.target/riscv/rvv/vsetvl/imm_conflict-3.c: Ditto. * gcc.target/riscv/rvv/vsetvl/pr108270.c: New test.
2023-04-21riscv: Fix <bitmanip_insn> fallout.Robin Dapp1-1/+1
PR109582: Since r14-116 generic.md uses standard names instead of the types defined in the <bitmanip_insn> iterator (that match instruction names). Change this. gcc/ChangeLog: PR target/109582 * config/riscv/generic.md: Change standard names to insn names.
2023-04-21rs6000: xfail float128 comparison test case that fails on powerpc64.Haochen Gui1-0/+1
This patch xfails a float128 comparison test case on powerpc64 that fails due to a longstanding issue with floating-point compares. See PR58684 for more information. When float128 hardware is enabled (-mfloat128-hardware), xscmpuqp is generated for comparison which is unexpected. When float128 software emulation is enabled (-mno-float128-hardware), we still have to xfail the hardware version (__lekf2_hw) which finally generates xscmpuqp. gcc/testsuite/ PR target/108728 * gcc.dg/torture/float128-cmp-invalid.c: Add xfail.
2023-04-21testsuite: make ppc_cpu_supports_hw as effective target keyword [PR108728]Haochen Gui1-0/+1
gcc/testsuite/ PR target/108728 * lib/target-supports.exp (is-effective-target-keyword): Add ppc_cpu_supports_hw.
2023-04-21Fix LCM dataflow CFG orderRichard Biener1-23/+24
The following fixes the initial order the LCM dataflow routines process BBs. For a forward problem you want reverse postorder, for a backward problem you want reverse postorder on the inverted graph. The LCM iteration has very many other issues but this allows to turn inverted_post_order_compute into computing a reverse postorder more easily. * lcm.cc (compute_antinout_edge): Use RPO on the inverted graph. (compute_laterin): Use RPO. (compute_available): Likewise.
2023-04-21LoongArch: Fix MUSL_DYNAMIC_LINKERPeng Fan1-1/+6
The system based on musl has no '/lib64', so change it. https://wiki.musl-libc.org/guidelines-for-distributions.html, "Multilib/multi-arch" section of this introduces it. gcc/ * config/loongarch/gnu-user.h (MUSL_DYNAMIC_LINKER): Redefine. Signed-off-by: Peng Fan <fanpeng@loongson.cn> Suggested-by: Xi Ruoyao <xry111@xry111.site>
2023-04-21RISC-V: Add local user vsetvl instruction elimination [PR109547]Juzhe-Zhong4-3/+85
This patch is to enhance optimization for auto-vectorization. Before this patch: Loop: vsetvl a5,a2... vsetvl zero,a5... vle After this patch: Loop: vsetvl a5,a2 vle gcc/ChangeLog: PR target/109547 * config/riscv/riscv-vsetvl.cc (local_eliminate_vsetvl_insn): New function. (vector_insn_info::skip_avl_compatible_p): Ditto. (vector_insn_info::merge): Remove default value. (pass_vsetvl::compute_local_backward_infos): Ditto. (pass_vsetvl::cleanup_insns): Add local vsetvl elimination. * config/riscv/riscv-vsetvl.h: Ditto. gcc/testsuite/ChangeLog: PR target/109547 * gcc.target/riscv/rvv/vsetvl/pr109547.c: New. * gcc.target/riscv/rvv/vsetvl/vsetvl-17.c: Update scan condition.
2023-04-21Daily bump.GCC Administrator6-1/+541
2023-04-20update_web_docs_git: Allow setting TEXI2*, add git build defaultArsen Arsenović1-3/+14
maintainer-scripts/ChangeLog: * update_web_docs_git: Add a mechanism to override makeinfo, texi2dvi and texi2pdf, and default them to /home/gccadmin/texinfo/install-git/bin/${tool}, if present.
2023-04-20c++: simplify TEMPLATE_TYPE_PARM level loweringPatrick Palka1-21/+16
1. Don't bother recursing when level lowering a cv-qualified type template parameter. 2. Get rid of the recursive loop breaker when level lowering a constrained auto, and enable the TEMPLATE_PARM_DESCENDANTS cache in this case too. This should be safe to do so now that we no longer substitute constraints on an auto. gcc/cp/ChangeLog: * pt.cc (tsubst) <case TEMPLATE_TYPE_PARM>: Don't recurse when level lowering a cv-qualified type template parameter. Remove recursive loop breaker in the level lowering case for constrained autos. Use the TEMPLATE_PARM_DESCENDANTS cache in this case as well.
2023-04-20c++: use TREE_VEC for trailing args of variadic built-in traitsPatrick Palka6-28/+43
This patch makes us use TREE_VEC instead of TREE_LIST to represent the trailing arguments of a variadic built-in trait. These built-ins are typically passed a simple pack expansion as the second argument, e.g. __is_constructible(T, Ts...) and the main benefit of this representation change is that substituting into this argument list is now basically free since tsubst_template_args makes sure we reuse the TREE_VEC of the corresponding ARGUMENT_PACK when expanding such a pack expansion. In the previous TREE_LIST representation we would need need to convert the expanded pack expansion into a TREE_LIST (via tsubst_tree_list). Note that an empty set of trailing arguments is now represented as an empty TREE_VEC instead of NULL_TREE, so now TRAIT_TYPE/EXPR_TYPE2 will be empty only for unary traits. gcc/cp/ChangeLog: * constraint.cc (diagnose_trait_expr): Convert a TREE_VEC of arguments into a TREE_LIST for sake of pretty printing. * cxx-pretty-print.cc (pp_cxx_trait): Handle TREE_VEC instead of TREE_LIST of trailing variadic trait arguments. * method.cc (constructible_expr): Likewise. (is_xible_helper): Likewise. * parser.cc (cp_parser_trait): Represent trailing variadic trait arguments as a TREE_VEC instead of TREE_LIST. * pt.cc (value_dependent_expression_p): Handle TREE_VEC instead of TREE_LIST of trailing variadic trait arguments. * semantics.cc (finish_type_pack_element): Likewise. (check_trait_type): Likewise.
2023-04-20c++: make strip_typedefs generalize strip_typedefs_exprPatrick Palka1-59/+25
Currently if we have a TREE_VEC of types that we want to strip of typedefs, we unintuitively need to call strip_typedefs_expr instead of strip_typedefs since only strip_typedefs_expr handles TREE_VEC, and it also dispatches to strip_typedefs when given a type. But this seems backwards: arguably strip_typedefs_expr should be the more specialized function, which strip_typedefs dispatches to (and thus generalizes). So this patch makes strip_typedefs subsume strip_typedefs_expr rather than vice versa, which allows for some simplifications. gcc/cp/ChangeLog: * tree.cc (strip_typedefs): Move TREE_LIST handling to strip_typedefs_expr. Dispatch to strip_typedefs_expr for non-type 't'. <case TYPENAME_TYPE>: Remove manual dispatching to strip_typedefs_expr. <case TRAIT_TYPE>: Likewise. (strip_typedefs_expr): Replaces calls to strip_typedefs_expr with strip_typedefs throughout. Don't dispatch to strip_typedefs for type 't'. <case TREE_LIST>: Replace this with the better version from strip_typedefs.
2023-04-20doc: Remove repeated word (typo)Alejandro Colomar1-1/+1
gcc/ChangeLog: * doc/extend.texi (Common Function Attributes): Remove duplicate word. Signed-off-by: Alejandro Colomar <alx@kernel.org>
2023-04-20Do not ignore UNDEFINED ranges when determining PHI equivalences.Andrew MacLeod5-10/+117
Do not ignore UNDEFINED name arguments when registering two-way equivalences from PHIs. PR tree-optimization/109564 gcc/ * gimple-range-fold.cc (fold_using_range::range_of_phi): Do no ignore UNDEFINED range names when deciding if all PHI arguments are the same, gcc/testsuite/ * gcc.dg/torture/pr109564-1.c: New testcase. * gcc.dg/torture/pr109564-2.c: Likewise. * gcc.dg/tree-ssa/evrp-ignore.c: XFAIL. * gcc.dg/tree-ssa/vrp06.c: Likewise.
2023-04-20tree-vect-patterns: One small vect_recog_ctz_ffs_pattern tweak [PR109011]Jakub Jelinek1-1/+1
I've noticed I've made a typo, ifn in this function this late is always only IFN_CTZ or IFN_FFS, never IFN_CLZ. Due to this typo, we weren't using the originally intended .CTZ (X) = .POPCOUNT ((X - 1) & ~X) but .CTZ (X) = PREC - .POPCOUNT (X | -X) instead when we want to emit __builtin_ctz*/.CTZ using .POPCOUNT. Both compute the same value, both are defined at 0 with the same value (PREC), both have same number of GIMPLE statements, but I think the former ought to be preferred, because lots of targets have andn as a single operation rather than two, and also putting a -1 constant into a vector register is often cheaper than vector with broadcast PREC power of two value. 2023-04-20 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/109011 * tree-vect-patterns.cc (vect_recog_ctz_ffs_pattern): Use .CTZ (X) = .POPCOUNT ((X - 1) & ~X) in preference to .CTZ (X) = PREC - .POPCOUNT (X | -X).
2023-04-20c: Avoid -Wenum-int-mismatch warning for redeclaration of builtin ↵Jakub Jelinek2-1/+31
acc_on_device [PR107041] The new -Wenum-int-mismatch warning triggers with -Wsystem-headers in <openacc.h>, for obvious reasons the builtin acc_on_device uses int type argument rather than enum which isn't defined yet when the builtin is created, while the OpenACC spec requires it to have acc_device_t enum argument. The header makes sure it has int underlying type by using negative and __INT_MAX__ enumerators. I've tried to make the builtin typegeneric or just varargs, but that changes behavior e.g. when one calls it with some C++ class which has cast operator to acc_device_t, so the following patch instead disables the warning for this builtin. 2023-04-20 Jakub Jelinek <jakub@redhat.com> PR c/107041 * c-decl.cc (diagnose_mismatched_decls): Avoid -Wenum-int-mismatch warning on acc_on_device declaration. * gcc.dg/goacc/pr107041.c: New test.
2023-04-20[LRA]: Exclude some hard regs for multi-reg inout reload pseudos used in asm ↵Vladimir N. Makarov1-0/+28
in different mode See gcc.c-torture/execute/20030222-1.c. Consider the code for 32-bit (e.g. BE) target: int i, v; long x; x = v; asm ("" : "=r" (i) : "0" (x)); We generate the following RTL with reload insns: 1. subreg:si(x:di, 0) = 0; 2. subreg:si(x:di, 4) = v:si; 3. t:di = x:di, dead x; 4. asm ("" : "=r" (subreg:si(t:di,4)) : "0" (t:di)) 5. i:si = subreg:si(t:di,4); If we assign hard reg of x to t, dead code elimination will remove insn #2 and we will use unitialized hard reg. So exclude the hard reg of x for t. We could ignore this problem for non-empty asm using all x value but it is hard to check that the asm are expanded into insn realy using x and setting r. The old reload pass used the same approach. gcc/ChangeLog * lra-constraints.cc (match_reload): Exclude some hard regs for multi-reg inout reload pseudos used in asm in different mode.
2023-04-20arch: Use VIRTUAL_REGISTER_P predicate.Uros Bizjak7-20/+11
gcc/ChangeLog: * config/arm/arm.cc (thumb1_legitimate_address_p): Use VIRTUAL_REGISTER_P predicate. (arm_eliminable_register): Ditto. * config/avr/avr.md (push<mode>_1): Ditto. * config/bfin/predicates.md (register_no_elim_operand): Ditto. * config/h8300/predicates.md (register_no_sp_elim_operand): Ditto. * config/i386/predicates.md (register_no_elim_operand): Ditto. * config/iq2000/predicates.md (call_insn_operand): Ditto. * config/microblaze/microblaze.h (CALL_INSN_OP): Ditto.
2023-04-20i386: Handle sign-extract for QImode operations with high registers [PR78952]Uros Bizjak3-186/+237
Introduce extract_operator predicate to handle both, zero-extract and sign-extract extract operations with expressions like: (subreg:QI (zero_extract:SWI248 (match_operand 1 "int248_register_operand" "0") (const_int 8) (const_int 8)) 0) As shown in the testcase, this will enable generation of QImode instructions with high registers when signed arguments are used. gcc/ChangeLog: PR target/78952 * config/i386/predicates.md (extract_operator): New predicate. * config/i386/i386.md (any_extract): Remove code iterator. (*cmpqi_ext<mode>_1_mem_rex64): Use extract_operator predicate. (*cmpqi_ext<mode>_1): Ditto. (*cmpqi_ext<mode>_2): Ditto. (*cmpqi_ext<mode>_3_mem_rex64): Ditto. (*cmpqi_ext<mode>_3): Ditto. (*cmpqi_ext<mode>_4): Ditto. (*extzvqi_mem_rex64): Ditto. (*extzvqi): Ditto. (*insvqi_2): Ditto. (*extendqi<SWI24:mode>_ext_1): Ditto. (*addqi_ext<mode>_0): Ditto. (*addqi_ext<mode>_1): Ditto. (*addqi_ext<mode>_2): Ditto. (*subqi_ext<mode>_0): Ditto. (*subqi_ext<mode>_2): Ditto. (*testqi_ext<mode>_1): Ditto. (*testqi_ext<mode>_2): Ditto. (*andqi_ext<mode>_0): Ditto. (*andqi_ext<mode>_1): Ditto. (*andqi_ext<mode>_1_cc): Ditto. (*andqi_ext<mode>_2): Ditto. (*<any_or:code>qi_ext<mode>_0): Ditto. (*<any_or:code>qi_ext<mode>_1): Ditto. (*<any_or:code>qi_ext<mode>_2): Ditto. (*xorqi_ext<mode>_1_cc): Ditto. (*negqi_ext<mode>_2): Ditto. (*ashlqi_ext<mode>_2): Ditto. (*<any_shiftrt:insn>qi_ext<mode>_2): Ditto. gcc/testsuite/ChangeLog: PR target/78952 * gcc.target/i386/pr78952-4.c: New test.
2023-04-20[PR target/108248] [RISC-V] Break down some bitmanip insn typesRaphael Zinsly3-6/+8
This is primarily Raphael's work. All I did was adjust it to apply to the trunk and add the new types to generic.md's scheduling model. The basic idea here is to make sure we have the ability to schedule the bitmanip instructions with a finer degree of control. Some of the bitmanip instructions are likely to have differing scheduler characteristics across different implementations. So rather than assign these instructions a generic "bitmanip" type, this patch assigns them a type based on their RTL code by using the <bitmanip_insn> iterator for the type. Naturally we have to add a few new types. It affects clz, ctz, cpop, min, max. We didn't do this for things like shNadd, single bit manipulation, etc. We certainly could if the needs presents itself. I threw all the new types into the generic_alu bucket in the generic scheduling model. Seems as good a place as any. Someone who knows the sifive uarch should probably add these types (and bitmanip) to the sifive scheduling model. We also noticed that the recently added orc.b didn't have a type at all. So we added it as a generic bitmanip type. This has been bootstrapped in a gcc-12 base and I've built and run the testsuite without regressions on the trunk. Given it was primarily Raphael's work I could probably approve & commit it. But I'd like to give the other RISC-V folks a chance to chime in. PR target/108248 gcc/ * config/riscv/bitmanip.md (clz, ctz, pcnt, min, max patterns): Use <bitmanip_insn> as the type to allow for fine grained control of scheduling these insns. * config/riscv/generic.md (generic_alu): Add bitmanip, clz, ctz, pcnt, min, max. * config/riscv/riscv.md (type attribute): Add types for clz, ctz, pcnt, signed and unsigned min/max.
2023-04-20RISC-V: Fix RVV register orderJuzhe-Zhong4-31/+50
This patch fixes the issue of incorrect reigster order of RVV. The new register order is coming from kito original RVV GCC implementation. Consider this case: void f (void *base,void *base2,void *out,size_t vl, int n) { vuint64m8_t bindex = __riscv_vle64_v_u64m8 (base + 100, vl); for (int i = 0; i < n; i++){ vbool8_t m = __riscv_vlm_v_b8 (base + i, vl); vuint64m8_t v = __riscv_vluxei64_v_u64m8_m(m,base,bindex,vl); vuint64m8_t v2 = __riscv_vle64_v_u64m8_tu (v, base2 + i, vl); vint8m1_t v3 = __riscv_vluxei64_v_i8m1_m(m,base,v,vl); vint8m1_t v4 = __riscv_vluxei64_v_i8m1_m(m,base,v2,vl); __riscv_vse8_v_i8m1 (out + 100*i,v3,vl); __riscv_vse8_v_i8m1 (out + 222*i,v4,vl); } } Before this patch: f: csrr t0,vlenb slli t1,t0,3 sub sp,sp,t1 addi a5,a0,100 vsetvli zero,a3,e64,m8,ta,ma vle64.v v24,0(a5) vs8r.v v24,0(sp) ble a4,zero,.L1 mv a6,a0 add a4,a4,a0 mv a5,a2 .L3: vsetvli zero,zero,e64,m8,ta,ma vl8re64.v v24,0(sp) vlm.v v0,0(a6) vluxei64.v v24,(a0),v24,v0.t addi a6,a6,1 vsetvli zero,zero,e8,m1,tu,ma vmv8r.v v16,v24 vluxei64.v v8,(a0),v24,v0.t vle64.v v16,0(a1) vluxei64.v v24,(a0),v16,v0.t vse8.v v8,0(a2) vse8.v v24,0(a5) addi a1,a1,1 addi a2,a2,100 addi a5,a5,222 bne a4,a6,.L3 .L1: csrr t0,vlenb slli t1,t0,3 add sp,sp,t1 jr ra After this patch: f: addi a5,a0,100 vsetvli zero,a3,e64,m8,ta,ma vle64.v v24,0(a5) ble a4,zero,.L1 mv a6,a0 add a4,a4,a0 mv a5,a2 .L3: vsetvli zero,zero,e64,m8,ta,ma vlm.v v0,0(a6) addi a6,a6,1 vluxei64.v v8,(a0),v24,v0.t vsetvli zero,zero,e8,m1,tu,ma vmv8r.v v16,v8 vluxei64.v v2,(a0),v8,v0.t vle64.v v16,0(a1) vluxei64.v v1,(a0),v16,v0.t vse8.v v2,0(a2) vse8.v v1,0(a5) addi a1,a1,1 addi a2,a2,100 addi a5,a5,222 bne a4,a6,.L3 .L1: ret The redundant register spillings is eliminated. However, there is one more issue need to be addressed which is the redundant move instruction "vmv8r.v". This is another story, and it will be fixed by another patch (Fine tune RVV machine description RA constraint). gcc/ChangeLog: * config/riscv/riscv.h (enum reg_class): Fix RVV register order. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/spill-4.c: Adapt testcase. * gcc.target/riscv/rvv/base/spill-6.c: Adapt testcase. * gcc.target/riscv/rvv/base/reg_order-1.c: New test. Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> Co-authored-by: kito-cheng <kito.cheng@sifive.com>
2023-04-20RISC-V: Fix riscv/arch-19.c with different ISA spec versionKito Cheng1-2/+2
In newer ISA spec, F will implied zicsr, add that into -march option to prevent different test result on different default -misa-spec version. gcc/testsuite/ * gcc.target/riscv/arch-19.c: Add -misa-spec.
2023-04-20RISC-V: Fix wrong check of register occurrences [PR109535]Ju-Zhe Zhong3-1/+168
count_occurrences will conly count same RTX (same code and same mode), but what we want to track is the occurrence of a register, a register might appeared in the insn with different mode or contain in SUBREG. Testcase coming from Kito. gcc/ChangeLog: PR target/109535 * config/riscv/riscv-vsetvl.cc (count_regno_occurrences): New function. (pass_vsetvl::cleanup_insns): Fix bug. gcc/testsuite/ChangeLog: PR target/109535 * g++.target/riscv/rvv/base/pr109535.C: New test. * gcc.target/riscv/rvv/base/pr109535.c: New test. Signed-off-by: Ju-Zhe Zhong <juzhe.zhong@rivai.ai> Co-authored-by: kito-cheng <kito.cheng@sifive.com>
2023-04-20RISC-V: Fix simplify_ior_optimization.c on rv32Kito Cheng1-1/+1
GCC will complaint if target ABI isn't have corresponding multi-lib on glibc toolchain, use stdint-gcc.h to suppress that. gcc/testsuite/ChangeLog: * gcc.target/riscv/simplify_ior_optimization.c: Use stdint-gcc.h rather than stdint.h
2023-04-20amdgcn: bug fix ldexp insnAndrew Stubbs1-16/+9
The vop3 instructions don't support B constraint immediates. Also, take the use the SV_FP iterator to delete a redundant pattern. gcc/ChangeLog: * config/gcn/gcn-valu.md (vnsi, VnSI): Add scalar modes. (ldexp<mode>3): Delete. (ldexp<mode>3<exec>): Change "B" to "A".
2023-04-20amdgcn: update target-supports.expAndrew Stubbs1-5/+10
The backend can now vectorize more things. gcc/testsuite/ChangeLog: * lib/target-supports.exp (check_effective_target_vect_call_copysignf): Add amdgcn. (check_effective_target_vect_call_sqrtf): Add amdgcn. (check_effective_target_vect_call_ceilf): Add amdgcn. (check_effective_target_vect_call_floor): Add amdgcn. (check_effective_target_vect_logical_reduc): Add amdgcn.
2023-04-20tree: Add 3+ argument fndecl_built_in_pJakub Jelinek10-33/+48
On Wed, Feb 22, 2023 at 09:52:06AM +0000, Richard Biener wrote: > > The following testcase ICEs because we still have some spots that > > treat BUILT_IN_UNREACHABLE specially but not BUILT_IN_UNREACHABLE_TRAP > > the same. This patch uses (fndecl_built_in_p (node, BUILT_IN_UNREACHABLE) || fndecl_built_in_p (node, BUILT_IN_UNREACHABLE_TRAP)) a lot and from grepping around, we do something like that in lots of other places, or in some spots instead as (fndecl_built_in_p (node, BUILT_IN_NORMAL) && (DECL_FUNCTION_CODE (node) == BUILT_IN_WHATEVER1 || DECL_FUNCTION_CODE (node) == BUILT_IN_WHATEVER2)) The following patch adds an overload for this case, so we can write it in a shorter way, using C++11 argument packs so that it supports as many codes as one needs. 2023-04-20 Jakub Jelinek <jakub@redhat.com> Jonathan Wakely <jwakely@redhat.com> * tree.h (built_in_function_equal_p): New helper function. (fndecl_built_in_p): Turn into variadic template to support 1 or more built_in_function arguments. * builtins.cc (fold_builtin_expect): Use 3 argument fndecl_built_in_p. * gimplify.cc (goa_stabilize_expr): Likewise. * cgraphclones.cc (cgraph_node::create_clone): Likewise. * ipa-fnsummary.cc (compute_fn_summary): Likewise. * omp-low.cc (setjmp_or_longjmp_p): Likewise. * cgraph.cc (cgraph_edge::redirect_call_stmt_to_callee, cgraph_update_edges_for_call_stmt_node, cgraph_edge::verify_corresponds_to_fndecl, cgraph_node::verify_node): Likewise. * tree-stdarg.cc (optimize_va_list_gpr_fpr_size): Likewise. * gimple-ssa-warn-access.cc (matching_alloc_calls_p): Likewise. * ipa-prop.cc (try_make_edge_direct_virtual_call): Likewise.