aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-05-05arm: [MVE intrinsics] rework vqrdmulhqChristophe Lyon4-213/+3
Implement vqrdmulhq using the new MVE builtins framework. 2022-09-08 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm-mve-builtins-base.cc (vqrdmulhq): New. * config/arm/arm-mve-builtins-base.def (vqrdmulhq): New. * config/arm/arm-mve-builtins-base.h (vqrdmulhq): New. * config/arm/arm_mve.h (vqrdmulhq): Remove. (vqrdmulhq_m): Remove. (vqrdmulhq_s8): Remove. (vqrdmulhq_n_s8): Remove. (vqrdmulhq_s16): Remove. (vqrdmulhq_n_s16): Remove. (vqrdmulhq_s32): Remove. (vqrdmulhq_n_s32): Remove. (vqrdmulhq_m_n_s8): Remove. (vqrdmulhq_m_n_s32): Remove. (vqrdmulhq_m_n_s16): Remove. (vqrdmulhq_m_s8): Remove. (vqrdmulhq_m_s32): Remove. (vqrdmulhq_m_s16): Remove. (__arm_vqrdmulhq_s8): Remove. (__arm_vqrdmulhq_n_s8): Remove. (__arm_vqrdmulhq_s16): Remove. (__arm_vqrdmulhq_n_s16): Remove. (__arm_vqrdmulhq_s32): Remove. (__arm_vqrdmulhq_n_s32): Remove. (__arm_vqrdmulhq_m_n_s8): Remove. (__arm_vqrdmulhq_m_n_s32): Remove. (__arm_vqrdmulhq_m_n_s16): Remove. (__arm_vqrdmulhq_m_s8): Remove. (__arm_vqrdmulhq_m_s32): Remove. (__arm_vqrdmulhq_m_s16): Remove. (__arm_vqrdmulhq): Remove. (__arm_vqrdmulhq_m): Remove.
2023-05-05arm: [MVE intrinsics] factorize vqshlq vshlqChristophe Lyon3-81/+51
Factorize vqshlq and vshlq so that they use the same pattern. 2022-09-08 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/iterators.md (MVE_SHIFT_M_R, MVE_SHIFT_M_N) (MVE_SHIFT_N, MVE_SHIFT_R): New. (mve_insn): Add vqshl, vshl. * config/arm/mve.md (mve_vqshlq_n_<supf><mode>) (mve_vshlq_n_<supf><mode>): Merge into ... (@mve_<mve_insn>q_n_<supf><mode>): ... this. (mve_vqshlq_r_<supf><mode>, mve_vshlq_r_<supf><mode>): Merge into ... (@mve_<mve_insn>q_r_<supf><mode>): ... this. (mve_vqshlq_m_r_<supf><mode>, mve_vshlq_m_r_<supf><mode>): Merge into ... (@mve_<mve_insn>q_m_r_<supf><mode>): ... this. (mve_vqshlq_m_n_<supf><mode>, mve_vshlq_m_n_<supf><mode>): Merge into ... (@mve_<mve_insn>q_m_n_<supf><mode>): ... this. * config/arm/vec-common.md (mve_vshlq_<supf><mode>): Transform into ... (@mve_<mve_insn>q_<supf><mode>): ... this.
2023-05-05arm: [MVE intrinsics] rework vrshlq vqrshlqChristophe Lyon5-952/+9
Implement vrshlq, vqrshlq using the new MVE builtins framework. 2022-09-08 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm-mve-builtins-base.cc (vqrshlq, vrshlq): New. * config/arm/arm-mve-builtins-base.def (vqrshlq, vrshlq): New. * config/arm/arm-mve-builtins-base.h (vqrshlq, vrshlq): New. * config/arm/arm-mve-builtins.cc (has_inactive_argument): Handle vqrshlq, vrshlq. * config/arm/arm_mve.h (vrshlq): Remove. (vrshlq_m_n): Remove. (vrshlq_m): Remove. (vrshlq_x): Remove. (vrshlq_u8): Remove. (vrshlq_n_u8): Remove. (vrshlq_s8): Remove. (vrshlq_n_s8): Remove. (vrshlq_u16): Remove. (vrshlq_n_u16): Remove. (vrshlq_s16): Remove. (vrshlq_n_s16): Remove. (vrshlq_u32): Remove. (vrshlq_n_u32): Remove. (vrshlq_s32): Remove. (vrshlq_n_s32): Remove. (vrshlq_m_n_u8): Remove. (vrshlq_m_n_s8): Remove. (vrshlq_m_n_u16): Remove. (vrshlq_m_n_s16): Remove. (vrshlq_m_n_u32): Remove. (vrshlq_m_n_s32): Remove. (vrshlq_m_s8): Remove. (vrshlq_m_s32): Remove. (vrshlq_m_s16): Remove. (vrshlq_m_u8): Remove. (vrshlq_m_u32): Remove. (vrshlq_m_u16): Remove. (vrshlq_x_s8): Remove. (vrshlq_x_s16): Remove. (vrshlq_x_s32): Remove. (vrshlq_x_u8): Remove. (vrshlq_x_u16): Remove. (vrshlq_x_u32): Remove. (__arm_vrshlq_u8): Remove. (__arm_vrshlq_n_u8): Remove. (__arm_vrshlq_s8): Remove. (__arm_vrshlq_n_s8): Remove. (__arm_vrshlq_u16): Remove. (__arm_vrshlq_n_u16): Remove. (__arm_vrshlq_s16): Remove. (__arm_vrshlq_n_s16): Remove. (__arm_vrshlq_u32): Remove. (__arm_vrshlq_n_u32): Remove. (__arm_vrshlq_s32): Remove. (__arm_vrshlq_n_s32): Remove. (__arm_vrshlq_m_n_u8): Remove. (__arm_vrshlq_m_n_s8): Remove. (__arm_vrshlq_m_n_u16): Remove. (__arm_vrshlq_m_n_s16): Remove. (__arm_vrshlq_m_n_u32): Remove. (__arm_vrshlq_m_n_s32): Remove. (__arm_vrshlq_m_s8): Remove. (__arm_vrshlq_m_s32): Remove. (__arm_vrshlq_m_s16): Remove. (__arm_vrshlq_m_u8): Remove. (__arm_vrshlq_m_u32): Remove. (__arm_vrshlq_m_u16): Remove. (__arm_vrshlq_x_s8): Remove. (__arm_vrshlq_x_s16): Remove. (__arm_vrshlq_x_s32): Remove. (__arm_vrshlq_x_u8): Remove. (__arm_vrshlq_x_u16): Remove. (__arm_vrshlq_x_u32): Remove. (__arm_vrshlq): Remove. (__arm_vrshlq_m_n): Remove. (__arm_vrshlq_m): Remove. (__arm_vrshlq_x): Remove. (vqrshlq): Remove. (vqrshlq_m_n): Remove. (vqrshlq_m): Remove. (vqrshlq_u8): Remove. (vqrshlq_n_u8): Remove. (vqrshlq_s8): Remove. (vqrshlq_n_s8): Remove. (vqrshlq_u16): Remove. (vqrshlq_n_u16): Remove. (vqrshlq_s16): Remove. (vqrshlq_n_s16): Remove. (vqrshlq_u32): Remove. (vqrshlq_n_u32): Remove. (vqrshlq_s32): Remove. (vqrshlq_n_s32): Remove. (vqrshlq_m_n_u8): Remove. (vqrshlq_m_n_s8): Remove. (vqrshlq_m_n_u16): Remove. (vqrshlq_m_n_s16): Remove. (vqrshlq_m_n_u32): Remove. (vqrshlq_m_n_s32): Remove. (vqrshlq_m_s8): Remove. (vqrshlq_m_s32): Remove. (vqrshlq_m_s16): Remove. (vqrshlq_m_u8): Remove. (vqrshlq_m_u32): Remove. (vqrshlq_m_u16): Remove. (__arm_vqrshlq_u8): Remove. (__arm_vqrshlq_n_u8): Remove. (__arm_vqrshlq_s8): Remove. (__arm_vqrshlq_n_s8): Remove. (__arm_vqrshlq_u16): Remove. (__arm_vqrshlq_n_u16): Remove. (__arm_vqrshlq_s16): Remove. (__arm_vqrshlq_n_s16): Remove. (__arm_vqrshlq_u32): Remove. (__arm_vqrshlq_n_u32): Remove. (__arm_vqrshlq_s32): Remove. (__arm_vqrshlq_n_s32): Remove. (__arm_vqrshlq_m_n_u8): Remove. (__arm_vqrshlq_m_n_s8): Remove. (__arm_vqrshlq_m_n_u16): Remove. (__arm_vqrshlq_m_n_s16): Remove. (__arm_vqrshlq_m_n_u32): Remove. (__arm_vqrshlq_m_n_s32): Remove. (__arm_vqrshlq_m_s8): Remove. (__arm_vqrshlq_m_s32): Remove. (__arm_vqrshlq_m_s16): Remove. (__arm_vqrshlq_m_u8): Remove. (__arm_vqrshlq_m_u32): Remove. (__arm_vqrshlq_m_u16): Remove. (__arm_vqrshlq): Remove. (__arm_vqrshlq_m_n): Remove. (__arm_vqrshlq_m): Remove.
2023-05-05arm: [MVE intrinsics] factorize vqrshlq vrshlqChristophe Lyon2-39/+24
Factorize vqrshlq, vrshlq so that they use the same pattern. 2022-09-08 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/iterators.md (MVE_RSHIFT_M_N, MVE_RSHIFT_N): New. (mve_insn): Add vqrshl, vrshl. * config/arm/mve.md (mve_vqrshlq_n_<supf><mode>) (mve_vrshlq_n_<supf><mode>): Merge into ... (@mve_<mve_insn>q_n_<supf><mode>): ... this. (mve_vqrshlq_m_n_<supf><mode>, mve_vrshlq_m_n_<supf><mode>): Merge into ... (@mve_<mve_insn>q_m_n_<supf><mode>): ... this.
2023-05-05arm: [MVE intrinsics] add binary_round_lshift shapeChristophe Lyon2-0/+62
This patch adds the binary_round_lshift shape description. 2022-09-08 Christophe Lyon <christophe.lyon@arm.com> gcc/ * config/arm/arm-mve-builtins-shapes.cc (binary_round_lshift): New. * config/arm/arm-mve-builtins-shapes.h (binary_round_lshift): New.
2023-05-05RISC-V: Fix PR109615Juzhe-Zhong4-66/+56
This patch is to fix following case: void f (int8_t * restrict in, int8_t * restrict out, int n, int m, int cond) { size_t vl = 101; if (cond) vl = m * 2; else vl = m * 2 * vl; for (size_t i = 0; i < n; i++) { vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i, vl); __riscv_vse8_v_i8mf8 (out + i, v, vl); vbool64_t mask = __riscv_vlm_v_b64 (in + i + 100, vl); vint8mf8_t v2 = __riscv_vle8_v_i8mf8_tumu (mask, v, in + i + 100, vl); __riscv_vse8_v_i8mf8 (out + i + 100, v2, vl); } for (size_t i = 0; i < n; i++) { vint8mf8_t v = __riscv_vle8_v_i8mf8 (in + i + 300, vl); __riscv_vse8_v_i8mf8 (out + i + 300, v, vl); } } The value of "vl" is coming from different blocks so it will be wrapped as a PHI node of each block. In the first loop, the "vl" source is a PHI node from bb 4. In the second loop, the "vl" source is a PHI node from bb 5. since bb 5 is dominated by bb 4, the PHI input of "vl" in the second loop is the PHI node of "vl" in bb 4. So when 2 "vl" PHI node are both degenerate PHI node (the phi->num_inputs () == 1) and their only input are same, it's safe for us to consider they are compatible. This patch is only optimize degenerate PHI since it's safe and simple optimization. non-dengerate PHI are considered as incompatible unless the PHI are the same in RTL_SSA. TODO: non-generate PHI is complicated, we can support it when it is necessary in the future. Before this patch: ... .L2: addi a4,a1,100 add t1,a0,a2 mv t0,a0 beq a2,zero,.L1 vsetvli zero,a3,e8,mf8,tu,mu .L4: addi a6,t0,100 addi a7,a4,-100 vle8.v v1,0(t0) addi t0,t0,1 vse8.v v1,0(a7) vlm.v v0,0(a6) vle8.v v1,0(a6),v0.t vse8.v v1,0(a4) addi a4,a4,1 bne t0,t1,.L4 addi a0,a0,300 addi a1,a1,300 add a2,a0,a2 vsetvli zero,a3,e8,mf8,ta,ma .L5: vle8.v v2,0(a0) addi a0,a0,1 vse8.v v2,0(a1) addi a1,a1,1 bne a2,a0,.L5 .L1: ret After this patch: ... .L2: addi a4,a1,100 add t1,a0,a2 mv t0,a0 beq a2,zero,.L1 vsetvli zero,a3,e8,mf8,tu,mu .L4: addi a6,t0,100 addi a7,a4,-100 vle8.v v1,0(t0) addi t0,t0,1 vse8.v v1,0(a7) vlm.v v0,0(a6) vle8.v v1,0(a6),v0.t vse8.v v1,0(a4) addi a4,a4,1 bne t0,t1,.L4 addi a0,a0,300 addi a1,a1,300 add a2,a0,a2 .L5: vle8.v v2,0(a0) addi a0,a0,1 vse8.v v2,0(a1) addi a1,a1,1 bne a2,a0,.L5 .L1: ret PR target/109615 gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (avl_info::multiple_source_equal_p): Add denegrate PHI optmization. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/vsetvl/avl_single-74.c: Adapt testcase. * gcc.target/riscv/rvv/vsetvl/vsetvl-11.c: Ditto. * gcc.target/riscv/rvv/vsetvl/pr109615.c: New test.
2023-05-05i386: Rename index_register_operand predicate to register_no_SP_operandUros Bizjak2-9/+9
Rename index_register_operand predicate to what it really does. No functional change. gcc/ChangeLog: * config/i386/predicates.md (register_no_SP_operand): Rename from index_register_operand. (call_register_operand): Update for rename. * config/i386/i386.md (*lea<mode>_general_[1234]): Update for rename.
2023-05-05match.pd: Use splits in makefile and make configurable.Tamar Christina3-22/+89
This updates the build system to split up match.pd files into chunks of 10. This also introduces a new flag --with-matchpd-partitions which can be used to change the number of partitions. For the analysis of why 10 please look at the previous patch in the series. gcc/ChangeLog: PR bootstrap/84402 * Makefile.in (NUM_MATCH_SPLITS, MATCH_SPLITS_SEQ, GIMPLE_MATCH_PD_SEQ_SRC, GIMPLE_MATCH_PD_SEQ_O, GENERIC_MATCH_PD_SEQ_SRC, GENERIC_MATCH_PD_SEQ_O): New. (OBJS, MOSTLYCLEANFILES, .PRECIOUS): Use them. (s-match): Split into s-generic-match and s-gimple-match. * configure.ac (with-matchpd-partitions, DEFAULT_MATCHPD_PARTITIONS): New. * configure: Regenerate.
2023-05-05match.pd: automatically partition *-match.cc files.Tamar Christina1-36/+190
Following on from Richi's RFC[1] this is another attempt to split up match.pd into multiple gimple-match and generic-match files. This version is fully automated and requires no human intervention. First things first, some perf numbers. The following shows the effect of the patch on my desktop doing parallel compilation of gimple-match: +--------+------------------+--------+------------------+ | splits | rel. improvement | splits | rel. improvement | +--------+------------------+--------+------------------+ | 1 | 0.00% | 33 | 91.03% | | 2 | 71.77% | 34 | 84.02% | | 3 | 100.71% | 35 | 83.42% | | 4 | 143.08% | 36 | 78.80% | | 5 | 176.18% | 37 | 74.06% | | 6 | 174.40% | 38 | 55.76% | | 7 | 176.62% | 39 | 66.90% | | 8 | 168.35% | 40 | 18.25% | | 9 | 189.80% | 41 | 16.55% | | 10 | 171.77% | 42 | 47.02% | | 11 | 152.82% | 43 | 15.29% | | 12 | 112.20% | 44 | 21.63% | | 13 | 158.57% | 45 | 41.53% | | 14 | 158.57% | 46 | 21.98% | | 15 | 152.07% | 47 | -42.74% | | 16 | 151.70% | 48 | -32.62% | | 17 | 131.52% | 49 | 11.81% | | 18 | 133.11% | 50 | 34.07% | | 19 | 137.33% | 51 | 2.71% | | 20 | 103.83% | 52 | -22.23% | | 21 | 132.47% | 53 | 32.30% | | 22 | 116.52% | 54 | 21.45% | | 23 | 112.73% | 55 | 40.02% | | 24 | 111.94% | 56 | 42.83% | | 25 | 112.73% | 57 | -9.98% | | 26 | 104.07% | 58 | 18.01% | | 27 | 113.27% | 59 | -4.91% | | 28 | 96.77% | 60 | 22.94% | | 29 | 93.42% | 61 | -3.73% | | 30 | 87.67% | 62 | -27.43% | | 31 | 89.54% | 63 | -1.05% | | 32 | 84.42% | 64 | -5.44% | +--------+------------------+--------+------------------+ As can be seen there seems to be a point of diminishing returns in doing splits. This comes from the fact that these match files consume a sizeable amount of headers. At a certain point the parsing overhead of the headers dominate and you start losing in gains. As such from this I've made the default 10 splits per file to allow for some room for growth in the future without needing changes to the split amount. Since 5-10 show roughly the same gains it means we can afford to double the file sizes before we need to up the split amount. This can be controlled by the configure parameter --with-matchpd-partitions=. At 10 splits the sizes of the files are: 1.2M gimple-match-1.cc 490K gimple-match-2.cc 459K gimple-match-3.cc 462K gimple-match-4.cc 466K gimple-match-5.cc 690K gimple-match-6.cc 517K gimple-match-7.cc 693K gimple-match-8.cc 1011K gimple-match-9.cc 490K gimple-match-10.cc 210K gimple-match-auto.h The reason gimple-match-1.cc is so large is because it got allocated a very large function: gimple_simplify_NE_EXPR. Because of these sporadically large functions the allocation to a split happens based on the amount of data already written to a split instead of just a simple round robin allocation (though the patch supports that too.). This means that once gimple_simplify_NE_EXPR is allocated to gimple-match-1.cc nothing uses it again until the rest of the files catch up. To support this split a new header file *-match-auto.h is generated to allow the individual files to compile separately. Lastly for the auto generated files I use pragmas to silence the unused predicate warnings instead of the previous Makefile way because I couldn't find a way to set them without knowing the number of split files beforehand. Finally with this change, bootstrap time has dropped 8 minutes on AArch64. [1] https://gcc.gnu.org/legacy-ml/gcc-patches/2018-04/msg01125.html gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (emit_func, SIZED_BASED_CHUNKS, get_out_file): New. (decision_tree::gen): Accept list of files instead of single and update to write function definition to header and main file. (write_predicate): Likewise. (write_header): Emit pragmas and new includes. (main): Create file buffers and cleanup. (showUsage, write_header_includes): New.
2023-05-05genmatch: split shared code to gimple-match-exports.ccTamar Christina4-1193/+1260
In preparation for automatically splitting match.pd files I split off the non-static helper functions that are shared between the match.pd functions off to another file. This file can be compiled in parallel and also allows us to later avoid duplicate symbols errors. gcc/ChangeLog: PR bootstrap/84402 * Makefile.in (OBJS): Add gimple-match-exports.o. * genmatch.cc (decision_tree::gen): Export gimple_gimplify helpers. * gimple-match-head.cc (gimple_simplify, gimple_resimplify1, gimple_resimplify2, gimple_resimplify3, gimple_resimplify4, gimple_resimplify5, constant_for_folding, convert_conditional_op, maybe_resimplify_conditional_op, gimple_match_op::resimplify, maybe_build_generic_op, build_call_internal, maybe_push_res_to_seq, do_valueize, try_conditional_simplification, gimple_extract, gimple_extract_op, canonicalize_code, commutative_binary_op_p, commutative_ternary_op_p, first_commutative_argument, associative_binary_op_p, directly_supported_p, get_conditional_internal_fn): Moved to gimple-match-exports.cc * gimple-match-exports.cc: New file.
2023-05-05match.pd: CSE the dump output check.Tamar Christina1-1/+7
This is a small improvement in QoL codegen for match.pd to save time not re-evaluating the condition for printing debug information in every function. There is a small but consistent runtime and compile time win here. The runtime win comes from not having to do the condition over again, and on Arm plaforms we now use the new test-and-branch support for booleans to only have a single instruction here. gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (decision_tree::gen, write_predicate): Generate new debug_dump var. (dt_simplify::gen_1): Use it.
2023-05-05match.pd: Remove commented out line pragmas unless -vv is used.Tamar Christina1-1/+1
genmatch currently outputs commented out line directives that have no effect but the compiler still has to parse only to discard. They are however handy when debugging genmatch output. As such this moves them behind the -vv flag. gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (output_line_directive): Only emit commented directive when -vv.
2023-05-05match.pd: don't emit label if not neededTamar Christina1-8/+22
This is a small QoL codegen improvement for match.pd to not emit labels when they are not needed. The codegen is nice and there is a small (but consistent) improvement in compile time. gcc/ChangeLog: PR bootstrap/84402 * genmatch.cc (dt_simplify::gen_1): Only emit labels if used.
2023-05-05GCN: Silence unused-variable warningTobias Burnus1-2/+0
gcc/ChangeLog: * config/gcn/gcn.cc (gcn_vectorize_builtin_vectorized_function): Remove unused in_mode/in_n variables.
2023-05-05tree-optimization/109735 - conversion for vectorized pointer-diffRichard Biener1-12/+13
There's handling in vectorizable_operation for POINTER_DIFF_EXPR requiring conversion of the result of the unsigned operation to a signed type. But that's conditional on the "default" kind of vectorization. In this PR it's shown the emulated vector path needs it and I think the masked operation case will, too (though we might eventually never mask an integral MINUS_EXPR). So the following makes that handling unconditional. PR tree-optimization/109735 * tree-vect-stmts.cc (vectorizable_operation): Perform conversion for POINTER_DIFF_EXPR unconditionally.
2023-05-05i386: Introduce mulv2si3 instructionUros Bizjak2-0/+76
For SSE2 targets the expander unpacks input elements into the correct position in the V4SI vector and emits PMULUDQ instruction. The output elements are then shuffled back to their positions in the V2SI vector. For SSE4 targets PMULLD instruction is emitted directly. gcc/ChangeLog: * config/i386/mmx.md (mulv2si3): New expander. (*mulv2si3): New insn pattern. gcc/testsuite/ChangeLog: * gcc.target/i386/sse2-mmx-mult-vec.c: New test.
2023-05-05[libstdc++] [testsuite] xfail double-prec from_chars for ldblAlexandre Oliva2-1/+6
When long double is wider than double, but from_chars is implemented in terms of double, tests that involve the full precision of long double are expected to fail. Mark them as such on aarch64-*-vxworks. for libstdc++-v3/ChangeLog * testsuite/20_util/from_chars/4.cc: Skip long double test06 on aarch64-vxworks. * testsuite/20_util/to_chars/long_double.cc: Xfail run on aarch64-vxworks.
2023-05-05nvptx/mkoffload.cc: Add dummy proc for OpenMP rev-offload table [PR108098]Tobias Burnus1-0/+14
Seemingly, the ptx JIT of CUDA <= 10.2 replaces function pointers in global variables by NULL if a translation does not contain any executable code. It works with CUDA 11.1. The code of this commit is about reverse offload; having NULL values disables the side of reverse offload during image load. Solution is the same as found by Thomas for a related issue: Adding a dummy procedure. Cf. the PR of this issue and Thomas' patch "nvptx: Support global constructors/destructors via 'collect2'" https://gcc.gnu.org/pipermail/gcc-patches/2022-December/607749.html As that approach also works here: Co-authored-by: Thomas Schwinge <thomas@codesourcery.com> gcc/ PR libgomp/108098 * config/nvptx/mkoffload.cc (process): Emit dummy procedure alongside reverse-offload function table to prevent NULL values of the function addresses.
2023-05-05builtins: Fix comment typo mpft_t -> mpfr_tJakub Jelinek2-4/+4
I've noticed 4 typos in comments, fixed thusly. 2023-05-05 Jakub Jelinek <jakub@redhat.com> * builtins.cc (do_mpfr_ckconv, do_mpc_ckconv): Fix comment typo, mpft_t -> mpfr_t. * fold-const-call.cc (do_mpfr_ckconv, do_mpc_ckconv): Likewise.
2023-05-04PHIOPT: Fix diamond case of match_simplify_replacementAndrew Pinski3-3/+90
So it turns out I messed checking which edge was true/false for the diamond form. The edges, e0 and e1 here are edges from the merge block but the true/false edges are from the conditional block and with diamond/threeway, there is a bb inbetween on both edges. Most of the time, the check that was in match_simplify_replacement would happen to be correct for diamond form as most of the time the first edge in the conditional is the edge for the true side of the conditional. This is why I didn't see the issue during bootstrap/testing. I added a fragile gimple testcase which exposed the issue. Since there is no way to specify the order of the edges in the gimple fe, we have to have forwprop to swap the false/true edges (not order of them, just swapping true/false flags) and hope not to do cleanupcfg inbetween forwprop and the first phiopt pass. This is the fragile part really, it is not that we will produce wrong code, just we won't hit what was the failing case. OK? Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/109732 gcc/ChangeLog: * tree-ssa-phiopt.cc (match_simplify_replacement): Fix the selection of the argtrue/argfalse. gcc/testsuite/ChangeLog: * gcc.dg/pr109732.c: New test. * gcc.dg/pr109732-1.c: New test.
2023-05-05MATCH: Add ABSU<a> == 0 to a == 0 simplificationAndrew Pinski2-5/+18
There is already an `ABS<a> == 0` to `a == 0` pattern, this just extends that to ABSU too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. PR tree-optimization/109722 gcc/ChangeLog: * match.pd: Extend the `ABS<a> == 0` pattern to cover `ABSU<a> == 0` too. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/abs-1.c: New test.
2023-05-04Revert "c++: restore instantiate_decl assert"Jason Merrill2-6/+10
In the testcase the assert fails because we use one member function from another while we're in the middle of instantiating them all, which is perfectly fine. It seems complicated to detect this situation, so let's remove the assert again. PR c++/109658 This reverts commit 95d4c0d2e6318aef88ba0bc607dfc1ec6b7a612f. gcc/testsuite/ChangeLog: * g++.dg/template/local10.C: New test.
2023-05-05Daily bump.GCC Administrator7-1/+433
2023-05-04i386: Tighten ashift to lea splitter operand predicates [PR109733]Uros Bizjak2-4/+9
The predicates of ashift to lea post-reload splitter were too broad so the splitter tried to convert the mask shift instruction. Tighten operand predicates to match only general registers. gcc/ChangeLog: PR target/109733 * config/i386/predicates.md (index_reg_operand): New predicate. * config/i386/i386.md (ashift to lea spliter): Use general_reg_operand and index_reg_operand predicates.
2023-05-04PR modula2/109729 cannot use a CHAR type as a FOR loop iteratorGaius Mulley4-30/+85
This patch introduces a new quadruple ArithAddOp which is used in the construction of FOR loop to ensure that when constant folding is applied it does not concatenate two constant char operands into a string constant. Overloading only occurs with constant operands. gcc/m2/ChangeLog: PR modula2/109729 * gm2-compiler/M2GenGCC.mod (CodeStatement): Detect ArithAddOp and call CodeAddChecked. (ResolveConstantExpressions): Detect ArithAddOp and call FoldArithAdd. (FoldArithAdd): New procedure. (FoldAdd): Refactor to use FoldArithAdd. * gm2-compiler/M2Quads.def (QuadOperator): Add ArithAddOp. * gm2-compiler/M2Quads.mod: Remove commented imports. (QuadFrame): Changed comments to use GNU coding standards. (ArithPlusTok): New global variable. (BuildForToByDo): Use ArithPlusTok instead of PlusTok. (MakeOp): Detect ArithPlusTok and return ArithAddOp. (WriteQuad): Add ArithAddOp clause. (WriteOperator): Add ArithAddOp clause. (Init): Initialize ArithPlusTok. gcc/testsuite/ChangeLog: PR modula2/109729 * gm2/pim/run/pass/ForChar.mod: New test. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-05-04[2/2] aarch64: Reimplement (R){ADD,SUB}HN2 patterns with standard RTL codesKyrylo Tkachov2-34/+82
Similar to the previous patch, this one converts the high-half versions of the patterns. With this patch we can remove the UNSPEC_* codes involved entirely. Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur><addsub>hn2<mode>_insn_le): Rename and reimplement with RTL codes to... (aarch64_<optab>hn2<mode>_insn_le): .. This. (aarch64_r<optab>hn2<mode>_insn_le): New pattern. (aarch64_<sur><addsub>hn2<mode>_insn_be): Rename and reimplement with RTL codes to... (aarch64_<optab>hn2<mode>_insn_be): ... This. (aarch64_r<optab>hn2<mode>_insn_be): New pattern. (aarch64_<sur><addsub>hn2<mode>): Rename and adjust expander to... (aarch64_<optab>hn2<mode>): ... This. (aarch64_r<optab>hn2<mode>): New expander. * config/aarch64/iterators.md (UNSPEC_ADDHN, UNSPEC_RADDHN, UNSPEC_SUBHN, UNSPEC_RSUBHN): Delete unspecs. (ADDSUBHN): Delete. (sur): Remove handling of the above. (addsub): Likewise.
2023-05-04[1/2] aarch64: Reimplement (R){ADD,SUB}HN intrinsics with RTL codesKyrylo Tkachov3-35/+88
We can implement the halving-narrowing add/sub patterns with standard RTL codes as well rather than relying on unspecs. This patch handles the low-part ones and the second patch does the high-part ones and removes the unspecs themselves. The operation ADDHN on V4SI, for example, is represented as (truncate:V4HI ((src1:V4SI + src2:V4SI) >> 16)) and RADDHN as (truncate:V4HI ((src1:V4SI + src2:V4SI + (1 << 15)) >> 16)). Taking this opportunity I specified the patterns returning the narrow mode and annotated them with the <vczle><vczbe> define_subst rules to get the vec_concat-zero meta-patterns too. This allows us to simplify the expanders somewhat too. Tests are added to check that the combinations work. Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf. gcc/ChangeLog: * config/aarch64/aarch64-simd.md (aarch64_<sur><addsub>hn<mode>_insn_le): Delete. (aarch64_<optab>hn<mode>_insn<vczle><vczbe>): New define_insn. (aarch64_<sur><addsub>hn<mode>_insn_be): Delete. (aarch64_r<optab>hn<mode>_insn<vczle><vczbe>): New define_insn. (aarch64_<sur><addsub>hn<mode>): Delete. (aarch64_<optab>hn<mode>): New define_expand. (aarch64_r<optab>hn<mode>): Likewise. * config/aarch64/predicates.md (aarch64_simd_raddsubhn_imm_vec): New predicate. gcc/testsuite/ChangeLog: * gcc.target/aarch64/simd/pr99195_4.c: New test.
2023-05-04OpenACC: Further attach/detach clause fixes for Fortran [PR109622]Julian Brown8-4/+132
This patch moves several tests introduced by the following patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616939.html commit r14-325-gcacf65d74463600815773255e8b82b4043432bd7 into the proper location for OpenACC testing (thanks to Thomas for spotting my mistake!), and also fixes a few additional problems -- missing diagnostics for non-pointer attaches, and a case where a pointer was incorrectly dereferenced. Tests are also adjusted for vector-length warnings on nvidia accelerators. 2023-04-29 Julian Brown <julian@codesourcery.com> PR fortran/109622 gcc/fortran/ * openmp.cc (resolve_omp_clauses): Add diagnostic for non-pointer/non-allocatable attach/detach. * trans-openmp.cc (gfc_trans_omp_clauses): Remove dereference for pointer-to-scalar derived type component attach/detach. Fix attach/detach handling for descriptors. gcc/testsuite/ * gfortran.dg/goacc/pr109622-5.f90: New test. * gfortran.dg/goacc/pr109622-6.f90: New test. libgomp/ * testsuite/libgomp.fortran/pr109622.f90: Move test... * testsuite/libgomp.oacc-fortran/pr109622.f90: ...to here. Ignore vector length warning. * testsuite/libgomp.fortran/pr109622-2.f90: Move test... * testsuite/libgomp.oacc-fortran/pr109622-2.f90: ...to here. Add missing copyin/copyout variable. Ignore vector length warnings. * testsuite/libgomp.fortran/pr109622-3.f90: Move test... * testsuite/libgomp.oacc-fortran/pr109622-3.f90: ...to here. Ignore vector length warnings. * testsuite/libgomp.oacc-fortran/pr109622-4.f90: New test.
2023-05-04libstdc++: Document new library version in manualJonathan Wakely2-3/+5
libstdc++-v3/ChangeLog: * doc/xml/manual/abi.xml (abi.versioning.history): Document libstdc++.so.6.0.32 and GLIBCXX_3.4.32 version. * doc/html/manual/abi.html: Regenerate.
2023-05-04libstdc++: Mention recent libgcc_s symbol versions in manualFlorian Weimer1-0/+5
GCC_11.0 is an aarch64-specific outlier. libstdc++-v3/ChangeLog: * doc/xml/manual/abi.xml (abi.versioning.history): Add GCC_7.0.0, GCC_9.0.0, GCC_11.0, GCC_12.0.0, GCC_13.0.0 for libgcc_s.
2023-05-04PHIOPT: Improve replace_phi_edge_with_variable for diamond shapped bbAndrew Pinski7-12/+43
While looking at differences between what minmax_replacement and match_simplify_replacement does. I noticed that they sometimes chose different edges to remove. I decided we should be able to do better and be able to remove both empty basic blocks in the case of match_simplify_replacement as that moves the statements. This also updates the testcases as now match_simplify_replacement will remove the unused MIN/MAX_EXPR and they were checking for those. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Handle diamond form bb with forwarder only empty blocks better. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/minmax-15.c: Update test. * gcc.dg/tree-ssa/minmax-16.c: Update test. * gcc.dg/tree-ssa/minmax-3.c: Update test. * gcc.dg/tree-ssa/minmax-4.c: Update test. * gcc.dg/tree-ssa/minmax-5.c: Update test. * gcc.dg/tree-ssa/minmax-8.c: Update test.
2023-05-04Move copy_phi_arg_into_existing_phi to common location and use itAndrew Pinski4-46/+30
While improving replace_phi_edge_with_variable for the diamond formed bb case, I need a way to copy phi entries from one edge to another as I am removing a forwarding bb inbetween. I was pointed out that jump threading code had copy_phi_arg_into_existing_phi which I can use. I also noticed that both gimple_duplicate_sese_tail and remove_forwarder_block have similar code so it makes sense to use that function in those two locations too. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-threadupdate.cc (copy_phi_arg_into_existing_phi): Move to ... * tree-cfg.cc (copy_phi_arg_into_existing_phi): Here and remove static. (gimple_duplicate_sese_tail): Use copy_phi_arg_into_existing_phi instead of an inline version of it. * tree-cfgcleanup.cc (remove_forwarder_block): Likewise. * tree-cfg.h (copy_phi_arg_into_existing_phi): New declaration.
2023-05-04PHIOPT: Improve replace_phi_edge_with_variable's dce_ssa_names slightlyAndrew Pinski1-2/+3
When I added the dce_ssa_names argument, I didn't realize bitmap was a pointer so I used the default argument value as auto_bitmap(). But instead we could just use nullptr and check if it was a nullptr before calling simple_dce_from_worklist. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * tree-ssa-phiopt.cc (replace_phi_edge_with_variable): Change the default argument value for dce_ssa_names to nullptr. Check to make sure dce_ssa_names is a non-nullptr before calling simple_dce_from_worklist.
2023-05-04i386: Improve index_register_operand predicateUros Bizjak1-24/+21
Use the same approach as in register_no_elim_operand predicate, but also reject stack_pointer_rtx operands. gcc/ChangeLog: * config/i386/predicates.md (index_register_operand): Reject arg_pointer_rtx, frame_pointer_rtx, stack_pointer_rtx and VIRTUAL_REGISTER_P operands. Allow subregs of memory before reload. (call_register_no_elim_operand): Rewrite as ... (call_register_operand): ... this. (call_insn_operand): Use call_register_operand predicate.
2023-05-04tree-optimization/109721 - emulated vectorsRichard Biener1-2/+8
When fixing PR109672 I noticed we let SImode AND through when target_support_p even though it isn't word_mode and I didn't want to change that but had to catch the case where SImode PLUS is supported but emulated vectors rely on it being word_mode. The following makes sure to preserve the word_mode check when !target_support_p to avoid excessive lowering later even for bit operations. PR tree-optimization/109721 * tree-vect-stmts.cc (vectorizable_operation): Make sure to test word_mode for all !target_support_p operations.
2023-05-04aarch64: PR target/99195 annotate simple ternary ops for vec-concat with zeroKyrylo Tkachov2-19/+87
We're now moving onto various simple ternary instructions, including some lane forms. These include intrinsics that map down to mla, mls, fma, aba, bsl instructions. Tests are added for lane 0 and lane 1 as for some of these instructions the lane 0 variants use separate simpler patterns that need a separate annotation. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (aarch64_<su>aba<mode>): Rename to... (aarch64_<su>aba<mode><vczle><vczbe>): ... This. (aarch64_mla<mode>): Rename to... (aarch64_mla<mode><vczle><vczbe>): ... This. (*aarch64_mla_elt<mode>): Rename to... (*aarch64_mla_elt<mode><vczle><vczbe>): ... This. (*aarch64_mla_elt_<vswap_width_name><mode>): Rename to... (*aarch64_mla_elt_<vswap_width_name><mode><vczle><vczbe>): ... This. (aarch64_mla_n<mode>): Rename to... (aarch64_mla_n<mode><vczle><vczbe>): ... This. (aarch64_mls<mode>): Rename to... (aarch64_mls<mode><vczle><vczbe>): ... This. (*aarch64_mls_elt<mode>): Rename to... (*aarch64_mls_elt<mode><vczle><vczbe>): ... This. (*aarch64_mls_elt_<vswap_width_name><mode>): Rename to... (*aarch64_mls_elt_<vswap_width_name><mode><vczle><vczbe>): ... This. (aarch64_mls_n<mode>): Rename to... (aarch64_mls_n<mode><vczle><vczbe>): ... This. (fma<mode>4): Rename to... (fma<mode>4<vczle><vczbe>): ... This. (*aarch64_fma4_elt<mode>): Rename to... (*aarch64_fma4_elt<mode><vczle><vczbe>): ... This. (*aarch64_fma4_elt_<vswap_width_name><mode>): Rename to... (*aarch64_fma4_elt_<vswap_width_name><mode><vczle><vczbe>): ... This. (*aarch64_fma4_elt_from_dup<mode>): Rename to... (*aarch64_fma4_elt_from_dup<mode><vczle><vczbe>): ... This. (fnma<mode>4): Rename to... (fnma<mode>4<vczle><vczbe>): ... This. (*aarch64_fnma4_elt<mode>): Rename to... (*aarch64_fnma4_elt<mode><vczle><vczbe>): ... This. (*aarch64_fnma4_elt_<vswap_width_name><mode>): Rename to... (*aarch64_fnma4_elt_<vswap_width_name><mode><vczle><vczbe>): ... This. (*aarch64_fnma4_elt_from_dup<mode>): Rename to... (*aarch64_fnma4_elt_from_dup<mode><vczle><vczbe>): ... This. (aarch64_simd_bsl<mode>_internal): Rename to... (aarch64_simd_bsl<mode>_internal<vczle><vczbe>): ... This. (*aarch64_simd_bsl<mode>_alt): Rename to... (*aarch64_simd_bsl<mode>_alt<vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_3.c: New test.
2023-05-04aarch64: PR target/99195 annotate more simple binary ops for vec-concat with ↵Kyrylo Tkachov3-13/+21
zero More pattern annotations and tests to eliminate redundant vec-concat with zero instructions. These are for the abd family of instructions and the pairwise floating-point max/min and fadd operations too. Bootstrapped and tested on aarch64-none-linux-gnu. gcc/ChangeLog: PR target/99195 * config/aarch64/aarch64-simd.md (aarch64_<su>abd<mode>): Rename to... (aarch64_<su>abd<mode><vczle><vczbe>): ... This. (fabd<mode>3): Rename to... (fabd<mode>3<vczle><vczbe>): ... This. (aarch64_<optab>p<mode>): Rename to... (aarch64_<optab>p<mode><vczle><vczbe>): ... This. (aarch64_faddp<mode>): Rename to... (aarch64_faddp<mode><vczle><vczbe>): ... This. gcc/testsuite/ChangeLog: PR target/99195 * gcc.target/aarch64/simd/pr99195_1.c: Add testing for more binary ops. * gcc.target/aarch64/simd/pr99195_2.c: Add testing for more binary ops.
2023-05-04gcov: add GCOV format version to gcov -vMartin Liska1-1/+4
gcc/ChangeLog: * gcov.cc (GCOV_JSON_FORMAT_VERSION): New definition. (print_version): Use it. (generate_results): Likewise.
2023-05-04tree-optimization/109724 - new testcaseRichard Biener1-0/+32
The following adds a testcase for PR109724 which was caused by backporting r13-2375-gbe1b42de9c151d and fixed by r11-199-g2b42509f8b7bdf. PR tree-optimization/109724 * g++.dg/torture/pr109724.C: New testcase.
2023-05-04Rename last_stmt to last_nondebug_stmtRichard Biener18-74/+79
The following renames last_stmt to last_nondebug_stmt which is what it really does. * tree-cfg.h (last_stmt): Rename to ... (last_nondebug_stmt): ... this. * tree-cfg.cc (last_stmt): Rename to ... (last_nondebug_stmt): ... this. (assign_discriminators): Adjust. (group_case_labels_stmt): Likewise. (gimple_can_duplicate_bb_p): Likewise. (execute_fixup_cfg): Likewise. * auto-profile.cc (afdo_propagate_circuit): Likewise. * gimple-range.cc (gimple_ranger::range_on_exit): Likewise. * omp-expand.cc (workshare_safe_to_combine_p): Likewise. (determine_parallel_type): Likewise. (adjust_context_and_scope): Likewise. (expand_task_call): Likewise. (remove_exit_barrier): Likewise. (expand_omp_taskreg): Likewise. (expand_omp_for_init_counts): Likewise. (expand_omp_for_init_vars): Likewise. (expand_omp_for_static_chunk): Likewise. (expand_omp_simd): Likewise. (expand_oacc_for): Likewise. (expand_omp_for): Likewise. (expand_omp_sections): Likewise. (expand_omp_atomic_fetch_op): Likewise. (expand_omp_atomic_cas): Likewise. (expand_omp_atomic): Likewise. (expand_omp_target): Likewise. (expand_omp): Likewise. (omp_make_gimple_edges): Likewise. * trans-mem.cc (tm_region_init): Likewise. * tree-inline.cc (redirect_all_calls): Likewise. * tree-parloops.cc (gen_parallel_loop): Likewise. * tree-ssa-loop-ch.cc (do_while_loop_p): Likewise. * tree-ssa-loop-ivcanon.cc (canonicalize_loop_induction_variables): Likewise. * tree-ssa-loop-ivopts.cc (stmt_after_ip_normal_pos): Likewise. (may_eliminate_iv): Likewise. * tree-ssa-loop-manip.cc (standard_iv_increment_position): Likewise. * tree-ssa-loop-niter.cc (do_warn_aggressive_loop_optimizations): Likewise. (estimate_numbers_of_iterations): Likewise. * tree-ssa-loop-split.cc (compute_added_num_insns): Likewise. * tree-ssa-loop-unswitch.cc (get_predicates_for_bb): Likewise. (set_predicates_for_bb): Likewise. (init_loop_unswitch_info): Likewise. (hoist_guard): Likewise. * tree-ssa-phiopt.cc (match_simplify_replacement): Likewise. (minmax_replacement): Likewise. * tree-ssa-reassoc.cc (update_range_test): Likewise. (optimize_range_tests_to_bit_test): Likewise. (optimize_range_tests_var_bound): Likewise. (optimize_range_tests): Likewise. (no_side_effect_bb): Likewise. (suitable_cond_bb): Likewise. (maybe_optimize_range_tests): Likewise. (reassociate_bb): Likewise. * tree-vrp.cc (rvrp_folder::pre_fold_bb): Likewise.
2023-05-04i386: Fix up handling of debug insns in STV [PR109676]Jakub Jelinek2-4/+51
The following testcase ICEs because STV replaces there (debug_insn 114 47 51 8 (var_location:TI D#3 (reg:TI 91 [ p ])) -1 (nil)) with (debug_insn 114 47 51 8 (var_location:TI D#3 (reg:V1TI 91 [ p ])) -1 (nil)) which is invalid because of the mode mismatch. STV has fix_debug_reg_uses function which is supposed to fix this up and adjust such debug insns into (debug_insn 114 47 51 8 (var_location:TI D#3 (subreg:TI (reg:V1TI 91 [ p ]) 0)) -1 (nil)) but it doesn't trigger here. The IL before stv1 has: (debug_insn 114 47 51 8 (var_location:TI D#3 (reg:TI 91 [ p ])) -1 (nil)) ... (insn 63 62 64 8 (set (mem/c:TI (reg/f:DI 89 [ .result_ptr ]) [0 <retval>.mStorage+0 S16 A32]) (reg:TI 91 [ p ])) "pr109676.C":4:48 87 {*movti_internal} (expr_list:REG_DEAD (reg:TI 91 [ p ]) (nil))) in bb 8 and (insn 97 96 98 9 (set (reg:TI 91 [ p ]) (mem/c:TI (plus:DI (reg/f:DI 19 frame) (const_int -32 [0xffffffffffffffe0])) [0 p+0 S16 A128])) "pr109676.C":26:12 87 {*movti_internal} (nil)) (insn 98 97 99 9 (set (mem/c:TI (plus:DI (reg/f:DI 19 frame) (const_int -64 [0xffffffffffffffc0])) [0 tmp+0 S16 A128]) (reg:TI 91 [ p ])) "pr109676.C":26:12 87 {*movti_internal} (nil)) in bb9. PUT_MODE on a REG is done in two spots in timode_scalar_chain::convert_insn, one is: switch (GET_CODE (dst)) { case REG: if (GET_MODE (dst) == TImode) { PUT_MODE (dst, V1TImode); fix_debug_reg_uses (dst); } if (GET_MODE (dst) == V1TImode) when seeing the REG in SET_DEST and another one the hunk the patch adjusts. Because bb 8 comes first in the order the pass walks the bbs, we first notice the TImode pseudo on insn 63 where it is SET_SRC, use PUT_MODE there unconditionally, so for a shared REG it changes all other uses in the IL, and then don't call fix_debug_reg_uses because DF_REG_DEF_CHAIN (REGNO (src)) is non-NULL - the REG is set in insn 97 but we haven't processed it yet. Later on we process insn 97, but because the REG in SET_DEST already has V1TImode, we don't do anything, even when the src handling code earlier relied on it being done. The following patch fixes this by using similar code for both dst and src, in particular calling fix_debug_reg_uses once when we actually change REG mode from TImode to V1TImode, and not later on. 2023-05-04 Jakub Jelinek <jakub@redhat.com> PR debug/109676 * config/i386/i386-features.cc (timode_scalar_chain::convert_insn): If src is REG, change its mode to V1TImode and call fix_debug_reg_uses for it only if it still has TImode. Don't decide whether to call fix_debug_reg_uses based on whether SRC is ever set or not. * g++.target/i386/pr109676.C: New test.
2023-05-04CRIS: peephole2 an "and" with a contiguous "one-sided" sequences of 1sHans-Peter Nilsson8-11/+146
This kind of transformation seems pretty generic and might be a candidate for adding to the middle-end, perhaps as part of combine. I noticed these happened more often for LRA, which is the reason I went on this track of low-hanging-fruit-microoptimizations that are such an itch when noticing them, inspecting generated code for libgcc. Unfortunately, this one improves coremark only by a few cycles at the beginning or end (<0.0005%) for cris-elf -march=v10. The size of the coremark code is down by 0.4% (0.22% pre-lra). Using an iterator from the start because other binary operations will be added and their define_peephole2's would look exactly the same for the .md part. Some existing and-peephole2-related tests suffered, because many of them were using patterns with only contiguous 1:s in them: adjusted. Also, spotted and fixed, by adding a space, some scan-assembler-strings that were prone to spurious identifier or file name matches. gcc: * config/cris/cris.cc (cris_split_constant): New function. * config/cris/cris.md (splitop): New iterator. (opsplit1): New define_peephole2. * config/cris/cris-protos.h (cris_split_constant): Declare. (cris_splittable_constant_p): New macro. gcc/testsuite: * gcc.target/cris/peep2-andsplit1.c: New test. * gcc.target/cris/peep2-andu1.c, gcc.target/cris/peep2-andu2.c, gcc.target/cris/peep2-xsrand.c, gcc.target/cris/peep2-xsrand2.c: Adjust values to avoid interference with "opsplit1" with AND. Add whitespace to match-strings that may be confused with identifiers or file names.
2023-05-04CRIS-LRA: Define TARGET_SPILL_CLASSHans-Peter Nilsson1-0/+12
This has no effect on arith-rand-ll (which suffers badly from LRA) and marginal effects (0.01% improvement) on coremark, but the size of coremark shrinks by 0.2%. An earlier version was tested with a tree around 2023-03 which showed (marginally) that ALL_REGS is preferable to GENERAL_REGS. * config/cris/cris.cc (TARGET_SPILL_CLASS): Define to ALL_REGS.
2023-05-04PR modula2/109675 implementation of writeAddress is non portableGaius Mulley92-1134/+1139
The implementation of gcc/m2/gm2-libs/DynamicStrings.mod:writeAddress is non portable as it casts a void * into an unsigned long int. This procedure has been re-implemented to use snprintf. As it is a library the support tools 'mc' and 'pge' have been rebuilt. There have been linking changes in the library and the underlying boolean type is now bool since the last rebuild hence the size of the patch. gcc/m2/ChangeLog: PR modula2/109675 * Make-lang.in (MC-LIB-DEFS): Remove M2LINK.def. (BUILD-PGE-O): Remove GM2LINK.o. * Make-maintainer.in (PPG-DEFS): New define. (PPG-LIB-DEFS): Remove M2LINK.def. (BUILD-BOOT-PPG-H): Add PPGDEF .h files. (m2/ppg$(exeext)): Remove M2LINK.o (PGE-DEPS): New define. (m2/pg$(exeext)): Remove M2LINK.o. (m2/gm2-pge-boot/$(SRC_PREFIX)%.o): Add -Im2/gm2-pge-boot. (m2/pge$(exeext)): Remove M2LINK.o. (pge-maintainer): Re-implement. (pge-libs-push): Re-implement. (m2/m2obj3/cc1gm2$(exeext)): Remove M2LINK.o. * gm2-libs/DynamicStrings.mod (writeAddress): Re-implement using snprintf. * gm2-libs/M2Dependent.mod: Remove commented out imports. * mc-boot/GDynamicStrings.cc: Rebuild. * mc-boot/GFIO.cc: Rebuild. * mc-boot/GFormatStrings.cc: Rebuild. * mc-boot/GM2Dependent.cc: Rebuild. * mc-boot/GM2Dependent.h: Rebuild. * mc-boot/GM2RTS.cc: Rebuild. * mc-boot/GM2RTS.h: Rebuild. * mc-boot/GRTExceptions.cc: Rebuild. * mc-boot/GRTint.cc: Rebuild. * mc-boot/GSFIO.cc: Rebuild. * mc-boot/GStringConvert.cc: Rebuild. * mc-boot/Gdecl.cc: Rebuild. * pge-boot/GASCII.cc: Rebuild. * pge-boot/GASCII.h: Rebuild. * pge-boot/GArgs.cc: Rebuild. * pge-boot/GArgs.h: Rebuild. * pge-boot/GAssertion.cc: Rebuild. * pge-boot/GAssertion.h: Rebuild. * pge-boot/GBreak.h: Rebuild. * pge-boot/GCmdArgs.h: Rebuild. * pge-boot/GDebug.cc: Rebuild. * pge-boot/GDebug.h: Rebuild. * pge-boot/GDynamicStrings.cc: Rebuild. * pge-boot/GDynamicStrings.h: Rebuild. * pge-boot/GEnvironment.h: Rebuild. * pge-boot/GFIO.cc: Rebuild. * pge-boot/GFIO.h: Rebuild. * pge-boot/GFormatStrings.h:: Rebuild. * pge-boot/GFpuIO.h:: Rebuild. * pge-boot/GIO.cc: Rebuild. * pge-boot/GIO.h: Rebuild. * pge-boot/GIndexing.cc: Rebuild. * pge-boot/GIndexing.h: Rebuild. * pge-boot/GLists.cc: Rebuild. * pge-boot/GLists.h: Rebuild. * pge-boot/GM2Dependent.cc: Rebuild. * pge-boot/GM2Dependent.h: Rebuild. * pge-boot/GM2EXCEPTION.cc: Rebuild. * pge-boot/GM2EXCEPTION.h: Rebuild. * pge-boot/GM2RTS.cc: Rebuild. * pge-boot/GM2RTS.h: Rebuild. * pge-boot/GNameKey.cc: Rebuild. * pge-boot/GNameKey.h: Rebuild. * pge-boot/GNumberIO.cc: Rebuild. * pge-boot/GNumberIO.h: Rebuild. * pge-boot/GOutput.cc: Rebuild. * pge-boot/GOutput.h: Rebuild. * pge-boot/GPushBackInput.cc: Rebuild. * pge-boot/GPushBackInput.h: Rebuild. * pge-boot/GRTExceptions.cc: Rebuild. * pge-boot/GRTExceptions.h: Rebuild. * pge-boot/GSArgs.h: Rebuild. * pge-boot/GSEnvironment.h: Rebuild. * pge-boot/GSFIO.cc: Rebuild. * pge-boot/GSFIO.h: Rebuild. * pge-boot/GSYSTEM.h: Rebuild. * pge-boot/GScan.h: Rebuild. * pge-boot/GStdIO.cc: Rebuild. * pge-boot/GStdIO.h: Rebuild. * pge-boot/GStorage.cc: Rebuild. * pge-boot/GStorage.h: Rebuild. * pge-boot/GStrCase.cc: Rebuild. * pge-boot/GStrCase.h: Rebuild. * pge-boot/GStrIO.cc: Rebuild. * pge-boot/GStrIO.h: Rebuild. * pge-boot/GStrLib.cc: Rebuild. * pge-boot/GStrLib.h: Rebuild. * pge-boot/GStringConvert.h: Rebuild. * pge-boot/GSymbolKey.cc: Rebuild. * pge-boot/GSymbolKey.h: Rebuild. * pge-boot/GSysExceptions.h: Rebuild. * pge-boot/GSysStorage.cc: Rebuild. * pge-boot/GSysStorage.h: Rebuild. * pge-boot/GTimeString.h: Rebuild. * pge-boot/GUnixArgs.h: Rebuild. * pge-boot/Gbnflex.cc: Rebuild. * pge-boot/Gbnflex.h: Rebuild. * pge-boot/Gdtoa.h: Rebuild. * pge-boot/Gerrno.h: Rebuild. * pge-boot/Gldtoa.h: Rebuild. * pge-boot/Glibc.h: Rebuild. * pge-boot/Glibm.h: Rebuild. * pge-boot/Gpge.cc: Rebuild. * pge-boot/Gtermios.h: Rebuild. * pge-boot/Gwrapc.h: Rebuild. * mc-boot/GM2LINK.h: Removed. * pge-boot/GM2LINK.cc: Removed. * pge-boot/GM2LINK.h: Removed. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2023-05-04Daily bump.GCC Administrator5-1/+2511
2023-05-04CRIS-LRA: Fix uses of reload_in_progressHans-Peter Nilsson3-20/+20
This shows no difference neither in arith-rand-ll nor coremark numbers. Comparing libgcc and newlib libc before/after, the only difference can be seen in a few functions where it's mostly neutral (newlib's _svfprintf_r et al) and one function (__gdtoa), which improves ever so slightly (four bytes less; one load less, but one instruction reading from memory instead of a register). * config/cris/cris.cc (cris_side_effect_mode_ok): Use lra_in_progress, not reload_in_progress. * config/cris/cris.md ("movdi", "*addi_reload"): Ditto. * config/cris/constraints.md ("Q"): Ditto.
2023-05-03libstdc++: Fix up abi.exp FAILs on powerpc64le-linuxJakub Jelinek3-0/+28
This is an ABI problem on powerpc64le-linux, introduced in 13.1. When libstdc++ is configured against old glibc, the _ZSt10from_charsPKcS0_RDF128_St12chars_format@@GLIBCXX_3.4.31 _ZSt8to_charsPcS_DF128_@@GLIBCXX_3.4.31 _ZSt8to_charsPcS_DF128_St12chars_format@@GLIBCXX_3.4.31 _ZSt8to_charsPcS_DF128_St12chars_formati@@GLIBCXX_3.4.31 symbols are exported from the library, while when it is configured against new enough glibc, those symbols aren't exported and we export instead _ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format@@GLIBCXX_IEEE128_3.4.29 _ZSt8to_charsPcS_u9__ieee128@@GLIBCXX_IEEE128_3.4.29 _ZSt8to_charsPcS_u9__ieee128St12chars_format@@GLIBCXX_IEEE128_3.4.29 _ZSt8to_charsPcS_u9__ieee128St12chars_formati@@GLIBCXX_IEEE128_3.4.29 together with various other @@GLIBCXX_IEEE128_3.4.{29,30,31} and @@CXXABI_IEEE128_1.3.13 symbols. The idea was that those *IEEE128* symbol versions (similarly to *LDBL* symbol versions) are optional (but if it appears, all symbols from it up to the version of the library appears), but the base appears always. My _Float128 from_chars/to_chars changes unfortunately broke this. I believe nothing really uses those symbols if libstdc++ has been configured against old glibc, so if 13.1 wasn't already released, it might be best to make sure they aren't exported on powerpc64le-linux. But as they were exported, I think the best resolution for this ABI difference is to add those 4 symbols as aliases to the GLIBCXX_IEEE128_3.4.29 *u9__ieee128* symbols, which the following patch does. 2023-05-03 Jakub Jelinek <jakub@redhat.com> * src/c++17/floating_from_chars.cc (_ZSt10from_charsPKcS0_RDF128_St12chars_format): New alias to _ZSt10from_charsPKcS0_Ru9__ieee128St12chars_format. * src/c++17/floating_to_chars.cc (_ZSt8to_charsPcS_DF128_): New alias to _ZSt8to_charsPcS_u9__ieee128. (_ZSt8to_charsPcS_DF128_St12chars_format): New alias to _ZSt8to_charsPcS_u9__ieee128St12chars_format. (_ZSt8to_charsPcS_DF128_St12chars_formati): New alias to _ZSt8to_charsPcS_u9__ieee128St12chars_formati. * config/abi/post/powerpc64le-linux-gnu/baseline_symbols.txt: Updated.
2023-05-03libstdc++: Fix up abi.exp FAILs on powerpc64-linuxJakub Jelinek3-275/+6653
As discussed on IRC, my _Float128/_Float64x support changes broke abi.exp testing on powerpc64-linux. The _ZTIDF128_@@CXXABI_1.3.14 _ZTIDF64x@@CXXABI_1.3.14 _ZTIPDF128_@@CXXABI_1.3.14 _ZTIPDF64x@@CXXABI_1.3.14 _ZTIPKDF128_@@CXXABI_1.3.14 _ZTIPKDF64x@@CXXABI_1.3.14 symbols only appear on powerpc64le-linux (both when building against very old glibcs as well as contemporary glibcs), while they don't appear on powerpc64-linux, because the latter never has _Float128 or _Float64x support. But we were using the same baseline_symbols.txt file for both powerpc64-linux and powerpc64le-linux, even when it contained quite a lot of stuff specific to the latter; but that was just the IEEE128 related stuff that appears only when configured against not very old glibc. The following patch keeps those exports as is and just splits the config/abi/post/ files, copies the current one to powerpc64le-linux unmodified and removes the above mentioned symbols plus all GLIBCXX_IEEE128_3.4.{29,30,31} and CXXABI_IEEE128_1.3.13 symbols from the powerpc64-linux version. 2023-05-03 Jakub Jelinek <jakub@redhat.com> * configure.host (abi_baseline_pair): Use powerpc64le-linux-gnu rather than powerpc64-linux-gnu for powerpc64le*-linux*. * config/abi/post/powerpc64-linux-gnu/baseline_symbols.txt: Remove _ZTI*DF128_, _ZTI*DF64x symbols and symbols in GLIBCXX_IEEE128_3.4.{29,30,31} and CXXABI_IEEE128_1.3.13 symbol versions. * config/abi/post/powerpc64le-linux-gnu/baseline_symbols.txt: New file.
2023-05-03c++: over-eager friend matching [PR109649]Jason Merrill4-1/+23
A bug in the simplification I did around 91618; at this point X<int>::f has DECL_IMPLICIT_INSTANTIATION set, but we've already identified what template it corresponds to, so we don't want to call check_explicit_specialization. To distinguish this case we need to look at DECL_TI_TEMPLATE. grokfndecl has for a long time set it to the OVERLOAD in this case, while the new cases I added for 91618 were leaving DECL_TEMPLATE_INFO null; let's adjust them to match. PR c++/91618 PR c++/109649 gcc/cp/ChangeLog: * friend.cc (do_friend): Don't call check_explicit_specialization if DECL_TEMPLATE_INFO is already set. * decl2.cc (check_classfn): Set DECL_TEMPLATE_INFO. * name-lookup.cc (set_decl_namespace): Likewise. gcc/testsuite/ChangeLog: * g++.dg/template/friend77.C: New test.
2023-05-03Add stats to simple_dce_from_worklistAndrew Pinski1-1/+11
While looking to move substitute_and_fold_engine over to use simple_dce_from_worklist, I noticed that we don't record the stats of the removed stmts/phis. So this does that. OK? Bootstrapped and tested on x86_64-linux-gnu. gcc/ChangeLog: * tree-ssa-dce.cc (simple_dce_from_worklist): Record stats on removed number of statements and phis.