aboutsummaryrefslogtreecommitdiff
path: root/gcc
AgeCommit message (Collapse)AuthorFilesLines
2024-09-02ada: Small fixes for FreeBSDPatrick Bernardi2-5/+9
Size of pthread data types now need to be defined for FreeBSD ports. Traceback support for AArch64 FreeBSD is now defined. gcc/ada/ * s-oscons-tmplt.c: Define sizes of pthread data types on FreeBSD. * tracebak.c: Use GCC unwinder and adjust PC appropriately on aarch64-freebsd.
2024-09-02ada: Also reset scope for some nested declarationMarc Poulhiès1-1/+24
When changing the scope for entities found in the entry body that is mutated into a procedure, the compiler needs to look deeper than only the top level entities as expansion may produce object declarations which scopes are also the entry. For example, the tree after expansion may look like: procedure This_Is_An_Entry_Proc is ... O1 : Typ := do TMP1 : OTyp := ...; ... in TMP1; O1's scope needs to be reset to This_Is_An_Entry_Proc, but so does TMP1's scope. This change also fix a small oversight where N_Implicit_Label_Declaration scope must be reset and its content skipped. gcc/ada/ * exp_ch9.adb (Reset_Scopes_To): Adjust comment. (Reset_Scopes_To.Reset_Scope): Adjust the scope reset for object declaration. In particular, visit the children nodes if any. Also extend the handling of other declarations to N_Implicit_Label_Declaration.
2024-09-02ada: Cleanup expansion of object declarationsPiotr Trojanek1-7/+3
Replace repeated calls to Sloc with uses of local constant Loc. Code cleanup; behavior is unaffected. gcc/ada/ * exp_ch3.adb (Expand_N_Object_Declaration): Replace calls to Sloc with uses of Loc; turn variable Prag into constant.
2024-09-02ada: Remove repeated guards in validity checksPiotr Trojanek1-6/+2
Routine Insert_Valid_Check only applies checks when Expr_Known_Valid query returns False; there is no need to call this query before inserting checks. Code cleanup; behavior is unaffected. gcc/ada/ * exp_imgv.adb (Expand_User_Defined_Enumeration_Image) (Expand_Image_Attribute): Remove redundant guards.
2024-09-02ranger: Fix up range computation for CLZ [PR116486]Jakub Jelinek2-2/+29
The initial CLZ gimple-range-op.cc implementation handled just the case where second argument to .CLZ is equal to prec, but in r15-1014 I've added also handling of the -1 case. As the following testcase shows, incorrectly though for the case where the first argument has [0,0] range. If the second argument is prec, then the result should be [prec,prec] and that was handled correctly, but when the second argument is -1, the result should be [-1,-1] but instead it was incorrectly computed as [prec-1,prec-1] (when second argument is prec, mini is 0 and maxi is prec, while when second argument is -1, mini is -1 and maxi is prec-1). Fixed thusly (the actual handling is then similar to the CTZ [0,0] case). 2024-09-02 Jakub Jelinek <jakub@redhat.com> PR middle-end/116486 * gimple-range-op.cc (cfn_clz::fold_range): If lh is [0,0] and mini is -1, return [-1,-1] range rather than [prec-1,prec-1]. * gcc.dg/bitint-109.c: New test.
2024-09-02load and store-lanes with SLPRichard Biener31-172/+458
The following is a prototype for how to represent load/store-lanes within SLP. I've for now settled with having a single load node with multiple permute nodes acting as selection, one for each loaded lane and a single store node fed from all stored lanes. For for (int i = 0; i < 1024; ++i) { a[2*i] = b[2*i] + 7; a[2*i+1] = b[2*i+1] * 3; } you have the following SLP graph where I explain how things are set up and code-generated: t.c:23:21: note: SLP graph after lowering permutations: t.c:23:21: note: node 0x50dc8b0 (max_nunits=1, refcnt=1) vector(4) int t.c:23:21: note: op template: *_6 = _7; t.c:23:21: note: stmt 0 *_6 = _7; t.c:23:21: note: stmt 1 *_12 = _13; t.c:23:21: note: children 0x50dc488 0x50dc6e8 This is the store node, it's marked with ldst_lanes = true during SLP discovery. This node code-generates vect_array.65[0] = vect__7.61_29; vect_array.65[1] = vect__13.62_28; MEM <int[8]> [(int *)vectp_a.63_27] = .STORE_LANES (vect_array.65); ... t.c:23:21: note: node 0x50dc520 (max_nunits=4, refcnt=2) vector(4) int t.c:23:21: note: op: VEC_PERM_EXPR t.c:23:21: note: stmt 0 _5 = *_4; t.c:23:21: note: lane permutation { 0[0] } t.c:23:21: note: children 0x50dc948 t.c:23:21: note: node 0x50dc780 (max_nunits=4, refcnt=2) vector(4) int t.c:23:21: note: op: VEC_PERM_EXPR t.c:23:21: note: stmt 0 _11 = *_10; t.c:23:21: note: lane permutation { 0[1] } t.c:23:21: note: children 0x50dc948 These are the selection nodes, marked with ldst_lanes = true. They code generate nothing. t.c:23:21: note: node 0x50dc948 (max_nunits=4, refcnt=3) vector(4) int t.c:23:21: note: op template: _5 = *_4; t.c:23:21: note: stmt 0 _5 = *_4; t.c:23:21: note: stmt 1 _11 = *_10; t.c:23:21: note: load permutation { 0 1 } This is the load node, marked with ldst_lanes = true (the load permutation is only accurate when taking into account the lane permute in the selection nodes). It code generates vect_array.58 = .LOAD_LANES (MEM <int[8]> [(int *)vectp_b.56_33]); vect__5.59_31 = vect_array.58[0]; vect__5.60_30 = vect_array.58[1]; This scheme allows to leave code generation in vectorizable_load/store mostly as-is. While this should support both load-lanes and (masked) store-lanes the decision to do either is done during SLP discovery time and cannot be reversed without altering the SLP tree - as-is the SLP tree is not usable for non-store-lanes on the store side, the load side is OK representation-wise but will very likely fail permute handling as the lowering to deal with the two input vector restriction isn't done - but of course since the permute node is marked as to be ignored that doesn't work out. So I've put restrictions in place that fail vectorization if a load/store-lane SLP tree is later classified differently by get_load_store_type. I'll note that for example gcc.target/aarch64/sve/mask_struct_store_3.c will not get SLP store-lanes used because the full store SLPs just fine though we then fail to handle the "splat" load-permutation t2.c:5:21: note: node 0x4db2630 (max_nunits=4, refcnt=2) vector([4,4]) int t2.c:5:21: note: op template: _6 = *_5; t2.c:5:21: note: stmt 0 _6 = *_5; t2.c:5:21: note: stmt 1 _6 = *_5; t2.c:5:21: note: stmt 2 _6 = *_5; t2.c:5:21: note: stmt 3 _6 = *_5; t2.c:5:21: note: load permutation { 0 0 0 0 } the load permute lowering code currently doesn't consider it worth lowering single loads from a group (or in this case not grouped loads). The expectation is the target can handle this by two interleaves with itself. So what we see here is that while the explicit SLP representation is helpful in some cases, in cases like this it would require changing it when we make decisions how to vectorize. My idea is that this all will change a lot when we re-do SLP discovery (for loops) and when we get rid of non-SLP as I think vectorizable_* should be allowed to alter the SLP graph during analysis. The patch also removes the code cancelling SLP if we can use load/store-lanes from the main loop vector analysis code and re-implements it as re-discovering the SLP instance with forced single-lane splits so SLP load/store-lanes scheme can be used. This is now done after SLP discovery and SLP pattern recog are complete to not disturb the latter but per SLP instance instead of being a global decision on the whole loop. This is a behavioral change that for example shows in gcc.dg/vect/slp-perm-6.c on ARM where we formerly used SLP permutes but now a mix of SLP without permutes and load/store lanes. The previous flaky heuristic is now flaky in a different way. Testing on RISC-V and aarch64 reveal several testcases that require adjustment as to now expect SLP even when load/store lanes are being used. If in doubt I've adjusted them to the final expectation which will lead to one or two new FAILs where we still do the SLP cancelling. I have a followup that implements that while remaining in SLP that's in final testing. Note that gcc.dg/vect/slp-42.c and gcc.dg/vect/pr68445.c will FAIL on aarch64 with SVE because for some odd reason vect_stridedN is true for any N for check_effective_target_vect_fully_masked targets but SVE cannot do ld8 while risc-v can. I have not bothered to adjust target tests that now fail assembly-scan. * tree-vectorizer.h (_slp_tree::ldst_lanes): New flag to mark load, store and permute nodes. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize ldst_lanes. (vect_build_slp_instance): For stores iff the target prefers store-lanes discover single-lane sub-groups, do not perform interleaving lowering but mark the node with ldst_lanes. Also allow i == 0 - fatal failure - for splitting up a store group when we're not doing single-lane discovery already. (vect_lower_load_permutations): When the target supports load lanes and the loads all fit the pattern split out a single level of permutes only and mark the load and permute nodes with ldst_lanes. (vectorizable_slp_permutation_1): Handle the load-lane permute forwarding of vector defs. (vect_analyze_slp): After SLP pattern recog is finished see if there are any SLP instances that would benefit from using load/store-lanes and re-discover those with forced single lanes. * tree-vect-stmts.cc (get_group_load_store_type): Support load/store-lanes for SLP. (vectorizable_store): Support SLP code generation for store-lanes. (vectorizable_load): Support SLP code generation for load-lanes. * tree-vect-loop.cc (vect_analyze_loop_2): Do not cancel SLP when store-lanes can be used. * gcc.dg/vect/slp-55.c: New testcase. * gcc.dg/vect/slp-56.c: Likewise. * gcc.dg/vect/slp-11c.c: Adjust. * gcc.dg/vect/slp-53.c: Likewise. * gcc.dg/vect/slp-cond-1.c: Likewise. * gcc.dg/vect/vect-complex-5.c: Likewise. * gcc.dg/vect/slp-1.c: Likewise. * gcc.dg/vect/slp-54.c: Remove riscv XFAIL. * gcc.dg/vect/slp-perm-5.c: Adjust. * gcc.dg/vect/slp-perm-7.c: Likewise. * gcc.dg/vect/slp-perm-8.c: Likewise. * gcc.dg/vect/slp-multitypes-11.c: Likewise. * gcc.dg/vect/slp-multitypes-11-big-array.c: Likewise. * gcc.dg/vect/slp-perm-9.c: Remove expected SLP fail due to three-vector permute. * gcc.dg/vect/slp-perm-6.c: Remove XFAIL. * gcc.dg/vect/slp-perm-1.c: Adjust. * gcc.dg/vect/slp-perm-2.c: Likewise. * gcc.dg/vect/slp-perm-3.c: Likewise. * gcc.dg/vect/slp-perm-4.c: Likewise. * gcc.dg/vect/pr68445.c: Likewise. * gcc.dg/vect/slp-11b.c: Likewise. * gcc.dg/vect/slp-2.c: Likewise. * gcc.dg/vect/slp-23.c: Likewise. * gcc.dg/vect/slp-33.c: Likewise. * gcc.dg/vect/slp-42.c: Likewise. * gcc.dg/vect/slp-46.c: Likewise. * gcc.dg/vect/slp-perm-10.c: Likewise.
2024-09-02lower SLP load permutation to interleavingRichard Biener5-4/+378
The following emulates classical interleaving for SLP load permutes that we are unlikely handling natively. This is to handle cases where interleaving (or load/store-lanes) is the optimal choice for vectorizing even when we are doing that within SLP. An example would be void foo (int * __restrict a, int * b) { for (int i = 0; i < 16; ++i) { a[4*i + 0] = b[4*i + 0] * 3; a[4*i + 1] = b[4*i + 1] + 3; a[4*i + 2] = (b[4*i + 2] * 3 + 3); a[4*i + 3] = b[4*i + 3] * 3; } } where currently the SLP store is merging four single-lane SLP sub-graphs but none of the loads in it can be code-generated with V4SImode vectors and a VF of four as the permutes would need three vectors. The patch introduces a lowering phase after SLP discovery but before SLP pattern recognition or permute optimization that analyzes all loads from the same dataref group and creates an interleaving scheme starting from an unpermuted load. What can be handled is power-of-two group size and a group size of three. The possibility for doing the interleaving with a load-lanes like instruction is done as followup. For a group-size of three this is done by using the non-interleaving fallback code which then creates at VF == 4 from { { a0, b0, c0 }, { a1, b1, c1 }, { a2, b2, c2 }, { a3, b3, c3 } } the intermediate vectors { c0, c0, c1, c1 } and { c2, c2, c3, c3 } to produce { c0, c1, c2, c3 }. This turns out to be more effective than the scheme implemented for non-SLP for SSE and only slightly worse for AVX512 and a bit more worse for AVX2. It seems to me that this would extend to other non-power-of-two group-sizes though (but the patch does not). Optimal schemes are likely difficult to lay out in VF agnostic form. I'll note that while the lowering assumes even/odd extract is generally available for all vector element sizes (which is probably a good assumption), it doesn't in any way constrain the other permutes it generates based on target availability. Again difficult to do in a VF agnostic way (but at least currently the vector type is fixed). I'll also note that the SLP store side merges lanes in a way producing three-vector permutes for store group-size of three, so the testcase uses a store group-size of four. The patch has a fallback for when there are multi-lane groups and the resulting permutes to not fit interleaving. Code generation is not optimal when this triggers and might be worse than doing single-lane group interleaving. The patch handles gaps by representing them with NULL entries in SLP_TREE_SCALAR_STMTS for the unpermuted load node. The SLP discovery changes could be elided if we manually build the load node instead. SLP load nodes covering enough lanes to not need intermediate permutes are retained as having a load-permutation and do not use the single SLP load node for each dataref group. That's something we might want to change, making load-permutation something purely local to SLP discovery (but then SLP discovery could do part of the lowering). The patch misses CSEing intermediate generated permutes and registering them with the bst_map which is possibly required for SLP pattern detection in some cases - this re-spin of the patch moves the lowering after SLP pattern detection. * tree-vect-slp.cc (vect_build_slp_tree_1): Handle NULL stmt. (vect_build_slp_tree_2): Likewise. Release load permutation when there's a NULL in SLP_TREE_SCALAR_STMTS and assert there's no actual permutation in that case. (vllp_cmp): New function. (vect_lower_load_permutations): Likewise. (vect_analyze_slp): Call it. * gcc.dg/vect/slp-11a.c: Expect SLP. * gcc.dg/vect/slp-12a.c: Likewise. * gcc.dg/vect/slp-51.c: New testcase. * gcc.dg/vect/slp-52.c: New testcase.
2024-09-01[PATCH] RISC-V: Optimize the cost of the DFmode register move for RV32.Xianmiao Qu2-0/+18
Currently, in RV32, even with the D extension enabled, the cost of DFmode register moves is still set to 'COSTS_N_INSNS (2)'. This results in the 'lower-subreg' pass splitting DFmode register moves into two SImode SUBREG register moves, leading to the generation of many redundant instructions. As an example, consider the following test case: double foo (int t, double a, double b) { if (t > 0) return a; else return b; } When compiling with -march=rv32imafdc -mabi=ilp32d, the following code is generated: .cfi_startproc addi sp,sp,-32 .cfi_def_cfa_offset 32 fsd fa0,8(sp) fsd fa1,16(sp) lw a4,8(sp) lw a5,12(sp) lw a2,16(sp) lw a3,20(sp) bgt a0,zero,.L1 mv a4,a2 mv a5,a3 .L1: sw a4,24(sp) sw a5,28(sp) fld fa0,24(sp) addi sp,sp,32 .cfi_def_cfa_offset 0 jr ra .cfi_endproc After adjust the DFmode register move's cost to 'COSTS_N_INSNS (1)', the generated code is as follows, with a significant reduction in the number of instructions. .cfi_startproc ble a0,zero,.L5 ret .L5: fmv.d fa0,fa1 ret .cfi_endproc gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Optimize the cost of the DFmode register move for RV32. gcc/testsuite/ * gcc.target/riscv/rv32-movdf-cost.c: New test.
2024-09-01[committed][PR rtl-optimization/116544] Fix test for promoted subregsJeff Law2-1/+23
This is a small bug in the ext-dce code's handling of promoted subregs. Essentially when we see a promoted subreg we need to make additional bit groups live as various parts of the RTL path know that an extension of a suitably promoted subreg can be trivially eliminated. When I added support for dealing with this quirk I failed to account for the larger modes properly and it ignored the case when the size of the inner object was > 32 bits. Opps. This does _not_ fix the outstanding x86 issue. That's caused by something completely different and more concerning ;( Bootstrapped and regression tested on x86. Obviously fixes the testcase on riscv as well. Pushing to the trunk. PR rtl-optimization/116544 gcc/ * ext-dce.cc (ext_dce_process_uses): Fix thinko in promoted subreg handling. gcc/testsuite/ * gcc.dg/torture/pr116544.c: New test.
2024-09-02i386: Support vec_cmp for V8BF/V16BF/V32BF in AVX10.2Levy Hsu4-0/+63
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_use_mask_cmp_p): Add BFmode for int mask cmp. * config/i386/sse.md (vec_cmp<mode><avx512fmaskmodelower>): New vec_cmp expand for VBF modes. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-cmpp-1.c: Ditto.
2024-09-02i386: Support vectorized BF16 sqrt with AVX10.2 instructionLevy Hsu1-5/+8
gcc/ChangeLog: * config/i386/sse.md: Expand VF2H to VF2HB with VBF modes.
2024-09-02i386: Support vectorized BF16 smaxmin with AVX10.2 instructionsLevy Hsu3-0/+63
gcc/ChangeLog: * config/i386/sse.md (<code><mode>3): New define expand pattern for BF smaxmin. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: New test.
2024-09-02i386: Support vectorized BF16 FMA with AVX10.2 instructionsLevy Hsu3-1/+101
gcc/ChangeLog: * config/i386/sse.md: Add V8BF/V16BF/V32BF to mode iterator FMAMODEM. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-fma-1.c: New test.
2024-09-02i386: Support vectorized BF16 add/sub/mul/div with AVX10.2 instructionsLevy Hsu3-8/+162
AVX10.2 introduces several non-exception instructions for BF16 vector. Enable vectorized BF add/sub/mul/div operation by supporting standard optab for them. gcc/ChangeLog: * config/i386/sse.md (div<mode>3): New expander for BFmode div. (VF_BHSD): New mode iterator with vector BFmodes. (<insn><mode>3<mask_name><round_name>): Change mode to VF_BHSD. (mul<mode>3<mask_name><round_name>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-operations-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-operations-1.c: Ditto.
2024-09-02i386: Optimize generate insn for AVX10.2 compareHu, Lin15-2/+147
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_fp_compare): Add UNSPEC to support the optimization. * config/i386/i386.cc (ix86_fp_compare_code_to_integer): Add NE/EQ. * config/i386/i386.md (*cmpx<unord><MODEF:mode>): New define_insn. (*cmpx<unord>hf): Ditto. * config/i386/predicates.md (ix86_trivial_fp_comparison_operator): Add ne/eq. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-compare-1b.c: New test.
2024-09-02i386: Optimize ordered and nonequalHu, Lin12-0/+12
Currently, when we input !__builtin_isunordered (a, b) && (a != b), gcc will emit ucomiss %xmm1, %xmm0 movl $1, %ecx setp %dl setnp %al cmovne %ecx, %edx andl %edx, %eax movzbl %al, %eax In fact, xorl %eax, %eax ucomiss %xmm1, %xmm0 setne %al is better. gcc/ChangeLog: * match.pd: Optimize (and ordered non-equal) to (not (or unordered equal)) gcc/testsuite/ChangeLog: * gcc.target/i386/optimize_one.c: New test.
2024-09-02i386: Auto vectorize sdot_prod, usdot_prod, udot_prod with AVX10.2 instructionsHaochen Jiang7-78/+86
gcc/ChangeLog: * config/i386/sse.md (VI1_AVX512VNNIBW): New. (VI2_AVX10_2): Ditto. (sdot_prod<mode>): Add AVX10.2 to auto vectorize and combine 512 bit part. (udot_prod<mode>): Ditto. (sdot_prodv64qi): Removed. (udot_prodv64qi): Ditto. (usdot_prod<mode>): Add AVX10.2 to auto vectorize. (udot_prod<mode>): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/vnniint16-auto-vectorize-2.c: Only define TEST when not defined. * gcc.target/i386/vnniint8-auto-vectorize-2.c: Ditto. * gcc.target/i386/vnniint16-auto-vectorize-3.c: New test. * gcc.target/i386/vnniint16-auto-vectorize-4.c: Ditto. * gcc.target/i386/vnniint8-auto-vectorize-3.c: Ditto. * gcc.target/i386/vnniint8-auto-vectorize-4.c: Ditto.
2024-09-02RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 3Pan Li6-0/+102
This patch would like to add test cases for the unsigned scalar quad and oct .SAT_TRUNC form 3. Aka: Form 3: #define DEF_SAT_U_TRUC_FMT_3(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_3 (WT x) \ { \ WT max = (WT)(NT)-1; \ return x <= max ? (NT)x : (NT) max; \ } QUAD: DEF_SAT_U_TRUC_FMT_3 (uint16_t, uint64_t) DEF_SAT_U_TRUC_FMT_3 (uint8_t, uint32_t) OCT: DEF_SAT_U_TRUC_FMT_3 (uint8_t, uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_trunc-16.c: New test. * gcc.target/riscv/sat_u_trunc-17.c: New test. * gcc.target/riscv/sat_u_trunc-18.c: New test. * gcc.target/riscv/sat_u_trunc-run-16.c: New test. * gcc.target/riscv/sat_u_trunc-run-17.c: New test. * gcc.target/riscv/sat_u_trunc-run-18.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-02RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2Pan Li6-0/+102
This patch would like to add test cases for the unsigned scalar quad and oct .SAT_TRUNC form 2. Aka: Form 2: #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ { \ WT max = (WT)(NT)-1; \ return x > max ? (NT) max : (NT)x; \ } QUAD: DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) OCT: DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_trunc-10.c: New test. * gcc.target/riscv/sat_u_trunc-11.c: New test. * gcc.target/riscv/sat_u_trunc-12.c: New test. * gcc.target/riscv/sat_u_trunc-run-10.c: New test. * gcc.target/riscv/sat_u_trunc-run-11.c: New test. * gcc.target/riscv/sat_u_trunc-run-12.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-02RISC-V: Add testcases for form 4 of unsigned vector .SAT_ADD IMMPan Li9-0/+188
This patch would like to add test cases for the unsigned vector .SAT_ADD when one of the operand is IMM. Form 4: #define DEF_VEC_SAT_U_ADD_IMM_FMT_4(T, IMM) \ T __attribute__((noinline)) \ vec_sat_u_add_imm##IMM##_##T##_fmt_4 (T *out, T *in, unsigned limit) \ { \ unsigned i; \ T ret; \ for (i = 0; i < limit; i++) \ { \ out[i] = __builtin_add_overflow (in[i], IMM, &ret) == 0 ? ret : -1; \ } \ } DEF_VEC_SAT_U_ADD_IMM_FMT_4(uint64_t, 123) The below test are passed for this patch. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-13.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-14.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-15.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-16.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-13.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-14.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-15.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-16.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-02RISC-V: Add testcases for form 3 of unsigned vector .SAT_ADD IMMPan Li8-0/+168
This patch would like to add test cases for the unsigned vector .SAT_ADD when one of the operand is IMM. Form 3: #define DEF_VEC_SAT_U_ADD_IMM_FMT_3(T, IMM) \ T __attribute__((noinline)) \ vec_sat_u_add_imm##IMM##_##T##_fmt_3 (T *out, T *in, unsigned limit) \ { \ unsigned i; \ T ret; \ for (i = 0; i < limit; i++) \ { \ out[i] = __builtin_add_overflow (in[i], IMM, &ret) ? -1 : ret; \ } \ } DEF_VEC_SAT_U_ADD_IMM_FMT_3(uint64_t, 123) The below test are passed for this patch. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-10.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-11.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-12.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-9.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-10.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-11.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-12.c: New test. * gcc.target/riscv/rvv/autovec/binop/vec_sat_u_add_imm-run-9.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-02RISC-V: Refactor gen zero_extend rtx for SAT_* when expand SImode in RV64Pan Li25-53/+118
In previous, we have some specially handling for both the .SAT_ADD and .SAT_SUB for unsigned int. There are similar to take care of SImode in RV64 for zero extend. Thus refactor these two helper function into one for possible code duplication. The below test suite are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_zero_extend_rtx): Merge the zero_extend handing from func riscv_gen_unsigned_xmode_reg. (riscv_gen_unsigned_xmode_reg): Remove. (riscv_expand_ussub): Leverage riscv_gen_zero_extend_rtx instead of riscv_gen_unsigned_xmode_reg. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_sub-11.c: Adjust asm check. * gcc.target/riscv/sat_u_sub-15.c: Ditto. * gcc.target/riscv/sat_u_sub-19.c: Ditto. * gcc.target/riscv/sat_u_sub-23.c: Ditto. * gcc.target/riscv/sat_u_sub-27.c: Ditto. * gcc.target/riscv/sat_u_sub-3.c: Ditto. * gcc.target/riscv/sat_u_sub-31.c: Ditto. * gcc.target/riscv/sat_u_sub-35.c: Ditto. * gcc.target/riscv/sat_u_sub-39.c: Ditto. * gcc.target/riscv/sat_u_sub-43.c: Ditto. * gcc.target/riscv/sat_u_sub-47.c: Ditto. * gcc.target/riscv/sat_u_sub-7.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-11.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-11_1.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-11_2.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-15.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-15_1.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-15_2.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-3.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-3_1.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-3_2.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-7.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-7_1.c: Ditto. * gcc.target/riscv/sat_u_sub_imm-7_2.c: Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-02Daily bump.GCC Administrator3-1/+27
2024-08-31slsr: Use simple_dce_from_worklist in SLSR [PR116554]Andrew Pinski1-18/+41
While working on a phiopt patch, it was noticed that SLSR would leave around some unused ssa names. Let's add simple_dce_from_worklist usage to SLSR to remove the dead statements. This should give a small improvemnent for passes afterwards. Boostrapped and tested on x86_64. gcc/ChangeLog: PR tree-optimization/116554 * gimple-ssa-strength-reduction.cc: Include tree-ssa-dce.h. (replace_mult_candidate): Add sdce_worklist argument, mark the rhs1/rhs2 for maybe dceing. (replace_unconditional_candidate): Add sdce_worklist argument, Update call to replace_mult_candidate. (replace_conditional_candidate): Add sdce_worklist argument, update call to replace_mult_candidate. (replace_uncond_cands_and_profitable_phis): Add sdce_worklist argument, update call to replace_conditional_candidate, replace_unconditional_candidate, and replace_uncond_cands_and_profitable_phis. (replace_one_candidate): Add sdce_worklist argument, mark the orig_rhs1/orig_rhs2 for maybe dceing. (replace_profitable_candidates): Add sdce_worklist argument, update call to replace_one_candidate and replace_profitable_candidates. (analyze_candidates_and_replace): Call simple_dce_from_worklist and update calls to replace_profitable_candidates, and replace_uncond_cands_and_profitable_phis. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-09-01testsuite: Prune compilation messages for modules testsHans-Peter Nilsson1-0/+10
All testsuite compiler-calls pass default_target_compile in the dejagnu installation (typically /usr/share/dejagnu/target.exp) which also calls the dejagnu-installed prune_warnings. Normally, tests using the dg framework (most or all tests these days) compile and link by calling various wrappers that end up calling dg-test in the dejagnu installation, typically installed as /usr/share/dejagnu/dg.exp. That, besides the compiler call, also calls ${tool}-dg-prune (g++-dg-prune) on the messages, which in turn ends up calling prune_gcc_output in gcc/testsuite/lib/prune.exp. That gcc-specific "pruning" function handles more cases than the dejagnu prune_warnings, and also has updated patterns. But, module_do_it in modules.exp calls the lower-level ${tool}_target_compile "directly", i.e. g++_target_compile defined in gcc/testsuite/lib/g++.exp. That does not call ${tool}-dg-prune, meaning those test-cases miss the gcc-specific pruning. Noticed while testing a dejagnu update that handled the miniscule "in" in the warning (line-breaks added below besides the original one after "(void*)':") "/path/to/cris-elf/bin/ld: /gccobj/cris-elf/./libstdc++-v3/src/.libs/libstdc++.a(random.o): in function `std::(anonymous namespace)::__libc_getentropy(void*)': /gccsrc/libstdc++-v3/src/c++11/random.cc:183: warning: _getentropy is not implemented and will always fail" The line saying "in function" rather than "In function" (from the binutils linker since 2018) is pruned by prune_gcc_output. The prune_warnings in dejagnu-1.6.3 and earlier handles the second line separately. It's an unfortunate wart that neither consumes the delimiting line-break, leaving to the callers to prune residual empty lines. See prune_warnings in dejagnu (default_target_compile and dg-test) for those other line-break fixups, as alluded in the comment. * g++.dg/modules/modules.exp (module_do_it): Prune compilation messages.
2024-09-01Daily bump.GCC Administrator7-1/+183
2024-08-31i386: Support read-modify-write memory operands in STV.Roger Sayle3-3/+16
This patch enables STV when the first operand of a TImode binary logic operand (AND, IOR or XOR) is a memory operand, which is commonly the case with read-modify-write instructions. A different motivating example from the one given previously is: __int128 m, p, q; void foo() { m ^= (p & q); } Currently with -O2 -mavx the RMW instructions are rejected by STV, resulting in scalar code: foo: movq p(%rip), %rax movq p+8(%rip), %rdx andq q(%rip), %rax andq q+8(%rip), %rdx xorq %rax, m(%rip) xorq %rdx, m+8(%rip) ret With this patch they become scalar-to-vector candidates: foo: vmovdqa p(%rip), %xmm0 vpand q(%rip), %xmm0, %xmm0 vpxor m(%rip), %xmm0, %xmm0 vmovdqa %xmm0, m(%rip) ret 2024-08-31 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386-features.cc (timode_scalar_to_vector_candidate_p): Support the first operand of AND, IOR and XOR being MEM_P, i.e. a read-modify-write insn. gcc/testsuite/ChangeLog * gcc.target/i386/movti-2.c: Change dg-options to -Os. * gcc.target/i386/movti-4.c: Expected output of original movti-2.c.
2024-08-31AVR: Run pass avr-fuse-add a second time after pass_cprop_hardreg.Georg-Johann Lay2-0/+28
gcc/ * config/avr/avr-passes.cc (avr_pass_fuse_add) <clone>: Override. * config/avr/avr-passes.def (avr_pass_fuse_add): Run again after pass_cprop_hardreg.
2024-08-31AVR: Tidy pass avr-fuse-add.Georg-Johann Lay4-43/+12
gcc/ * config/avr/avr-protos.h (avr_split_tiny_move): Rename to avr_split_fake_addressing_move. * config/avr/avr-passes.def: Same. * config/avr/avr-passes.cc: Same. (avr_pass_data_fuse_add) <tv_id>: Set to TV_MACH_DEP. * config/avr/avr.md (split-lpmx): Remove a define_split. Such splits are performed by avr_split_fake_addressing_move.
2024-08-31testsuite, c++, coroutines: Avoid 'unused' warnings [NFC].Iain Sandoe6-7/+7
The 'torture' section of the coroutine tests is primarily about checking correct operation of the generated code. It should, ideally, be possible to run this part of the testsuite with '-Wall' and expect no fails. In the case that we wish to test for a specific diagnostic (and that it does not appear over a range of optimisation/debug conditions) then we should make that explict (as done, for example, in pr109867.C). The tests amended here have warnings because of unused entities; in many cases those are relevant to the test, and so we just mark them with __attribute__((__unused__)). We amend the debug output in coro.h to avoid similar warnings when print output is disabled (the default). gcc/testsuite/ChangeLog: * g++.dg/coroutines/coro.h: Use a variadic macro for PRINTF to avoid unused warnings when output is disabled. * g++.dg/coroutines/torture/co-await-04-control-flow.C: Avoid unused warnings. * g++.dg/coroutines/torture/co-ret-13-template-2.C: Likewise. * g++.dg/coroutines/torture/exceptions-test-01-n4849-a.C: Likewise. * g++.dg/coroutines/torture/local-var-04-hiding-nested-scopes.C: Likewise. * g++.dg/coroutines/torture/pr109867.C: Likewise. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-31testsuite, c++, coroutines: Correct a test intent.Iain Sandoe1-8/+7
The intention of the series of tests numberef pr95615-* is to verify that entities created by the ramp and potentially needing destruction are correctly handled when exceptions are thrown. Because of a typo, one case was not being checked correctly (the return object). This patch amends the check to test that the returned object is properly deleted. gcc/testsuite/ChangeLog: * g++.dg/coroutines/torture/pr95615.inc: Check tha the task object produced by get_return_object is correctly deleted on exception. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-31c++, coroutines: Make and use a frame access helper.Iain Sandoe1-47/+44
In the review of earlier patches it was suggested that we might make use of finish_class_access_expr instead of doing a lookup for the member and then a build_class_access_expr call. finish_class_access_expr does a lot more work than we need and ends up calling build_class_access_expr anyway. So, instead, this patch makes a new helper to do the lookup and build and uses that helper everywhere except instances in the ramp function that we are going to handle separately. gcc/cp/ChangeLog: * coroutines.cc (coro_build_frame_access_expr): New. (transform_await_expr): Use coro_build_frame_access_expr. (transform_local_var_uses): Likewise. (build_actor_fn): Likewise. (build_destroy_fn): Likewise. (cp_coroutine_transform::build_ramp_function): Likewise. Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
2024-08-31hppa: Enable PA 2.0 symbolic operands on ELF32 targetsJohn David Anglin3-27/+25
The GNU ELF32 linker has been fixed and it can now handle PA 2.0 symbolic relocations. This only affects non-pic code generation. 2024-08-31 John David Anglin <danglin@gcc.gnu.org> gcc/ChangeLog: * config/pa/pa.cc (pa_emit_move_sequence): Remove symbolic memory work arounds for TARGET_ELF32. (pa_legitimate_address_p): Likewise. Allow symbolic operands. Adjust comment. * config/pa/pa.md: Replace reg_or_0_or_nonsymb_mem_operand with reg_or_0_or_mem_operand predicate in various unnamed move insns. * config/pa/predicates.md (floating_point_store_memory_operand): Update comment. Remove symbolic memory work arounds for TARGET_ELF32. (nonsymb_mem_operand): Rename to mem_operand. Allow symbolic memory operands. (reg_or_0_or_nonsymb_mem_operand): Rename to reg_or_0_or_mem_operand. Allow symbolic memory operands.
2024-08-31phiopt: Ignore some nop statements in heursics [PR116098]Andrew Pinski3-2/+119
The heurstics that was added for PR71016, try to search to see if the conversion was being moved away from its definition. The problem is the heurstics would stop if there was a non GIMPLE_ASSIGN (and already ignores debug statements) and in this case we would have a GIMPLE_LABEL that was not being ignored. So we should need to ignore GIMPLE_NOP, GIMPLE_LABEL and GIMPLE_PREDICT. Note this is now similar to how gimple_empty_block_p behaves. Note this fixes the wrong code that was reported by moving the VCE (conversion) out before the phiopt/match could convert it into an bit_ior and move the VCE out with the VCE being conditionally valid. Bootstrapped and tested on x86_64-linux-gnu. Also built and tested for aarch64-linux-gnu. PR tree-optimization/116098 gcc/ChangeLog: * tree-ssa-phiopt.cc (factor_out_conditional_operation): Ignore nops, labels and predicts for heuristic for conversion with a constant. gcc/testsuite/ChangeLog: * c-c++-common/torture/pr116098-1.c: New test. * gcc.target/aarch64/csel-1.c: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-08-31testsuite: Change what is being tested for pr66726-2.cAndrew Pinski1-1/+1
r14-575-g6d6c17e45f62cf changed the debug dump message but the testcase pr66726-2.c was not updated for the change. The testcase was searching to make sure we didn't factor out a conversion but the testcase was no longer testing that so we needed to update what was being searched for. Tested on x86_64-linux. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr66726-2.c: Update scan dump message. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-08-31Fortran: downgrade use associated namelist group name to legacy extensionHarald Anlauf2-3/+4
The Fortran standard disallows use associated names as namelist group name (e.g. F2003:C581, but also later standards). This feature is a gfortran legacy extension, and we should give a warning even for -std=gnu. gcc/fortran/ChangeLog: * match.cc (gfc_match_namelist): Downgrade feature from GNU to legacy extension. gcc/testsuite/ChangeLog: * gfortran.dg/pr88169_3.f90: Adjust pattern.
2024-08-31c++: Add unsequenced C++ testcaseJakub Jelinek1-0/+53
This is the testcase I wrote originally and which on top of the https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659154.html patch didn't behave the way I wanted (no warning and no optimizations of [[unsequenced]] function templates which don't have pointer/reference arguments. Posting this separately, because it depends on the above mentioned patch as well as the PR116175 https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659157.html patch. 2024-08-31 Jakub Jelinek <jakub@redhat.com> * g++.dg/ext/attr-unsequenced-1.C: New test.
2024-08-31c: Add support for unsequenced and reproducible attributesJakub Jelinek25-9/+979
C23 added in N2956 ( https://open-std.org/JTC1/SC22/WG14/www/docs/n2956.htm ) two new attributes, which are described as similar to GCC const and pure attributes, but they aren't really same and it seems that even the paper is missing some of the differences. The paper says unsequenced is the same as const on functions without pointer arguments and reproducible is the same as pure on such functions (except that they are function type attributes rather than function declaration ones), but it seems the paper doesn't consider the finiteness GCC relies on (aka non-DECL_LOOPING_CONST_OR_PURE_P) - the paper only talks about using the attributes for CSE etc., not for DCE. The following patch introduces (for now limited) support for those attributes, both as standard C23 attributes and as GNU extensions (the difference is that the patch is then less strict on where it allows them, like other function type attributes they can be specified on function declarations as well and apply to the type, while C23 standard ones must go on the function declarators (i.e. after closing paren after function parameters) or in type specifiers of function type. If function doesn't have any pointer/reference arguments, the patch adds additional internal attribute with " noptr" suffix which then is used by flags_from_decl_or_type to handle those easy cases as ECF_CONST|ECF_LOOPING_CONST_OR_PURE or ECF_PURE|ECF_LOOPING_CONST_OR_PURE The harder cases aren't handled right now, I'd hope they can be handled incrementally. I wonder whether we shouldn't emit a warning for the gcc.dg/c23-attr-{reproducible,unsequenced}-5.c cases, while the standard clearly specifies that composite types should union the attributes and it is what GCC implements for decades, for ?: that feels dangerous for the new attributes, it would be much better to be conservative on say (cond ? unsequenced_function : normal_function) (args) There is no diagnostics on incorrect [[unsequenced]] or [[reproducible]] function definitions, while I think diagnosing non-const static/TLS declarations in the former could be easy, the rest feels hard. E.g. the const/pure discovery can just punt on everything it doesn't understand, but complete diagnostics would need to understand it. 2024-08-31 Jakub Jelinek <jakub@redhat.com> PR c/116130 gcc/ * doc/extend.texi (unsequenced, reproducible): Document new function type attributes. * calls.cc (flags_from_decl_or_type): Handle "unsequenced noptr" and "reproducible noptr" attributes. gcc/c-family/ * c-attribs.cc (c_common_gnu_attributes): Add entries for "unsequenced", "reproducible", "unsequenced noptr" and "reproducible noptr" attributes. (handle_unsequenced_attribute): New function. (handle_reproducible_attribute): Likewise. * c-common.h (handle_unsequenced_attribute): Declare. (handle_reproducible_attribute): Likewise. * c-lex.cc (c_common_has_attribute): Return 202311 for standard unsequenced and reproducible attributes. gcc/c/ * c-decl.cc (handle_std_unsequenced_attribute): New function. (handle_std_reproducible_attribute): Likewise. (std_attributes): Add entries for "unsequenced" and "reproducible" attributes. (c_warn_type_attributes): Add TYPE argument. Allow unsequenced or reproducible attributes if it is FUNCTION_TYPE. (groktypename): Adjust c_warn_type_attributes caller. (grokdeclarator): Likewise. (finish_declspecs): Likewise. * c-parser.cc (c_parser_declaration_or_fndef): Likewise. * c-tree.h (c_warn_type_attributes): Add TYPE argument. gcc/testsuite/ * c-c++-common/attr-reproducible-1.c: New test. * c-c++-common/attr-reproducible-2.c: New test. * c-c++-common/attr-unsequenced-1.c: New test. * c-c++-common/attr-unsequenced-2.c: New test. * gcc.dg/c23-attr-reproducible-1.c: New test. * gcc.dg/c23-attr-reproducible-2.c: New test. * gcc.dg/c23-attr-reproducible-3.c: New test. * gcc.dg/c23-attr-reproducible-4.c: New test. * gcc.dg/c23-attr-reproducible-5.c: New test. * gcc.dg/c23-attr-reproducible-5-aux.c: New file. * gcc.dg/c23-attr-unsequenced-1.c: New test. * gcc.dg/c23-attr-unsequenced-2.c: New test. * gcc.dg/c23-attr-unsequenced-3.c: New test. * gcc.dg/c23-attr-unsequenced-4.c: New test. * gcc.dg/c23-attr-unsequenced-5.c: New test. * gcc.dg/c23-attr-unsequenced-5-aux.c: New file. * gcc.dg/c23-has-c-attribute-2.c: Add tests for unsequenced and reproducible attributes.
2024-08-31AVR: Don't print a space after , when printing instructions.Georg-Johann Lay1-12/+12
gcc/ * config/avr/avr.cc: Follow the convention to not add a space after comma when printing instructions.
2024-08-31Optimize initialization of small padded objectsAlexandre Oliva6-9/+132
When small objects containing padding bits (or bytes) are fully initialized, we will often store them in registers, and setting bitfields and other small fields will attempt to preserve the uninitialized padding bits, which tends to be expensive. Zero-initializing registers, OTOH, tends to be cheap. So, if we're optimizing, zero-initialize such small padded objects even if that's not needed for correctness. We can't zero-initialize all such padding objects, though: if there's no padding whatsoever, and all fields are initialized with nonzero, the zero initialization would be flagged as dead. That's why we introduce machinery to detect whether objects have padding bits. I considered distinguishing between bitfields, units and larger padding elements, but I didn't pursue that distinction. Since the object's zero-initialization subsumes fields' zero-initialization, the empty string test in builtin-snprintf-6.c's test_assign_aggregate would regress without the addition of native_encode_constructor. for gcc/ChangeLog * expr.cc (categorize_ctor_elements_1): Change p_complete to int, to distinguish complete initialization in presence or absence of uninitialized padding bits. (categorize_ctor_elements): Likewise. Adjust all callers... * expr.h (categorize_ctor_elements): ... and declaration. (type_has_padding_at_level_p): New. * gimple-fold.cc (type_has_padding_at_level_p): New. * fold-const.cc (native_encode_constructor): New. (native_encode_expr): Call it. * gimplify.cc (gimplify_init_constructor): Clear small non-addressable non-volatile objects with padding or other uninitialized fields as an optimization. for gcc/testsuite/ChangeLog * gcc.dg/init-pad-1.c: New.
2024-08-31Daily bump.GCC Administrator6-1/+115
2024-08-30c++: add fixed test [PR101099]Marek Polacek1-0/+6
-fconcepts-ts is no longer supported so this can't be made to ICE anymore. PR c++/101099 gcc/testsuite/ChangeLog: * g++.dg/concepts/pr101099.C: New test.
2024-08-30c++: add fixed test [PR115616]Marek Polacek1-0/+24
This got fixed by r15-2120. PR c++/115616 gcc/testsuite/ChangeLog: * g++.dg/template/friend83.C: New test.
2024-08-30c++: fix used but not defined warning for friendJason Merrill2-1/+14
Here limit_bad_template_recursion avoids instantiating foo, and then we wrongly warn that it isn't defined, because as a non-template (but templated) friend DECL_TEMPLATE_INSTANTIATION is false. gcc/cp/ChangeLog: * decl2.cc (c_parse_final_cleanups): Also check DECL_FRIEND_PSEUDO_TEMPLATE_INSTANTIATION. gcc/testsuite/ChangeLog: * g++.dg/diagnostic/used-inline1.C: New test.
2024-08-30Fortran: default-initialization of derived-type function results [PR98454]Harald Anlauf4-2/+163
gcc/fortran/ChangeLog: PR fortran/98454 * resolve.cc (resolve_symbol): Add default-initialization of non-allocatable, non-pointer derived-type function results. gcc/testsuite/ChangeLog: PR fortran/98454 * gfortran.dg/alloc_comp_class_4.f03: Remove bogus pattern. * gfortran.dg/pdt_26.f03: Adjust expected count. * gfortran.dg/derived_result_3.f90: New test.
2024-08-30gdbhooks: Fix printing of vec with vl_ptr layoutAlex Coplan1-3/+27
As it stands, the pretty printing of GCC's vecs by gdbhooks.py only handles vectors with vl_embed layout. As such, when encountering a vec with vl_ptr layout, GDB would print a diagnostic like: gdb.error: There is no member or method named m_vecpfx. when (e.g.) any such vec occurred in a backtrace. This patch extends VecPrinter.children to also handle vl_ptr vectors. gcc/ChangeLog: * gdbhooks.py (VEC_KIND_EMBED): New. (VEC_KIND_PTR): New. (get_vec_kind): New. (VecPrinter.children): Also handle vectors with vl_ptr layout.
2024-08-30Don't remove /usr/lib and /lib from when passing to the linker [PR97304/104707]Andrew Pinski1-18/+6
With newer ld, the default search library path does not include /usr/lib nor /lib but the driver decides to not pass -L down to the link for these and then in some/most cases libc is not found. This code dates from at least 1992 and it is done in a way which is not safe and does not make sense. So let's remove it. Bootstrapped and tested on x86_64-linux-gnu (which defaults to being a multilib). gcc/ChangeLog: PR driver/104707 PR driver/97304 * gcc.cc (is_directory): Don't not include /usr/lib and /lib for library directory pathes. Remove library argument. (add_to_obstack): Update call to is_directory. (driver_handle_option): Likewise. (spec_path): Likewise. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-08-30middle-end: Remove integer_three_node [PR116537]Andrew Pinski4-4/+1
After the small expansion patch for __builtin_prefetch, the only use of integer_three_node is inside tree-ssa-loop-prefetch.cc so let's remove it as the loop prefetch pass is not enabled these days by default and having a tree node around just for that pass is a little wasteful. Integer constants are also shared these days so calling build_int_cst will use the cached node anyways. Bootstrapped and tested on x86_64-linux. PR middle-end/116537 gcc/ChangeLog: * tree-core.h (enum tree_index): Remove TI_INTEGER_THREE * tree-ssa-loop-prefetch.cc (issue_prefetch_ref): Call build_int_cst instead of using integer_three_node. * tree.cc (build_common_tree_nodes): Remove initialization of integer_three_node. * tree.h (integer_three_node): Delete. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2024-08-30expand: Small speed up expansion of __builtin_prefetchAndrew Pinski1-14/+14
This is a small speed up of the expansion of __builtin_prefetch. Basically for the optional arguments, no reason to call expand_normal on a constant integer that we know the value, just replace it with GEN_INT/const0_rtx instead. Bootstrapped and tested on x86_64-linux. gcc/ChangeLog: * builtins.cc (expand_builtin_prefetch): Rewrite expansion of the optional arguments to not expand known constants.
2024-08-30PR modula2/116181: m2rts fix -Wodr warningGaius Mulley2-31/+7
This patch fixes the -Wodr warning seen in pge-boot/m2rts.h when building pge. gcc/m2/ChangeLog: PR modula2/116181 * pge-boot/GM2RTS.h: Regenerate. * pge-boot/m2rts.h: Ditto. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>