riscv-gnu-toolchain/gcc.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2024-09-03	Drop file that should not have been committed.	Jeff Law	1	-1064/+0
	* J: Drop file that should not have been committed
2024-09-03	Zen5 tuning part 3: fix typo in previous patch	Jan Hubicka	1	-1/+1
	gcc/ChangeLog: * config/i386/x86-tune-sched.cc (ix86_fuse_mov_alu_p): Fix typo.
2024-09-03	Dump whether a SLP node represents load/store-lanes	Richard Biener	1	-2/+5
	This makes it easier to discover whether SLP load or store nodes participate in load/store-lanes accesses. * tree-vect-slp.cc (vect_print_slp_tree): Annotate load and store-lanes nodes.
2024-09-03	Fix missed peeling for gaps with SLP load-lanes	Richard Biener	1	-0/+1
	The following disables peeling for gap avoidance with using smaller vector accesses when using load-lanes. * tree-vect-stmts.cc (get_group_load_store_type): Only disable peeling for gaps by using smaller vectors when not using load-lanes.
2024-09-03	Zen5 tuning part 3: scheduler tweaks	Jan Hubicka	3	-3/+77
	this patch adds support for new fussion in znver5 documented in the optimization manual: The Zen5 microarchitecture adds support to fuse reg-reg MOV Instructions with certain ALU instructions. The following conditions need to be met for fusion to happen: - The MOV should be reg-reg mov with Opcode 0x89 or 0x8B - The MOV is followed by an ALU instruction where the MOV and ALU destination register match. - The ALU instruction may source only registers or immediate data. There cannot be any memory source. - The ALU instruction sources either the source or dest of MOV instruction. - If ALU instruction has 2 reg sources, they should be different. - The following ALU instructions can fuse with an older qualified MOV instruction: ADD ADC AND XOR OP SUB SBB INC DEC NOT SAL / SHL SHR SAR (I assume OP is OR) I also increased issue rate from 4 to 6. Theoretically znver5 can do more, but with our model we can't realy use it. Increasing issue rate to 8 leads to infinite loop in scheduler. Finally, I also enabled fuse_alu_and_branch since it is supported by znver5 (I think by earlier zens too). New fussion pattern moves quite few instructions around in common code: @@ -2210,13 +2210,13 @@ .cfi_offset 3, -32 leaq 63(%rsi), %rbx movq %rbx, %rbp + shrq $6, %rbp + salq $3, %rbp subq $16, %rsp .cfi_def_cfa_offset 48 movq %rdi, %r12 - shrq $6, %rbp - movq %rsi, 8(%rsp) - salq $3, %rbp movq %rbp, %rdi + movq %rsi, 8(%rsp) call _Znwm movq 8(%rsp), %rsi movl $0, 8(%r12) @@ -2224,8 +2224,8 @@ movq %rax, (%r12) movq %rbp, 32(%r12) testq %rsi, %rsi - movq %rsi, %rdx cmovns %rsi, %rbx + movq %rsi, %rdx sarq $63, %rdx shrq $58, %rdx sarq $6, %rbx which should help decoder bandwidth and perhaps also cache, though I was not able to measure off-noise effect on SPEC. gcc/ChangeLog: * config/i386/i386.h (TARGET_FUSE_MOV_AND_ALU): New tune. * config/i386/x86-tune-sched.cc (ix86_issue_rate): Updat for znver5. (ix86_adjust_cost): Add TODO about znver5 memory latency. (ix86_fuse_mov_alu_p): New. (ix86_macro_fusion_pair_p): Use it. * config/i386/x86-tune.def (X86_TUNE_FUSE_ALU_AND_BRANCH): Add ZNVER5. (X86_TUNE_FUSE_MOV_AND_ALU): New tune;
2024-09-03	Zen5 tuning part 2: disable gather and scatter	Jan Hubicka	1	-6/+6
	We disable gathers for zen4. It seems that gather has improved a bit compared to zen4 and Zen5 optimization manual suggests "Avoid GATHER instructions when the indices are known ahead of time. Vector loads followed by shuffles result in a higher load bandwidth." however the situation seems to be more complicated. gather is 5-10% loss on parest benchmark as well as 30% loss on sparse dot products in TSVC. Curiously enough breaking these out into microbenchmark reversed the situation and it turns out that the performance depends on how indices are distributed. gather is loss if indices are sequential, neutral if they are random and win for some strides (4, 8). This seems to be similar to earlier zens, so I think (especially for backporting znver5 support) that it makes sense to be conistent and disable gather unless we work out a good heuristics on when to use it. Since we typically do not know the indices in advance, I don't see how that can be done. I opened PR116582 with some examples of wins and loses gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_USE_GATHER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_2PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_4PARTS): Disable for ZNVER5. (X86_TUNE_USE_GATHER_8PARTS): Disable for ZNVER5. (X86_TUNE_USE_SCATTER_8PARTS): Disable for ZNVER5.
2024-09-03	ipa: Don't disable function parameter analysis for fat LTO	H.J. Lu	1	-2/+2
	Update analyze_parms not to disable function parameter analysis for -ffat-lto-objects. Tested on x86-64, there are no differences in zstd with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects". PR ipa/116410 * ipa-modref.cc (analyze_parms): Always analyze function parameter for LTO. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-09-03	[PR target/115921] Improve reassociation for rv64	Jeff Law	3	-4/+1083
	As Jovan pointed out in pr115921, we're not reassociating expressions like this on rv64: (x & 0x3e) << 12 It generates something like this: li a5,258048 slli a0,a0,12 and a0,a0,a5 We have a pattern that's designed to clean this up. Essentially reassociating the operations so that we don't need to load the constant resulting in something like this: andi a0,a0,63 slli a0,a0,12 That pattern wasn't working for certain constants due to its condition. The condition is trying to avoid cases where this kind of reassociation would hinder shadd generation on rv64. That condition was just written poorly. This patch tightens up that condition in a few ways. First, there's no need to worry about shadd cases if ZBA is not enabled. Second we can't use shadd if the shift value isn't 1, 2 or 3. Finally rather than open-coding one of the tests, we can use an existing operand predicate. The net is we'll start performing this transformation in more cases on rv64 while still avoiding reassociation if it would spoil shadd generation. PR target/115921 gcc/ * config/riscv/riscv.md (reassociate bitwise ops): Tighten test for cases we do not want reassociate. gcc/testsuite/ * gcc.target/riscv/pr115921.c: New test.
2024-09-03	Zen5 tuning part 1: avoid FMA chains	Jan Hubicka	1	-4/+5
	testing matrix multiplication benchmarks shows that FMA on a critical chain is a perofrmance loss over separate multiply and add. While the latency of 4 is lower than multiply + add (3+2) the problem is that all values needs to be ready before computation starts. While on znver4 AVX512 code fared well with FMA, it was because of the split registers. Znver5 benefits from avoding FMA on all widths. This may be different with the mobile version though. On naive matrix multiplication benchmark the difference is 8% with -O3 only since with -Ofast loop interchange solves the problem differently. It is 30% win, for example, on S323 from TSVC: real_t s323(struct args_t * func_args) { // recurrences // coupled recurrence initialise_arrays(__func__); gettimeofday(&func_args->t1, NULL); for (int nl = 0; nl < iterations/2; nl++) { for (int i = 1; i < LEN_1D; i++) { a[i] = b[i-1] + c[i] * d[i]; b[i] = a[i] + c[i] * e[i]; } dummy(a, b, c, d, e, aa, bb, cc, 0.); } gettimeofday(&func_args->t2, NULL); return calc_checksum(__func__); } gcc/ChangeLog: * config/i386/x86-tune.def (X86_TUNE_AVOID_128FMA_CHAINS): Enable for znver5. (X86_TUNE_AVOID_256FMA_CHAINS): Likewise. (X86_TUNE_AVOID_512FMA_CHAINS): Likewise.
2024-09-03	LTO/WPA: Ensure that output_offload_tables only writes table once [PR116535]	Tobias Burnus	6	-16/+16
	When ltrans was written concurrently, e.g. via -flto=N (N > 1, assuming sufficient partiations, e.g., via -flto-partition=max), output_offload_tables wrote the output tables once per fork. PR lto/116535 gcc/ChangeLog: * lto-cgraph.cc (output_offload_tables): Remove offload_ frees. * lto-streamer-out.cc (lto_output): Make call to it depend on lto_get_out_decl_state ()->output_offload_tables_p. * lto-streamer.h (struct lto_out_decl_state): Add output_offload_tables_p field. * tree-pass.h (ipa_write_optimization_summaries): Add bool argument. * passes.cc (ipa_write_summaries_1): Add bool output_offload_tables_p arg. (ipa_write_summaries): Update call. (ipa_write_optimization_summaries): Accept output_offload_tables_p. gcc/lto/ChangeLog: * lto.cc (stream_out): Update call to ipa_write_optimization_summaries to pass true for first partition.
2024-09-03	tree-optimization/116575 - avoid ICE with SLP mask_load_lane	Richard Biener	2	-2/+32
	The following avoids performing re-discovery with single lanes in the attempt to for the use of mask_load_lane as rediscovery will fail since a single lane of a mask load will appear permuted which isn't supported. PR tree-optimization/116575 * tree-vect-slp.cc (vect_analyze_slp): Properly compute the mask argument for vect_load/store_lanes_supported. When the load is masked for now avoid rediscovery. * gcc.dg/vect/pr116575.c: New testcase.
2024-09-03	i386: Fix vfpclassph non-optimizied intrin	Haochen Jiang	2	-2/+79
	The intrin for non-optimized got a typo in mask type, which will cause the high bits of __mmask32 being unexpectedly zeroed. The test does not fail under O0 with current 1b since the testcase is wrong. We need to include avx512-mask-type.h after SIZE is defined, or it will always be __mmask8. That problem also happened in AVX10.2 testcases. I will write a seperate patch to fix that. gcc/ChangeLog: * config/i386/avx512fp16intrin.h (_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32. (_mm512_fpclass_ph_mask): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.
2024-09-03	Do not assert NUM_POLY_INT_COEFFS != 1 early	Richard Biener	1	-1/+2
	The following moves the assert on NUM_POLY_INT_COEFFS != 1 after INTEGER_CST processing. * fold-const.cc (poly_int_binop): Move assert on NUM_POLY_INT_COEFFS after INTEGER_CST processing.
2024-09-03	lower-bitint: Fix up __builtin_{add,sub}_overflow{,_p} bitint lowering ↵	Jakub Jelinek	2	-2/+22
	[PR116501] The following testcase is miscompiled. The problem is in the last_ovf step. The second operand has signed _BitInt(513) type but has the MSB clear, so range_to_prec returns 512 for it (i.e. it fits into unsigned _BitInt(512)). Because of that the last step actually doesn't need to get the most significant bit from the second operand, but the code was deciding what to use purely from TYPE_UNSIGNED (type1) - if unsigned, use 0, otherwise sign-extend the last processed bit; but that in this case was set. We don't want to treat the positive operand as if it was negative regardless of the bit below that precision, and precN >= 0 indicates that the operand is in the [0, inf) range. 2024-09-03 Jakub Jelinek <jakub@redhat.com> PR tree-optimization/116501 * gimple-lower-bitint.cc (bitint_large_huge::lower_addsub_overflow): In the last_ovf case, use build_zero_cst operand not just when TYPE_UNSIGNED (typeN), but also when precN >= 0. * gcc.dg/torture/bitint-73.c: New test.
2024-09-03	ada: Add kludge for quirk of ancient 32-bit ABIs to previous change	Eric Botcazou	1	-2/+14
	Some ancient 32-bit ABIs, most notably that of x86/Linux, misalign double scalars in record types, so comparing DECL_ALIGN with TYPE_ALIGN directly may give the wrong answer for them. gcc/ada/ * gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Add kludge to cope with ancient 32-bit ABIs.
2024-09-03	ada: Plug loophole exposed by previous change	Eric Botcazou	1	-0/+3
	The change causes more temporaries to be created at call sites for unaligned actual parameters, thus revealing that the machinery does not properly deal with unconstrained nominal subtypes for them. gcc/ada/ * gcc-interface/trans.cc (create_temporary): Deal with types whose size is self-referential by allocating the maximum size.
2024-09-03	ada: Fix internal error with Atomic Volatile_Full_Access object	Eric Botcazou	1	-4/+6
	The initial implementation of the GNAT aspect/pragma Volatile_Full_Access made it incompatible with Atomic, because it was not decided whether the read-modify-write sequences generated by Volatile_Full_Access would need to be implemented atomically when Atomic was also specified, which would have required a compare-and-swap primitive from the target architecture. But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic in the process, answering the above question by the negative, so the incompatibility between Volatile_Full_Access and Atomic was lifted in Ada 2012 as well, unfortunately without adjusting the implementation. gcc/ada/ * gcc-interface/trans.cc (get_atomic_access): Deal specifically with nodes that are both Atomic and Volatile_Full_Access in Ada 2012.
2024-09-03	ada: Pass unaligned record components by copy in calls on all platforms	Eric Botcazou	1	-3/+2
	This has historically been done only on platforms requiring the strict alignment of memory references, but this can arguably be considered as being mandated by the language on all of them. gcc/ada/ * gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Take into account the alignment of the field on all platforms.
2024-09-03	ada: Fix internal error on pragma pack with discriminated record component	Eric Botcazou	1	-0/+2
	When updating the size after making a packable type in gnat_to_gnu_field, we fail to clear it again when it is not constant. gcc/ada/ * gcc-interface/decl.cc (gnat_to_gnu_field): Clear again gnu_size after updating it if it is not constant.
2024-09-03	ada: Simplify Note_Uplevel_Bound procedure	Marc Poulhiès	1	-103/+66
	The procedure Note_Uplevel_Bound was implemented as a custom expression tree walk. This change replaces this custom tree traversal by a more idiomatic use of Traverse_Proc. gcc/ada/ * exp_unst.adb (Check_Static_Type::Note_Uplevel_Bound): Refactor to use the generic Traverse_Proc. (Check_Static_Type): Adjust calls to Note_Uplevel_Bound as the previous second parameter was unused, so removed.
2024-09-03	ada: Transform Length attribute references for non-Strict overflow mode.	Steve Baird	1	-1/+68
	The non-strict overflow checking code does a better job of eliminating overflow checks if given an expression consisting only of predefined operators (including relationals), literals, identifiers, and conditional expressions. If it is both feasible and useful, rewrite a Length attribute reference as such an expression. "Feasible" means "index type is same type as attribute reference type, so we can rewrite without using type conversions". "Useful" means "Overflow_Mode is something other than Strict, so there is value in making overflow check elimination easier". gcc/ada/ * exp_attr.adb (Expand_N_Attribute_Reference): If it makes sense to do so, then rewrite a Length attribute reference as an equivalent conditional expression.
2024-09-03	ada: Do not warn for partial access to Atomic Volatile_Full_Access objects	Eric Botcazou	1	-16/+30
	The initial implementation of the GNAT aspect/pragma Volatile_Full_Access made it incompatible with Atomic, because it was not decided whether the read-modify-write sequences generated by Volatile_Full_Access would need to be implemented atomically when Atomic was also specified, which would have required a compare-and-swap primitive from the target architecture. But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic in the process, answering the above question by the negative, so the incompatibility between Volatile_Full_Access and Atomic was lifted in Ada 2012 as well, but the implementation was not entirely adjusted. In Ada 2012, it does not make sense to warn for the partial access to an Atomic object if the object is also declared Volatile_Full_Access, since the object will be accessed as a whole in this case (like in Ada 2022). gcc/ada/ * sem_res.adb (Is_Atomic_Ref_With_Address): Rename into... (Is_Atomic_Non_VFA_Ref_With_Address): ...this and adjust the implementation to exclude Volatile_Full_Access objects. (Resolve_Indexed_Component): Adjust to above renaming. (Resolve_Selected_Component): Likewise.
2024-09-03	ada: Reject illegal array aggregates as per AI22-0106.	Steve Baird	1	-17/+97
	Implement the new legality rules of AI22-0106 which (as discussed in the AI) are needed to disallow constructs whose semantics would otherwise be poorly defined. gcc/ada/ * sem_aggr.adb (Resolve_Array_Aggregate): Implement the two new legality rules of AI11-0106. Add code to avoid cascading error messages.
2024-09-03	ada: Fix Finalize_Storage_Only bug in b-i-p calls	Bob Duff	1	-9/+5
	Do not pass null for the Collection parameter when Finalize_Storage_Only is in effect. If the collection is null in that case, we will blow up later when we deallocate the object. gcc/ada/ * exp_ch6.adb (Add_Collection_Actual_To_Build_In_Place_Call): Remove Finalize_Storage_Only from the code that checks whether to pass null to the Collection parameter. Having done that, we don't need to check for Is_Library_Level_Entity, because No_Heap_Finalization requires that. And if we ever change No_Heap_Finalization to allow nested access types, we will still want to pass null. Note that the comment "Such a type lacks a collection." is incorrect in the case of Finalize_Storage_Only; such types have a collection.
2024-09-03	SVE intrinsics: Fold constant operands for svmul.	Jennifer Schmitz	2	-1/+316
	This patch implements constant folding for svmul by calling gimple_folder::fold_const_binary with tree_code MULT_EXPR. Tests were added to check the produced assembly for different predicates, signed and unsigned integers, and the svmul_n_* case. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold): Try constant folding. gcc/testsuite/ * gcc.target/aarch64/sve/const_fold_mul_1.c: New test.
2024-09-03	SVE intrinsics: Fold constant operands for svdiv.	Jennifer Schmitz	4	-3/+410
	This patch implements constant folding for svdiv: The new function aarch64_const_binop was created, which - in contrast to int_const_binop - does not treat operations as overflowing. This function is passed as callback to vector_const_binop from the new gimple_folder method fold_const_binary, if the predicate is ptrue or predication is _x. From svdiv_impl::fold, fold_const_binary is called with TRUNC_DIV_EXPR as tree_code. In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0 for division by 0, as defined in the semantics for svdiv. Tests were added to check the produced assembly for different predicates, signed and unsigned integers, and the svdiv_n_* case. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold): Try constant folding. * config/aarch64/aarch64-sve-builtins.h: Declare gimple_folder::fold_const_binary. * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop): New function to fold binary SVE intrinsics without overflow. (gimple_folder::fold_const_binary): New helper function for constant folding of SVE intrinsics. gcc/testsuite/ * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
2024-09-03	SVE intrinsics: Refactor const_binop to allow constant folding of intrinsics.	Jennifer Schmitz	2	-89/+105
	This patch sets the stage for constant folding of binary operations for SVE intrinsics: In fold-const.cc, the code for folding vector constants was moved from const_binop to a new function vector_const_binop. This function takes a function pointer as argument specifying how to fold the vector elements. The intention is to call vector_const_binop from the backend with an aarch64-specific callback function. The code in const_binop for folding operations where the first operand is a vector constant and the second argument is an integer constant was also moved into vector_const_binop to to allow folding of binary SVE intrinsics where the second operand is an integer (_n). To allow calling poly_int_binop from the backend, the latter was made public. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * fold-const.h: Declare vector_const_binop. * fold-const.cc (const_binop): Remove cases for vector constants. (vector_const_binop): New function that folds vector constants element-wise. (int_const_binop): Remove call to wide_int_binop. (poly_int_binop): Add call to wide_int_binop.
2024-09-03	Handle mixing REALPART/IMAGPART with other components in SLP groups	Richard Biener	1	-2/+4
	The following makes sure we handle a SLP load/store group from a structure with complex and scalar members. This for example happens in gcc.target/i386/pr106010-9a.c. * tree-vect-slp.cc (vect_build_slp_tree_1): Handle mixing all of handled components besides ARRAY_RANGE_REF, drop handling of INDIRECT_REF.
2024-09-03	Correctly handle store IFNs in vect_get_vector_types_for_stmt	Richard Biener	1	-4/+4
	Currently vect_get_vector_types_for_stmt only special-cases IFN_MASK_STORE but there are now very many variants and simply passing analysis without setting VECTYPE will ICE duing SLP discovery (noticed with IFN_SCATTER_STORE). The following properly uses internal_store_fn_p. I also noticed we're unnecessarily handing those again to determine the scalar type but there should always be a data reference for them. tree-vect-stmts.cc (vect_get_vector_types_for_stmt): Handle all internal_store_fn_p the same. Remove special-casing for the scalar_type of IFN_MASK_STORE.
2024-09-03	i386: Support partial vectorized V2BF/V4BF smaxmin	Levy Hsu	2	-0/+55
	This patch supports sminmax for partial vectorized V2BF/V4BF. gcc/ChangeLog: * config/i386/mmx.md (<code><mode>3): New define_expand for V2BF/V4BFsmaxmin gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-vector-smaxmin-1.c: New test.
2024-09-03	i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrt	Levy Hsu	3	-0/+116
	This patch introduces new mode iterators and expands for the i386 architecture to support partial vectorization of bf16 operations using AVX10.2 instructions. gcc/ChangeLog: * config/i386/mmx.md (VBF_32_64): New mode iterator for partial vectorized V2BF/V4BF. (<insn><mode>3): New define_expand for plusminusmultdiv. (sqrt<mode>2): New define_expand for sqrt. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c: New test. * gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c: New test.
2024-09-03	RISC-V: Support form 1 of integer scalar .SAT_ADD	Pan Li	14	-0/+417
	This patch would like to support the scalar signed ssadd pattern for the RISC-V backend. Aka Form 1: #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_1 (T x, T y) \ { \ T sum = (UT)x + (UT)y; \ return (x ^ y) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX) Before this patch: 10 │ sat_s_add_int64_t_fmt_1: 11 │ mv a5,a0 12 │ add a0,a0,a1 13 │ xor a1,a5,a1 14 │ not a1,a1 15 │ xor a4,a5,a0 16 │ and a1,a1,a4 17 │ blt a1,zero,.L5 18 │ ret 19 │ .L5: 20 │ srai a5,a5,63 21 │ li a0,-1 22 │ srli a0,a0,1 23 │ xor a0,a5,a0 24 │ ret After this patch: 10 │ sat_s_add_int64_t_fmt_1: 11 │ add a2,a0,a1 12 │ xor a1,a0,a1 13 │ xor a5,a0,a2 14 │ srli a5,a5,63 15 │ srli a1,a1,63 16 │ xori a1,a1,1 17 │ and a5,a5,a1 18 │ srai a4,a0,63 19 │ li a3,-1 20 │ srli a3,a3,1 21 │ xor a3,a3,a4 22 │ neg a4,a5 23 │ and a3,a3,a4 24 │ addi a5,a5,-1 25 │ and a0,a2,a5 26 │ or a0,a0,a3 27 │ ret The below test suites are passed for this patch: 1. The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_expand_ssadd): Add new func decl for expanding ssadd. * config/riscv/riscv.cc (riscv_gen_sign_max_cst): Add new func impl to gen the max int rtx. (riscv_expand_ssadd): Add new func impl to expand the ssadd. * config/riscv/riscv.md (ssadd<mode>3): Add new pattern for signed integer .SAT_ADD. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_arith_data.h: Add test data. * gcc.target/riscv/sat_s_add-1.c: New test. * gcc.target/riscv/sat_s_add-2.c: New test. * gcc.target/riscv/sat_s_add-3.c: New test. * gcc.target/riscv/sat_s_add-4.c: New test. * gcc.target/riscv/sat_s_add-run-1.c: New test. * gcc.target/riscv/sat_s_add-run-2.c: New test. * gcc.target/riscv/sat_s_add-run-3.c: New test. * gcc.target/riscv/sat_s_add-run-4.c: New test. * gcc.target/riscv/scalar_sat_binary_run_xxx.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-03	Daily bump.	GCC Administrator	10	-1/+589

2024-09-03	MIPS: Support vector reduc for MSA	YunQiang Su	5	-0/+293
	We have SHF.fmt and HADD_S/U.fmt with MSA, which can be used for vector reduc. For min/max for U8/S8, we can SHF.B W1, W0, 0xb1 # swap byte inner every half MIN.B W1, W1, W0 SHF.H W2, W1, 0xb1 # swap half inner every word MIN.B W2, W2, W1 SHF.W W3, W2, 0xb1 # swap word inner every doubleword MIN.B W4, W3, W2 SHF.W W4, W4, 0x4e # swap the two doubleword MIN.B W4, W4, W3 For plus of S8/U8, we can use HADD HADD.H W0, W0, W0 HADD.W W0, W0, W0 HADD.D W0, W0, W0 SHF.W W1, W0, 0x4e # swap the two doubleword ADDV.D W1, W1, W0 COPY_S.B T0, W1 # COPY_U.B for U8 We can do similar for S16/U16/S32/U32/S64/U64/FLOAT/DOUBLE. gcc * config/mips/mips-msa.md: (MSA_NO_HADD): we have HADD for S8/U8/S16/U16/S32/U32 only. (reduc_smin_scal_<mode>): New define pattern. (reduc_smax_scal_<mode>): Ditto. (reduc_umin_scal_<mode>): Ditto. (reduc_umax_scal_<mode>): Ditto. (reduc_plus_scal_<mode>): Ditto. (reduc_plus_scal_v4si): Ditto. (reduc_plus_scal_v8hi): Ditto. (reduc_plus_scal_v16qi): Ditto. (reduc_<optab>_scal_<mode>): Ditto. * config/mips/mips-protos.h: New function mips_expand_msa_reduc. * config/mips/mips.cc: New function mips_expand_msa_reduc. * config/mips/mips.md: Define any_bitwise iterator. gcc/testsuite: * gcc.target/mips/msa-reduc.c: New tests.
2024-09-02	testsuite: Fix optimize_one.c FAIL on i686-linux	Jakub Jelinek	1	-1/+1
	The test FAILs on i686-linux because -mfpmath=sse is used without -msse2 being enabled. 2024-09-02 Jakub Jelinek <jakub@redhat.com> * gcc.target/i386/optimize_one.c: Add -msse2 to dg-options.
2024-09-02	[testsuite] add linkonly to dg-additional-sources [PR115295]	Alexandre Oliva	8	-21/+39
	The D testsuite shows it was a mistake to assume that dg-additional-sources are never to be used for compilation tests. Even if an output file is specified for compilation, extra module files can be named and used in the compilation without being flagged as errors. Introduce a 'linkonly' flag for dg-additional-sources, and use it in pr95401.cc and other vector tests that default to run, so that its additional sources get discarded when vector tests downgrade to compile-only. This reverts previous workarounds for this very circumstance, that relied on being able to run vector tests anyway, even after failing to detect runtime or hardware vector support. for gcc/ChangeLog PR d/115295 * doc/sourcebuild.texi (dg-additional-sources): Add linkonly. for gcc/testsuite/ChangeLog PR d/115295 * g++.dg/vect/pr95401.cc: Add linkonly to dg-additional-sources. * g++.dg/vect/pr68762-1.cc: Likewise. * g++.dg/vect/simd-clone-3.cc: Likewise. * g++.dg/vect/simd-clone-5.cc: Likewise. * gcc.dg/vect/vect-simd-clone-10.c: Likewise. Drop dg-do run. * gcc.dg/vect/vect-simd-clone-12.c: Likewise. Likewise. * lib/gcc-defs.exp (additional_sources_omit_on_compile): New. (dg-additional-sources): Add to it on linkonly. (dg-additional-files-options): Omit select sources on compile.
2024-09-02	amdgcn: Remove TARGET_GCN5_PLUS	Andrew Stubbs	5	-135/+59
	Now that GCN3 support is gone, TARGET_GCN5_PLUS always evaluates to true, so we can make that code unconditional, and remove all the "else" cases. The ISA features TARGET_GLOBAL_ADDRSPACE, TARGET_FLAT_OFFSETS, TARGET_EXPLICIT_CARRY, and TARGET_MULTIPLY_IMMEDIATE, are similarly also redundant and can be made unconditional. The naming of the "gcc_version" attribute has been confusing since the "rdna" attribute was added and this makes it worse, so it has been renamed to "cdna". The add-with-carry assembler mnemonics no longer have two forms, so '%^' can be removed. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_GCN5_PLUS): Delete. (TARGET_GLOBAL_ADDRSPACE): Delete. (TARGET_FLAT_OFFSETS): Delete. (TARGET_EXPLICIT_CARRY): Delete. (TARGET_MULTIPLY_IMMEDIATE): Delete. * config/gcn/gcn-valu.md (mov<mode>): Rename "gcn_version" to "cdna". (mov<mode>_4reg): Likewise. (@mov<mode>_sgprbase): Likwise. (gather<mode>_insn_1offset<exec>): Likewise. (gather<mode>_insn_1offset_ds<exec>): Likewise. (gather<mode>_insn_2offsets<exec>): Likewise. (scatter<mode>_insn_1offset<exec_scatter>): Likewise. (scatter<mode>_insn_1offset_ds<exec_scatter>): Likewise. (scatter<mode>_insn_2offsets<exec_scatter>): Likewise. (gather<mode>_insn_1offset<exec>): Remove TARGET_FLAT_OFFSETS conditionals. (scatter<mode>_insn_1offset<exec_scatter>): Likewise. (scatter<mode>_insn_1offset<exec_scatter>): Likewise. (add<mode>3<exec_clobber>): Use "_co" instead of "%^". (add<mode>3_dup<exec_clobber>): Likewise. (add<mode>3_vcc<exec_vcc>): Likewise. (add<mode>3_vcc_dup<exec_vcc>): Likewise. (addc<mode>3<exec_vcc>): Likewise. (sub<mode>3<exec_clobber>): Likewise. (sub<mode>3_vcc<exec_vcc>): Likewise. (subc<mode>3<exec_vcc>): Likewise. (plus_carry_dpp_shr_<mode>): Likewise. (plus_carry_in_dpp_shr_<mode>): Likewise. * config/gcn/gcn.cc (gcn_flat_address_p): Remove TARGET_FLAT_OFFSETS conditionals. (gcn_addr_space_legitimate_address_p): Likewise. (gcn_addr_space_legitimize_address): Likewise. (gcn_expand_scalar_to_vector_address): Likewise. (print_operand_address): Likewise, and TARGET_GLOBAL_ADDRSPACE also. (print_operand): Remove "%^" operand code. Remove TARGET_GLOBAL_ADDRSPACE assertion. * config/gcn/gcn.h (STACK_ADDR_SPACE): Remove GCN5 conditional. * config/gcn/gcn.md (gcn_version): Rename attribute ... (cdna): ... to this, and remove the gcn3 and gcn5 values. (enabled): Replace old "gcn_version" logic with new "cdna" logic. (mov<mode>_insn): Rename "gcn_version" to "cdna". (movti_insn): Likewise. (addsi3): Use "_co" instead of "%^". (addsi3_scalar_carry): Likewise. (addsi3_scalar_carry_cst): Likewise. (addcsi3_scalar): Likewise. (addcsi3_scalar_zero): Likewise. (addptrdi3): Likewise. (subsi3): Likewise. (<su>mulsi3_highpart): Remove TARGET_MULTIPLY_IMMEDIATE conditions. (<su>mulsi3_highpart_reg): Remove "gcn_version" attribute. (muldi3): Likewise. (atomic_fetch_<bare_mnemonic><mode>): Likewise. (atomic_<bare_mnemonic><mode>): Likewise. (sync_compare_and_swap<mode>_insn): Likewise. (atomic_load<mode>): Likewise. (atomic_store<mode>): Likewise. (atomic_exchange<mode>): Likewise. (<su>mulsi3_highpart_imm): Remove both TARGET_MULTIPLY_IMMEDIATE and "gcn_version". (<su>mulsidi3): Likewise. (<su>mulsidi3_imm): Likewise.
2024-09-02	amdgcn: Remove TARGET_GCN3	Andrew Stubbs	4	-31/+7
	The only GCN3 ISA device was remove (Fiji, gfx803) so all the GCN3-specific code and features can be removed from the back-end. gcc/ChangeLog: * config/gcn/gcn-opts.h (enum gcn_isa): Delete ISA_GCN3. (TARGET_GCN3): Delete. (TARGET_GCN3_PLUS): Delete. (TARGET_M0_LDS_LIMIT): Delete. * config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>): Remove TARGET_GCN3 from conditions. (<reduc_op>_dpp_shr_<mode>): Likewise. config/gcn/gcn.cc (enum gcn_isa): Change default to ISA_GCN5. (gcn_expand_prologue): Remove TARGET_M0_LDS_LIMIT feature. (gcn_expand_reduc_scalar): Remove TARGET_GCN3 conditions. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Remove TARGET_GCN3.
2024-09-02	amdgcn: remove gfx803 "Fiji" support	Andrew Stubbs	10	-64/+18
	The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and hasn't worked properly with the drivers since about ROCm 4. This patch removes the device from GCC options and documentation, and removes the direct mentions from the internals. The TARGET_GCN3 support in the back-end is now unused and can be removed (in a follow-up patch). gcc/ChangeLog: * config.gcc (amdgcn--): Remove "fiji" from with_arch checks. * config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): Remove fiji alternative. (NO_XNACK): Likewise. (NO_SRAM_ECC): Likewise. (ASM_SPEC): Remove "%{}" around ABI_VERSION_SPEC. * config/gcn/gcn-opts.h (enum processor_type): Remove PROCESSOR_FIJI. (TARGET_FIJI): Delete. * config/gcn/gcn.cc (gcn_option_override): Remove Fiji. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Likewise. * config/gcn/gcn.opt (gpu_type): Likewise. (march, mtune): Change default to PROCESSOR_VEGA10. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX803): Delete. (copy_early_debug_info): Remove elf_flags_actual. Use ELFABIVERSION_AMDGPU_HSA_V4 unconditionally. (get_arch): Remove Fiji. (main): Remove gfx803. * config/gcn/t-omp-device (omp-device-properties-gcn): Remove fiji and gfx803. * doc/install.texi (amdgcn--): Remove fiji and special instructions. doc/invoke.texi: Remove fiji. libgomp/ChangeLog: * libgomp.texi: Remove fiji and gfx803. * testsuite/libgomp.c/declare-variant-4.h: Remove fiji and gfx803. * testsuite/libgomp.c/declare-variant-4-fiji.c: Removed. * testsuite/libgomp.c/declare-variant-4-gfx803.c: Removed.
2024-09-02	PR modula2/116557 Remove physical address from the GPL header comment	Gaius Mulley	40	-137/+116
	This patch removes the physical address from all the header comments in the m2 subdirectory. The physical address is replaced with the text "You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>." instead. gcc/m2/ChangeLog: PR modula2/116557 * gm2-lang.cc: Replace physical address with URL in GPL header. * gm2-lang.h: Ditto. * images/LICENSE.IMG: Ditto. * m2-tree.def: Ditto. * mc-boot/GIndexing.cc: Ditto. * mc-boot/Gkeyc.cc: Ditto. * mc-boot/Glists.cc: Ditto. * mc-boot/GmcComp.cc: Ditto. * mc-boot/GmcDebug.cc: Ditto. * mc-boot/GmcFileName.cc: Ditto. * mc-boot/GmcMetaError.cc: Ditto. * mc-boot/GmcOptions.cc: Ditto. * mc-boot/GmcPreprocess.cc: Ditto. * mc-boot/GmcPretty.cc: Ditto. * mc-boot/GmcPrintf.cc: Ditto. * mc-boot/GmcQuiet.cc: Ditto. * mc-boot/GmcReserved.cc: Ditto. * mc-boot/GmcSearch.cc: Ditto. * mc-boot/GmcStack.cc: Ditto. * mc/Indexing.mod: Ditto. * mc/keyc.mod: Ditto. * mc/lists.mod: Ditto. * mc/mcComp.mod: Ditto. * mc/mcDebug.mod: Ditto. * mc/mcFileName.mod: Ditto. * mc/mcMetaError.mod: Ditto. * mc/mcOptions.mod: Ditto. * mc/mcPreprocess.mod: Ditto. * mc/mcPretty.mod: Ditto. * mc/mcPrintf.mod: Ditto. * mc/mcQuiet.mod: Ditto. * mc/mcReserved.mod: Ditto. * mc/mcSearch.mod: Ditto. * mc/mcStack.mod: Ditto. * tools-src/buildpg: Ditto. * tools-src/calcpath: Ditto. * tools-src/checkmeta.py: Ditto. * tools-src/def2doc.py: Ditto. * tools-src/makeSystem: Ditto. * tools-src/tidydates.py: Ditto. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-09-02	testsuite: Rename scanltranstree.exp -> scanltrans.exp	Alex Coplan	3	-2/+2
	Since r15-3254-g3f51f0dc88ec21c1ec79df694200f10ef85915f4 added scan-ltrans-rtl* variants to scanltranstree.exp, it no longer makes sense to have "tree" in the name. This renames the file accordingly and updates users. libatomic/ChangeLog: * testsuite/lib/libatomic.exp: Load scanltrans.exp instead of scanltranstree.exp. libgomp/ChangeLog: * testsuite/lib/libgomp.exp: Load scanltrans.exp instead of scanltranstree.exp. libitm/ChangeLog: * testsuite/lib/libitm.exp: Load scanltrans.exp instead of scanltranstree.exp. libphobos/ChangeLog: * testsuite/lib/libphobos-dg.exp: Load scanltrans.exp instead of scanltranstree.exp. libvtv/ChangeLog: * testsuite/lib/libvtv.exp: Load scanltrans.exp instead of scanltranstree.exp. gcc/testsuite/ChangeLog: * gcc.dg-selftests/dg-final.exp: Load scanltrans.exp instead of scanltranstree.exp. * lib/gcc-dg.exp: Likewise. * lib/scanltranstree.exp: Rename to ... * lib/scanltrans.exp: ... this.
2024-09-02	Rename gimple_asm_input_p to gimple_asm_basic_p	Richard Sandiford	7	-14/+24
	Following on from the earlier tree rename, this patch renames gimple_asm_input_p to gimple_asm_basic_p, and similarly for related names. gcc/ * doc/gimple.texi (gimple_asm_basic_p): Document. (gimple_asm_set_basic): Likewise. * gimple.h (GF_ASM_INPUT): Rename to... (GF_ASM_BASIC): ...this. (gimple_asm_set_input): Rename to... (gimple_asm_set_basic): ...this. (gimple_asm_input_p): Rename to... (gimple_asm_basic_p): ...this. * cfgexpand.cc (expand_asm_stmt): Update after above renaming. * gimple.cc (gimple_asm_clobbers_memory_p): Likewise. * gimplify.cc (gimplify_asm_expr): Likewise. * ipa-icf-gimple.cc (func_checker::compare_gimple_asm): Likewise. * tree-cfg.cc (stmt_can_terminate_bb_p): Likewise.
2024-09-02	Rename ASM_INPUT_P to ASM_BASIC_P	Richard Sandiford	11	-19/+24
	ASM_INPUT_P is so named because it causes the eventual rtl insn pattern to be a top-level ASM_INPUT rather than an ASM_OPERANDS. However, this name has caused confusion, partly due to earlier documentation. The name also sounds related to ASM_INPUTS but is for a different piece of state. This patch renames it to ASM_BASIC_P, with the inverse meaning an extended asm. ("Basic asm" is the term used in extend.texi.) gcc/ * doc/generic.texi (ASM_BASIC_P): Document. * tree.h (ASM_INPUT_P): Rename to... (ASM_BASIC_P): ...this. (ASM_VOLATILE_P, ASM_INLINE_P): Reindent. * gimplify.cc (gimplify_asm_expr): Update after above renaming. * tree-core.h (tree_base): Likewise. gcc/c/ * c-typeck.cc (build_asm_expr): Rename ASM_INPUT_P to ASM_BASIC_P. gcc/cp/ * pt.cc (tsubst_stmt): Rename ASM_INPUT_P to ASM_BASIC_P. * parser.cc (cp_parser_asm_definition): Likewise. gcc/d/ * toir.cc (IRVisitor): Rename ASM_INPUT_P to ASM_BASIC_P. gcc/jit/ * jit-playback.cc (playback::block::add_extended_asm): Rename ASM_INPUT_P to ASM_BASIC_P. gcc/m2/ * gm2-gcc/m2block.cc (flush_pending_note): Rename ASM_INPUT_P to ASM_BASIC_P. * gm2-gcc/m2statement.cc (m2statement_BuildAsm): Likewise.
2024-09-02	lto/lto.cc: Fix build with not HAVE_WORKING_FORK	Tobias Burnus	1	-0/+2
	gcc/lto/ChangeLog: * lto.cc: Add missing HAVE_WORKING_FORK.
2024-09-02	lto-wrapper: Honor -save-temps for ltrans' makefile	Tobias Burnus	1	-1/+4
	gcc/ChangeLog: * lto-wrapper.cc (run_gcc): Honor -save-temps for makefile name.
2024-09-02	ada: Diagnose too large size clause on floating-point type	Eric Botcazou	1	-0/+4
	The problem is that the size clause changes the floating-point format used for the type, but it must not when this format is the widest format that is supported in hardware on the target. Instead a padding type must be built and the associated warning given. gcc/ada/ * gcc-interface/decl.cc (gnat_to_gnu_entity): Cap the Esize of a floating-point type to the size of the widest format supported in hardware if it is explicity defined.
2024-09-02	ada: Create usage entry for -gnatw_l	Viljar Indus	3	-7/+9
	gcc/ada/ * doc/gnat_ugn/building_executable_programs_with_gnat.rst: update documentation for the -gnatw_l switch. * usage.adb: Add -gnatw_l entry. * gnat_ugn.texi: Regenerate.
2024-09-02	ada: Fix standard output stream for gnatcmd output	Ronan Desplanques	1	-1/+4
	Before this patch, the gnat command sent to standard error pieces of information that are a better match for standard output. This patch makes this information go to standard output. gcc/ada/ * gnatcmd.adb (GNATCmd): Fix standard output stream.
2024-09-02	ada: Fix minor issues in -gnaty0's documentation	Ronan Desplanques	2	-7/+7
	Before this patch, the documentation of -gnaty0 used 0-based indexing for column numbers while 1-based indexing is used everywhere else. This patch makes this documentation use 1-based indexing, and also adds a missing parenthesis. gcc/ada/ * doc/gnat_ugn/building_executable_programs_with_gnat.rst: Fix minor issues. * gnat_ugn.texi: Regenerate.
2024-09-02	ada: Documentation for generic type inference	Bob Duff	2	-35/+202
	...plus minor improvements to existing documentation. gcc/ada/ * doc/gnat_rm/gnat_language_extensions.rst: I assume "extended set of extensions" was a typo for "experimental set of extensions", because "extended extensions" is repetitive and redundant. "in addition" clarifies that the one subsumes the other. Add a reminder at the start of each subsection about what switch/pragma enables what extensions. Add new section about "Inference of Dependent Types in Generic Instantiations". * gnat_rm.texi: Regenerate.