aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-09-03ada: Pass unaligned record components by copy in calls on all platformsEric Botcazou1-3/+2
This has historically been done only on platforms requiring the strict alignment of memory references, but this can arguably be considered as being mandated by the language on all of them. gcc/ada/ * gcc-interface/trans.cc (addressable_p) <COMPONENT_REF>: Take into account the alignment of the field on all platforms.
2024-09-03ada: Fix internal error on pragma pack with discriminated record componentEric Botcazou1-0/+2
When updating the size after making a packable type in gnat_to_gnu_field, we fail to clear it again when it is not constant. gcc/ada/ * gcc-interface/decl.cc (gnat_to_gnu_field): Clear again gnu_size after updating it if it is not constant.
2024-09-03ada: Simplify Note_Uplevel_Bound procedureMarc Poulhiès1-103/+66
The procedure Note_Uplevel_Bound was implemented as a custom expression tree walk. This change replaces this custom tree traversal by a more idiomatic use of Traverse_Proc. gcc/ada/ * exp_unst.adb (Check_Static_Type::Note_Uplevel_Bound): Refactor to use the generic Traverse_Proc. (Check_Static_Type): Adjust calls to Note_Uplevel_Bound as the previous second parameter was unused, so removed.
2024-09-03ada: Transform Length attribute references for non-Strict overflow mode.Steve Baird1-1/+68
The non-strict overflow checking code does a better job of eliminating overflow checks if given an expression consisting only of predefined operators (including relationals), literals, identifiers, and conditional expressions. If it is both feasible and useful, rewrite a Length attribute reference as such an expression. "Feasible" means "index type is same type as attribute reference type, so we can rewrite without using type conversions". "Useful" means "Overflow_Mode is something other than Strict, so there is value in making overflow check elimination easier". gcc/ada/ * exp_attr.adb (Expand_N_Attribute_Reference): If it makes sense to do so, then rewrite a Length attribute reference as an equivalent conditional expression.
2024-09-03ada: Do not warn for partial access to Atomic Volatile_Full_Access objectsEric Botcazou1-16/+30
The initial implementation of the GNAT aspect/pragma Volatile_Full_Access made it incompatible with Atomic, because it was not decided whether the read-modify-write sequences generated by Volatile_Full_Access would need to be implemented atomically when Atomic was also specified, which would have required a compare-and-swap primitive from the target architecture. But Ada 2022 introduced Full_Access_Only and retrofitted it into Atomic in the process, answering the above question by the negative, so the incompatibility between Volatile_Full_Access and Atomic was lifted in Ada 2012 as well, but the implementation was not entirely adjusted. In Ada 2012, it does not make sense to warn for the partial access to an Atomic object if the object is also declared Volatile_Full_Access, since the object will be accessed as a whole in this case (like in Ada 2022). gcc/ada/ * sem_res.adb (Is_Atomic_Ref_With_Address): Rename into... (Is_Atomic_Non_VFA_Ref_With_Address): ...this and adjust the implementation to exclude Volatile_Full_Access objects. (Resolve_Indexed_Component): Adjust to above renaming. (Resolve_Selected_Component): Likewise.
2024-09-03ada: Reject illegal array aggregates as per AI22-0106.Steve Baird1-17/+97
Implement the new legality rules of AI22-0106 which (as discussed in the AI) are needed to disallow constructs whose semantics would otherwise be poorly defined. gcc/ada/ * sem_aggr.adb (Resolve_Array_Aggregate): Implement the two new legality rules of AI11-0106. Add code to avoid cascading error messages.
2024-09-03ada: Fix Finalize_Storage_Only bug in b-i-p callsBob Duff1-9/+5
Do not pass null for the Collection parameter when Finalize_Storage_Only is in effect. If the collection is null in that case, we will blow up later when we deallocate the object. gcc/ada/ * exp_ch6.adb (Add_Collection_Actual_To_Build_In_Place_Call): Remove Finalize_Storage_Only from the code that checks whether to pass null to the Collection parameter. Having done that, we don't need to check for Is_Library_Level_Entity, because No_Heap_Finalization requires that. And if we ever change No_Heap_Finalization to allow nested access types, we will still want to pass null. Note that the comment "Such a type lacks a collection." is incorrect in the case of Finalize_Storage_Only; such types have a collection.
2024-09-03SVE intrinsics: Fold constant operands for svmul.Jennifer Schmitz2-1/+316
This patch implements constant folding for svmul by calling gimple_folder::fold_const_binary with tree_code MULT_EXPR. Tests were added to check the produced assembly for different predicates, signed and unsigned integers, and the svmul_n_* case. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svmul_impl::fold): Try constant folding. gcc/testsuite/ * gcc.target/aarch64/sve/const_fold_mul_1.c: New test.
2024-09-03SVE intrinsics: Fold constant operands for svdiv.Jennifer Schmitz4-3/+410
This patch implements constant folding for svdiv: The new function aarch64_const_binop was created, which - in contrast to int_const_binop - does not treat operations as overflowing. This function is passed as callback to vector_const_binop from the new gimple_folder method fold_const_binary, if the predicate is ptrue or predication is _x. From svdiv_impl::fold, fold_const_binary is called with TRUNC_DIV_EXPR as tree_code. In aarch64_const_binop, a case was added for TRUNC_DIV_EXPR to return 0 for division by 0, as defined in the semantics for svdiv. Tests were added to check the produced assembly for different predicates, signed and unsigned integers, and the svdiv_n_* case. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl::fold): Try constant folding. * config/aarch64/aarch64-sve-builtins.h: Declare gimple_folder::fold_const_binary. * config/aarch64/aarch64-sve-builtins.cc (aarch64_const_binop): New function to fold binary SVE intrinsics without overflow. (gimple_folder::fold_const_binary): New helper function for constant folding of SVE intrinsics. gcc/testsuite/ * gcc.target/aarch64/sve/const_fold_div_1.c: New test.
2024-09-03SVE intrinsics: Refactor const_binop to allow constant folding of intrinsics.Jennifer Schmitz2-89/+105
This patch sets the stage for constant folding of binary operations for SVE intrinsics: In fold-const.cc, the code for folding vector constants was moved from const_binop to a new function vector_const_binop. This function takes a function pointer as argument specifying how to fold the vector elements. The intention is to call vector_const_binop from the backend with an aarch64-specific callback function. The code in const_binop for folding operations where the first operand is a vector constant and the second argument is an integer constant was also moved into vector_const_binop to to allow folding of binary SVE intrinsics where the second operand is an integer (_n). To allow calling poly_int_binop from the backend, the latter was made public. The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression. OK for mainline? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> gcc/ * fold-const.h: Declare vector_const_binop. * fold-const.cc (const_binop): Remove cases for vector constants. (vector_const_binop): New function that folds vector constants element-wise. (int_const_binop): Remove call to wide_int_binop. (poly_int_binop): Add call to wide_int_binop.
2024-09-03Handle mixing REALPART/IMAGPART with other components in SLP groupsRichard Biener1-2/+4
The following makes sure we handle a SLP load/store group from a structure with complex and scalar members. This for example happens in gcc.target/i386/pr106010-9a.c. * tree-vect-slp.cc (vect_build_slp_tree_1): Handle mixing all of handled components besides ARRAY_RANGE_REF, drop handling of INDIRECT_REF.
2024-09-03Correctly handle store IFNs in vect_get_vector_types_for_stmtRichard Biener1-4/+4
Currently vect_get_vector_types_for_stmt only special-cases IFN_MASK_STORE but there are now very many variants and simply passing analysis without setting *VECTYPE will ICE duing SLP discovery (noticed with IFN_SCATTER_STORE). The following properly uses internal_store_fn_p. I also noticed we're unnecessarily handing those again to determine the scalar type but there should always be a data reference for them. * tree-vect-stmts.cc (vect_get_vector_types_for_stmt): Handle all internal_store_fn_p the same. Remove special-casing for the scalar_type of IFN_MASK_STORE.
2024-09-03i386: Support partial vectorized V2BF/V4BF smaxminLevy Hsu2-0/+55
This patch supports sminmax for partial vectorized V2BF/V4BF. gcc/ChangeLog: * config/i386/mmx.md (<code><mode>3): New define_expand for V2BF/V4BFsmaxmin gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-vector-smaxmin-1.c: New test.
2024-09-03i386: Support partial vectorized V2BF/V4BF plus/minus/mult/div/sqrtLevy Hsu3-0/+116
This patch introduces new mode iterators and expands for the i386 architecture to support partial vectorization of bf16 operations using AVX10.2 instructions. gcc/ChangeLog: * config/i386/mmx.md (VBF_32_64): New mode iterator for partial vectorized V2BF/V4BF. (<insn><mode>3): New define_expand for plusminusmultdiv. (sqrt<mode>2): New define_expand for sqrt. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-partial-bf-vector-fast-math-1.c: New test. * gcc.target/i386/avx10_2-partial-bf-vector-operations-1.c: New test.
2024-09-03RISC-V: Support form 1 of integer scalar .SAT_ADDPan Li14-0/+417
This patch would like to support the scalar signed ssadd pattern for the RISC-V backend. Aka Form 1: #define DEF_SAT_S_ADD_FMT_1(T, UT, MIN, MAX) \ T __attribute__((noinline)) \ sat_s_add_##T##_fmt_1 (T x, T y) \ { \ T sum = (UT)x + (UT)y; \ return (x ^ y) < 0 \ ? sum \ : (sum ^ x) >= 0 \ ? sum \ : x < 0 ? MIN : MAX; \ } DEF_SAT_S_ADD_FMT_1(int64_t, uint64_t, INT64_MIN, INT64_MAX) Before this patch: 10 │ sat_s_add_int64_t_fmt_1: 11 │ mv a5,a0 12 │ add a0,a0,a1 13 │ xor a1,a5,a1 14 │ not a1,a1 15 │ xor a4,a5,a0 16 │ and a1,a1,a4 17 │ blt a1,zero,.L5 18 │ ret 19 │ .L5: 20 │ srai a5,a5,63 21 │ li a0,-1 22 │ srli a0,a0,1 23 │ xor a0,a5,a0 24 │ ret After this patch: 10 │ sat_s_add_int64_t_fmt_1: 11 │ add a2,a0,a1 12 │ xor a1,a0,a1 13 │ xor a5,a0,a2 14 │ srli a5,a5,63 15 │ srli a1,a1,63 16 │ xori a1,a1,1 17 │ and a5,a5,a1 18 │ srai a4,a0,63 19 │ li a3,-1 20 │ srli a3,a3,1 21 │ xor a3,a3,a4 22 │ neg a4,a5 23 │ and a3,a3,a4 24 │ addi a5,a5,-1 25 │ and a0,a2,a5 26 │ or a0,a0,a3 27 │ ret The below test suites are passed for this patch: 1. The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv-protos.h (riscv_expand_ssadd): Add new func decl for expanding ssadd. * config/riscv/riscv.cc (riscv_gen_sign_max_cst): Add new func impl to gen the max int rtx. (riscv_expand_ssadd): Add new func impl to expand the ssadd. * config/riscv/riscv.md (ssadd<mode>3): Add new pattern for signed integer .SAT_ADD. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat_arith_data.h: Add test data. * gcc.target/riscv/sat_s_add-1.c: New test. * gcc.target/riscv/sat_s_add-2.c: New test. * gcc.target/riscv/sat_s_add-3.c: New test. * gcc.target/riscv/sat_s_add-4.c: New test. * gcc.target/riscv/sat_s_add-run-1.c: New test. * gcc.target/riscv/sat_s_add-run-2.c: New test. * gcc.target/riscv/sat_s_add-run-3.c: New test. * gcc.target/riscv/sat_s_add-run-4.c: New test. * gcc.target/riscv/scalar_sat_binary_run_xxx.h: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2024-09-03Daily bump.GCC Administrator16-1/+640
2024-09-03MIPS: Support vector reduc for MSAYunQiang Su5-0/+293
We have SHF.fmt and HADD_S/U.fmt with MSA, which can be used for vector reduc. For min/max for U8/S8, we can SHF.B W1, W0, 0xb1 # swap byte inner every half MIN.B W1, W1, W0 SHF.H W2, W1, 0xb1 # swap half inner every word MIN.B W2, W2, W1 SHF.W W3, W2, 0xb1 # swap word inner every doubleword MIN.B W4, W3, W2 SHF.W W4, W4, 0x4e # swap the two doubleword MIN.B W4, W4, W3 For plus of S8/U8, we can use HADD HADD.H W0, W0, W0 HADD.W W0, W0, W0 HADD.D W0, W0, W0 SHF.W W1, W0, 0x4e # swap the two doubleword ADDV.D W1, W1, W0 COPY_S.B T0, W1 # COPY_U.B for U8 We can do similar for S16/U16/S32/U32/S64/U64/FLOAT/DOUBLE. gcc * config/mips/mips-msa.md: (MSA_NO_HADD): we have HADD for S8/U8/S16/U16/S32/U32 only. (reduc_smin_scal_<mode>): New define pattern. (reduc_smax_scal_<mode>): Ditto. (reduc_umin_scal_<mode>): Ditto. (reduc_umax_scal_<mode>): Ditto. (reduc_plus_scal_<mode>): Ditto. (reduc_plus_scal_v4si): Ditto. (reduc_plus_scal_v8hi): Ditto. (reduc_plus_scal_v16qi): Ditto. (reduc_<optab>_scal_<mode>): Ditto. * config/mips/mips-protos.h: New function mips_expand_msa_reduc. * config/mips/mips.cc: New function mips_expand_msa_reduc. * config/mips/mips.md: Define any_bitwise iterator. gcc/testsuite: * gcc.target/mips/msa-reduc.c: New tests.
2024-09-02testsuite: Fix optimize_one.c FAIL on i686-linuxJakub Jelinek1-1/+1
The test FAILs on i686-linux because -mfpmath=sse is used without -msse2 being enabled. 2024-09-02 Jakub Jelinek <jakub@redhat.com> * gcc.target/i386/optimize_one.c: Add -msse2 to dg-options.
2024-09-02[libstdc++-v3] [testsuite] improve future/*/poll.cc calibrationAlexandre Oliva1-41/+70
30_threads/future/members/poll.cc has calibration code that, on systems with very low clock resolution, may spuriously fail to run. Even when it does run, low resolution and reasonable timeouts limit severely the viability of increasing the loop counts so as to reduce measurement noise, so we end up with very noisy results. On various vxworks targets, high iteration count (low-noise) measurements confirmed that some of the operations that we expected to be up to 100x slower than the fastest ones can run a little slower than that and, with significant noise, may seem to be even slower, comparatively. Bump the factors up to 200x, so that we have plenty of margin over measured results. for libstdc++-v3/ChangeLog * testsuite/30_threads/future/members/poll.cc: Factor out calibration, and run it unconditionally. Lower its strictness. Bump wait_until_*'s slowness factor.
2024-09-02[libstdc++] [testsuite] avoid async.cc loss of precision [PR91486]Alexandre Oliva1-3/+16
When we get to test_pr91486_wait_until(), we're about 10s past the float_steady_clock epoch. This is enough for the 1s delta for the timeout to come out slightly lower when the futex-less wait_until converts the deadline from float_steady_clock to __clock_t. So we may wake up a little too early, and end up looping one extra time to sleep for e.g. another 954ns until we hit the deadline. Each iteration calls float_steady_clock::now(), bumping the call_count that we VERIFY() at the end of the subtest. Since we expect at most 3 calls, and we're going to have at the very least 3 on futex-less targets (one in the test proper, one before wait_until_impl to compute the deadline, and one after wait_until_impl to check whether the deadline was hit), any such imprecision that causes an extra iteration will reach 5 and cause the test to fail. Initializing the epoch in the beginning of the test makes such spurious fails due to loss of precision far less likely. I don't suppose allowing for an extra couple of calls would be desirable. While at that, I'm annotating unused status variables as such. for libstdc++-v3/ChangeLog PR libstdc++/91486 * testsuite/30_threads/async/async.cc (test_pr91486_wait_for): Mark status as unused. (test_pr91486_wait_until): Likewise. Initialize epoch later.
2024-09-02[testsuite] add linkonly to dg-additional-sources [PR115295]Alexandre Oliva8-21/+39
The D testsuite shows it was a mistake to assume that dg-additional-sources are never to be used for compilation tests. Even if an output file is specified for compilation, extra module files can be named and used in the compilation without being flagged as errors. Introduce a 'linkonly' flag for dg-additional-sources, and use it in pr95401.cc and other vector tests that default to run, so that its additional sources get discarded when vector tests downgrade to compile-only. This reverts previous workarounds for this very circumstance, that relied on being able to run vector tests anyway, even after failing to detect runtime or hardware vector support. for gcc/ChangeLog PR d/115295 * doc/sourcebuild.texi (dg-additional-sources): Add linkonly. for gcc/testsuite/ChangeLog PR d/115295 * g++.dg/vect/pr95401.cc: Add linkonly to dg-additional-sources. * g++.dg/vect/pr68762-1.cc: Likewise. * g++.dg/vect/simd-clone-3.cc: Likewise. * g++.dg/vect/simd-clone-5.cc: Likewise. * gcc.dg/vect/vect-simd-clone-10.c: Likewise. Drop dg-do run. * gcc.dg/vect/vect-simd-clone-12.c: Likewise. Likewise. * lib/gcc-defs.exp (additional_sources_omit_on_compile): New. (dg-additional-sources): Add to it on linkonly. (dg-additional-files-options): Omit select sources on compile.
2024-09-02amdgcn: Remove TARGET_GCN5_PLUSAndrew Stubbs5-135/+59
Now that GCN3 support is gone, TARGET_GCN5_PLUS always evaluates to true, so we can make that code unconditional, and remove all the "else" cases. The ISA features TARGET_GLOBAL_ADDRSPACE, TARGET_FLAT_OFFSETS, TARGET_EXPLICIT_CARRY, and TARGET_MULTIPLY_IMMEDIATE, are similarly also redundant and can be made unconditional. The naming of the "gcc_version" attribute has been confusing since the "rdna" attribute was added and this makes it worse, so it has been renamed to "cdna". The add-with-carry assembler mnemonics no longer have two forms, so '%^' can be removed. gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_GCN5_PLUS): Delete. (TARGET_GLOBAL_ADDRSPACE): Delete. (TARGET_FLAT_OFFSETS): Delete. (TARGET_EXPLICIT_CARRY): Delete. (TARGET_MULTIPLY_IMMEDIATE): Delete. * config/gcn/gcn-valu.md (*mov<mode>): Rename "gcn_version" to "cdna". (*mov<mode>_4reg): Likewise. (@mov<mode>_sgprbase): Likwise. (gather<mode>_insn_1offset<exec>): Likewise. (gather<mode>_insn_1offset_ds<exec>): Likewise. (gather<mode>_insn_2offsets<exec>): Likewise. (scatter<mode>_insn_1offset<exec_scatter>): Likewise. (scatter<mode>_insn_1offset_ds<exec_scatter>): Likewise. (scatter<mode>_insn_2offsets<exec_scatter>): Likewise. (gather<mode>_insn_1offset<exec>): Remove TARGET_FLAT_OFFSETS conditionals. (scatter<mode>_insn_1offset<exec_scatter>): Likewise. (scatter<mode>_insn_1offset<exec_scatter>): Likewise. (add<mode>3<exec_clobber>): Use "_co" instead of "%^". (add<mode>3_dup<exec_clobber>): Likewise. (add<mode>3_vcc<exec_vcc>): Likewise. (add<mode>3_vcc_dup<exec_vcc>): Likewise. (addc<mode>3<exec_vcc>): Likewise. (sub<mode>3<exec_clobber>): Likewise. (sub<mode>3_vcc<exec_vcc>): Likewise. (subc<mode>3<exec_vcc>): Likewise. (*plus_carry_dpp_shr_<mode>): Likewise. (*plus_carry_in_dpp_shr_<mode>): Likewise. * config/gcn/gcn.cc (gcn_flat_address_p): Remove TARGET_FLAT_OFFSETS conditionals. (gcn_addr_space_legitimate_address_p): Likewise. (gcn_addr_space_legitimize_address): Likewise. (gcn_expand_scalar_to_vector_address): Likewise. (print_operand_address): Likewise, and TARGET_GLOBAL_ADDRSPACE also. (print_operand): Remove "%^" operand code. Remove TARGET_GLOBAL_ADDRSPACE assertion. * config/gcn/gcn.h (STACK_ADDR_SPACE): Remove GCN5 conditional. * config/gcn/gcn.md (gcn_version): Rename attribute ... (cdna): ... to this, and remove the gcn3 and gcn5 values. (enabled): Replace old "gcn_version" logic with new "cdna" logic. (*mov<mode>_insn): Rename "gcn_version" to "cdna". (*movti_insn): Likewise. (addsi3): Use "_co" instead of "%^". (addsi3_scalar_carry): Likewise. (addsi3_scalar_carry_cst): Likewise. (addcsi3_scalar): Likewise. (addcsi3_scalar_zero): Likewise. (addptrdi3): Likewise. (subsi3): Likewise. (<su>mulsi3_highpart): Remove TARGET_MULTIPLY_IMMEDIATE conditions. (<su>mulsi3_highpart_reg): Remove "gcn_version" attribute. (muldi3): Likewise. (atomic_fetch_<bare_mnemonic><mode>): Likewise. (atomic_<bare_mnemonic><mode>): Likewise. (sync_compare_and_swap<mode>_insn): Likewise. (atomic_load<mode>): Likewise. (atomic_store<mode>): Likewise. (atomic_exchange<mode>): Likewise. (<su>mulsi3_highpart_imm): Remove both TARGET_MULTIPLY_IMMEDIATE and "gcn_version". (<su>mulsidi3): Likewise. (<su>mulsidi3_imm): Likewise.
2024-09-02amdgcn: Remove TARGET_GCN3Andrew Stubbs4-31/+7
The only GCN3 ISA device was remove (Fiji, gfx803) so all the GCN3-specific code and features can be removed from the back-end. gcc/ChangeLog: * config/gcn/gcn-opts.h (enum gcn_isa): Delete ISA_GCN3. (TARGET_GCN3): Delete. (TARGET_GCN3_PLUS): Delete. (TARGET_M0_LDS_LIMIT): Delete. * config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>): Remove TARGET_GCN3 from conditions. (*<reduc_op>_dpp_shr_<mode>): Likewise. * config/gcn/gcn.cc (enum gcn_isa): Change default to ISA_GCN5. (gcn_expand_prologue): Remove TARGET_M0_LDS_LIMIT feature. (gcn_expand_reduc_scalar): Remove TARGET_GCN3 conditions. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Remove TARGET_GCN3.
2024-09-02amdgcn: remove gfx803 "Fiji" supportAndrew Stubbs14-99/+19
The gfx803 "Fiji" device was deprecated in GCC 14, removed from LLVM 18, and hasn't worked properly with the drivers since about ROCm 4. This patch removes the device from GCC options and documentation, and removes the direct mentions from the internals. The TARGET_GCN3 support in the back-end is now unused and can be removed (in a follow-up patch). gcc/ChangeLog: * config.gcc (amdgcn-*-*): Remove "fiji" from with_arch checks. * config/gcn/gcn-hsa.h (ABI_VERSION_SPEC): Remove fiji alternative. (NO_XNACK): Likewise. (NO_SRAM_ECC): Likewise. (ASM_SPEC): Remove "%{}" around ABI_VERSION_SPEC. * config/gcn/gcn-opts.h (enum processor_type): Remove PROCESSOR_FIJI. (TARGET_FIJI): Delete. * config/gcn/gcn.cc (gcn_option_override): Remove Fiji. (gcn_omp_device_kind_arch_isa): Likewise. (output_file_start): Likewise. * config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS): Likewise. * config/gcn/gcn.opt (gpu_type): Likewise. (march, mtune): Change default to PROCESSOR_VEGA10. * config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX803): Delete. (copy_early_debug_info): Remove elf_flags_actual. Use ELFABIVERSION_AMDGPU_HSA_V4 unconditionally. (get_arch): Remove Fiji. (main): Remove gfx803. * config/gcn/t-omp-device (omp-device-properties-gcn): Remove fiji and gfx803. * doc/install.texi (amdgcn*-*-*): Remove fiji and special instructions. * doc/invoke.texi: Remove fiji. libgomp/ChangeLog: * libgomp.texi: Remove fiji and gfx803. * testsuite/libgomp.c/declare-variant-4.h: Remove fiji and gfx803. * testsuite/libgomp.c/declare-variant-4-fiji.c: Removed. * testsuite/libgomp.c/declare-variant-4-gfx803.c: Removed.
2024-09-02PR modula2/116557 Remove physical address from the GPL header commentGaius Mulley40-137/+116
This patch removes the physical address from all the header comments in the m2 subdirectory. The physical address is replaced with the text "You should have received a copy of the GNU General Public License along with GCC; see the file COPYING3. If not see <http://www.gnu.org/licenses/>." instead. gcc/m2/ChangeLog: PR modula2/116557 * gm2-lang.cc: Replace physical address with URL in GPL header. * gm2-lang.h: Ditto. * images/LICENSE.IMG: Ditto. * m2-tree.def: Ditto. * mc-boot/GIndexing.cc: Ditto. * mc-boot/Gkeyc.cc: Ditto. * mc-boot/Glists.cc: Ditto. * mc-boot/GmcComp.cc: Ditto. * mc-boot/GmcDebug.cc: Ditto. * mc-boot/GmcFileName.cc: Ditto. * mc-boot/GmcMetaError.cc: Ditto. * mc-boot/GmcOptions.cc: Ditto. * mc-boot/GmcPreprocess.cc: Ditto. * mc-boot/GmcPretty.cc: Ditto. * mc-boot/GmcPrintf.cc: Ditto. * mc-boot/GmcQuiet.cc: Ditto. * mc-boot/GmcReserved.cc: Ditto. * mc-boot/GmcSearch.cc: Ditto. * mc-boot/GmcStack.cc: Ditto. * mc/Indexing.mod: Ditto. * mc/keyc.mod: Ditto. * mc/lists.mod: Ditto. * mc/mcComp.mod: Ditto. * mc/mcDebug.mod: Ditto. * mc/mcFileName.mod: Ditto. * mc/mcMetaError.mod: Ditto. * mc/mcOptions.mod: Ditto. * mc/mcPreprocess.mod: Ditto. * mc/mcPretty.mod: Ditto. * mc/mcPrintf.mod: Ditto. * mc/mcQuiet.mod: Ditto. * mc/mcReserved.mod: Ditto. * mc/mcSearch.mod: Ditto. * mc/mcStack.mod: Ditto. * tools-src/buildpg: Ditto. * tools-src/calcpath: Ditto. * tools-src/checkmeta.py: Ditto. * tools-src/def2doc.py: Ditto. * tools-src/makeSystem: Ditto. * tools-src/tidydates.py: Ditto. Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
2024-09-02libsupc++: Fix handling of m68k extended real in <compare>Andreas Schwab1-1/+6
PR libstdc++/116513 * libsupc++/compare (_S_fp_bits) [__fmt == _M68k_80bit]: Shift padding out of exponent word.
2024-09-02testsuite: Rename scanltranstree.exp -> scanltrans.expAlex Coplan8-7/+7
Since r15-3254-g3f51f0dc88ec21c1ec79df694200f10ef85915f4 added scan-ltrans-rtl* variants to scanltranstree.exp, it no longer makes sense to have "tree" in the name. This renames the file accordingly and updates users. libatomic/ChangeLog: * testsuite/lib/libatomic.exp: Load scanltrans.exp instead of scanltranstree.exp. libgomp/ChangeLog: * testsuite/lib/libgomp.exp: Load scanltrans.exp instead of scanltranstree.exp. libitm/ChangeLog: * testsuite/lib/libitm.exp: Load scanltrans.exp instead of scanltranstree.exp. libphobos/ChangeLog: * testsuite/lib/libphobos-dg.exp: Load scanltrans.exp instead of scanltranstree.exp. libvtv/ChangeLog: * testsuite/lib/libvtv.exp: Load scanltrans.exp instead of scanltranstree.exp. gcc/testsuite/ChangeLog: * gcc.dg-selftests/dg-final.exp: Load scanltrans.exp instead of scanltranstree.exp. * lib/gcc-dg.exp: Likewise. * lib/scanltranstree.exp: Rename to ... * lib/scanltrans.exp: ... this.
2024-09-02Rename gimple_asm_input_p to gimple_asm_basic_pRichard Sandiford7-14/+24
Following on from the earlier tree rename, this patch renames gimple_asm_input_p to gimple_asm_basic_p, and similarly for related names. gcc/ * doc/gimple.texi (gimple_asm_basic_p): Document. (gimple_asm_set_basic): Likewise. * gimple.h (GF_ASM_INPUT): Rename to... (GF_ASM_BASIC): ...this. (gimple_asm_set_input): Rename to... (gimple_asm_set_basic): ...this. (gimple_asm_input_p): Rename to... (gimple_asm_basic_p): ...this. * cfgexpand.cc (expand_asm_stmt): Update after above renaming. * gimple.cc (gimple_asm_clobbers_memory_p): Likewise. * gimplify.cc (gimplify_asm_expr): Likewise. * ipa-icf-gimple.cc (func_checker::compare_gimple_asm): Likewise. * tree-cfg.cc (stmt_can_terminate_bb_p): Likewise.
2024-09-02Rename ASM_INPUT_P to ASM_BASIC_PRichard Sandiford11-19/+24
ASM_INPUT_P is so named because it causes the eventual rtl insn pattern to be a top-level ASM_INPUT rather than an ASM_OPERANDS. However, this name has caused confusion, partly due to earlier documentation. The name also sounds related to ASM_INPUTS but is for a different piece of state. This patch renames it to ASM_BASIC_P, with the inverse meaning an extended asm. ("Basic asm" is the term used in extend.texi.) gcc/ * doc/generic.texi (ASM_BASIC_P): Document. * tree.h (ASM_INPUT_P): Rename to... (ASM_BASIC_P): ...this. (ASM_VOLATILE_P, ASM_INLINE_P): Reindent. * gimplify.cc (gimplify_asm_expr): Update after above renaming. * tree-core.h (tree_base): Likewise. gcc/c/ * c-typeck.cc (build_asm_expr): Rename ASM_INPUT_P to ASM_BASIC_P. gcc/cp/ * pt.cc (tsubst_stmt): Rename ASM_INPUT_P to ASM_BASIC_P. * parser.cc (cp_parser_asm_definition): Likewise. gcc/d/ * toir.cc (IRVisitor): Rename ASM_INPUT_P to ASM_BASIC_P. gcc/jit/ * jit-playback.cc (playback::block::add_extended_asm): Rename ASM_INPUT_P to ASM_BASIC_P. gcc/m2/ * gm2-gcc/m2block.cc (flush_pending_note): Rename ASM_INPUT_P to ASM_BASIC_P. * gm2-gcc/m2statement.cc (m2statement_BuildAsm): Likewise.
2024-09-02lto/lto.cc: Fix build with not HAVE_WORKING_FORKTobias Burnus1-0/+2
gcc/lto/ChangeLog: * lto.cc: Add missing HAVE_WORKING_FORK.
2024-09-02lto-wrapper: Honor -save-temps for ltrans' makefileTobias Burnus1-1/+4
gcc/ChangeLog: * lto-wrapper.cc (run_gcc): Honor -save-temps for makefile name.
2024-09-02ada: Diagnose too large size clause on floating-point typeEric Botcazou1-0/+4
The problem is that the size clause changes the floating-point format used for the type, but it must not when this format is the widest format that is supported in hardware on the target. Instead a padding type must be built and the associated warning given. gcc/ada/ * gcc-interface/decl.cc (gnat_to_gnu_entity): Cap the Esize of a floating-point type to the size of the widest format supported in hardware if it is explicity defined.
2024-09-02ada: Create usage entry for -gnatw_lViljar Indus3-7/+9
gcc/ada/ * doc/gnat_ugn/building_executable_programs_with_gnat.rst: update documentation for the -gnatw_l switch. * usage.adb: Add -gnatw_l entry. * gnat_ugn.texi: Regenerate.
2024-09-02ada: Fix standard output stream for gnatcmd outputRonan Desplanques1-1/+4
Before this patch, the gnat command sent to standard error pieces of information that are a better match for standard output. This patch makes this information go to standard output. gcc/ada/ * gnatcmd.adb (GNATCmd): Fix standard output stream.
2024-09-02ada: Fix minor issues in -gnaty0's documentationRonan Desplanques2-7/+7
Before this patch, the documentation of -gnaty0 used 0-based indexing for column numbers while 1-based indexing is used everywhere else. This patch makes this documentation use 1-based indexing, and also adds a missing parenthesis. gcc/ada/ * doc/gnat_ugn/building_executable_programs_with_gnat.rst: Fix minor issues. * gnat_ugn.texi: Regenerate.
2024-09-02ada: Documentation for generic type inferenceBob Duff2-35/+202
...plus minor improvements to existing documentation. gcc/ada/ * doc/gnat_rm/gnat_language_extensions.rst: I assume "extended set of extensions" was a typo for "experimental set of extensions", because "extended extensions" is repetitive and redundant. "in addition" clarifies that the one subsumes the other. Add a reminder at the start of each subsection about what switch/pragma enables what extensions. Add new section about "Inference of Dependent Types in Generic Instantiations". * gnat_rm.texi: Regenerate.
2024-09-02ada: Small fixes for FreeBSDPatrick Bernardi2-5/+9
Size of pthread data types now need to be defined for FreeBSD ports. Traceback support for AArch64 FreeBSD is now defined. gcc/ada/ * s-oscons-tmplt.c: Define sizes of pthread data types on FreeBSD. * tracebak.c: Use GCC unwinder and adjust PC appropriately on aarch64-freebsd.
2024-09-02ada: Also reset scope for some nested declarationMarc Poulhiès1-1/+24
When changing the scope for entities found in the entry body that is mutated into a procedure, the compiler needs to look deeper than only the top level entities as expansion may produce object declarations which scopes are also the entry. For example, the tree after expansion may look like: procedure This_Is_An_Entry_Proc is ... O1 : Typ := do TMP1 : OTyp := ...; ... in TMP1; O1's scope needs to be reset to This_Is_An_Entry_Proc, but so does TMP1's scope. This change also fix a small oversight where N_Implicit_Label_Declaration scope must be reset and its content skipped. gcc/ada/ * exp_ch9.adb (Reset_Scopes_To): Adjust comment. (Reset_Scopes_To.Reset_Scope): Adjust the scope reset for object declaration. In particular, visit the children nodes if any. Also extend the handling of other declarations to N_Implicit_Label_Declaration.
2024-09-02ada: Cleanup expansion of object declarationsPiotr Trojanek1-7/+3
Replace repeated calls to Sloc with uses of local constant Loc. Code cleanup; behavior is unaffected. gcc/ada/ * exp_ch3.adb (Expand_N_Object_Declaration): Replace calls to Sloc with uses of Loc; turn variable Prag into constant.
2024-09-02ada: Remove repeated guards in validity checksPiotr Trojanek1-6/+2
Routine Insert_Valid_Check only applies checks when Expr_Known_Valid query returns False; there is no need to call this query before inserting checks. Code cleanup; behavior is unaffected. gcc/ada/ * exp_imgv.adb (Expand_User_Defined_Enumeration_Image) (Expand_Image_Attribute): Remove redundant guards.
2024-09-02ranger: Fix up range computation for CLZ [PR116486]Jakub Jelinek2-2/+29
The initial CLZ gimple-range-op.cc implementation handled just the case where second argument to .CLZ is equal to prec, but in r15-1014 I've added also handling of the -1 case. As the following testcase shows, incorrectly though for the case where the first argument has [0,0] range. If the second argument is prec, then the result should be [prec,prec] and that was handled correctly, but when the second argument is -1, the result should be [-1,-1] but instead it was incorrectly computed as [prec-1,prec-1] (when second argument is prec, mini is 0 and maxi is prec, while when second argument is -1, mini is -1 and maxi is prec-1). Fixed thusly (the actual handling is then similar to the CTZ [0,0] case). 2024-09-02 Jakub Jelinek <jakub@redhat.com> PR middle-end/116486 * gimple-range-op.cc (cfn_clz::fold_range): If lh is [0,0] and mini is -1, return [-1,-1] range rather than [prec-1,prec-1]. * gcc.dg/bitint-109.c: New test.
2024-09-02load and store-lanes with SLPRichard Biener31-172/+458
The following is a prototype for how to represent load/store-lanes within SLP. I've for now settled with having a single load node with multiple permute nodes acting as selection, one for each loaded lane and a single store node fed from all stored lanes. For for (int i = 0; i < 1024; ++i) { a[2*i] = b[2*i] + 7; a[2*i+1] = b[2*i+1] * 3; } you have the following SLP graph where I explain how things are set up and code-generated: t.c:23:21: note: SLP graph after lowering permutations: t.c:23:21: note: node 0x50dc8b0 (max_nunits=1, refcnt=1) vector(4) int t.c:23:21: note: op template: *_6 = _7; t.c:23:21: note: stmt 0 *_6 = _7; t.c:23:21: note: stmt 1 *_12 = _13; t.c:23:21: note: children 0x50dc488 0x50dc6e8 This is the store node, it's marked with ldst_lanes = true during SLP discovery. This node code-generates vect_array.65[0] = vect__7.61_29; vect_array.65[1] = vect__13.62_28; MEM <int[8]> [(int *)vectp_a.63_27] = .STORE_LANES (vect_array.65); ... t.c:23:21: note: node 0x50dc520 (max_nunits=4, refcnt=2) vector(4) int t.c:23:21: note: op: VEC_PERM_EXPR t.c:23:21: note: stmt 0 _5 = *_4; t.c:23:21: note: lane permutation { 0[0] } t.c:23:21: note: children 0x50dc948 t.c:23:21: note: node 0x50dc780 (max_nunits=4, refcnt=2) vector(4) int t.c:23:21: note: op: VEC_PERM_EXPR t.c:23:21: note: stmt 0 _11 = *_10; t.c:23:21: note: lane permutation { 0[1] } t.c:23:21: note: children 0x50dc948 These are the selection nodes, marked with ldst_lanes = true. They code generate nothing. t.c:23:21: note: node 0x50dc948 (max_nunits=4, refcnt=3) vector(4) int t.c:23:21: note: op template: _5 = *_4; t.c:23:21: note: stmt 0 _5 = *_4; t.c:23:21: note: stmt 1 _11 = *_10; t.c:23:21: note: load permutation { 0 1 } This is the load node, marked with ldst_lanes = true (the load permutation is only accurate when taking into account the lane permute in the selection nodes). It code generates vect_array.58 = .LOAD_LANES (MEM <int[8]> [(int *)vectp_b.56_33]); vect__5.59_31 = vect_array.58[0]; vect__5.60_30 = vect_array.58[1]; This scheme allows to leave code generation in vectorizable_load/store mostly as-is. While this should support both load-lanes and (masked) store-lanes the decision to do either is done during SLP discovery time and cannot be reversed without altering the SLP tree - as-is the SLP tree is not usable for non-store-lanes on the store side, the load side is OK representation-wise but will very likely fail permute handling as the lowering to deal with the two input vector restriction isn't done - but of course since the permute node is marked as to be ignored that doesn't work out. So I've put restrictions in place that fail vectorization if a load/store-lane SLP tree is later classified differently by get_load_store_type. I'll note that for example gcc.target/aarch64/sve/mask_struct_store_3.c will not get SLP store-lanes used because the full store SLPs just fine though we then fail to handle the "splat" load-permutation t2.c:5:21: note: node 0x4db2630 (max_nunits=4, refcnt=2) vector([4,4]) int t2.c:5:21: note: op template: _6 = *_5; t2.c:5:21: note: stmt 0 _6 = *_5; t2.c:5:21: note: stmt 1 _6 = *_5; t2.c:5:21: note: stmt 2 _6 = *_5; t2.c:5:21: note: stmt 3 _6 = *_5; t2.c:5:21: note: load permutation { 0 0 0 0 } the load permute lowering code currently doesn't consider it worth lowering single loads from a group (or in this case not grouped loads). The expectation is the target can handle this by two interleaves with itself. So what we see here is that while the explicit SLP representation is helpful in some cases, in cases like this it would require changing it when we make decisions how to vectorize. My idea is that this all will change a lot when we re-do SLP discovery (for loops) and when we get rid of non-SLP as I think vectorizable_* should be allowed to alter the SLP graph during analysis. The patch also removes the code cancelling SLP if we can use load/store-lanes from the main loop vector analysis code and re-implements it as re-discovering the SLP instance with forced single-lane splits so SLP load/store-lanes scheme can be used. This is now done after SLP discovery and SLP pattern recog are complete to not disturb the latter but per SLP instance instead of being a global decision on the whole loop. This is a behavioral change that for example shows in gcc.dg/vect/slp-perm-6.c on ARM where we formerly used SLP permutes but now a mix of SLP without permutes and load/store lanes. The previous flaky heuristic is now flaky in a different way. Testing on RISC-V and aarch64 reveal several testcases that require adjustment as to now expect SLP even when load/store lanes are being used. If in doubt I've adjusted them to the final expectation which will lead to one or two new FAILs where we still do the SLP cancelling. I have a followup that implements that while remaining in SLP that's in final testing. Note that gcc.dg/vect/slp-42.c and gcc.dg/vect/pr68445.c will FAIL on aarch64 with SVE because for some odd reason vect_stridedN is true for any N for check_effective_target_vect_fully_masked targets but SVE cannot do ld8 while risc-v can. I have not bothered to adjust target tests that now fail assembly-scan. * tree-vectorizer.h (_slp_tree::ldst_lanes): New flag to mark load, store and permute nodes. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize ldst_lanes. (vect_build_slp_instance): For stores iff the target prefers store-lanes discover single-lane sub-groups, do not perform interleaving lowering but mark the node with ldst_lanes. Also allow i == 0 - fatal failure - for splitting up a store group when we're not doing single-lane discovery already. (vect_lower_load_permutations): When the target supports load lanes and the loads all fit the pattern split out a single level of permutes only and mark the load and permute nodes with ldst_lanes. (vectorizable_slp_permutation_1): Handle the load-lane permute forwarding of vector defs. (vect_analyze_slp): After SLP pattern recog is finished see if there are any SLP instances that would benefit from using load/store-lanes and re-discover those with forced single lanes. * tree-vect-stmts.cc (get_group_load_store_type): Support load/store-lanes for SLP. (vectorizable_store): Support SLP code generation for store-lanes. (vectorizable_load): Support SLP code generation for load-lanes. * tree-vect-loop.cc (vect_analyze_loop_2): Do not cancel SLP when store-lanes can be used. * gcc.dg/vect/slp-55.c: New testcase. * gcc.dg/vect/slp-56.c: Likewise. * gcc.dg/vect/slp-11c.c: Adjust. * gcc.dg/vect/slp-53.c: Likewise. * gcc.dg/vect/slp-cond-1.c: Likewise. * gcc.dg/vect/vect-complex-5.c: Likewise. * gcc.dg/vect/slp-1.c: Likewise. * gcc.dg/vect/slp-54.c: Remove riscv XFAIL. * gcc.dg/vect/slp-perm-5.c: Adjust. * gcc.dg/vect/slp-perm-7.c: Likewise. * gcc.dg/vect/slp-perm-8.c: Likewise. * gcc.dg/vect/slp-multitypes-11.c: Likewise. * gcc.dg/vect/slp-multitypes-11-big-array.c: Likewise. * gcc.dg/vect/slp-perm-9.c: Remove expected SLP fail due to three-vector permute. * gcc.dg/vect/slp-perm-6.c: Remove XFAIL. * gcc.dg/vect/slp-perm-1.c: Adjust. * gcc.dg/vect/slp-perm-2.c: Likewise. * gcc.dg/vect/slp-perm-3.c: Likewise. * gcc.dg/vect/slp-perm-4.c: Likewise. * gcc.dg/vect/pr68445.c: Likewise. * gcc.dg/vect/slp-11b.c: Likewise. * gcc.dg/vect/slp-2.c: Likewise. * gcc.dg/vect/slp-23.c: Likewise. * gcc.dg/vect/slp-33.c: Likewise. * gcc.dg/vect/slp-42.c: Likewise. * gcc.dg/vect/slp-46.c: Likewise. * gcc.dg/vect/slp-perm-10.c: Likewise.
2024-09-02lower SLP load permutation to interleavingRichard Biener5-4/+378
The following emulates classical interleaving for SLP load permutes that we are unlikely handling natively. This is to handle cases where interleaving (or load/store-lanes) is the optimal choice for vectorizing even when we are doing that within SLP. An example would be void foo (int * __restrict a, int * b) { for (int i = 0; i < 16; ++i) { a[4*i + 0] = b[4*i + 0] * 3; a[4*i + 1] = b[4*i + 1] + 3; a[4*i + 2] = (b[4*i + 2] * 3 + 3); a[4*i + 3] = b[4*i + 3] * 3; } } where currently the SLP store is merging four single-lane SLP sub-graphs but none of the loads in it can be code-generated with V4SImode vectors and a VF of four as the permutes would need three vectors. The patch introduces a lowering phase after SLP discovery but before SLP pattern recognition or permute optimization that analyzes all loads from the same dataref group and creates an interleaving scheme starting from an unpermuted load. What can be handled is power-of-two group size and a group size of three. The possibility for doing the interleaving with a load-lanes like instruction is done as followup. For a group-size of three this is done by using the non-interleaving fallback code which then creates at VF == 4 from { { a0, b0, c0 }, { a1, b1, c1 }, { a2, b2, c2 }, { a3, b3, c3 } } the intermediate vectors { c0, c0, c1, c1 } and { c2, c2, c3, c3 } to produce { c0, c1, c2, c3 }. This turns out to be more effective than the scheme implemented for non-SLP for SSE and only slightly worse for AVX512 and a bit more worse for AVX2. It seems to me that this would extend to other non-power-of-two group-sizes though (but the patch does not). Optimal schemes are likely difficult to lay out in VF agnostic form. I'll note that while the lowering assumes even/odd extract is generally available for all vector element sizes (which is probably a good assumption), it doesn't in any way constrain the other permutes it generates based on target availability. Again difficult to do in a VF agnostic way (but at least currently the vector type is fixed). I'll also note that the SLP store side merges lanes in a way producing three-vector permutes for store group-size of three, so the testcase uses a store group-size of four. The patch has a fallback for when there are multi-lane groups and the resulting permutes to not fit interleaving. Code generation is not optimal when this triggers and might be worse than doing single-lane group interleaving. The patch handles gaps by representing them with NULL entries in SLP_TREE_SCALAR_STMTS for the unpermuted load node. The SLP discovery changes could be elided if we manually build the load node instead. SLP load nodes covering enough lanes to not need intermediate permutes are retained as having a load-permutation and do not use the single SLP load node for each dataref group. That's something we might want to change, making load-permutation something purely local to SLP discovery (but then SLP discovery could do part of the lowering). The patch misses CSEing intermediate generated permutes and registering them with the bst_map which is possibly required for SLP pattern detection in some cases - this re-spin of the patch moves the lowering after SLP pattern detection. * tree-vect-slp.cc (vect_build_slp_tree_1): Handle NULL stmt. (vect_build_slp_tree_2): Likewise. Release load permutation when there's a NULL in SLP_TREE_SCALAR_STMTS and assert there's no actual permutation in that case. (vllp_cmp): New function. (vect_lower_load_permutations): Likewise. (vect_analyze_slp): Call it. * gcc.dg/vect/slp-11a.c: Expect SLP. * gcc.dg/vect/slp-12a.c: Likewise. * gcc.dg/vect/slp-51.c: New testcase. * gcc.dg/vect/slp-52.c: New testcase.
2024-09-01[PATCH] RISC-V: Optimize the cost of the DFmode register move for RV32.Xianmiao Qu2-0/+18
Currently, in RV32, even with the D extension enabled, the cost of DFmode register moves is still set to 'COSTS_N_INSNS (2)'. This results in the 'lower-subreg' pass splitting DFmode register moves into two SImode SUBREG register moves, leading to the generation of many redundant instructions. As an example, consider the following test case: double foo (int t, double a, double b) { if (t > 0) return a; else return b; } When compiling with -march=rv32imafdc -mabi=ilp32d, the following code is generated: .cfi_startproc addi sp,sp,-32 .cfi_def_cfa_offset 32 fsd fa0,8(sp) fsd fa1,16(sp) lw a4,8(sp) lw a5,12(sp) lw a2,16(sp) lw a3,20(sp) bgt a0,zero,.L1 mv a4,a2 mv a5,a3 .L1: sw a4,24(sp) sw a5,28(sp) fld fa0,24(sp) addi sp,sp,32 .cfi_def_cfa_offset 0 jr ra .cfi_endproc After adjust the DFmode register move's cost to 'COSTS_N_INSNS (1)', the generated code is as follows, with a significant reduction in the number of instructions. .cfi_startproc ble a0,zero,.L5 ret .L5: fmv.d fa0,fa1 ret .cfi_endproc gcc/ * config/riscv/riscv.cc (riscv_rtx_costs): Optimize the cost of the DFmode register move for RV32. gcc/testsuite/ * gcc.target/riscv/rv32-movdf-cost.c: New test.
2024-09-01[committed][PR rtl-optimization/116544] Fix test for promoted subregsJeff Law2-1/+23
This is a small bug in the ext-dce code's handling of promoted subregs. Essentially when we see a promoted subreg we need to make additional bit groups live as various parts of the RTL path know that an extension of a suitably promoted subreg can be trivially eliminated. When I added support for dealing with this quirk I failed to account for the larger modes properly and it ignored the case when the size of the inner object was > 32 bits. Opps. This does _not_ fix the outstanding x86 issue. That's caused by something completely different and more concerning ;( Bootstrapped and regression tested on x86. Obviously fixes the testcase on riscv as well. Pushing to the trunk. PR rtl-optimization/116544 gcc/ * ext-dce.cc (ext_dce_process_uses): Fix thinko in promoted subreg handling. gcc/testsuite/ * gcc.dg/torture/pr116544.c: New test.
2024-09-02i386: Support vec_cmp for V8BF/V16BF/V32BF in AVX10.2Levy Hsu4-0/+63
gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_use_mask_cmp_p): Add BFmode for int mask cmp. * config/i386/sse.md (vec_cmp<mode><avx512fmaskmodelower>): New vec_cmp expand for VBF modes. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-cmpp-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-cmpp-1.c: Ditto.
2024-09-02i386: Support vectorized BF16 sqrt with AVX10.2 instructionLevy Hsu1-5/+8
gcc/ChangeLog: * config/i386/sse.md: Expand VF2H to VF2HB with VBF modes.
2024-09-02i386: Support vectorized BF16 smaxmin with AVX10.2 instructionsLevy Hsu3-0/+63
gcc/ChangeLog: * config/i386/sse.md (<code><mode>3): New define expand pattern for BF smaxmin. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-smaxmin-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-smaxmin-1.c: New test.
2024-09-02i386: Support vectorized BF16 FMA with AVX10.2 instructionsLevy Hsu3-1/+101
gcc/ChangeLog: * config/i386/sse.md: Add V8BF/V16BF/V32BF to mode iterator FMAMODEM. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-fma-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-fma-1.c: New test.
2024-09-02i386: Support vectorized BF16 add/sub/mul/div with AVX10.2 instructionsLevy Hsu3-8/+162
AVX10.2 introduces several non-exception instructions for BF16 vector. Enable vectorized BF add/sub/mul/div operation by supporting standard optab for them. gcc/ChangeLog: * config/i386/sse.md (div<mode>3): New expander for BFmode div. (VF_BHSD): New mode iterator with vector BFmodes. (<insn><mode>3<mask_name><round_name>): Change mode to VF_BHSD. (mul<mode>3<mask_name><round_name>): Likewise. gcc/testsuite/ChangeLog: * gcc.target/i386/avx10_2-512-bf-vector-operations-1.c: New test. * gcc.target/i386/avx10_2-bf-vector-operations-1.c: Ditto.