aboutsummaryrefslogtreecommitdiff
path: root/gcc/config
AgeCommit message (Collapse)AuthorFilesLines
2025-07-28gcn: Add more s_nop for MI300Tobias Burnus3-38/+55
Implement another case where the CDNA3 ISA documentation requires s_nop, add a comment why another case does not need to be handled. And add one case where an s_nop is required by MI300A hardware but seems to be not mentioned in the CDNA3 ISA documentation. gcc/ChangeLog: * config/gcn/gcn.md (define_attr "vcmp"): Add with values vcmp/vcmpx/no. (*movbi, cstoredi4.., cstore<mode>4): Set it. * config/gcn/gcn-valu.md (vec_cmp<mode>...): Likewise. * config/gcn/gcn.cc (gcn_cmpx_insn_p): Remove. (gcn_md_reorg): Add two new conditions for MI300.
2025-07-28gcn: Add 'nops' insn, extend commentsTobias Burnus3-2/+18
Use 's_nops' with a number instead of multiple of 's_nop' when manually adding 1 to 5 wait state. This helps with the instruction cache and helps a tiny bit with PR119367 where a two-byte variable overflows in the debugging location view handling. Add a comment about 'sc0' to TARGET_GLC_NAME as for atomics it is unrelated to the scope but to whether the result is stored; i.e. using e.g. 'sc1' instead of 'sc0' will have undesired consequences! Update the comment above print_operand_address to document 'R' and 'V'; those are used below as "Temporary hack.", but it makes sense to see them in the list. gcc/ChangeLog: * config/gcn/gcn-opts.h (enum hsaco_attr_type): Add comment about 'sc0'. * config/gcn/gcn.cc (gcn_md_reorg): Use gen_nops instead of gen_nop. (print_operand_address): Document 'R' and 'V' in the pre-function comment as well. * config/gcn/gcn.md (nops): Add.
2025-07-28Move STMT_VINFO_TYPE to SLP_TREE_TYPERichard Biener4-30/+24
I am at a point where I want to store additional information from analysis (from loads and stores) to re-use them at transform stage without repeating the analysis. I do not want to add to stmt_vec_info at this point, so this starts adding kind specific sub-structures by moving the STMT_VINFO_TYPE field to the SLP tree and adding a (dummy for now) union tagged by it to receive such data. The change is largely mechanical after RISC-V has been prepared to have a SLP node around. I have settled for a union (supposed to get pointers to data). As followup this enables getting rid of SLP_TREE_CODE and making VEC_PERM therein a separate type, unifying its handling. * tree-vectorizer.h (_slp_tree::type): Add. (_slp_tree::u): Likewise. (_stmt_vec_info::type): Remove. (STMT_VINFO_TYPE): Likewise. (SLP_TREE_TYPE): New. * tree-vectorizer.cc (vec_info::new_stmt_vec_info): Do not initialize type. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize type. (vect_slp_analyze_node_operations): Adjust. (vect_schedule_slp_node): Likewise. * tree-vect-patterns.cc (vect_init_pattern_stmt): Do not copy STMT_VINFO_TYPE. * tree-vect-loop.cc: Set SLP_TREE_TYPE instead of STMT_VINFO_TYPE everywhere. (vect_create_loop_vinfo): Do not set STMT_VINFO_TYPE on loop conditions. * tree-vect-stmts.cc: Set SLP_TREE_TYPE instead of STMT_VINFO_TYPE everywhere. (vect_analyze_stmt): Adjust. (vect_transform_stmt): Likewise. * config/aarch64/aarch64.cc (aarch64_vector_costs::count_ops): Access SLP_TREE_TYPE instead of STMT_VINFO_TYPE. * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Remove non-SLP element-wise load/store matching. * config/rs6000/rs6000.cc (rs6000_cost_data::update_target_cost_per_stmt): Pass in the SLP node. Use that to get at the memory access kind and type. (rs6000_cost_data::add_stmt_cost): Pass down SLP node. * config/riscv/riscv-vector-costs.cc (variable_vectorized_p): Use SLP_TREE_TYPE. (costs::need_additional_vector_vars_p): Likewise. (costs::update_local_live_ranges): Likewise.
2025-07-28aarch64: Add tuning model for Olympus core.Jennifer Schmitz3-1/+212
This patch adds a new tuning model for the NVIDIA Olympus core. The values used here are based on the Software Optimization Guide that will be published imminently. Bootstrapped and tested on aarch64-linux-gnu, no regression. OK for trunk? OK to backport to GCC 15? Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com> Co-Authored-By: Dhruv Chawla <dhruvc@nvidia.com> gcc/ChangeLog: * config/aarch64/aarch64-cores.def (olympus): Use olympus tuning model. * config/aarch64/aarch64.cc: Include olympus.h. * config/aarch64/tuning_models/olympus.h: New file.
2025-07-28LoongArch: Remove the definition of CASE_VECTOR_SHORTEN_MODE.Lulu Cheng1-2/+0
On LoongArch, the switch jump-table always stores absolute addresses, so there is no need to define the macro CASE_VECTOR_SHORTEN_MODE. gcc/ChangeLog: * config/loongarch/loongarch.h (CASE_VECTOR_SHORTEN_MODE): Delete.
2025-07-27xtensa: Fix remaining inaccuracies in xtensa_is_insn_L32R_p()Takayuki 'January June' Suwa1-11/+35
The previous fix also had some flaws: - The TARGET_CONST16 check was a bit premature - It didn't take into account the possibility of the RTL expression "(set (reg:SF gpr) (const_int))", especially when TARGET_AUTOLITPOOLS is configured This patch fixes the above. gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_is_insn_L32R_p): Re-rewrite to more accurately capture insns that could be L32R machine instructions wherever possible, and add comments that help understand the intent of the process.
2025-07-27RISC-V: Combine vec_duplicate + vaadd.vv to vaadd.vx on GR2VR costPan Li3-4/+7
This patch would like to combine the vec_duplicate + vaadd.vv to the vaadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_AVG_FLOOR(NT, WT) \ NT \ test_##NT##_avg_floor(NT x, NT y) \ { \ return (NT)(((WT)x + (WT)y) >> 1); \ } #define AVG_FLOOR_FUNC(T) test_##T##_avg_floor DEF_AVG_FLOOR(int32_t, int64_t) DEF_VX_BINARY_CASE_2_WRAP(T, AVG_FLOOR_FUNC(T), avg_floor) Before this patch: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vaadd.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vaadd.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vxrm_vec_vec_dup): Add new case UNSPEC_VAADD. (expand_vx_binary_vxrm_vec_dup_vec): Ditto. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new case UNSPEC_VAADD to iterator. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-26RISC-V: riscv-ext.def: Add allocated group IDs and group bit positionsChristoph Müllner1-15/+15
The riscv-c-api-doc defines a group ID and and a bit position for some extension. Most of them are set in riscv-ext.def, but some are missing and one bit position (for Zilsd) is wrong. This patch replaces the `BITMASK_NOT_YET_ALLOCATED` value for the actual allocated value wherever possible and fixes the bit position for Zilsd. Currently, we don't have any infrastructure to utilize the information that is placed into riscv_ext_info_t::m_bitmask_group_id and riscv_ext_info_t::m_bitmask_group_bit_pos. This also means we can't test. gcc/ChangeLog: * config/riscv/riscv-ext.def: Add allocated group IDs and group bit positions. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2025-07-25diagnostics: convert diagnostic_t to enum class diagnostics::kindDavid Malcolm3-7/+12
No functional change intended. gcc/ChangeLog: * Makefile.in: Replace diagnostic.def with diagnostics/kinds.def. * config/aarch64/aarch64.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * config/i386/i386-options.cc: Likewise. * config/s390/s390.cc: Likewise. * diagnostic-core.h: Replace typedef diagnostic_t with enum class diagnostics::kind in diagnostics/kinds.h and include it. * diagnostic-global-context.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * diagnostic.cc: Likewise. * diagnostic.h: Likewise. * diagnostics/buffering.cc: Likewise. * diagnostics/buffering.h: Likewise. * diagnostics/context.h: Likewise. * diagnostics/diagnostic-info.h: Likewise. * diagnostics/html-sink.cc: Likewise. * diagnostic.def: Move to... * diagnostics/kinds.def: ...here and update for diagnostic_t becoming enum class diagnostics::kind. * diagnostics/kinds.h: New file, based on material in diagnostic-core.h. * diagnostics/lazy-paths.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * diagnostics/option-classifier.cc: Likewise. * diagnostics/option-classifier.h: Likewise. * diagnostics/output-spec.h: Likewise. * diagnostics/paths-output.cc: Likewise. * diagnostics/sarif-sink.cc: Likewise. * diagnostics/selftest-context.cc: Likewise. * diagnostics/selftest-context.h: Likewise. * diagnostics/sink.h: Likewise. * diagnostics/source-printing.cc: Likewise. * diagnostics/text-sink.cc: Likewise. * diagnostics/text-sink.h: Likewise. * gcc.cc: Likewise. * libgdiagnostics.cc: Likewise. * lto-wrapper.cc: Likewise. * opts-common.cc: Likewise. * opts-diagnostic.h: Likewise. * opts.cc: Likewise. * rtl-error.cc: Likewise. * substring-locations.cc: Likewise. * toplev.cc: Likewise. gcc/ada/ChangeLog: * gcc-interface/trans.cc: Update for diagnostic_t becoming enum class diagnostics::kind. gcc/analyzer/ChangeLog: * pending-diagnostic.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * program-point.cc: Likewise. gcc/c-family/ChangeLog: * c-common.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * c-format.cc: Likewise. * c-lex.cc: Likewise. * c-opts.cc: Likewise. * c-pragma.cc: Likewise. * c-warn.cc: Likewise. gcc/c/ChangeLog: * c-errors.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * c-parser.cc: Likewise. * c-typeck.cc: Likewise. gcc/cobol/ChangeLog: * util.cc: Update for diagnostic_t becoming enum class diagnostics::kind. gcc/cp/ChangeLog: * call.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * constexpr.cc: Likewise. * cp-tree.h: Likewise. * decl.cc: Likewise. * error.cc: Likewise. * init.cc: Likewise. * method.cc: Likewise. * module.cc: Likewise. * parser.cc: Likewise. * pt.cc: Likewise. * semantics.cc: Likewise. * typeck.cc: Likewise. * typeck2.cc: Likewise. gcc/d/ChangeLog: * d-diagnostic.cc: Update for diagnostic_t becoming enum class diagnostics::kind. gcc/fortran/ChangeLog: * cpp.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * error.cc: Likewise. * options.cc: Likewise. gcc/jit/ChangeLog: * dummy-frontend.cc: Update for diagnostic_t becoming enum class diagnostics::kind. gcc/m2/ChangeLog: * gm2-gcc/m2linemap.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * gm2-gcc/rtegraph.cc: Likewise. gcc/rust/ChangeLog: * backend/rust-tree.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * backend/rust-tree.h: Likewise. * resolve/rust-ast-resolve-expr.cc: Likewise. * resolve/rust-ice-finalizer.cc: Likewise. * resolve/rust-ice-finalizer.h: Likewise. * resolve/rust-late-name-resolver-2.0.cc: Likewise. gcc/testsuite/ChangeLog: * gcc.dg/plugin/diagnostic_plugin_test_show_locus.cc: Update for diagnostic_t becoming enum class diagnostics::kind. * gcc.dg/plugin/expensive_selftests_plugin.cc: Likewise. * gcc.dg/plugin/location_overflow_plugin.cc: Likewise. * lib/gcc-dg.exp: Likewise. libcpp/ChangeLog: * internal.h: Update comment for diagnostic_t becoming enum class diagnostics::kind. Signed-off-by: David Malcolm <dmalcolm@redhat.com>
2025-07-25RISC-V: Prepare dynamic LMUL heuristic for SLP.Robin Dapp2-25/+62
This patch prepares the dynamic LMUL vector costing to use the coming SLP_TREE_TYPE instead of the (to-be-removed) STMT_VINFO_TYPE. Even though the whole approach should be reviewed and adjusted at some point, the patch chooses the path of least resistance and uses a hash map for the stmt_info -> slp node relationship. A node is mapped to the accompanying stmt_info during add_stmt_cost. In finish_cost we go through all statements as before, and obtain the corresponding slp nodes as well as their types. This allows us to operate largely as before. We don't yet do the switch over from STMT_VINFO_TYPE to SLP_TREE_TYPE, though but only take care of the necessary refactoring upfront. Regtested on rv64gcv_zvl512b with -mrvv-max-lmul=dynamic. There are a few regressions but nothing worse than what we already have. I'd rather accept these now and take it as an incentive to work on the heuristic later than block the SLP work until it is fixed. gcc/ChangeLog: * config/riscv/riscv-vector-costs.cc (get_live_range): Move compute_local_program_points to cost class. (variable_vectorized_p): Add slp node parameter. (need_additional_vector_vars_p): Move from here... (costs::need_additional_vector_vars_p): ... to here and add slp parameter. (compute_estimated_lmul): Move update_local_live_ranges to cost class. (has_unexpected_spills_p): Move from here... (costs::has_unexpected_spills_p): ... to here. (costs::record_lmul_spills): New function. (costs::add_stmt_cost): Add stmt_info, slp mapping. (costs::finish_cost): Analyze loop. * config/riscv/riscv-vector-costs.h: Move declarations to class.
2025-07-25RISC-V: Remove user-level interruptsChristoph Müllner2-19/+8
There was once a RISC-V extension draft ("N"), which introduced user-level interrupts. However, it was never ratified and the specification draft has been removed from the RISC-V ISA manual in commit `b6cade07034` with the comment "it'll likely need to be redesigned". Support for a N extension never made it to GCC, but we support fuction attributes for user-level interrupt handlers that use the URET instruction. The "user" interrupt attribute was documented in the RISC-V C API, but has been removed in PR #106 in May 2025 (driven by LLVM devs/ maintainers and ack'ed by at least one GCC maintainer). Let's drop URET support from GCC as well. gcc/ChangeLog: * config/riscv/riscv.cc (enum riscv_privilege_levels): Remove USER_MODE. (riscv_handle_type_attribute): Remove "user" interrupts. (riscv_expand_epilogue): Likewise. (riscv_get_interrupt_type): Likewise. * config/riscv/riscv.md (riscv_uret): Remove URET pattern. * doc/extend.texi: Remove documentation of user interrupts. gcc/testsuite/ChangeLog: * gcc.target/riscv/interrupt-conflict-mode.c: Remove "user" interrupts. * gcc.target/riscv/xtheadint-push-pop.c: Likewise. * gcc.target/riscv/interrupt-umode.c: Removed. Reported-by: Sam Elliott <quic_aelliott@quicinc.com> Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2025-07-25gcn: Add "s_nop"s for MI300Tobias Burnus4-137/+312
MI300 requires some additional s_nop to be added between some instructions. * As 'v_readlane' and 'v_writelane' have to be distinguished, the 'laneselect' attribute was changed from no/yes to no/read/write. * Add some missing 'laneselect' attributes for v_(read,write)lane. * Replace 'delayeduse' by 'flatmemaccess' which is more explicit, especially as some uses have to destinguished more details. (Alongside, one off-by-two delayeduse has been fixed.) On the other hand, RDNA 2, 3, and 3.5 do not require any added s_nop; thus, there is no need to walk the instructions for them to insert pointless S_NOP. (RDNA4 (not yet in GCC) requires it in a few cases.) gcc/ChangeLog: * config/gcn/gcn-opts.h (TARGET_NO_MANUAL_NOPS, TARGET_CDNA3_NOPS): Define. * config/gcn/gcn.md (define_attr "laneselect): Change 'yes' to 'read' and 'write'. (define_attr "flatmemaccess"): Add with values store, storex34, load, atomic, atomicwait, cmpswapx2, and no. Replacing ... (define_attr "delayeduse"): Remove. (define_attr "transop"): Add with values yes and no. (various insns): Update 'laneselect', add flatmemaccess and transop, remove delayeduse; fixing an issue for s_load_dwordx4 vs. flat_store_dwordx4 related to delayeduse (now: flatmemaccess). * config/gcn/gcn-valu.md: Update laneselect attribute and add flatmemaccess. * config/gcn/gcn.cc (gcn_cmpx_insn_p): New. (gcn_md_reorg): Update for MI300 to add additional s_nop. Skip s_nop-insertion part for RDNA{2,3}; add "VALU writes EXEC followed by VALU DPP" unconditionally for CDNA2/CDNA3/GCN5.
2025-07-25RISC-V: Add support for resumable non-maskable interrupt (RNMI) handlersChristoph Müllner2-4/+21
The Smrnmi extension introduces the nmret instruction to return from RNMI handlers. We already have basic Smrnmi support. This patch introduces support for the nmret instruction and the ability to set the function attribute `__attribute__ ((interrupt ("rnmi")))` to let the compiler generate RNMI handlers. The attribute name is proposed in a PR for the RISC C API and approved by LLVM maintainers: https://github.com/riscv-non-isa/riscv-c-api-doc/pull/116 gcc/ChangeLog: * config/riscv/riscv.cc (enum riscv_privilege_levels): Add RNMI_MODE. (riscv_handle_type_attribute): Handle 'rnmi' interrupt attribute. (riscv_expand_epilogue): Generate nmret for RNMI handlers. (riscv_get_interrupt_type): Handle 'rnmi' interrupt attribute. * config/riscv/riscv.md (riscv_rnmi): Add nmret INSN. * doc/extend.texi: Add documentation for 'rnmi' interrupt attribute. gcc/testsuite/ChangeLog: * gcc.target/riscv/interrupt-rnmi.c: New test. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
2025-07-24vect: Add is_gather_scatter argument to misalignment hook.Robin Dapp8-20/+72
This patch adds an is_gather_scatter argument to the support_vector_misalignment hook. All targets but riscv do not care about alignment for gather/scatter so return true for is_gather_scatter. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_builtin_support_vector_misalignment): Return true for gather/scatter. * config/arm/arm.cc (arm_builtin_support_vector_misalignment): Ditto. * config/epiphany/epiphany.cc (epiphany_support_vector_misalignment): Ditto. * config/gcn/gcn.cc (gcn_vectorize_support_vector_misalignment): Ditto. * config/loongarch/loongarch.cc (loongarch_builtin_support_vector_misalignment): Ditto. * config/riscv/riscv.cc (riscv_support_vector_misalignment): Add gather/scatter argument. * config/rs6000/rs6000.cc (rs6000_builtin_support_vector_misalignment): Return true for gather/scatter. * config/s390/s390.cc (s390_support_vector_misalignment): Ditto. * doc/tm.texi: Add argument. * target.def: Ditto. * targhooks.cc (default_builtin_support_vector_misalignment): Ditto. * targhooks.h (default_builtin_support_vector_misalignment): Ditto. * tree-vect-data-refs.cc (vect_supportable_dr_alignment): Ditto.
2025-07-24aarch64: Relaxed SEL combiner patterns for unpacked SVE FP binary arithmeticSpencer Abson1-49/+49
Extend the binary op/UNSPEC_SEL combiner patterns from SVE_FULL_F/ SVE_FULL_F_B16B16 to SVE_F/SVE_F_B16B16, where the strictness value is SVE_RELAXED_GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_relaxed): Extend from SVE_FULL_F_B16B16 to SVE_F_B16B16. (*cond_<optab><mode>_3_relaxed): Likewise. (*cond_<optab><mode>_any_relaxed): Likwise. (*cond_<optab><mode>_any_const_relaxed): Extend from SVE_FULL_F to SVE_F. (*cond_add<mode>_2_const_relaxed): Likewise. (*cond_add<mode>_any_const_relaxed): Likewise. (*cond_sub<mode>_3_const_relaxed): Likewise. (*cond_sub<mode>_const_relaxed): Likewise. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/unpacked_cond_binary_bf16_1.C: New test. * gcc.target/aarch64/sve/unpacked_cond_builtin_fmax_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_builtin_fmin_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fadd_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fdiv_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmaxnm_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fminnm_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fmul_1.c: Likewise.. * gcc.target/aarch64/sve/unpacked_cond_fsubr_1.c: Likewise.
2025-07-24aarch64: Add support for unpacked SVE FDIVSpencer Abson2-25/+33
This patch extends the unpredicated FP division expander to support partial FP modes. It extends the existing patterns used to implement UNSPEC_COND_FDIV and it's approximation as needed. gcc/ChangeLog: * config/aarch64/aarch64-sve.md: (@aarch64_sve_<optab><mode>): Extend from SVE_FULL_F to SVE_F, use aarch64_predicate_operand. (@aarch64_frecpe<mode>): Extend from SVE_FULL_F to SVE_F. (@aarch64_frecps<mode>): Likewise. (div<mode>3): Likewise, use aarch64_sve_fp_pred. * config/aarch64/iterators.md: Add warnings above SVE_FP_UNARY and SVE_FP_BINARY. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_fdiv_1.c: New test. * gcc.target/aarch64/sve/unpacked_fdiv_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fdiv_3.c: Likewise.
2025-07-24aarch64: Add support for unpacked SVE FP binary arithmeticSpencer Abson2-44/+42
This patch extends the expanders for unpredicated smax, smin, add, sub, mul, min, and max, so that they support partial SVE FP modes. The relevant insn and splitting patterns are also updated. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (<optab><mode>3): Extend from SVE_FULL_F to SVE_F, use aarch64_sve_fp_pred. (*post_ra_<sve_fp_op><mode>3): Extend from SVE_FULL_F to SVE_F. (@aarch64_pred_<optab><mode>): Extend from SVE_FULL_F to SVE_F, use aarch64_predicate_operand (ADD/SUB/MUL/MAX/MIN). (split for using unpredicated insns): Move SVE_RELAXED_GP into the pattern, rather than testing for it in the condition. * config/aarch64/aarch64-sve2.md (@aarch64_pred_<optab><mode>): Extend from VNx8BF_ONLY to SVE_BF. gcc/testsuite/ChangeLog: * g++.target/aarch64/sve/unpacked_binary_bf16_1.C: New test. * g++.target/aarch64/sve/unpacked_binary_bf16_2.C: Likewise. * gcc.target/aarch64/sve/unpacked_builtin_fmax_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_builtin_fmax_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_builtin_fmin_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_builtin_fmin_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fadd_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fadd_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fmaxnm_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fmaxnm_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fminnm_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fminnm_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fmul_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fmul_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_fsubr_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_fsubr_2.c: Likewise.
2025-07-24RISC-V: Avoid vaaddu.vx combine pattern pollute VXRM csrPan Li4-13/+122
The vaaddu.vx combine almost comes from avg_floor, it will requires the vxrm to be RDN. But not all vaaddu.vx should depends on the RDN. The vaaddu.vx combine should leverage the VXRM value as is instead of pollute them all to RDN. This patch would like to fix this and set it as is. gcc/ChangeLog: * config/riscv/autovec-opt.md (*uavg_floor_vx_<mode>): Rename from... (*<sat_op_v_vdup>_vx_<mode>): Rename to... (*<sat_op_vdup_v>_vx_<mode>): Rename to... * config/riscv/riscv-protos.h (enum insn_flags): Add vxrm RNE, ROD type. (enum insn_type): Add RNE_P, ROD_P type. (expand_vx_binary_vxrm_vec_vec_dup): Add new func decl. (expand_vx_binary_vxrm_vec_dup_vec): Ditto. * config/riscv/riscv-v.cc (get_insn_type_by_vxrm_val): Add helper to get insn type by vxrm value. (expand_vx_binary_vxrm_vec_vec_dup): Add new func impl to expand vec + vec_dup pattern. (expand_vx_binary_vxrm_vec_dup_vec): Ditto but for vec_dup + vec pattern. * config/riscv/vector-iterators.md: Add helper iterator for sat vx combine. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-23aarch64: Relaxed SEL combiner patterns for unpacked SVE FP unary operationsSpencer Abson1-9/+9
Extend the unary op/UNSPEC_SEL combiner patterns from SVE_FULL_F to SVE_F, where the strictness value is SVE_RELAXED_GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond_<optab><mode>_2_relaxed): Extend from SVE_FULL_F to SVE_F. (*cond_<optab><mode>_any_relaxed): Likewise. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_cond_fabs_1.c: New test. * gcc.target/aarch64/sve/unpacked_cond_fneg_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_frinta_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_frinta_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_frinti_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_frintm_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_frintp_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_frintx_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_frintz_1.c: Likewise.
2025-07-23aarch64: Add support for unpacked SVE FP unary operationsSpencer Abson2-15/+31
This patch extends the expander for unpredicated round, nearbyint, floor, ceil, rint, and trunc, so that it can handle partial SVE FP modes. We move fabs and fneg to a separate expander, since they are not trapping instructions. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (<optab><mode>2): Replace use of aarch64_ptrue_reg with aarch64_sve_fp_pred. (@aarch64_pred_<optab><mode>): Extend from SVE_FULL_F to SVE_F, and use aarch64_predicate_operand. * config/aarch64/iterators.md: Split FABS/FNEG out of SVE_COND_FP_UNARY (into new SVE_COND_FP_UNARY_BITWISE). gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_fabs_1.c: New test. * gcc.target/aarch64/sve/unpacked_fneg_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_frinta_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_frinta_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_frinti_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_frinti_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_frintm_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_frintm_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_frintp_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_frintp_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_frintx_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_frintx_2.c: Likewise. * gcc.target/aarch64/sve/unpacked_frintz_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_frintz_2.c: Likewise.
2025-07-23aarch64: Relaxed SEL combiner patterns for unpacked SVE FP conversionsSpencer Abson2-0/+122
Add UNSPEC_SEL combiner patterns for unpacked FP conversions, where the strictness value is SVE_RELAXED_GP. gcc/ChangeLog: * config/aarch64/aarch64-sve.md (*cond_<optab>_nontrunc<SVE_PARTIAL_F:mode><SVE_HSDI:mode>_relaxed): New FCVT/SEL combiner pattern. (*cond_<optab>_trunc<VNx2DF_ONLY:mode><VNx2SI_ONLY:mode>_relaxed): New FCVTZ{S,U}/SEL combiner pattern. (*cond_<optab>_nonextend<SVE_HSDI:mode><SVE_PARTIAL_F:mode>_relaxed): New {S,U}CVTF/SEL combiner pattern. (*cond_<optab>_trunc<SVE_SDF:mode><SVE_PARTIAL_HSF:mode>): New FCVT/SEL combiner pattern. (*cond_<optab>_nontrunc<SVE_PARTIAL_HSF:mode><SVE_SDF:mode>_relaxed): New FCVTZ{S,U}/SEL combiner pattern. * config/aarch64/iterators.md: New mode iterator for VNx2SI. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/unpacked_cond_cvtf_1.c: New test. * gcc.target/aarch64/sve/unpacked_cond_fcvt_1.c: Likewise. * gcc.target/aarch64/sve/unpacked_cond_fcvtz_1.c: Likewise.
2025-07-23RISC-V: Rework broadcast handling [PR121073].Robin Dapp10-237/+442
During the last weeks it became clear that our current broadcast handling needs an overhaul in order to improve maintainability. PR121073 showed that my intermediate fix wasn't enough and caused regressions. This patch now goes a first step towards untangling broadcast (vmv.v.x), "set first" (vmv.s.x), and zero-strided load (vlse). Also can_be_broadcast_p is rewritten and strided_broadcast_p is introduced to make the distinction clear directly in the predicates. Due to the pervasiveness of the patterns I needed to touch a lot of places and tried to clear up some things while at it. The patch therefore also introduces new helpers expand_broadcast for vmv.v.x that dispatches to regular as well as strided broadcast and expand_set_first that does the same thing for vmv.s.x. The non-strided fallbacks are now implemented as splitters of the strided variants. This makes it easier to see where and when things happen. The test cases I touched appeared wrong to me so this patch sets a new baseline for some of the scalar_move tests. There is still work to be done but IMHO that can be deferred: It would be clearer if the three broadcast-like variants differed not just in name but also in RTL pattern so matching is not as confusing. Right now vmv.v.x and vmv.s.x only differ in the mask and are interchangeable by just changing it from "all ones" to a "single one". As last time, I regtested on rv64 and rv32 with strided_broadcast turned on and off. Note there are regressions cond_fma_fnma-[78].c. Those are due to the patch exposing more fwprop/late-combine opportunities. For fma/fnma we don't yet have proper costing for vv/vx in place but I'll expect that to be addressed soon and figured we can live with those for the time being. PR target/121073 gcc/ChangeLog: * config/riscv/autovec-opt.md: Use new helpers. * config/riscv/autovec.md: Ditto. * config/riscv/predicates.md (strided_broadcast_mask_operand): New predicate. (strided_broadcast_operand): Ditto. (any_broadcast_operand): Ditto. * config/riscv/riscv-protos.h (expand_broadcast): Declare. (expand_set_first): Ditto. (expand_set_first_tu): Ditto. (strided_broadcast_p): Ditto. * config/riscv/riscv-string.cc (expand_vec_setmem): Use new helpers. * config/riscv/riscv-v.cc (expand_broadcast): New functionk. (expand_set_first): Ditto. (expand_set_first_tu): Ditto. (expand_const_vec_duplicate): Use new helpers. (expand_const_vector_duplicate_repeating): Ditto. (expand_const_vector_duplicate_default): Ditto. (sew64_scalar_helper): Ditto. (expand_vector_init_merge_repeating_sequence): Ditto. (expand_reduction): Ditto. (strided_broadcast_p): New function. (whole_reg_to_reg_move_p): Use new helpers. * config/riscv/riscv-vector-builtins-bases.cc: Use either broadcast or strided broadcast. * config/riscv/riscv-vector-builtins.cc (function_expander::use_ternop_insn): Ditto. (function_expander::use_widen_ternop_insn): Ditto. (function_expander::use_scalar_broadcast_insn): Ditto. * config/riscv/riscv-vector-builtins.h: Declare scalar broadcast. * config/riscv/vector.md (*pred_broadcast<mode>): Split into regular and strided broadcast. (*pred_broadcast<mode>_zvfh): Split. (pred_broadcast<mode>_zvfh): Ditto. (*pred_broadcast<mode>_zvfhmin): Ditto. (@pred_strided_broadcast<mode>): Ditto. (*pred_strided_broadcast<mode>): Ditto. (*pred_strided_broadcast<mode>_zvfhmin): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls-vlmax/repeat-6.c: Adjust test expectation. * gcc.target/riscv/rvv/base/scalar_move-5.c: Ditto. * gcc.target/riscv/rvv/base/scalar_move-6.c: Ditto. * gcc.target/riscv/rvv/base/scalar_move-7.c: Ditto. * gcc.target/riscv/rvv/base/scalar_move-8.c: Ditto. * gcc.target/riscv/rvv/base/scalar_move-9.c: Ditto. * gcc.target/riscv/rvv/pr121073.c: New test.
2025-07-23aarch64: Fix fma steering when rename fails [PR120119]Andrew Pinski1-0/+5
Regrename can fail in some case and `insn_rr[INSN_UID (insn)].op_info` will be null. The FMA steering code was not expecting the failure to happen. This started to happen after early RA was added but it has been a latent bug before that. Build and tested for aarch64-linux-gnu. PR target/120119 gcc/ChangeLog: * config/aarch64/cortex-a57-fma-steering.cc (func_fma_steering::analyze): Skip if renaming fails. gcc/testsuite/ChangeLog: * g++.dg/torture/pr120119-1.C: New test. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-07-23[aarch64] check for non-NULL vectype in aarch64_vector_costs::add_stmt_costRichard Biener1-0/+1
With a patch still in development we get NULL STMT_VINFO_VECTYPE. One side-effect is that during scalar stmt testing we no longer pass a vectype. The following adjusts aarch64_vector_costs::add_stmt_cost to check for a non-NULL vectype before accessing it, like all the code surrounding it. The other fix possibility would have been to re-orderr the check with the vect_mem_access_type one, but that one is not going to exist during scalar code costing either in the future. * config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost): Check vectype is non-NULL before accessing it.
2025-07-22xtensa: Fix inaccuracy in xtensa_rtx_costs()Takayuki 'January June' Suwa2-22/+32
This patch fixes the following defects in the function: - The cost of move instructions larger than the natural word width, specifically "movd[if]_internal", cannot be estimated correctly - Floating-point or symbolic constant assignment insns cannot be identified as L32R instructions gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_is_insn_L32R_p): Rewrite to capture insns that could be L32R machine instructions wherever possible. (xtensa_rtx_costs): Fix to consider that moves larger than a natural word can take multiple L32R machine instructions. (constantpool_address_p): Cosmetics. * config/xtensa/xtensa.md (movdi_internal, movdf_internal): Add missing insn attributes.
2025-07-22xtensa: Make relaxed MOVI instructions treated as "load" typeTakayuki 'January June' Suwa1-3/+3
The relaxed MOVI instructions in the Xtensa ISA are assignment ones that contain large integer, floating-point or symbolic constants that would not normally be allowed as immediate values by instructions in assembly code, and will instead be translated by the assembler later rather the compiler, into the L32R instructions referencing to literal pool entries containing that values (see '-mauto-litpools' Xtensa-specific option). This means that even though such instructions look like nothing more than constant value assignments in their RTL representation, these may perform better by treating them as loads from memory (i.e. the actual behavior) and also trying to avoid using the value immediately after the load, especially from an instruction scheduling perspective. gcc/ChangeLog: * config/xtensa/xtensa.md (movsi_internal, movhi_internal, movsf_internal): Change the value of the "type" attribute from "move" to "load" when the source operand constraint is "Y".
2025-07-22[RISC-V] Restrict generic-vector-ooo DFAJeff Law1-30/+55
So while debugging Austin's work to support the spacemit x60 in the BPI we found that even though his pipeline description had mappings for all the vector instructions, they were still getting matched by the generic-vector-ooo DFA. The core problem is that DFA never restricted itself to a tune option (oops). That's easily fixed, at which time everything using generic blows up because we don't have a generic in-order vector DFA. Everything using generic was indirectly also using generic-vector-ooo for the vector instructions. It may be better long term to define a generic-vector DFA, but to preserve behavior, I'm letting generic-vector-ooo match when the generic DFA is active. Tested in my tester, waiting on pre-commit CI before moving forward. gcc/ * config/riscv/generic-vector-ooo.md: Restrict insn reservations to generic_ooo and generic tuning models.
2025-07-21[RISC-V] Add missing insn types to xiangshan.md and mips-p8700.mdJeff Law2-2/+3
This is a trivial patch to add a few missing types to pipeline models that are mostly complete. In particular this adds the "ghost" to mips-p8700.md and the "sf_vc" and "sf_vc_se" types to xiangshan.md. There are definitely some bigger issues to solve in this space. But this is a trivial fix that stands on its own. I've tested this in my tester, just waiting for pre-commit CI to do its thing. gcc/ * config/riscv/mips-p8700.md: Add support for "ghost" insn types. * config/riscv/xiangshan.md: Add support for "sf_vc" and "sf_vc_se" insn types.
2025-07-21[RISC-V] Fix wrong CFA during stack probeAndreas Schwab1-1/+1
temp1 is used by the probe loop for the step size, but we need the final address of the stack after the loop which resides in temp2. PR target/121121 * config/riscv/riscv.cc (riscv_allocate_and_probe_stack_space): Use temp2 instead of temp1 for the CFA note.
2025-07-21RISC-V: Allow VLS DImode for sat_op vx DImode patternPan Li1-15/+15
When try to introduce the vaaddu.vx combine for DImode, we will meet ICE like below: 0x4889763 internal_error(char const*, ...) .../riscv-gnu-toolchain/gcc/__build__/../gcc/diagnostic-global-context.cc:517 0x4842f98 fancy_abort(char const*, int, char const*) .../riscv-gnu-toolchain/gcc/__build__/../gcc/diagnostic.cc:1818 0x2953461 code_for_pred_scalar(int, machine_mode) ./insn-opinit.h:1911 0x295f300 riscv_vector::sat_op<110>::expand(riscv_vector::function_expander&) const .../riscv-gnu-toolchain/gcc/__build__/../gcc/config/riscv/riscv-vector-builtins-bases.cc:667 0x294bce1 riscv_vector::function_expander::expand() We will have code_for_nothing when emit the vaadd.vx insn for V2DI vls mode. So allow the VLS mode for the sat_op vx pattern to unblock it. gcc/ChangeLog: * config/riscv/vector.md: Allow VLS DImode for sat_op vx pattern. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-21RISC-V: Combine vec_duplicate + vaaddu.vv to vaaddu.vx on GR2VR cost for HI, ↵Pan Li2-2/+89
QI and SI mode This patch would like to combine the vec_duplicate + vaaddu.vv to the vaaddu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_AVG_FLOOR(NT, WT) \ NT \ test_##NT##_avg_floor(NT x, NT y) \ { \ return (NT)(((WT)x + (WT)y) >> 1); \ } #define AVG_FLOOR_FUNC(T) test_##T##_avg_floor DEF_AVG_FLOOR(uint32_t, uint64_t) DEF_VX_BINARY_CASE_2_WRAP(T, AVG_FLOOR_FUNC(T), sat_add) Before this patch: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ slli a3,a3,32 15 │ srli a3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ slli a4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vaaddu.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 11 │ beq a3,zero,.L8 12 │ slli a3,a3,32 13 │ srli a3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ slli a4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vaaddu.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/autovec-opt.md (*uavg_floor_vx_<mode>): Add pattern for vaaddu.vx combine. * config/riscv/riscv.cc (get_vector_binary_rtx_cost): Add UNSPEC handling for UNSPEC_VAADDU. (riscv_rtx_costs): Ditto. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-21aarch64: Avoid INS-(W|X)ZR instructions when optimising for speedKyrylo Tkachov1-1/+4
For inserting zero into a vector lane we usually use an instruction like: ins v0.h[2], wzr This, however, has not-so-great performance on some CPUs. On Grace, for example it has a latency of 5 and throughput 1. The alternative sequence: movi v31.8b, #0 ins v0.h[2], v31.h[0] is prefereble bcause the MOVI-0 is often a zero-latency operation that is eliminated by the CPU frontend and the lane-to-lane INS has a latency of 2 and throughput of 4. We can avoid the merging of the two instructions into the aarch64_simd_vec_set_zero<mode> by disabling that pattern when optimizing for speed. Thanks to wider benchmarking from Tamar, it makes sense to make this change for all tunings, so no RTX costs or tuning flags are introduced to control this in a more fine-grained manner. They can be easily added in the future if needed for a particular CPU. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64-simd.md (aarch64_simd_vec_set_zero<mode>): Enable only when optimizing for size. gcc/testsuite/ * gcc.target/aarch64/simd/mf8_data_1.c (test_set_lane4, test_setq_lane4): Relax allowed assembly. * gcc.target/aarch64/vec-set-zero.c: Use -Os in flags. * gcc.target/aarch64/inszero_split_1.c: New test.
2025-07-21aarch64: NFC - Make vec_* rtx costing logic consistentKyrylo Tkachov3-58/+65
The rtx costs logic for CONST_VECTOR, VEC_DUPLICATE and VEC_SELECT sets the cost unconditionally to the movi, dup or extract fields of extra_cost, when the normal practice in that function is to use extra_cost only when speed is set. When speed is false the function should estimate the size cost only. This patch makes the logic consistent by using the extra_cost fields to increment the cost when speed is set. This requires reducing the extra_cost values of the movi, dup and extract fields by COSTS_N_INSNS (1), as every insn being costed has a cost of COSTS_N_INSNS (1) at the start of the function. The cost tables for the CPUs are updated in line with this. With these changes the testsuite is unaffected so no different costing decisions are made and this patch is just a cleanup. Bootstrapped and tested on aarch64-none-linux-gnu. Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com> gcc/ * config/aarch64/aarch64.cc (aarch64_rtx_costs): Add extra_cost values only when speed is true for CONST_VECTOR, VEC_DUPLICATE, VEC_SELECT cases. * config/aarch64/aarch64-cost-tables.h (qdf24xx_extra_costs, thunderx_extra_costs, thunderx2t99_extra_costs, thunderx3t110_extra_costs, tsv110_extra_costs, a64fx_extra_costs, ampere1_extra_costs, ampere1a_extra_costs, ampere1b_extra_costs): Reduce cost of movi, dup, extract fields by COSTS_N_INSNS (1). * config/arm/aarch-cost-tables.h (generic_extra_costs, cortexa53_extra_costs, cortexa57_extra_costs, cortexa76_extra_costs, exynosm1_extra_costs, xgene1_extra_costs): Likewise.
2025-07-21amdgcn: add DImode offsets for gather/scatterAndrew Stubbs2-12/+103
Add new variant of he gather_load and scatter_store instructions that take the offsets in DImode. This is not the natural width for offsets in the instruction set, but we can use them to compute a vector of absolute addresses, which does work. This enables the autovectorizer to use gather/scatter in a number of additional scenarios (one of which shows up in the SPEC HPC lbm benchmark). gcc/ChangeLog: * config/gcn/gcn-valu.md (gather_load<mode><vndi>): New. (scatter_store<mode><vndi>): New. (mask_gather_load<mode><vndi>): New. (mask_scatter_store<mode><vndi>): New. * config/gcn/gcn.cc (gcn_expand_scaled_offsets): Support DImode.
2025-07-21amdgcn: Add ashlvNm, mulvNm macrosAndrew Stubbs1-27/+41
I need some extra shift varieties in the mode-independent code, but the macros don't permit insns that don't have QI/HI variants. This fixes the problem, and adds the new functions for the follow-up patch to use. gcc/ChangeLog: * config/gcn/gcn.cc (GEN_VNM_NOEXEC): Use USE_QHF. (GEN_VNM): Likewise, and call for new ashl and mul variants.
2025-07-21amdgcn: add more insn patterns using vec_duplicateAndrew Stubbs2-6/+179
These new insns allow more efficient use of scalar inputs to 64-bit vector add and mul. Also, the patch adjusts the existing mul.._dup because it was actually a dup2 (the vec_duplicate is on the second input), and that was inconveniently inconsistent. The patterns are generally useful, but will be used directly by a follow-up patch. gcc/ChangeLog: * config/gcn/gcn-valu.md (add<mode>3_dup): New. (add<mode>3_dup_exec): New. (<su>mul<mode>3_highpart_dup<exec>): New. (mul<mode>3_dup): Move the vec_duplicate to operand 1. (mul<mode>3_dup_exec): New. (vec_series<mode>): Adjust call to gen_mul<mode>3_dup. * config/gcn/gcn.cc (gcn_expand_vector_init): Likewise.
2025-07-21Error handling for hard register constraintsStefan Schulze Frielinghaus1-2/+3
This implements error handling for hard register constraints including potential conflicts with register asm operands. In contrast to register asm operands, hard register constraints allow more than just one register per operand. Even more than just one register per alternative. For example, a valid constraint for an operand is "{r0}{r1}m,{r2}". However, this also means that we have to make sure that each register is used at most once in each alternative over all outputs and likewise over all inputs. For asm statements this is done by this patch during gimplification. For hard register constraints used in machine description, error handling is still a todo and I haven't investigated this so far and consider this rather a low priority. gcc/ada/ChangeLog: * gcc-interface/trans.cc (gnat_to_gnu): Pass null pointer to parse_{input,output}_constraint(). gcc/analyzer/ChangeLog: * region-model-asm.cc (region_model::on_asm_stmt): Pass null pointer to parse_{input,output}_constraint(). gcc/c/ChangeLog: * c-typeck.cc (build_asm_expr): Pass null pointer to parse_{input,output}_constraint(). gcc/ChangeLog: * cfgexpand.cc (n_occurrences): Move this ... (check_operand_nalternatives): and this ... (expand_asm_stmt): and the call to gimplify.cc. * config/s390/s390.cc (s390_md_asm_adjust): Pass null pointer to parse_{input,output}_constraint(). * gimple-walk.cc (walk_gimple_asm): Pass null pointer to parse_{input,output}_constraint(). (walk_stmt_load_store_addr_ops): Ditto. * gimplify-me.cc (gimple_regimplify_operands): Ditto. * gimplify.cc (num_occurrences): Moved from cfgexpand.cc. (num_alternatives): Ditto. (gimplify_asm_expr): Deal with hard register constraints. * stmt.cc (eliminable_regno_p): New helper. (hardreg_ok_p): Perform a similar check as done in make_decl_rtl(). (parse_output_constraint): Add parameter for gimplify_reg_info and validate hard register constrained operands. (parse_input_constraint): Ditto. * stmt.h (class gimplify_reg_info): Forward declaration. (parse_output_constraint): Add parameter. (parse_input_constraint): Ditto. * tree-ssa-operands.cc (operands_scanner::get_asm_stmt_operands): Pass null pointer to parse_{input,output}_constraint(). * tree-ssa-structalias.cc (find_func_aliases): Pass null pointer to parse_{input,output}_constraint(). * varasm.cc (assemble_asm): Pass null pointer to parse_{input,output}_constraint(). * gimplify_reg_info.h: New file. gcc/cp/ChangeLog: * semantics.cc (finish_asm_stmt): Pass null pointer to parse_{input,output}_constraint(). gcc/d/ChangeLog: * toir.cc: Pass null pointer to parse_{input,output}_constraint(). gcc/testsuite/ChangeLog: * gcc.dg/pr87600-2.c: Split test into two files since errors for functions test{0,1} are thrown during expand, and for test{2,3} during gimplification. * lib/scanasm.exp: On s390, skip lines beginning with #. * gcc.dg/asm-hard-reg-error-1.c: New test. * gcc.dg/asm-hard-reg-error-2.c: New test. * gcc.dg/asm-hard-reg-error-3.c: New test. * gcc.dg/asm-hard-reg-error-4.c: New test. * gcc.dg/asm-hard-reg-error-5.c: New test. * gcc.dg/pr87600-3.c: New test. * gcc.target/aarch64/asm-hard-reg-2.c: New test. * gcc.target/s390/asm-hard-reg-7.c: New test.
2025-07-21Hard register constraintsStefan Schulze Frielinghaus3-4/+14
Implement hard register constraints of the form {regname} where regname must be a valid register name for the target. Such constraints may be used in asm statements as a replacement for register asm and in machine descriptions. A more verbose description is given in extend.texi. It is expected and desired that optimizations coalesce multiple pseudos into one whenever possible. However, in case of hard register constraints we may have to undo this and introduce copies since otherwise we would constraint a single pseudo to multiple hard registers. This is done prior RA during asmcons in match_asm_constraints_2(). While IRA tries to reduce live ranges, it also replaces some register-register moves. That in turn might undo those copies of a pseudo which we just introduced during asmcons. Thus, check in decrease_live_ranges_number() via valid_replacement_for_asm_input_p() whether it is valid to perform a replacement. The reminder of the patch mostly deals with parsing and decoding hard register constraints. The actual work is done by LRA in process_alt_operands() where a register filter, according to the constraint, is installed. For the sake of "reviewability" and in order to show the beauty of LRA, error handling (which gets pretty involved) is spread out into a subsequent patch. Limitation ---------- Currently, a fixed register cannot be used as hard register constraint. For example, loading the stack pointer on x86_64 via void * foo (void) { void *y; __asm__ ("" : "={rsp}" (y)); return y; } leads to an error. Asm Adjust Hook --------------- The following targets implement TARGET_MD_ASM_ADJUST: - aarch64 - arm - avr - cris - i386 - mn10300 - nds32 - pdp11 - rs6000 - s390 - vax Most of them only add the CC register to the list of clobbered register. However, cris, i386, and s390 need some minor adjustment. gcc/ChangeLog: * config/cris/cris.cc (cris_md_asm_adjust): Deal with hard register constraint. * config/i386/i386.cc (map_egpr_constraints): Ditto. * config/s390/s390.cc (f_constraint_p): Ditto. * doc/extend.texi: Document hard register constraints. * doc/md.texi: Ditto. * function.cc (match_asm_constraints_2): Have a unique pseudo for each operand with a hard register constraint. (pass_match_asm_constraints::execute): Calling into new helper match_asm_constraints_2(). * genoutput.cc (mdep_constraint_len): Return the length of a hard register constraint. * genpreds.cc (write_insn_constraint_len): Support hard register constraints for insn_constraint_len(). * ira.cc (valid_replacement_for_asm_input_p_1): New helper. (valid_replacement_for_asm_input_p): New helper. (decrease_live_ranges_number): Similar to match_asm_constraints_2() ensure that each operand has a unique pseudo if constrained by a hard register. * lra-constraints.cc (process_alt_operands): Install hard register filter according to constraint. * recog.cc (asm_operand_ok): Accept register type for hard register constrained asm operands. (constrain_operands): Validate hard register constraints. * stmt.cc (decode_hard_reg_constraint): Parse a hard register constraint into the corresponding register number or bail out. (parse_output_constraint): Parse hard register constraint and set *ALLOWS_REG. (parse_input_constraint): Ditto. * stmt.h (decode_hard_reg_constraint): Declaration of new function. gcc/testsuite/ChangeLog: * gcc.dg/asm-hard-reg-1.c: New test. * gcc.dg/asm-hard-reg-2.c: New test. * gcc.dg/asm-hard-reg-3.c: New test. * gcc.dg/asm-hard-reg-4.c: New test. * gcc.dg/asm-hard-reg-5.c: New test. * gcc.dg/asm-hard-reg-6.c: New test. * gcc.dg/asm-hard-reg-7.c: New test. * gcc.dg/asm-hard-reg-8.c: New test. * gcc.target/aarch64/asm-hard-reg-1.c: New test. * gcc.target/i386/asm-hard-reg-1.c: New test. * gcc.target/i386/asm-hard-reg-2.c: New test. * gcc.target/s390/asm-hard-reg-1.c: New test. * gcc.target/s390/asm-hard-reg-2.c: New test. * gcc.target/s390/asm-hard-reg-3.c: New test. * gcc.target/s390/asm-hard-reg-4.c: New test. * gcc.target/s390/asm-hard-reg-5.c: New test. * gcc.target/s390/asm-hard-reg-6.c: New test. * gcc.target/s390/asm-hard-reg-longdouble.h: New test.
2025-07-20RISC-V: Add ashiftrt operand 2 for vector avg_floor and avg_ceilPan Li1-2/+4
According to the semantics of the avg_floor and avg_ceil as below: floor: op0 = (narrow) (((wide) op1 + (wide) op2) >> 1); ceil: op0 = (narrow) (((wide) op1 + (wide) op2 + 1) >> 1); Aka we have (const_int 1) as the op2 of the ashiftrt but seems missed. Thus, add it back to align the definition. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec.md: Add (const_int 1) as the op2 of ashiftrt. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-19pru: Use signed HOST_WIDE_INT for handling ctable addressesDimitar Dimitrov3-13/+16
The ctable base address for SBCO/LBCO load/store patterns was incorrectly stored as unsigned integer. That prevented matching addresses with bit 31 set, because const_int RTL expression is expected to be sign-extended. Fix by using sign-extended 32-bit values for ctable base addresses. PR target/121124 gcc/ChangeLog: * config/pru/pru-pragma.cc (pru_pragma_ctable_entry): Handle the ctable base address as signed 32-bit value, and sign-extend to HOST_WIDE_INT. * config/pru/pru-protos.h (struct pru_ctable_entry): Store the ctable base address as signed. (pru_get_ctable_exact_base_index): Pass base address as signed. (pru_get_ctable_base_index): Ditto. (pru_get_ctable_base_offset): Ditto. * config/pru/pru.cc (pru_get_ctable_exact_base_index): Ditto. (pru_get_ctable_base_index): Ditto. (pru_get_ctable_base_offset): Ditto. (pru_print_operand_address): Ditto. gcc/testsuite/ChangeLog: * gcc.target/pru/pragma-ctable_entry-2.c: New test.
2025-07-19[PATCH] RISC-V: Vector-scalar widening negate-multiply-(subtract-)accumulate ↵Paul-Antoine Arras2-1/+55
[PR119100] This pattern enables the combine pass (or late-combine, depending on the case) to merge a float_extend'ed vec_duplicate into a (possibly negated) minus-mult RTL instruction. Before this patch, we have six instructions, e.g.: vsetivli zero,4,e32,m1,ta,ma fcvt.s.h fa5,fa5 vfmv.v.f v4,fa5 vfwcvt.f.f.v v1,v3 vsetvli zero,zero,e32,m1,ta,ma vfnmadd.vv v1,v4,v2 After, we get only one: vfwnmacc.vf v1,fa5,v2 PR target/119100 gcc/ChangeLog: * config/riscv/autovec-opt.md (*vfwnmacc_vf_<mode>): New pattern. (*vfwnmsac_vf_<mode>): New pattern. * config/riscv/riscv.cc (get_vector_binary_rtx_cost): Add support for a vec_duplicate in a neg. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f16.c: Add vfwnmacc and vfwnmsac. * gcc.target/riscv/rvv/autovec/vx_vf/vf-1-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-2-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-3-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f16.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf-4-f32.c: Likewise. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmacc-run-1-f32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vf_vfwnmsac-run-1-f32.c: New test.
2025-07-19[PATCH] RISC-V: prevent NULL_RTX dereference in riscv_macro_fusion_pair_p ()Artemiy Volkov1-2/+2
> A number of folks have had their fingers in this code and it's going to take > a few submissions to do everything we want to do. > > This patch is primarily concerned with avoiding signaling that fusion can > occur in cases where it obviously should not be signaling fusion. Hi Jeff, With this change, we're liable to ICE whenever prev_set or curr_set are NULL_RTX. For a fix, how about something like the below? Thanks, Artemiy Introduced in r16-1984-g83d19b5d842dad, initializers for {prev,curr}_dest_regno can cause an ICE if the respective insn isn't a single set. Rectify this by inserting a NULL_RTX check before using {prev,curr}_set. Regtested on riscv32. gcc/ * config/riscv/riscv.cc (riscv_macro_fusion_pair_p): Protect from a NULL PREV_SET or CURR_SET.
2025-07-19AVR: Fuse get_insns with end_sequence.Georg-Johann Lay1-4/+2
gcc/ * config/avr/avr-passes.cc (avr_optimize_casesi): Fuse get_insns() with end_sequence().
2025-07-18RISC-V: Support RVVDImode for avg3_ceil auto vectPan Li1-0/+13
Like the avg3_floor pattern, the avg3_ceil has the similar issue that lack of the RVV DImode support. Thus, this patch would like to support the DImode by the standard name, with the iterator V_VLSI_D. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/autovec.md (avg<mode>3_ceil): Add new pattern of avg3_ceil for RVV DImode gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/avg_data.h: Adjust the test data. * gcc.target/riscv/rvv/autovec/avg_ceil-1-i64-from-i128.c: New test. * gcc.target/riscv/rvv/autovec/avg_ceil-run-1-i64-from-i128.c: New test. Signed-off-by: Pan Li <pan2.li@intel.com>
2025-07-17x86: Don't change mode for XOR in ix86_expand_ternlogH.J. Lu1-27/+0
There is no need to change mode for XOR in ix86_expand_ternlog now. Whatever reasons for it in the first place no longer exist. Tested on x86-64 with -m32. There are no regressions. * config/i386/i386-expand.cc (ix86_expand_ternlog): Don't change mode for XOR. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2025-07-17s390: Rework signbit optabStefan Schulze Frielinghaus1-1/+82
Currently for a signbit operation instructions tc{f,d,x}b + ipm + srl are emitted. If the source operand is a MEM, then a load precedes the sequence. A faster implementation is by issuing a load either from a REG or MEM into a GPR followed by a shift. In spirit of the signbit function of the C standard, the signbit optab only guarantees that the resulting value is nonzero if the signbit is set. The common code implementation computes a value where the signbit is stored in the most significant bit, i.e., all other bits are just masked out, whereas the current implementation of s390 results in a value where the signbit is stored in the least significant bit. Although, there is no guarantee where the signbit is stored, keep the current behaviour and, therefore, implement the signbit optab manually. Since z10, instruction lgdr can be effectively used for a 64-bit FPR-to-GPR load. However, there exists no 32-bit pendant. Thus, for target z10 make use of post-reload splitters which emit either a 64-bit or a 32-bit load depending on whether the source operand is a REG or a MEM and a corresponding 63 or 31-bit shift. We can do without post-reload splitter in case of vector extensions since there we also have a 32-bit VR-to-GPR load via instruction vlgvf. gcc/ChangeLog: * config/s390/s390.md (signbit_tdc): Rename expander. (signbit<mode>2): New expander. (signbit<mode>2_z10): New expander. gcc/testsuite/ChangeLog: * gcc.target/s390/isfinite-isinf-isnormal-signbit-2.c: Adapt scan assembler directives. * gcc.target/s390/isfinite-isinf-isnormal-signbit-3.c: Ditto. * gcc.target/s390/signbit-1.c: New test. * gcc.target/s390/signbit-2.c: New test. * gcc.target/s390/signbit-3.c: New test. * gcc.target/s390/signbit-4.c: New test. * gcc.target/s390/signbit-5.c: New test. * gcc.target/s390/signbit.h: New test.
2025-07-17s390: Adapt GPR<->VR costsStefan Schulze Frielinghaus1-1/+15
Moving between GPRs and VRs in any mode with size less than or equal to 8 bytes becomes available with vector extensions. Without adapting costs for those loads, we typically go over memory. gcc/ChangeLog: * config/s390/s390.cc (s390_register_move_cost): Add costing for vlvg/vlgv.
2025-07-17s390: Add implicit zero extend for VLGVStefan Schulze Frielinghaus1-6/+54
Exploit the fact that instruction VLGV zeros excessive bits of a GPR. gcc/ChangeLog: * config/s390/vector.md (bhfgq): Add scalar modes. (*movdi<mode>_zero_extend_A): New insn. (*movsi<mode>_zero_extend_A): New insn. (*movdi<mode>_zero_extend_B): New insn. (*movsi<mode>_zero_extend_B): New insn. gcc/testsuite/ChangeLog: * gcc.target/s390/vector/vlgv-zero-extend-1.c: New test.
2025-07-17LoongArch: Fix wrong code generated by TARGET_VECTORIZE_VEC_PERM_CONST ↵Xi Ruoyao3-99/+35
[PR121064] When TARGET_VECTORIZE_VEC_PERM_CONST is called, target may be the same pseudo as op0 and/or op1. Loading the selector into target would clobber the input, producing wrong code like vld $vr0, $t0 vshuf.w $vr0, $vr0, $vr1 So don't load the selector into d->target, use a new pseudo to hold the selector instead. The reload pass will load the pseudo for selector and the pseudo for target into the same hard register (following our constraint '0' on the shuf instructions) anyway. gcc/ChangeLog: PR target/121064 * config/loongarch/lsx.md (lsx_vshuf_<lsxfmt_f>): Add '@' to generate a mode-aware helper. Use <VIMODE> as the mode of the operand 1 (selector). * config/loongarch/lasx.md (lasx_xvshuf_<lasxfmt_f>): Likewise. * config/loongarch/loongarch.cc (loongarch_try_expand_lsx_vshuf_const): Create a new pseudo for the selector. Use the mode-aware helper to simplify the code. (loongarch_expand_vec_perm_const): Likewise. gcc/testsuite/ChangeLog: PR target/121064 * gcc.target/loongarch/pr121064.c: New test.
2025-07-16x86: Convert MMX integer loads from constant vector poolUros Bizjak2-19/+45
For MMX 16-bit, 32-bit and 64-bit constant vector loads from constant vector pool: (insn 6 2 7 2 (set (reg:V1SI 5 di) (mem/u/c:V1SI (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S4 A32])) "pr121062-2.c":10:3 2036 {*movv1si_internal} (expr_list:REG_EQUAL (const_vector:V1SI [ (const_int -1 [0xffffffffffffffff]) ]) (nil))) we can convert it to (insn 12 2 7 2 (set (reg:SI 5 di) (const_int -1 [0xffffffffffffffff])) "pr121062-2.c":10:3 100 {*movsi_internal} (nil)) Co-Developed-by: H.J. Lu <hjl.tools@gmail.com> gcc/ PR target/121062 * config/i386/i386.cc (ix86_convert_const_vector_to_integer): Handle E_V1SImode and E_V1DImode. * config/i386/mmx.md (V_16_32_64): Add V1SI, V2BF and V1DI. (mmxinsnmode): Add V1DI and V1SI. Add V_16_32_64 splitter for constant vector loads from constant vector pool. (V_16_32_64:*mov<mode>_imm): Moved after V_16_32_64 splitter. Replace lowpart_subreg with adjust_address. gcc/testsuite/ PR target/121062 * gcc.target/i386/pr121062-1.c: New test. * gcc.target/i386/pr121062-2.c: Likewise. * gcc.target/i386/pr121062-3a.c: Likewise. * gcc.target/i386/pr121062-3b.c: Likewise. * gcc.target/i386/pr121062-3c.c: Likewise. * gcc.target/i386/pr121062-4.c: Likewise. * gcc.target/i386/pr121062-5.c: Likewise. * gcc.target/i386/pr121062-6.c: Likewise. * gcc.target/i386/pr121062-7.c: Likewise.