aboutsummaryrefslogtreecommitdiff
path: root/gcc/tree-vectorizer.h
AgeCommit message (Collapse)AuthorFilesLines
2025-09-02tree-optimization/121754 - ICE with vect_reduc_type and nested cycleRichard Biener1-7/+3
The reduction guard isn't correct, STMT_VINFO_REDUC_DEF also exists for nested cycles not part of reductions but there's no reduction info for them. PR tree-optimization/121754 * tree-vectorizer.h (vect_reduc_type): Simplify to not ICE on nested cycles. * gcc.dg/vect/pr121754.c: New testcase. * gcc.target/aarch64/vect-pr121754.c: Likewise.
2025-09-02Pass vectype to vect_check_gather_scatterRichard Biener1-2/+2
The strided-store path needs to have the SLP trees vector type so the following patch passes dowm the vector type to be used to vect_check_gather_scatter and adjusts all other callers. This removes one of the last pieces requiring STMT_VINFO_VECTYPE during SLP stmt analysis. * tree-vectorizer.h (vect_check_gather_scatter): Add vectype parameter. * tree-vect-data-refs.cc (vect_check_gather_scatter): Get vectype as parameter. (vect_analyze_data_refs): Adjust. * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Likewise. * tree-vect-slp.cc (vect_get_and_check_slp_defs): Get vectype as parameter, pass down. (vect_build_slp_tree_2): Adjust. * tree-vect-stmts.cc (vect_mark_stmts_to_be_vectorized): Likewise. (vect_use_strided_gather_scatters_p): Likewise.
2025-09-01Introduce abstraction for vect reduction info, tracked from SLP nodesRichard Biener1-58/+99
While we have already the accessor info_for_reduction, its result is a plain stmt_vec_info. The following turns that into a class for the purpose of changing accesses to reduction info to a new set of accessors prefixed with VECT_REDUC_INFO and removes the corresponding STMT_VINFO prefixed accessors where possible. There is few reduction related things that are used by scalar cycle detection and thus have to stay as-is for now and as copies in future. This also separates reduction info into one object per reduction and associate it with SLP nodes, splitting it out from stmt_vec_info, retaining (and duplicating) parts used by scalar cycle analysis. The data is then associated with SLP nodes forming reduction cycles and accessible via info_for_reduction. The data is created at SLP discovery time as we look at it even pre-vectorizable_reduction analysis, but most of the data is only populated by the latter. There is no reduction info with nested cycles that are not part of an outer reduction. In the process this adds cycle info to each SLP tree, notably the reduc-idx and a way to identify the reduction info. * tree-vectorizer.h (vect_reduc_info): New. (create_info_for_reduction): Likewise. (VECT_REDUC_INFO_TYPE): Likewise. (VECT_REDUC_INFO_CODE): Likewise. (VECT_REDUC_INFO_FN): Likewise. (VECT_REDUC_INFO_SCALAR_RESULTS): Likewise. (VECT_REDUC_INFO_INITIAL_VALUES): Likewise. (VECT_REDUC_INFO_REUSED_ACCUMULATOR): Likewise. (VECT_REDUC_INFO_INDUC_COND_INITIAL_VAL): Likewise. (VECT_REDUC_INFO_EPILOGUE_ADJUSTMENT): Likewise. (VECT_REDUC_INFO_FORCE_SINGLE_CYCLE): Likewise. (VECT_REDUC_INFO_RESULT_POS): Likewise. (VECT_REDUC_INFO_VECTYPE): Likewise. (STMT_VINFO_VEC_INDUC_COND_INITIAL_VAL): Remove. (STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT): Likewise. (STMT_VINFO_FORCE_SINGLE_CYCLE): Likewise. (STMT_VINFO_REDUC_FN): Likewise. (STMT_VINFO_REDUC_VECTYPE): Likewise. (vect_reusable_accumulator::reduc_info): Adjust. (vect_reduc_type): Adjust. (_slp_tree::cycle_info): New member. (SLP_TREE_REDUC_IDX): Likewise. (vect_reduc_info_s): Move/copy data from ... (_stmt_vec_info): ... here. (_loop_vec_info::redcu_infos): New member. (info_for_reduction): Adjust to take SLP node. (vect_reduc_type): Adjust. (vect_is_reduction): Add overload for SLP node. * tree-vectorizer.cc (vec_info::new_stmt_vec_info): Do not initialize removed members. (vec_info::free_stmt_vec_info): Do not release them. * tree-vect-stmts.cc (vectorizable_condition): Adjust. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize cycle info. (vect_build_slp_tree_2): Compute SLP reduc_idx and store it. Create, populate and propagate reduction info. (vect_print_slp_tree): Print cycle info. (vect_analyze_slp_reduc_chain): Set cycle info on the manual added conversion node. (vect_optimize_slp_pass::start_choosing_layouts): Adjust. * tree-vect-loop.cc (_loop_vec_info::~_loop_vec_info): Release reduction infos. (info_for_reduction): Get the reduction info from the vector in the loop_vinfo. (vect_create_epilog_for_reduction): Adjust. (vectorizable_reduction): Likewise. (vect_transform_reduction): Likewise. (vect_transform_cycle_phi): Likewise, deal with nested cycles not part of a double reduction have no reduction info. * config/aarch64/aarch64.cc (aarch64_force_single_cycle): Use VECT_REDUC_INFO_FORCE_SINGLE_CYCLE, get SLP node and use that. (aarch64_vector_costs::count_ops): Adjust.
2025-08-26Compute vect_reduc_type off SLP node instead of stmt-infoRichard Biener1-6/+10
The following changes the vect_reduc_type API to work on the SLP node. The API is only used from the aarch64 backend, so all changes are there. In particular I noticed aarch64_force_single_cycle is invoked even for scalar costing (where the flag tested isn't computed yet), I figured in scalar costing all reductions are a single cycle. * tree-vectorizer.h (vect_reduc_type): Get SLP node as argument. * config/aarch64/aarch64.cc (aarch64_sve_in_loop_reduction_latency): Take SLO node as argument and adjust. (aarch64_in_loop_reduction_latency): Likewise. (aarch64_detect_vector_stmt_subtype): Adjust. (aarch64_vector_costs::count_ops): Likewise. Treat reductions during scalar costing as single-cycle.
2025-08-26Remove STMT_VINFO_REDUC_VECTYPE_INRichard Biener1-3/+0
This was added when invariants/externals outside of SLP didn't have an easily accessible vector type. Now it's redundant so the following removes it. * tree-vectorizer.h (stmt_vec_info_::reduc_vectype_in): Remove. (STMT_VINFO_REDUC_VECTYPE_IN): Likewise. * tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): Get at the input vectype via the SLP node child. (vectorizable_lane_reducing): Likewise. (vect_transform_reduction): Likewise. (vectorizable_reduction): Do not set STMT_VINFO_REDUC_VECTYPE_IN.
2025-08-21Merge BB and loop path in vect_analyze_stmtRichard Biener1-2/+2
We have now common patterns for most of the vectorizable_* calls, so merge. This also avoids calling vectorizable_early_exit for BB vect and clarifies signatures of it and vectorizable_phi. * tree-vectorizer.h (vectorizable_phi): Take bb_vec_info. (vectorizable_early_exit): Take loop_vec_info. * tree-vect-loop.cc (vectorizable_phi): Adjust. * tree-vect-slp.cc (vect_slp_analyze_operations): Likewise. (vectorize_slp_instance_root_stmt): Likewise. * tree-vect-stmts.cc (vectorizable_early_exit): Likewise. (vect_transform_stmt): Likewise. (vect_analyze_stmt): Merge the sequences of vectorizable_* where common.
2025-08-20Record get_load_store_info results from analysisRichard Biener1-5/+29
The following is a patch to make us record the get_load_store_info results from load/store analysis and re-use them during transform. In particular this moves where SLP_TREE_MEMORY_ACCESS_TYPE is stored. A major hassle was (and still is, to some extent), gather/scatter handling with it's accompaning gather_scatter_info. As get_load_store_info no longer fully re-analyzes them but parts of the information is recorded in the SLP tree during SLP build the following goes and eliminates the use of this data in vectorizable_load/store, instead recording the other relevant part in the load-store info (namely the IFN or decl chosen). Strided load handling keeps the re-analysis but populates the data back to the SLP tree and the load-store info. That's something for further improvement. This also shows that early classifying a SLP tree as load/store and allocating the load-store data might be a way to move back all of the gather/scatter auxiliary data into one place. Rather than mass-replacing references to variables I've kept the locals but made them read-only, only adjusting a few elsval setters and adding a FIXME to strided SLP handling of alignment (allowing local override there). The FIXME shows that while a lot of analysis is done in get_load_store_type that's far from all of it. There's also a possibility that splitting up the transform phase into separate load/store def types, based on VMAT choosen, will make the code more maintainable. * tree-vectorizer.h (vect_load_store_data): New. (_slp_tree::memory_access_type): Remove. (SLP_TREE_MEMORY_ACCESS_TYPE): Turn into inline function. * tree-vect-slp.cc (_slp_tree::_slp_tree): Do not initialize SLP_TREE_MEMORY_ACCESS_TYPE. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Remove gather_scatter_info pointer argument, instead get info from the SLP node. (vect_build_one_gather_load_call): Get SLP node and builtin decl as argument and remove uses of gather_scatter_info. (vect_build_one_scatter_store_call): Likewise. (vect_get_gather_scatter_ops): Remove uses of gather_scatter_info. (vect_get_strided_load_store_ops): Get SLP node and remove uses of gather_scatter_info. (get_load_store_type): Take pointer to vect_load_store_data instead of individual pointers. (vectorizable_store): Adjust. Re-use get_load_store_type result from analysis time. (vectorizable_load): Likewise.
2025-08-13Introduce SLP_TREE_PERMUTE_PRichard Biener1-0/+1
The following wraps SLP_TREE_CODE checks against VEC_PERM_EXPR (the only relevant code) in a new SLP_TREE_PERMUTE_P predicate. Most places guard against SLP_TREE_REPRESENTATIVE being NULL. * tree-vectorizer.h (SLP_TREE_PERMUTE_P): New. * tree-vect-slp-patterns.cc (linear_loads_p): Adjust. (vect_detect_pair_op): Likewise. (addsub_pattern::recognize): Likewise. * tree-vect-slp.cc (vect_print_slp_tree): Likewise. (vect_gather_slp_loads): Likewise. (vect_is_slp_load_node): Likewise. (optimize_load_redistribution_1): Likewise. (vect_optimize_slp_pass::is_cfg_latch_edge): Likewise. (vect_optimize_slp_pass::internal_node_cost): Likewise. (vect_optimize_slp_pass::start_choosing_layouts): Likewise. (vect_optimize_slp_pass::backward_cost): Likewise. (vect_optimize_slp_pass::forward_pass): Likewise. (vect_optimize_slp_pass::get_result_with_layout): Likewise. (vect_optimize_slp_pass::materialize): Likewise. (vect_optimize_slp_pass::dump): Likewise. (vect_optimize_slp_pass::decide_masked_load_lanes): Likewise. (vect_update_slp_vf_for_node): Likewise. (vect_slp_analyze_node_operations_1): Likewise. (vect_schedule_slp_node): Likewise. (vect_schedule_scc): Likewise. * tree-vect-stmts.cc (vect_analyze_stmt): Likewise. (vect_transform_stmt): Likewise. (vect_is_simple_use): Likewise.
2025-08-13Fold GATHER_SCATTER_*_P into vect_memory_access_typeRichard Biener1-8/+13
The following splits up VMAT_GATHER_SCATTER into VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and VMAT_GATHER_SCATTER_EMULATED. The main motivation is to reduce the uses of (full) gs_info, but it also makes the kind representable by a single entry rather than the ifn and decl tristate. The strided load with gather case gets to use VMAT_GATHER_SCATTER_IFN, since that's what we end up checking. * tree-vectorizer.h (vect_memory_access_type): Replace VMAT_GATHER_SCATTER with three separate access types, VMAT_GATHER_SCATTER_LEGACY, VMAT_GATHER_SCATTER_IFN and VMAT_GATHER_SCATTER_EMULATED. (mat_gather_scatter_p): New predicate. (GATHER_SCATTER_LEGACY_P): Remove. (GATHER_SCATTER_IFN_P): Likewise. (GATHER_SCATTER_EMULATED_P): Likewise. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Adjust. (get_load_store_type): Likewise. (vect_get_loop_variant_data_ptr_increment): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Likewise. * config/riscv/riscv-vector-costs.cc (costs::need_additional_vector_vars_p): Likewise. * config/aarch64/aarch64.cc (aarch64_detect_vector_stmt_subtype): Likewise. (aarch64_vector_costs::count_ops): Likewise. (aarch64_vector_costs::add_stmt_cost): Likewise.
2025-08-13Simplify vect_supportable_dr_alignment APIRichard Biener1-1/+1
The gather_scatter_info pointer is only used as flag, so pass down a flag. * tree-vectorizer.h (vect_supportable_dr_alignment): Pass a bool instead of a pointer to gather_scatter_info. * tree-vect-data-refs.cc (vect_supportable_dr_alignment): Likewise. * tree-vect-stmts.cc (get_load_store_type): Adjust.
2025-08-07vect: Extend peeling and versioning for alignment to VLA modesPengfei Li1-1/+8
This patch extends the support for peeling and versioning for alignment from VLS modes to VLA modes. The key change is allowing the DR target alignment to be set to a non-constant poly_int. Since the value must be a power-of-two, for variable VFs, the power-of-two check is deferred to runtime through loop versioning. The vectorizable check for speculative loads is also refactored in this patch to handle both constant and variable target alignment values. Additional changes for VLA modes include: 1) Peeling In VLA modes, we use peeling with masking - using a partial vector in the first iteration of the vectorized loop to ensure aligned DRs in subsequent iterations. It was already enabled for VLS modes to avoid scalar peeling. This patch reuses most of the existing logic and just fixes a small issue of incorrect IV offset in VLA code path. This also removes a power-of-two rounding when computing the number of iterations to peel, as power-of-two VF has been guaranteed by a new runtime check. 2) Versioning The type of the mask for runtime alignment check is updated to poly_int to support variable VFs. After this change, both standalone versioning and peeling with versioning are available in VLA modes. This patch also introduces another runtime check for speculative read amount, to ensure that all speculative loads remain within current valid memory page. We plan to remove these runtime checks in the future by introducing capped VF - using partial vectors to limit the actual VF value at runtime. 3) Speculative read flag DRs whose scalar accesses are known to be in-bounds will be considered unaligned unsupported with a variable target alignment. But in fact, speculative reads can be naturally avoided for in-bounds DRs as long as partial vectors are used. Therefore, this patch clears the speculative flags and sets the "must use partial vectors" flag for these cases. This patch is bootstrapped and regression-tested on x86_64-linux-gnu, arm-linux-gnueabihf and aarch64-linux-gnu with bootstrap-O3. gcc/ChangeLog: * tree-vect-data-refs.cc (vect_compute_data_ref_alignment): Allow DR target alignment to be a poly_int. (vect_enhance_data_refs_alignment): Support peeling and versioning for VLA modes. * tree-vect-loop-manip.cc (get_misalign_in_elems): Remove power-of-two rounding in peeling. (vect_create_cond_for_align_checks): Update alignment check logic for poly_int mask. (vect_create_cond_for_vla_spec_read): New runtime checks. (vect_loop_versioning): Support new runtime checks. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Add a new loop_vinfo field. (vectorizable_induction): Fix wrong IV offset issue. * tree-vect-stmts.cc (get_load_store_type): Refactor vectorizable checks for speculative loads. * tree-vectorizer.h (LOOP_VINFO_MAX_SPEC_READ_AMOUNT): New macro for new runtime checks. (LOOP_REQUIRES_VERSIONING_FOR_SPEC_READ): Likewise (LOOP_REQUIRES_VERSIONING): Update macro for new runtime checks. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve/peel_ind_11.c: New test. * gcc.target/aarch64/sve/peel_ind_11_run.c: New test. * gcc.target/aarch64/sve/peel_ind_12.c: New test. * gcc.target/aarch64/sve/peel_ind_12_run.c: New test. * gcc.target/aarch64/sve/peel_ind_13.c: New test. * gcc.target/aarch64/sve/peel_ind_13_run.c: New test.
2025-08-06Record gather/scatter scale and base in the SLP treeRichard Biener1-0/+9
The main gather/scatter discovery happens at SLP discovery time, the base address and the offset scale are currently not explicitly represented in the SLP tree. This requires re-discovery of them during vectorizable_store/load. The following fixes this by recording this info into the SLP tree. This allows the main vect_check_gather_scatter call to be elided from get_load_store_type and replaced with target support checks for IFN/decl or fallback emulated mode. There's vect_check_gather_scatter left in the path using gather/scatter for strided load/store. I hope to deal with this later. * tree-vectorizer.h (_slp_tree::gs_scale): New. (_slp_tree::gs_base): Likewise. (SLP_TREE_GS_SCALE): Likewise. (SLP_TREE_GS_BASE): Likewise. (vect_describe_gather_scatter_call): Declare. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize new members. (vect_build_slp_tree_2): Record gather/scatter base and scale. (vect_get_and_check_slp_defs): For gather/scatter IFNs describe the call to first_gs_info. * tree-vect-data-refs.cc (vect_gather_scatter_fn_p): Add mode of operation with fixed offset vector type. (vect_describe_gather_scatter_call): Export. * tree-vect-stmts.cc (get_load_store_type): Do not call vect_check_gather_scatter to fill gs_info, instead populate from the SLP tree. Check which of, IFN, decl or fallback is supported and record that decision.
2025-08-05Remove hybrid SLP detectionRichard Biener1-1/+0
The following removes hybrid SLP detection - it existed as sanity check that all stmts are covered by SLP, but it proved itself incomplete at that. Its job is taken by early terminating SLP build when SLP discovery fails for one root and the hope that we now do catch all of them. * tree-vectorizer.h (vect_relevant::hybrid): Remove. * tree-vect-loop.cc (vect_analyze_loop_2): Do not call vect_detect_hybrid_slp. * tree-vect-slp.cc (maybe_push_to_hybrid_worklist): Remove. (vect_detect_hybrid_slp): Likewise.
2025-08-05tree-optimization/121395 - SLP of SIMD calls w/o LHSRichard Biener1-0/+5
The following records the alternate SLP instance entries coming from stmts with stores that have no SSA def, like OMP SIMD calls without LHS. There's a bit of fallout with having a SLP tree with a NULL vectype, but nothing too gross. PR tree-optimization/121395 * tree-vectorizer.h (_loop_vec_info::alternate_defs): New member. (LOOP_VINFO_ALTERNATE_DEFS): New. * tree-vect-stmts.cc (vect_stmt_relevant_p): Populate it. (vectorizable_simd_clone_call): Do not register a SLP def when there is none. * tree-vect-slp.cc (vect_build_slp_tree_1): Allow a NULL vectype when there's no LHS. Allow all calls w/o LHS. (vect_analyze_slp): Process LOOP_VINFO_ALTERNATE_DEFS as SLP graph entries. (vect_make_slp_decision): Handle a NULL SLP_TREE_VECTYPE. (vect_slp_analyze_node_operations_1): Likewise. (vect_schedule_slp_node): Likewise. * gcc.dg/vect/pr59984.c: Adjust.
2025-08-05Rename loop_vect SLP_TYPE and clarify docsRichard Biener1-18/+4
The following renames loop_vect to not_vect, removes the unused HYBRID_SLP_STMT macro and rewords the slp_vect_type docs to clarify STMT_SLP_TYPE is mainly used for BB vectorization, tracking what is vectorized and what not. * tree-vectorizer.h (enum slp_vect_type): Rename loop_vect to not_vect, clarify docs. (HYBRID_SLP_STMT): Remove. * tree-vectorizer.cc (vec_info::new_stmt_vec_info): Adjust. * tree-vect-loop.cc (vect_analyze_loop_2): Likewise.
2025-08-01Add VMAT_UNINITIALIZEDRichard Biener1-0/+2
We're using VMAT_INVARIANT as default, but we should simply have an uninitialized state. * tree-vectorizer.h (VMAT_UNINITIALIZED): New vect_memory_access_type. * tree-vect-slp.cc (_slp_tree::_slp_tree): Use it.
2025-08-01Put SLP_TREE_SIMD_CLONE_INFO into type specifc dataRichard Biener1-6/+13
The following adds vect_simd_clone_data as a container for vect type specific data for vectorizable_simd_clone_call and moves SLP_TREE_SIMD_CLONE_INFO there. * tree-vectorizer.h (vect_simd_clone_data): New. (_slp_tree::simd_clone_info): Remove. (SLP_TREE_SIMD_CLONE_INFO): Likewise. * tree-vect-slp.cc (_slp_tree::_slp_tree): Adjust. (_slp_tree::~_slp_tree): Likewise. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Use tyupe specific data to store SLP_TREE_SIMD_CLONE_INFO.
2025-08-01Use a class hierarchy for vect specific dataRichard Biener1-5/+10
The following turns the union into a class hierarchy. One completed SLP_TREE_TYPE could move into the base class. * tree-vect-slp.cc (_slp_tree::_slp_tree): Adjust. (_slp_tree::~_slp_tree): Likewise. * tree-vectorizer.h (vect_data): New base class. (_slp_tree::u): Remove. (_slp_tree::data): Add pointer to vect_data. (_slp_tree::get_data): New helper template.
2025-07-31Remove STMT_VINFO_MEMORY_ACCESS_TYPERichard Biener1-17/+0
This should be present only on SLP nodes now. The RISC-V changes are mechanical along the line of the SLP_TREE_TYPE changes. * tree-vectorizer.h (_stmt_vec_info::memory_access_type): Remove. (STMT_VINFO_MEMORY_ACCESS_TYPE): Likewise. (vect_mem_access_type): Likewise. * tree-vect-stmts.cc (vectorizable_store): Do not set STMT_VINFO_MEMORY_ACCESS_TYPE. Fix SLP_TREE_MEMORY_ACCESS_TYPE usage. * tree-vect-loop.cc (update_epilogue_loop_vinfo): Remove checking of memory access type. * config/riscv/riscv-vector-costs.cc (costs::compute_local_live_ranges): Use SLP_TREE_MEMORY_ACCESS_TYPE. (costs::need_additional_vector_vars_p): Likewise. (segment_loadstore_group_size): Get SLP node as argument, use SLP_TREE_MEMORY_ACCESS_TYPE. (costs::adjust_stmt_cost): Pass down SLP node. * config/aarch64/aarch64.cc (aarch64_ld234_st234_vectors): Use SLP_TREE_MEMORY_ACCESS_TYPE instead of vect_mem_access_type. (aarch64_detect_vector_stmt_subtype): Likewise. (aarch64_vector_costs::count_ops): Likewise. (aarch64_vector_costs::add_stmt_cost): Likewise.
2025-07-31Avoid passing vectype != NULL when costing scalar ILRichard Biener1-0/+8
The following makes sure to not leak a set vectype on a stmt when doing scalar IL costing as this can confuse vector cost models which do not look at m_costing_for_scalar most of the time. * tree-vectorizer.h (vector_costs::costing_for_scalar): New accessor. (add_stmt_cost): For scalar costing force vectype to NULL. Verify we do not pass in a SLP node.
2025-07-30vect: Add missing skip-vector check for peeling with versioning [PR121020]Pengfei Li1-0/+4
This fixes a miscompilation issue introduced by the enablement of combined loop peeling and versioning. A test case that reproduces the issue is included in the patch. When performing loop peeling, GCC usually inserts a skip-vector check. This ensures that after peeling, there are enough remaining iterations to enter the main vectorized loop. Previously, the check was omitted if loop versioning for alignment was applied. It was safe before because versioning and peeling for alignment were mutually exclusive. However, with combined peeling and versioning enabled, this is not safe any more. A loop may be peeled and versioned at the same time. Without the skip-vector check, the main vectorized loop can be entered even if its iteration count is zero. This can cause the loop running many more iterations than needed, resulting in incorrect results. To fix this, the patch updates the condition of omitting the skip-vector check to when versioning is performed alone without peeling. gcc/ChangeLog: PR tree-optimization/121020 * tree-vect-loop-manip.cc (vect_do_peeling): Update the condition of omitting the skip-vector check. * tree-vectorizer.h (LOOP_VINFO_USE_VERSIONING_WITHOUT_PEELING): Add a helper macro. gcc/testsuite/ChangeLog: PR tree-optimization/121020 * gcc.dg/vect/vect-early-break_138-pr121020.c: New test.
2025-07-29Eliminate gather-scatter-info offset_dt memberRichard Biener1-3/+0
The following removes this only set member. Sligthly complicated by the hoops get_group_load_store_type jumps through. I've simplified that, noting the offset vector type that's relevant is that of the actual offset SLP node, not of what vect_check_gather_scatter (re-)computes. * tree-vectorizer.h (gather_scatter_info::offset_dt): Remove. * tree-vect-data-refs.cc (vect_describe_gather_scatter_call): Do not set it. (vect_check_gather_scatter): Likewise. * tree-vect-stmts.cc (vect_truncate_gather_scatter_offset): Likewise. (get_group_load_store_type): Use the vector type of the offset SLP child. Do not re-check vect_is_simple_use validated by SLP build.
2025-07-28Move STMT_VINFO_TYPE to SLP_TREE_TYPERichard Biener1-29/+34
I am at a point where I want to store additional information from analysis (from loads and stores) to re-use them at transform stage without repeating the analysis. I do not want to add to stmt_vec_info at this point, so this starts adding kind specific sub-structures by moving the STMT_VINFO_TYPE field to the SLP tree and adding a (dummy for now) union tagged by it to receive such data. The change is largely mechanical after RISC-V has been prepared to have a SLP node around. I have settled for a union (supposed to get pointers to data). As followup this enables getting rid of SLP_TREE_CODE and making VEC_PERM therein a separate type, unifying its handling. * tree-vectorizer.h (_slp_tree::type): Add. (_slp_tree::u): Likewise. (_stmt_vec_info::type): Remove. (STMT_VINFO_TYPE): Likewise. (SLP_TREE_TYPE): New. * tree-vectorizer.cc (vec_info::new_stmt_vec_info): Do not initialize type. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize type. (vect_slp_analyze_node_operations): Adjust. (vect_schedule_slp_node): Likewise. * tree-vect-patterns.cc (vect_init_pattern_stmt): Do not copy STMT_VINFO_TYPE. * tree-vect-loop.cc: Set SLP_TREE_TYPE instead of STMT_VINFO_TYPE everywhere. (vect_create_loop_vinfo): Do not set STMT_VINFO_TYPE on loop conditions. * tree-vect-stmts.cc: Set SLP_TREE_TYPE instead of STMT_VINFO_TYPE everywhere. (vect_analyze_stmt): Adjust. (vect_transform_stmt): Likewise. * config/aarch64/aarch64.cc (aarch64_vector_costs::count_ops): Access SLP_TREE_TYPE instead of STMT_VINFO_TYPE. * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost): Remove non-SLP element-wise load/store matching. * config/rs6000/rs6000.cc (rs6000_cost_data::update_target_cost_per_stmt): Pass in the SLP node. Use that to get at the memory access kind and type. (rs6000_cost_data::add_stmt_cost): Pass down SLP node. * config/riscv/riscv-vector-costs.cc (variable_vectorized_p): Use SLP_TREE_TYPE. (costs::need_additional_vector_vars_p): Likewise. (costs::update_local_live_ranges): Likewise.
2025-07-25Remove now redundant vect_get_vec_defs overloadRichard Biener1-6/+1
The following removes the vect_get_vec_defs overload receiving a vector type to be used for the possibly constant/invariant operand. This was used for non-SLP code generation as there constants/invariants are generated on the fly. It also elides the stmt_vec_info and ncopies argument which are not required for SLP. * tree-vectorizer.h (vect_get_vec_defs): Remove overload with operand vector type. Remove stmt_vec_info and ncopies argument. * tree-vect-stmts.cc (vect_get_vec_defs): Likewise. (vectorizable_conversion): Adjust by not passing in vector types, stmt_vec_info and ncopies. (vectorizable_bswap): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_scan_store): Likewise. (vectorizable_store): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison_1): Likewise. * tree-vect-loop.cc (vect_transform_reduction): Likewise. (vect_transform_lc_phi): Likewise.
2025-07-25Tidy vect_is_simple_use API for SLP onlyRichard Biener1-4/+1
The following removes one vect_is_simple_use overload that shouldn't be used anymore after removing the single remaining use related to gather handling in get_group_load_store_type. It also removes the dual-purpose of the overload getting both SLP node and stmt_vec_info and removes the latter argument. That leaves us with a SLP overload handling vector code and the stmt_info overload handling scalar code. In theory the former is only convenience and it should never fail given SLP build checks the constraint already, but there's the 'op' argument we have to get rid of first. * tree-vectorizer.h (vect_is_simple_use): Remove stmt-info with vectype output overload and remove stmt-info argument from SLP based API. * tree-vect-loop.cc (vectorizable_lane_reducing): Remove unused def_stmt_info output argument to vect_is_simple_use. Adjust. * tree-vect-stmts.cc (get_group_load_store_type): Get the gather/scatter offset vector type from the SLP child. (vect_check_scalar_mask): Remove stmt_info argument. Adjust. (vect_check_store_rhs): Likewise. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_load): Likewise. (vect_is_simple_cond): Remove stmt_info argument. Adjust. (vectorizable_condition): Likewise. (vectorizable_comparison_1): Likewise. (vectorizable_store): Likewise. (vect_is_simple_use): Remove overload and non-SLP path.
2025-07-25Remove STMT_VINFO_VEC_STMTSRichard Biener1-4/+0
The following removes the last uses of STMT_VINFO_VEC_STMTS and the vector itself. Vector stmts are recorded in SLP nodes now. The last use is a bit strange - it was introduced by Richard S. in r8-6064-ga57776a1136962 and affects only power7 and below (the re-align optimized load path). The check should have never been true since vect_vfa_access_size is only ever invoked before stmt transform. I have done the "conservative" change of making it always true now (so the code is now entered). I can as well remove it, but I wonder if you remember anything about this ... * tree-vectorizer.h (_stmt_vec_info::vec_stmts): Remove. (STMT_VINFO_VEC_STMTS): Likewise. * tree-vectorizer.cc (vec_info::new_stmt_vec_info): Do not initialize it. (vec_info::free_stmt_vec_info): Nor free it. * tree-vect-data-refs.cc (vect_vfa_access_size): Remove check on STMT_VINFO_VEC_STMTS.
2025-07-25Remove load interleaving codeRichard Biener1-4/+0
The following removes the non-SLP load interleaving code which was almost unused. * tree-vectorizer.h (vect_transform_grouped_load): Remove. (vect_record_grouped_load_vectors): Likewise. * tree-vect-data-refs.cc (vect_permute_load_chain): Likewise. (vect_shift_permute_load_chain): Likewise. (vect_transform_grouped_load): Likewise. (vect_record_grouped_load_vectors): Likewise. * tree-vect-stmts.cc (vectorizable_load): Remove comments about load interleaving.
2025-07-25Remove store interleaving supportRichard Biener1-3/+0
The following removes the non-SLP store interleaving support which was already almost unused. * tree-vectorizer.h (vect_permute_store_chain): Remove. * tree-vect-data-refs.cc (vect_permute_store_chain): Likewise. * tree-vect-stmts.cc (vectorizable_store): Remove comment about store interleaving.
2025-07-25Remove vect_get_vec_defs_for_operandRichard Biener1-2/+0
This removes vect_get_vec_defs_for_operand and its remaining uses. It also removes some remaining non-SLP paths in preparation to elide STMT_VINFO_VEC_STMTS. * tree-vectorizer.h (vect_get_vec_defs_for_operand): Remove. * tree-vect-stmts.cc (vect_get_vec_defs_for_operand): Likewise. (vect_get_vec_defs): Remove non-SLP path. (check_load_store_for_partial_vectors): We always have an SLP node. (vect_check_store_rhs): Likewise. (vect_get_gather_scatter_ops): Likewise. (vect_create_vectorized_demotion_stmts): Likewise. (vectorizable_store): Adjust. (vectorizable_load): Likewise.
2025-07-25Remove VMAT_CONTIGUOUS_PERMUTERichard Biener1-5/+0
This VMAT was used for interleaving which was non-SLP only. The following removes code gated by it (code selecting it is already gone). * tree-vectorizer.h (VMAT_CONTIGUOUS_PERMUTE): Remove. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Remove checks on VMAT_CONTIGUOUS_PERMUTE. (vectorizable_load): Likewise. (vectorizable_store): Likewise. Prune dead code.
2025-07-24Remove vec_stmt from vectorizable_* APIRichard Biener1-5/+4
The following removes the non-SLP gimple **vec_stmt argument from the vectorizable_* functions API. Checks on it can be replaced by an inverted check on the passed cost_vec vector pointer. * tree-vectorizer.h (vectorizable_induction): Remove gimple **vec_stmt argument. (vectorizable_phi): Likewise. (vectorizable_recurr): Likewise. (vectorizable_early_exit): Likewise. * tree-vect-loop.cc (vectorizable_phi): Likewise and adjust. (vectorizable_recurr): Likewise. (vectorizable_nonlinear_induction): Likewise. (vectorizable_induction): Likewise. * tree-vect-stmts.cc (vectorizable_bswap): Likewise. (vectorizable_call): Likewise. (vectorizable_simd_clone_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_store): Likewise. (vectorizable_load): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison_1): Likewise. (vectorizable_comparison): Likewise. (vectorizable_early_exit): Likewise. (vect_analyze_stmt): Adjust. (vect_transform_stmt): Likewise. * tree-vect-slp.cc (vect_slp_analyze_operations): Adjust. (vectorize_slp_instance_root_stmt): Likewise.
2025-07-24Remove non-SLP path from vectorizable_simd_clone_callRichard Biener1-6/+0
This removes the non-SLP path from vectorizable_simd_clone_call and the then unused simd_clone_info from the stmt_vec_info structure. * tree-vectorizer.h (_stmt_vec_info::simd_clone_info): Remove. (STMT_VINFO_SIMD_CLONE_INFO): Likewise. * tree-vectorizer.cc (vec_info::free_stmt_vec_info): Do not release it. * tree-vect-stmts.cc (vectorizable_simd_clone_call): Remove non-SLP path.
2025-07-24vect: Misalign checks for gather/scatter.Robin Dapp1-1/+6
This patch adds simple misalignment checks for gather/scatter operations. Previously, we assumed that those perform element accesses internally so alignment does not matter. The riscv vector spec however explicitly states that vector operations are allowed to fault on element-misaligned accesses. Reasonable uarchs won't, but... For gather/scatter we have two paths in the vectorizer: (1) Regular analysis based on datarefs. Here we can also create strided loads. (2) Non-affine access where each gather index is relative to the initial address. The assumption this patch works on is that once the alignment for the first scalar is correct, all others will fall in line, as the index is always a multiple of the first element's size. For (1) we have a dataref and can check it for alignment as in other cases. For (2) this patch checks the object alignment of BASE and compares it against the natural alignment of the current vectype's unit. The patch also adds a pointer argument to the gather/scatter IFNs that contains the necessary alignment. Most of the patch is thus mechanical in that it merely adjusts indices. I tested the riscv version with a custom qemu version that faults on element-misaligned vector accesses. With this patch applied, there is just a single fault left, which is due to PR120782 and which will be addressed separately. Bootstrapped and regtested on x86 and aarch64. Regtested on rv64gcv_zvl512b with and without unaligned vector support. gcc/ChangeLog: * internal-fn.cc (internal_fn_len_index): Adjust indices for new alias_ptr param. (internal_fn_else_index): Ditto. (internal_fn_mask_index): Ditto. (internal_fn_stored_value_index): Ditto. (internal_fn_alias_ptr_index): Ditto. (internal_fn_offset_index): Ditto. (internal_fn_scale_index): Ditto. (internal_gather_scatter_fn_supported_p): Ditto. * internal-fn.h (internal_fn_alias_ptr_index): Ditto. * optabs-query.cc (supports_vec_gather_load_p): Ditto. * tree-vect-data-refs.cc (vect_check_gather_scatter): Add alias pointer. * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Add alias pointer. * tree-vect-slp.cc (vect_get_operand_map): Adjust for alias pointer. * tree-vect-stmts.cc (vect_truncate_gather_scatter_offset): Add alias pointer and misalignment handling. (get_load_store_type): Move from here... (get_group_load_store_type): ...To here. (vectorizable_store): Add alias pointer. (vectorizable_load): Ditto. * tree-vectorizer.h (struct gather_scatter_info): Ditto.
2025-07-24vect: Add helper macros for gather/scatter.Robin Dapp1-0/+8
This encapsulates the IFN and the builtin-function way of handling gather/scatter via three defines: GATHER_SCATTER_IFN_P GATHER_SCATTER_LEGACY_P GATHER_SCATTER_EMULATED_P and introduces a helper define for SLP operand handling as well. gcc/ChangeLog: * tree-vect-slp.cc (GATHER_SCATTER_OFFSET): New define. (vect_get_and_check_slp_defs): Use. * tree-vectorizer.h (GATHER_SCATTER_LEGACY_P): New define. (GATHER_SCATTER_IFN_P): Ditto. (GATHER_SCATTER_EMULATED_P): Ditto. * tree-vect-stmts.cc (vectorizable_store): Use. (vectorizable_load): Use.
2025-07-21Remove bougs minimum VF computeRichard Biener1-1/+1
The following removes the minimum VF compute from dataref analysis which does not take into account SLP at all, leaving the testcase vectorized with V2SImode instead of V4SImode on x86. With SLP the only minimum VF we can compute this early is 1. * tree-vectorizer.h (vect_analyze_data_refs): Remove min_vf output. * tree-vect-data-refs.cc (vect_analyze_data_refs): Likewise. * tree-vect-loop.cc (vect_analyze_loop_2): Remove early out based on bogus min_vf. * tree-vect-slp.cc (vect_slp_analyze_bb_1): Adjust. * gcc.dg/vect/vect-127.c: New testcase.
2025-07-10Remove dead code dealing with non-SLPRichard Biener1-2/+1
After vect_analyze_loop_operations is gone we can clean up vect_analyze_stmt as it is no longer called out of SLP context. * tree-vectorizer.h (vect_analyze_stmt): Remove stmt-info and need_to_vectorize arguments. * tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Adjust. * tree-vect-stmts.cc (can_vectorize_live_stmts): Remove stmt_info argument and remove non-SLP path. (vect_analyze_stmt): Remove stmt_info and need_to_vectorize argument and prune paths no longer reachable. (vect_transform_stmt): Adjust.
2025-07-10Remove non-SLP vectorization factor determiningRichard Biener1-1/+1
The following removes the VF determining step from non-SLP stmts. For now we keep setting STMT_VINFO_VECTYPE for all stmts, there are too many places to fix, including some more complicated ones, so this is defered for a followup. Along this removes vect_update_vf_for_slp, merging the check for present hybrid SLP stmts to vect_detect_hybrid_slp and fail analysis early. This also removes to essentially duplicate this check in the stmt walk of vect_analyze_loop_operations. Getting rid of that, and performing some other checks earlier is also defered to a followup. * tree-vect-loop.cc (vect_determine_vf_for_stmt_1): Rename to ... (vect_determine_vectype_for_stmt_1): ... this and only set STMT_VINFO_VECTYPE. Fail for single-element vector types. (vect_determine_vf_for_stmt): Rename to ... (vect_determine_vectype_for_stmt): ... this and only set STMT_VINFO_VECTYPE. Fail for single-element vector types. (vect_determine_vectorization_factor): Rename to ... (vect_set_stmts_vectype): ... this and only set STMT_VINFO_VECTYPE. (vect_update_vf_for_slp): Remove. (vect_analyze_loop_operations): Remove walk over stmts. (vect_analyze_loop_2): Call vect_set_stmts_vectype instead of vect_determine_vectorization_factor. Set vectorization factor from LOOP_VINFO_SLP_UNROLLING_FACTOR. Fail if vect_detect_hybrid_slp detects hybrid stmts or when vect_make_slp_decision finds nothing to SLP. * tree-vect-slp.cc (vect_detect_hybrid_slp): Move check whether we have any hybrid stmts here from vect_update_vf_for_slp * tree-vect-stmts.cc (vect_analyze_stmt): Remove loop over stmts. * tree-vectorizer.h (vect_detect_hybrid_slp): Update.
2025-07-08Allow the target to request a masked vector epilogueRichard Biener1-3/+10
Targets recently got the ability to request the vector mode to be used for a vector epilogue (or the epilogue of a vector epilogue). The following adds the ability for it to indicate the epilogue should use loop masking, irrespective of the --param vect-partial-vector-usage default setting. The patch below uses a separate flag from the epilogue mode, not addressing the issue that on x86 the vector_modes mode iteration hook would not allow for both masked and unmasked variants to be tried and costed given this doesn't naturally map to modes on that target. That's left for a future exercise - turning on cost comparison for the x86 backend would be a prerequesite there. * tree-vectorizer.h (vector_costs::suggested_epilogue_mode): Add masked output parameter and return m_masked_epilogue. (vector_costs::m_masked_epilogue): New tristate flag. (vector_costs::vector_costs): Initialize m_masked_epilogue. * tree-vect-loop.cc (vect_analyze_loop_1): Pass in masked flag to optionally initialize can_use_partial_vectors_p. (vect_analyze_loop): For epilogues also get whether to use a masked epilogue for this loop from the target and use that for the first epilogue mode we try.
2025-06-27Fixup vector epilog analysis skipping when not using partial vectorsRichard Biener1-0/+1
The following avoids re-analyzing the loop as epilogue when not using partial vectors and the mode is the same as the autodetected vector mode and that has a too high VF for a non-predicated loop. This situation occurs almost always on x86 and saves us one re-analysis unless --param vect-partial-vector-usage is non-default. * tree-vectorizer.h (vect_chooses_same_modes_p): New overload. * tree-vect-stmts.cc (vect_chooses_same_modes_p): Likewise. * tree-vect-loop.cc (vect_analyze_loop): Prune epilogue analysis further when not using partial vectors.
2025-06-25tree-optimization/120808 - SLP build with mixed .FMA/.FMSRichard Biener1-1/+1
The following allows SLP build to succeed when mixing .FMA/.FMS in different lanes like we handle mixed plus/minus. This does not yet address SLP pattern matching to not being able to form a FMADDSUB from this. PR tree-optimization/120808 * tree-vectorizer.h (compatible_calls_p): Add flag to indicate a FMA/FMS pair is allowed. * tree-vect-slp.cc (compatible_calls_p): Likewise. (vect_build_slp_tree_1): Allow mixed .FMA/.FMS as two-operator. (vect_build_slp_tree_2): Handle calls in two-operator SLP build. * tree-vect-slp-patterns.cc (compatible_complex_nodes_p): Adjust. * gcc.dg/vect/bb-slp-pr120808.c: New testcase.
2025-06-24middle-end: Apply loop->unroll directly in vectorizerTamar Christina1-0/+5
Consider the loop void f1 (int *restrict a, int n) { #pragma GCC unroll 4 requested for (int i = 0; i < n; i++) a[i] *= 2; } Which today is vectorized and then unrolled 3x by the RTL unroller due to the use of the pragma. This is unfortunate because the pragma was intended for the scalar loop but we end up with an unrolled vector loop and a longer path to the entry which has a low enough VF requirement to enter. This patch instead seeds the suggested_unroll_factor with the value the user requested and instead uses it to maintain the total VF that the user wanted the scalar loop to maintain. In effect it applies the unrolling inside the vector loop itself. This has the benefits for things like reductions, as it allows us to split the accumulator and so the unrolled loop is more efficient. For early-break it allows the cbranch call to be shared between the unrolled elements, giving you more effective unrolling because it doesn't need the repeated cbranch which can be expensive. The target can then choose to create multiple epilogues to deal with the "rest". The example above now generates: .L4: ldr q31, [x2] add v31.4s, v31.4s, v31.4s str q31, [x2], 16 cmp x2, x3 bne .L4 as V4SI maintains the requested VF, but e.g. pragma unroll 8 generates: .L4: ldp q30, q31, [x2] add v30.4s, v30.4s, v30.4s add v31.4s, v31.4s, v31.4s stp q30, q31, [x2], 32 cmp x3, x2 bne .L4 gcc/ChangeLog: * doc/extend.texi: Document pragma unroll interaction with vectorizer. * tree-vectorizer.h (LOOP_VINFO_USER_UNROLL): New. (class _loop_vec_info): Add user_unroll. * tree-vect-loop.cc (vect_analyze_loop_1): Set suggested_unroll_factor and retry. (_loop_vec_info::_loop_vec_info): Initialize user_unroll. (vect_transform_loop): Clear the loop->unroll value if the pragma was used. gcc/testsuite/ChangeLog: * gcc.target/aarch64/unroll-vect.c: New test.
2025-06-23vect: Use combined peeling and versioning for mutually aligned DRsPengfei Li1-0/+16
Current GCC uses either peeling or versioning, but not in combination, to handle unaligned data references (DRs) during vectorization. This limitation causes some loops with early break to fall back to scalar code at runtime. Consider the following loop with DRs in its early break condition: for (int i = start; i < end; i++) { if (a[i] == b[i]) break; count++; } In the loop, references to a[] and b[] need to be strictly aligned for vectorization because speculative reads that may cross page boundaries are not allowed. Current GCC does versioning for this loop by creating a runtime check like: ((&a[start] | &b[start]) & mask) == 0 to see if two initial addresses both have lower bits zeros. If above runtime check fails, the loop will fall back to scalar code. However, it's often possible that DRs are all unaligned at the beginning but they become all aligned after a few loop iterations. We call this situation DRs being "mutually aligned". This patch enables combined peeling and versioning to avoid loops with mutually aligned DRs falling back to scalar code. Specifically, the function vect_peeling_supportable is updated in this patch to return a three-state enum indicating how peeling can make all unsupportable DRs aligned. In addition to previous true/false return values, a new state peeling_maybe_supported is used to indicate that peeling may be able to make these DRs aligned but we are not sure about it at compile time. In this case, peeling should be combined with versioning so that a runtime check will be generated to guard the peeled vectorized loop. A new type of runtime check is also introduced for combined peeling and versioning. It's enabled when LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT is true. The new check tests if all DRs recorded in LOOP_VINFO_MAY_MISALIGN_STMTS have the same lower address bits. For above loop case, the new test will generate an XOR between two addresses, like: ((&a[start] ^ &b[start]) & mask) == 0 Therefore, if a and b have the same alignment step (element size) and the same offset from an alignment boundary, a peeled vectorized loop will run. This new runtime check also works for >2 DRs, with the LHS expression being: ((a1 ^ a2) | (a2 ^ a3) | (a3 ^ a4) | ... | (an-1 ^ an)) & mask where ai is the address of i'th DR. This patch is bootstrapped and regression tested on x86_64-linux-gnu, arm-linux-gnueabihf and aarch64-linux-gnu. gcc/ChangeLog: * tree-vect-data-refs.cc (vect_peeling_supportable): Return new enum values to indicate if combined peeling and versioning can potentially support vectorization. (vect_enhance_data_refs_alignment): Support combined peeling and versioning in vectorization analysis. * tree-vect-loop-manip.cc (vect_create_cond_for_align_checks): Add a new type of runtime check for mutually aligned DRs. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Set default value of allow_mutual_alignment in the initializer list. * tree-vectorizer.h (enum peeling_support): Define type of peeling support for function vect_peeling_supportable. (LOOP_VINFO_ALLOW_MUTUAL_ALIGNMENT): New access macro. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-early-break_133_pfa6.c: Adjust test.
2025-05-14Remove the mixed stmt_vec_info/SLP node record_stmt_cost overloadRichard Biener1-13/+0
The following changes the record_stmt_cost calls in vectorizable_load/store to only pass the SLP node when costing vector stmts. For now we'll still pass the stmt_vec_info, determined from SLP_TREE_REPRESENTATIVE, so this merely cleans up the API. * tree-vectorizer.h (record_stmt_cost): Remove mixed stmt_vec_info/SLP node inline overload. * tree-vect-stmts.cc (vectorizable_store): For costing vector stmts only pass SLP node to record_stmt_cost. (vectorizable_load): Likewise.
2025-05-14This transitions vect_model_simple_cost to SLP onlyRichard Biener1-0/+11
As part of the vector cost API cleanup this transitions vect_model_simple_cost to only record costs with SLP node. For this to work the patch adds an overload to record_stmt_cost only passing in the SLP node. The vect_prologue_cost_for_slp adjustment is one spot that needs an eye with regard to re-doing the whole thing. * tree-vectorizer.h (record_stmt_cost): Add overload with only SLP node and no vector type. * tree-vect-stmts.cc (record_stmt_cost): Use SLP_TREE_REPRESENTATIVE for stmt_vec_info. (vect_model_simple_cost): Do not get stmt_vec_info argument and adjust. (vectorizable_call): Adjust. (vectorizable_simd_clone_call): Likewise. (vectorizable_conversion): Likewise. (vectorizable_assignment): Likewise. (vectorizable_shift): Likewise. (vectorizable_operation): Likewise. (vectorizable_condition): Likewise. (vectorizable_comparison_1): Likewise. * tree-vect-slp.cc (vect_prologue_cost_for_slp): Use full-blown record_stmt_cost.
2025-05-08vect: Remove non-SLP path from vectorizable_reductionAndre Vieira1-4/+3
Fold slp_node to TRUE and clean-up vectorizable_reduction and related functions. Also split up vectorizable_lc_phi and create vect_transform_lc_phi. gcc/ChangeLog: * tree-vect-loop.cc (get_initial_def_for_reduction): Remove. (vect-create_epilog_for_reduction): Remove non-SLP path. (vectorize_fold_left_reduction): Likewise. (vectorizable_lane_reducing): Likewise. (vectorizable_reduction): Likewise. (vect_transform_reduction): Likewise. (vect_transform_cycle_phi): Likewise. (vectorizable_lc_phi): Remove non-SLP PATH and split into... (vect_transform_lc_phi): ... this. (update_epilogue_loop_vinfo): Update comment. * tree-vect-stmts.cc (vect_analyze_stmt): Update call to vectorizable_lc_phi. (vect_transform_stmt): Update calls to vect_transform_reduction and vect_transform_cycle_phi. Rename call from vectorizable_lc_phi to vect_transform_lc_phi. * tree-vectorizer.h (vect_transform_reduction): Update declaration. (vect_transform_cycle_phi): Likewise. (vectorizable_lc_phi): Likewise. (vect_transform_lc_phi): New.
2025-05-06tree-optimization/1157777 - STLF fails with BB vectorization of loopRichard Biener1-0/+3
The following tries to address us BB vectorizing a loop body that swaps consecutive elements of an array like for bubble-sort. This causes the vector store in the previous iteration to fail to forward to the vector load in the current iteration since there's a partial overlap. We try to detect this situation by looking for a load to store data dependence and analyze this with respect to the containing loop for a proven problematic access. Currently the search for a problematic pair is limited to loads and stores in the same SLP instance which means the problematic load happens in the next loop iteration and larger dependence distances are not considered. On x86 with generic costing this avoids vectorizing the loop body, but once you do core-specific tuning the saved cost for the vector store vs. the scalar stores makes vectorization still profitable, but at least the STLF issue is avoided. For example on my Zen4 machine with -O2 -march=znver4 the testcase in the PR is improving from insertion_sort => 2327 to insertion_sort => 997 but plain -O2 (or -fno-tree-slp-vectorize) gives insertion_sort => 183 In the end a better target-side cost model for small vector vectorization is needed to reject this vectorization from this side. I'll note this is a machine independent heuristic (similar to the avoid-store-forwarding RTL optimization pass), I expect that uarchs implementing vectors will suffer from this kind of issue. I know some aarch64 uarchs can forward from upper/lower part stores, this isn't considered at the moment. The actual vector size/overlap distance check could be moved to a target hook if it turns out necessary. There might be the chance to use a smaller vector size for the loads avoiding the penalty rather than falling back to elementwise accesses, that's not implemented either. PR tree-optimization/1157777 * tree-vectorizer.h (_slp_tree::avoid_stlf_fail): New member. * tree-vect-slp.cc (_slp_tree::_slp_tree): Initialize it. (vect_print_slp_tree): Dump it. * tree-vect-data-refs.cc (vect_slp_analyze_instance_dependence): For dataflow dependent loads of a store check whether there's a cross-iteration data dependence that for sure prohibits store-to-load forwarding and mark involved loads. * tree-vect-stmts.cc (get_group_load_store_type): For avoid_stlf_fail marked loads use VMAT_ELEMENTWISE. * gcc.dg/vect/bb-slp-pr115777.c: New testcase.
2025-04-30vectorizer: Fix riscv build [PR120042]Andrew Pinski1-0/+1
r15-9859-ga6cfde60d8c added a call to dominated_by_p to tree-vectorizer.h but dominance.h is not always included; and you get a build failure on riscv building riscv-vector-costs.cc. Let's add the include of dominance.h to tree-vectorizer.h Pushed as obvious after builds for riscv and x86_64. gcc/ChangeLog: PR target/120042 * tree-vectorizer.h: Include dominance.h. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
2025-04-30tree-optimization/119960 - fix and guard get_later_stmtRichard Biener1-3/+17
The following makes get_later_stmt handle stmts from different basic-blocks in the case they are orderd and otherwise asserts. * tree-vectorizer.h (get_later_stmt): Robustify against stmts in different BBs, assert when they are unordered.
2025-04-16middle-end: Fix incorrect codegen with PFA and VLS [PR119351]Tamar Christina1-1/+17
The following example: #define N 512 #define START 2 #define END 505 int x[N] __attribute__((aligned(32))); int __attribute__((noipa)) foo (void) { for (signed int i = START; i < END; ++i) { if (x[i] == 0) return i; } return -1; } generates incorrect code with fixed length SVE because for early break we need to know which value to start the scalar loop with if we take an early exit. Historically this means that we take the first element of every induction. this is because there's an assumption in place, that even with masked loops the masks come from a whilel* instruction. As such we reduce using a BIT_FIELD_REF <, 0>. When PFA was added this assumption was correct for non-masked loop, however we assumed that PFA for VLA wouldn't work for now, and disabled it using the alignment requirement checks. We also expected VLS to PFA using scalar loops. However as this PR shows, for VLS the vectorizer can, and does in some circumstances choose to peel using masks by masking the first iteration of the loop with an additional alignment mask. When this is done, the first elements of the predicate can be inactive. In this example element 1 is inactive based on the calculated misalignment. hence the -1 value in the first vector IV element. When we reduce using BIT_FIELD_REF we get the wrong value. This patch updates it by creating a new scalar PHI that keeps track of whether we are the first iteration of the loop (with the additional masking) or whether we have taken a loop iteration already. The generated sequence: pre-header: bb1: i_1 = <number of leading inactive elements> header: bb2: i_2 = PHI <i_1(bb1), 0(latch)> … early-exit: bb3: i_3 = iv_step * i_2 + PHI<vector-iv> Which eliminates the need to do an expensive mask based reduction. This fixes gromacs with one OpenMP thread. But with > 1 there is still an issue. gcc/ChangeLog: PR tree-optimization/119351 * tree-vectorizer.h (LOOP_VINFO_MASK_NITERS_PFA_OFFSET, LOOP_VINFO_NON_LINEAR_IV): New. (class _loop_vec_info): Add mask_skip_niters_pfa_offset and nonlinear_iv. * tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize them. (vect_analyze_scalar_cycles_1): Record non-linear inductions. (vectorizable_induction): If early break and PFA using masking create a new phi which tracks where the scalar code needs to start... (vectorizable_live_operation): ...and generate the adjustments here. (vect_use_loop_mask_for_alignment_p): Reject non-linear inductions and early break needing peeling. gcc/testsuite/ChangeLog: PR tree-optimization/119351 * gcc.target/aarch64/sve/peel_ind_10.c: New test. * gcc.target/aarch64/sve/peel_ind_10_run.c: New test. * gcc.target/aarch64/sve/peel_ind_5.c: New test. * gcc.target/aarch64/sve/peel_ind_5_run.c: New test. * gcc.target/aarch64/sve/peel_ind_6.c: New test. * gcc.target/aarch64/sve/peel_ind_6_run.c: New test. * gcc.target/aarch64/sve/peel_ind_7.c: New test. * gcc.target/aarch64/sve/peel_ind_7_run.c: New test. * gcc.target/aarch64/sve/peel_ind_8.c: New test. * gcc.target/aarch64/sve/peel_ind_8_run.c: New test. * gcc.target/aarch64/sve/peel_ind_9.c: New test. * gcc.target/aarch64/sve/peel_ind_9_run.c: New test.
2025-03-07middle-end: delay checking for alignment to load [PR118464]Tamar Christina1-1/+34
This fixes two PRs on Early break vectorization by delaying the safety checks to vectorizable_load when the VF, VMAT and vectype are all known. This patch does add two new restrictions: 1. On LOAD_LANES targets, where the buffer size is known, we reject non-power of two group sizes, as they are unaligned every other iteration and so may cross a page unwittingly. For those cases require partial masking support. 2. On LOAD_LANES targets when the buffer is unknown, we reject vectorization if we cannot peel for alignment, as the alignment requirement is quite large at GROUP_SIZE * vectype_size. This is unlikely to ever be beneficial so we don't support it for now. There are other steps documented inside the code itself so that the reasoning is next to the code. As a fall-back, when the alignment fails we require partial vector support. For VLA targets like SVE return element alignment as the desired vector alignment. This means that the loads are never misaligned and so annoying it won't ever need to peel. So what I think needs to happen in GCC 16 is that. 1. during vect_compute_data_ref_alignment we need to take the max of POLY_VALUE_MIN and vector_alignment. 2. vect_do_peeling define skip_vector when PFA for VLA, and in the guard add a check that ncopies * vectype does not exceed POLY_VALUE_MAX which we use as a proxy for pagesize. 3. Force LOOP_VINFO_USING_PARTIAL_VECTORS_P to be true in vect_determine_partial_vectors_and_peeling since the first iteration has to be partial. Require LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P otherwise we have to fail to vectorize. 4. Create a default mask to be used, so that vect_use_loop_mask_for_alignment_p becomes true and we generate the peeled check through loop control for partial loops. From what I can tell this won't work for LOOP_VINFO_FULLY_WITH_LENGTH_P since they don't have any peeling support at all in the compiler. That would need to be done independently from the above. In any case, not GCC 15 material so I've kept the WIP patches I have downstream. Bootstrapped Regtested on aarch64-none-linux-gnu, arm-none-linux-gnueabihf, x86_64-pc-linux-gnu -m32, -m64 and no issues. gcc/ChangeLog: PR tree-optimization/118464 PR tree-optimization/116855 * doc/invoke.texi (min-pagesize): Update docs with vectorizer use. * tree-vect-data-refs.cc (vect_analyze_early_break_dependences): Delay checks. (vect_compute_data_ref_alignment): Remove alignment checks and move to get_load_store_type, increase group access alignment. (vect_enhance_data_refs_alignment): Add note to comment needing investigating. (vect_analyze_data_refs_alignment): Likewise. (vect_supportable_dr_alignment): For group loads look at first DR. * tree-vect-stmts.cc (get_load_store_type): Perform safety checks for early break pfa. * tree-vectorizer.h (dr_set_safe_speculative_read_required, dr_safe_speculative_read_required, DR_SCALAR_KNOWN_BOUNDS): New. (need_peeling_for_alignment): Renamed to... (safe_speculative_read_required): .. This (class dr_vec_info): Add scalar_access_known_in_bounds. gcc/testsuite/ChangeLog: PR tree-optimization/118464 PR tree-optimization/116855 * gcc.dg/vect/bb-slp-pr65935.c: Update, it now vectorizes because the load type is relaxed later. * gcc.dg/vect/vect-early-break_121-pr114081.c: Update. * gcc.dg/vect/vect-early-break_22.c: Require partial vectors. * gcc.dg/vect/vect-early-break_128.c: Likewise. * gcc.dg/vect/vect-early-break_26.c: Likewise. * gcc.dg/vect/vect-early-break_43.c: Likewise. * gcc.dg/vect/vect-early-break_44.c: Likewise. * gcc.dg/vect/vect-early-break_2.c: Require load_lanes. * gcc.dg/vect/vect-early-break_7.c: Likewise. * gcc.dg/vect/vect-early-break_132-pr118464.c: New test. * gcc.dg/vect/vect-early-break_133_pfa1.c: New test. * gcc.dg/vect/vect-early-break_133_pfa11.c: New test. * gcc.dg/vect/vect-early-break_133_pfa10.c: New test. * gcc.dg/vect/vect-early-break_133_pfa2.c: New test. * gcc.dg/vect/vect-early-break_133_pfa3.c: New test. * gcc.dg/vect/vect-early-break_133_pfa4.c: New test. * gcc.dg/vect/vect-early-break_133_pfa5.c: New test. * gcc.dg/vect/vect-early-break_133_pfa6.c: New test. * gcc.dg/vect/vect-early-break_133_pfa7.c: New test. * gcc.dg/vect/vect-early-break_133_pfa8.c: New test. * gcc.dg/vect/vect-early-break_133_pfa9.c: New test. * gcc.dg/vect/vect-early-break_39.c: Update testcase for misalignment. * gcc.dg/vect/vect-early-break_18.c: Likewise. * gcc.dg/vect/vect-early-break_20.c: Likewise. * gcc.dg/vect/vect-early-break_21.c: Likewise. * gcc.dg/vect/vect-early-break_38.c: Likewise. * gcc.dg/vect/vect-early-break_6.c: Likewise. * gcc.dg/vect/vect-early-break_53.c: Likewise. * gcc.dg/vect/vect-early-break_56.c: Likewise. * gcc.dg/vect/vect-early-break_57.c: Likewise. * gcc.dg/vect/vect-early-break_81.c: Likewise.