diff options
author | Pengfei Li <Pengfei.Li2@arm.com> | 2025-08-07 11:08:35 +0000 |
---|---|---|
committer | Pengfei Li <Pengfei.Li2@arm.com> | 2025-08-07 11:10:10 +0000 |
commit | eee51f9a4b6e584230f75e4616438bb5ad5935a9 (patch) | |
tree | 5b19bca0a9b0b84b5eaeced1ca649a8023cf53cb /gcc/tree-vect-data-refs.cc | |
parent | b7fd1fe6cf3df1eed2a80a9cf8b812522ecba8d8 (diff) | |
download | gcc-eee51f9a4b6e584230f75e4616438bb5ad5935a9.zip gcc-eee51f9a4b6e584230f75e4616438bb5ad5935a9.tar.gz gcc-eee51f9a4b6e584230f75e4616438bb5ad5935a9.tar.bz2 |
vect: Extend peeling and versioning for alignment to VLA modes
This patch extends the support for peeling and versioning for alignment
from VLS modes to VLA modes. The key change is allowing the DR target
alignment to be set to a non-constant poly_int. Since the value must be
a power-of-two, for variable VFs, the power-of-two check is deferred to
runtime through loop versioning. The vectorizable check for speculative
loads is also refactored in this patch to handle both constant and
variable target alignment values.
Additional changes for VLA modes include:
1) Peeling
In VLA modes, we use peeling with masking - using a partial vector in
the first iteration of the vectorized loop to ensure aligned DRs in
subsequent iterations. It was already enabled for VLS modes to avoid
scalar peeling. This patch reuses most of the existing logic and just
fixes a small issue of incorrect IV offset in VLA code path. This also
removes a power-of-two rounding when computing the number of iterations
to peel, as power-of-two VF has been guaranteed by a new runtime check.
2) Versioning
The type of the mask for runtime alignment check is updated to poly_int
to support variable VFs. After this change, both standalone versioning
and peeling with versioning are available in VLA modes. This patch also
introduces another runtime check for speculative read amount, to ensure
that all speculative loads remain within current valid memory page. We
plan to remove these runtime checks in the future by introducing capped
VF - using partial vectors to limit the actual VF value at runtime.
3) Speculative read flag
DRs whose scalar accesses are known to be in-bounds will be considered
unaligned unsupported with a variable target alignment. But in fact,
speculative reads can be naturally avoided for in-bounds DRs as long as
partial vectors are used. Therefore, this patch clears the speculative
flags and sets the "must use partial vectors" flag for these cases.
This patch is bootstrapped and regression-tested on x86_64-linux-gnu,
arm-linux-gnueabihf and aarch64-linux-gnu with bootstrap-O3.
gcc/ChangeLog:
* tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
Allow DR target alignment to be a poly_int.
(vect_enhance_data_refs_alignment): Support peeling and
versioning for VLA modes.
* tree-vect-loop-manip.cc (get_misalign_in_elems): Remove
power-of-two rounding in peeling.
(vect_create_cond_for_align_checks): Update alignment check
logic for poly_int mask.
(vect_create_cond_for_vla_spec_read): New runtime checks.
(vect_loop_versioning): Support new runtime checks.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Add a new
loop_vinfo field.
(vectorizable_induction): Fix wrong IV offset issue.
* tree-vect-stmts.cc (get_load_store_type): Refactor
vectorizable checks for speculative loads.
* tree-vectorizer.h (LOOP_VINFO_MAX_SPEC_READ_AMOUNT): New
macro for new runtime checks.
(LOOP_REQUIRES_VERSIONING_FOR_SPEC_READ): Likewise
(LOOP_REQUIRES_VERSIONING): Update macro for new runtime checks.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/peel_ind_11.c: New test.
* gcc.target/aarch64/sve/peel_ind_11_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_12.c: New test.
* gcc.target/aarch64/sve/peel_ind_12_run.c: New test.
* gcc.target/aarch64/sve/peel_ind_13.c: New test.
* gcc.target/aarch64/sve/peel_ind_13_run.c: New test.
Diffstat (limited to 'gcc/tree-vect-data-refs.cc')
-rw-r--r-- | gcc/tree-vect-data-refs.cc | 61 |
1 files changed, 34 insertions, 27 deletions
diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index a9d4aae..a3d3b3e 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -1448,17 +1448,20 @@ vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info, if (loop_vinfo && dr_safe_speculative_read_required (stmt_info)) { - poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); - auto vectype_size + /* The required target alignment must be a power-of-2 value and is + computed as the product of vector element size, VF and group size. + We compute the constant part first as VF may be a variable. For + variable VF, the power-of-2 check of VF is deferred to runtime. */ + auto align_factor_c = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (TREE_TYPE (vectype))); - poly_uint64 new_alignment = vf * vectype_size; - /* If we have a grouped access we require that the alignment be N * elem. */ if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) - new_alignment *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info)); + align_factor_c *= DR_GROUP_SIZE (DR_GROUP_FIRST_ELEMENT (stmt_info)); + + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); + poly_uint64 new_alignment = vf * align_factor_c; - unsigned HOST_WIDE_INT target_alignment; - if (new_alignment.is_constant (&target_alignment) - && pow2p_hwi (target_alignment)) + if ((vf.is_constant () && pow2p_hwi (new_alignment.to_constant ())) + || (!vf.is_constant () && pow2p_hwi (align_factor_c))) { if (dump_enabled_p ()) { @@ -1467,7 +1470,7 @@ vect_compute_data_ref_alignment (vec_info *vinfo, dr_vec_info *dr_info, dump_dec (MSG_NOTE, new_alignment); dump_printf (MSG_NOTE, " bytes.\n"); } - vector_alignment = target_alignment; + vector_alignment = new_alignment; } } @@ -2438,6 +2441,7 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) - The cost of peeling (the extra runtime checks, the increase in code size). */ + poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); FOR_EACH_VEC_ELT (datarefs, i, dr) { dr_vec_info *dr_info = loop_vinfo->lookup_dr (dr); @@ -2446,9 +2450,18 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) stmt_vec_info stmt_info = dr_info->stmt; tree vectype = STMT_VINFO_VECTYPE (stmt_info); - do_peeling - = vector_alignment_reachable_p (dr_info, - LOOP_VINFO_VECT_FACTOR (loop_vinfo)); + + /* With variable VF, unsafe speculative read can be avoided for known + inbounds DRs as long as partial vectors are used. */ + if (!vf.is_constant () + && dr_safe_speculative_read_required (stmt_info) + && DR_SCALAR_KNOWN_BOUNDS (dr_info)) + { + dr_set_safe_speculative_read_required (stmt_info, false); + LOOP_VINFO_MUST_USE_PARTIAL_VECTORS_P (loop_vinfo) = true; + } + + do_peeling = vector_alignment_reachable_p (dr_info, vf); if (do_peeling) { if (known_alignment_for_access_p (dr_info, vectype)) @@ -2488,7 +2501,6 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) poly_uint64 nscalars = npeel_tmp; if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) { - poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); unsigned group_size = 1; if (STMT_SLP_TYPE (stmt_info) && STMT_VINFO_GROUPED_ACCESS (stmt_info)) @@ -2911,14 +2923,12 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) 2) there is at least one unsupported misaligned data ref with an unknown misalignment, and 3) all misaligned data refs with a known misalignment are supported, and - 4) the number of runtime alignment checks is within reason. - 5) the vectorization factor is a constant. */ + 4) the number of runtime alignment checks is within reason. */ do_versioning = (optimize_loop_nest_for_speed_p (loop) && !loop->inner /* FORNOW */ - && loop_cost_model (loop) > VECT_COST_MODEL_CHEAP) - && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (); + && loop_cost_model (loop) > VECT_COST_MODEL_CHEAP); if (do_versioning) { @@ -2965,25 +2975,22 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo) ?? We could actually unroll the loop to achieve the required overall step alignment, and forcing the alignment could be done by doing some iterations of the non-vectorized loop. */ - if (!multiple_p (LOOP_VINFO_VECT_FACTOR (loop_vinfo) - * DR_STEP_ALIGNMENT (dr), + if (!multiple_p (vf * DR_STEP_ALIGNMENT (dr), DR_TARGET_ALIGNMENT (dr_info))) { do_versioning = false; break; } - /* The rightmost bits of an aligned address must be zeros. - Construct the mask needed for this test. For example, - GET_MODE_SIZE for the vector mode V4SI is 16 bytes so the - mask must be 15 = 0xf. */ - gcc_assert (DR_TARGET_ALIGNMENT (dr_info).is_constant ()); - int mask = DR_TARGET_ALIGNMENT (dr_info).to_constant () - 1; + /* Use "mask = DR_TARGET_ALIGNMENT - 1" to test rightmost address + bits for runtime alignment check. For example, for 16 bytes + target alignment the mask is 15 = 0xf. */ + poly_uint64 mask = DR_TARGET_ALIGNMENT (dr_info) - 1; /* FORNOW: use the same mask to test all potentially unaligned references in the loop. */ - if (LOOP_VINFO_PTR_MASK (loop_vinfo) - && LOOP_VINFO_PTR_MASK (loop_vinfo) != mask) + if (maybe_ne (LOOP_VINFO_PTR_MASK (loop_vinfo), 0U) + && maybe_ne (LOOP_VINFO_PTR_MASK (loop_vinfo), mask)) { do_versioning = false; break; |