diff options
author | Richard Biener <rguenther@suse.de> | 2024-06-14 07:54:15 +0200 |
---|---|---|
committer | Richard Biener <rguenther@suse.de> | 2024-06-14 09:04:18 +0200 |
commit | e575b5c56137b12d402d9fb39291fe20985067b7 (patch) | |
tree | 45d3739620f56b1635da7e0b20cc2bfd8e933d1c | |
parent | d3fae2bea034edb001cd45d1d86c5ceef146899b (diff) | |
download | gcc-e575b5c56137b12d402d9fb39291fe20985067b7.zip gcc-e575b5c56137b12d402d9fb39291fe20985067b7.tar.gz gcc-e575b5c56137b12d402d9fb39291fe20985067b7.tar.bz2 |
Fix fallout of peeling for gap improvements
The following hopefully addresses an observed bootstrap issue on aarch64
where maybe-uninit diagnostics occur. It also fixes bogus napkin math
from myself when I was confusing rounded up size of a single access
with rounded up size of the group accessed in a single scalar iteration.
So the following puts in a correctness check, leaving a set of peeling
for gaps as insufficient. This could be rectified by splitting the
last load into multiple ones but I'm leaving this for a followup, better
quickly fix the reported wrong-code.
* tree-vect-stmts.cc (get_group_load_store_type): Do not
re-use poly-int remain but re-compute with non-poly values.
Verify the shortened load is good enough to be covered with
a single scalar gap iteration before accepting it.
* gcc.dg/vect/pr115385.c: Enable AVX2 if available.
-rw-r--r-- | gcc/testsuite/gcc.dg/vect/pr115385.c | 1 | ||||
-rw-r--r-- | gcc/tree-vect-stmts.cc | 12 |
2 files changed, 8 insertions, 5 deletions
diff --git a/gcc/testsuite/gcc.dg/vect/pr115385.c b/gcc/testsuite/gcc.dg/vect/pr115385.c index a18cd66..baea0b2 100644 --- a/gcc/testsuite/gcc.dg/vect/pr115385.c +++ b/gcc/testsuite/gcc.dg/vect/pr115385.c @@ -1,4 +1,5 @@ /* { dg-require-effective-target mmap } */ +/* { dg-additional-options "-mavx2" { target avx2_runtime } } */ #include <sys/mman.h> #include <stdio.h> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index e32d440..ca60526 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2148,15 +2148,17 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, { /* But peeling a single scalar iteration is enough if we can use the next power-of-two sized partial - access. */ + access and that is sufficiently small to be covered + by the single scalar iteration. */ unsigned HOST_WIDE_INT cnunits, cvf, cremain, cpart_size; if (!nunits.is_constant (&cnunits) || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&cvf) - || ((cremain = remain.to_constant (), true) + || (((cremain = group_size * cvf - gap % cnunits), true) && ((cpart_size = (1 << ceil_log2 (cremain))) != cnunits) - && vector_vector_composition_type - (vectype, cnunits / cpart_size, - &half_vtype) == NULL_TREE)) + && (cremain + group_size < cpart_size + || vector_vector_composition_type + (vectype, cnunits / cpart_size, + &half_vtype) == NULL_TREE))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, |