diff options
author | Tamar Christina <tamar.christina@arm.com> | 2025-04-17 10:25:43 +0100 |
---|---|---|
committer | Tamar Christina <tamar.christina@arm.com> | 2025-04-17 10:41:10 +0100 |
commit | 7cf5503e0af52f5b726da4274a148590c57a458a (patch) | |
tree | 789ff7dd18cac5d8a465f3211ac6d5b0eb98dc63 /libjava | |
parent | 0be3dff1aadcc3e879f3d1ffd45d842ab0e0c0bf (diff) | |
download | gcc-7cf5503e0af52f5b726da4274a148590c57a458a.zip gcc-7cf5503e0af52f5b726da4274a148590c57a458a.tar.gz gcc-7cf5503e0af52f5b726da4274a148590c57a458a.tar.bz2 |
middle-end: fix masking for partial vectors and early break [PR119351]
The following testcase shows an incorrect masked codegen:
#define N 512
#define START 1
#define END 505
int x[N] __attribute__((aligned(32)));
int __attribute__((noipa))
foo (void)
{
int z = 0;
for (unsigned int i = START; i < END; ++i)
{
z++;
if (x[i] > 0)
continue;
return z;
}
return -1;
}
notice how there's a continue there instead of a break. This means we generate
a control flow where success stays within the loop iteration:
mask_patt_9.12_46 = vect__1.11_45 > { 0, 0, 0, 0 };
vec_mask_and_47 = mask_patt_9.12_46 & loop_mask_41;
if (vec_mask_and_47 == { -1, -1, -1, -1 })
goto <bb 4>; [41.48%]
else
goto <bb 15>; [58.52%]
However when loop_mask_41 is a partial mask this comparison can lead to an
incorrect match. In this case the mask is:
# loop_mask_41 = PHI <next_mask_63(6), { 0, -1, -1, -1 }(2)>
due to peeling for alignment with masking and compiling with
-msve-vector-bits=128.
At codegen time we generate:
ptrue p15.s, vl4
ptrue p7.b, vl1
not p7.b, p15/z, p7.b
.L5:
ld1w z29.s, p7/z, [x1, x0, lsl 2]
cmpgt p7.s, p7/z, z29.s, #0
not p7.b, p15/z, p7.b
ptest p15, p7.b
b.none .L2
...<early exit>...
Here the basic blocks are rotated and a not is generated.
But the generated not is unmasked (or predicated over an ALL true mask in this
case). This has the unintended side-effect of flipping the results of the
inactive lanes (which were zero'd by the cmpgt) into -1. Which then incorrectly
causes us to not take the branch to .L2.
This is happening because we're not comparing against the right value for the
forall case. This patch gets rid of the forall case by rewriting the
if(all(mask)) into if (!all(mask)) which is the same as if (any(~mask)) by
negating the masks and flipping the branches.
1. For unmasked loops we simply reduce the ~mask.
2. For masked loops we reduce (~mask & loop_mask) which is the same as
doing (mask & loop_mask) ^ loop_mask.
For the above we now generate:
.L5:
ld1w z28.s, p7/z, [x1, x0, lsl 2]
cmple p7.s, p7/z, z28.s, #0
ptest p15, p7.b
b.none .L2
This fixes gromacs with > 1 OpenMP threads and improves performance.
gcc/ChangeLog:
PR tree-optimization/119351
* tree-vect-stmts.cc (vectorizable_early_exit): Mask both operands of
the gcond for partial masking support.
gcc/testsuite/ChangeLog:
PR tree-optimization/119351
* gcc.target/aarch64/sve/pr119351.c: New test.
* gcc.target/aarch64/sve/pr119351_run.c: New test.
Diffstat (limited to 'libjava')
0 files changed, 0 insertions, 0 deletions