diff options
author | Tamar Christina <tamar.christina@arm.com> | 2024-07-22 10:26:14 +0100 |
---|---|---|
committer | Thomas Koenig <tkoenig@gcc.gnu.org> | 2024-07-28 19:05:43 +0200 |
commit | f60e18d0c8a64f26dd87cbf3c91975f19953379d (patch) | |
tree | 3db99e567991156a67dd324dc78d22f2669077a8 /gcc/doc | |
parent | 4eb0e778e3b3a4096df3deaca625709cb209c7b4 (diff) | |
download | gcc-f60e18d0c8a64f26dd87cbf3c91975f19953379d.zip gcc-f60e18d0c8a64f26dd87cbf3c91975f19953379d.tar.gz gcc-f60e18d0c8a64f26dd87cbf3c91975f19953379d.tar.bz2 |
middle-end: Implement conditonal store vectorizer pattern [PR115531]
This adds a conditional store optimization for the vectorizer as a pattern.
The vectorizer already supports modifying memory accesses because of the pattern
based gather/scatter recognition.
Doing it in the vectorizer allows us to still keep the ability to vectorize such
loops for architectures that don't have MASK_STORE support, whereas doing this
in ifcvt makes us commit to MASK_STORE.
Concretely for this loop:
void foo1 (char *restrict a, int *restrict b, int *restrict c, int n, int stride)
{
if (stride <= 1)
return;
for (int i = 0; i < n; i++)
{
int res = c[i];
int t = b[i+stride];
if (a[i] != 0)
res = t;
c[i] = res;
}
}
today we generate:
.L3:
ld1b z29.s, p7/z, [x0, x5]
ld1w z31.s, p7/z, [x2, x5, lsl 2]
ld1w z30.s, p7/z, [x1, x5, lsl 2]
cmpne p15.b, p6/z, z29.b, #0
sel z30.s, p15, z30.s, z31.s
st1w z30.s, p7, [x2, x5, lsl 2]
add x5, x5, x4
whilelo p7.s, w5, w3
b.any .L3
which in gimple is:
vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67);
vect_t_20.12_74 = .MASK_LOAD (vectp.10_72, 32B, loop_mask_67);
vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67);
mask__34.16_79 = vect__9.15_77 != { 0, ... };
vect_res_11.17_80 = VEC_COND_EXPR <mask__34.16_79, vect_t_20.12_74, vect_res_18.9_68>;
.MASK_STORE (vectp_c.18_81, 32B, loop_mask_67, vect_res_11.17_80);
A MASK_STORE is already conditional, so there's no need to perform the load of
the old values and the VEC_COND_EXPR. This patch makes it so we generate:
vect_res_18.9_68 = .MASK_LOAD (vectp_c.7_65, 32B, loop_mask_67);
vect__9.15_77 = .MASK_LOAD (vectp_a.13_75, 8B, loop_mask_67);
mask__34.16_79 = vect__9.15_77 != { 0, ... };
.MASK_STORE (vectp_c.18_81, 32B, mask__34.16_79, vect_res_18.9_68);
which generates:
.L3:
ld1b z30.s, p7/z, [x0, x5]
ld1w z31.s, p7/z, [x1, x5, lsl 2]
cmpne p7.b, p7/z, z30.b, #0
st1w z31.s, p7, [x2, x5, lsl 2]
add x5, x5, x4
whilelo p7.s, w5, w3
b.any .L3
gcc/ChangeLog:
PR tree-optimization/115531
* tree-vect-patterns.cc (vect_cond_store_pattern_same_ref): New.
(vect_recog_cond_store_pattern): New.
(vect_vect_recog_func_ptrs): Use it.
* target.def (conditional_operation_is_expensive): New.
* doc/tm.texi: Regenerate.
* doc/tm.texi.in: Document it.
* targhooks.cc (default_conditional_operation_is_expensive): New.
* targhooks.h (default_conditional_operation_is_expensive): New.
Diffstat (limited to 'gcc/doc')
-rw-r--r-- | gcc/doc/tm.texi | 7 | ||||
-rw-r--r-- | gcc/doc/tm.texi.in | 2 |
2 files changed, 9 insertions, 0 deletions
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index f10d9a5..c7535d0 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6449,6 +6449,13 @@ The default implementation returns a @code{MODE_VECTOR_INT} with the same size and number of elements as @var{mode}, if such a mode exists. @end deftypefn +@deftypefn {Target Hook} bool TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE (unsigned @var{ifn}) +This hook returns true if masked operation @var{ifn} (really of +type @code{internal_fn}) should be considered more expensive to use than +implementing the same operation without masking. GCC can then try to use +unconditional operations instead with extra selects. +@end deftypefn + @deftypefn {Target Hook} bool TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE (unsigned @var{ifn}) This hook returns true if masked internal function @var{ifn} (really of type @code{internal_fn}) should be considered expensive when the mask is diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 24596eb..64cea3b 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4290,6 +4290,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_GET_MASK_MODE +@hook TARGET_VECTORIZE_CONDITIONAL_OPERATION_IS_EXPENSIVE + @hook TARGET_VECTORIZE_EMPTY_MASK_IS_EXPENSIVE @hook TARGET_VECTORIZE_CREATE_COSTS |