diff options
author | Tamar Christina <tamar.christina@arm.com> | 2021-12-03 15:25:44 +0000 |
---|---|---|
committer | Tamar Christina <tamar.christina@arm.com> | 2021-12-03 15:25:44 +0000 |
commit | 06f2e7d49fc6341ea0128ccd83fd13705dd2c523 (patch) | |
tree | b1f63d8af9c2ef415a57ec8f5981e0675e29100d /gcc/tree-vectorizer.h | |
parent | f7854b908977adce4ff669c4e0332ef868568b7c (diff) | |
download | gcc-06f2e7d49fc6341ea0128ccd83fd13705dd2c523.zip gcc-06f2e7d49fc6341ea0128ccd83fd13705dd2c523.tar.gz gcc-06f2e7d49fc6341ea0128ccd83fd13705dd2c523.tar.bz2 |
sve: combine nested if predicates
The following example
void f5(float * restrict z0, float * restrict z1, float *restrict x,
float * restrict y, float c, int n)
{
for (int i = 0; i < n; i++) {
float a = x[i];
float b = y[i];
if (a > b) {
z0[i] = a + b;
if (a > c) {
z1[i] = a - b;
}
}
}
}
generates currently:
ptrue p3.b, all
ld1w z1.s, p1/z, [x2, x5, lsl 2]
ld1w z2.s, p1/z, [x3, x5, lsl 2]
fcmgt p0.s, p3/z, z1.s, z0.s
fcmgt p2.s, p1/z, z1.s, z2.s
fcmgt p0.s, p0/z, z1.s, z2.s
and p0.b, p0/z, p1.b, p1.b
The conditions for a > b and a > c become separate comparisons.
After this patch we generate:
ld1w z1.s, p0/z, [x2, x5, lsl 2]
ld1w z2.s, p0/z, [x3, x5, lsl 2]
fcmgt p1.s, p0/z, z1.s, z2.s
fcmgt p1.s, p1/z, z1.s, z0.s
Where the condition a > b && a > c are folded by using the predicate result of
the previous compare and thus allows the removal of one of the compares.
When never a mask is being generated from an BIT_AND we mask the operands of
the and instead and then just AND the result.
This allows us to be able to CSE the masks and generate the right combination.
However because re-assoc will try to re-order the masks in the & we have to now
perform a small local CSE on the vectorized loop is vectorization is successful.
Note: This patch series is working incrementally towards generating the most
efficient code for this and other loops in small steps.
gcc/ChangeLog:
* tree-vect-stmts.c (prepare_load_store_mask): Rename to...
(prepare_vec_mask): ...This and record operations that have already been
masked.
(vectorizable_call): Use it.
(vectorizable_operation): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
* tree-vectorizer.h (class _loop_vec_info): Add vec_cond_masked_set.
(vec_cond_masked_set_type, tree_cond_mask_hash): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pred-combine-and.c: New test.
Diffstat (limited to 'gcc/tree-vectorizer.h')
-rw-r--r-- | gcc/tree-vectorizer.h | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 76e81ea..2f6e1e2 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -328,6 +328,12 @@ struct default_hash_traits<scalar_cond_masked_key> typedef hash_set<scalar_cond_masked_key> scalar_cond_masked_set_type; +/* Key and map that records association between vector conditions and + corresponding loop mask, and is populated by prepare_vec_mask. */ + +typedef pair_hash<tree_operand_hash, tree_operand_hash> tree_cond_mask_hash; +typedef hash_set<tree_cond_mask_hash> vec_cond_masked_set_type; + /* Describes two objects whose addresses must be unequal for the vectorized loop to be valid. */ typedef std::pair<tree, tree> vec_object_pair; @@ -647,6 +653,9 @@ public: /* Set of scalar conditions that have loop mask applied. */ scalar_cond_masked_set_type scalar_cond_masked_set; + /* Set of vector conditions that have loop mask applied. */ + vec_cond_masked_set_type vec_cond_masked_set; + /* If we are using a loop mask to align memory addresses, this variable contains the number of vector elements that we should skip in the first iteration of the vector loop (i.e. the number of leading |