diff options
author | Tamar Christina <tamar.christina@arm.com> | 2021-11-10 15:59:26 +0000 |
---|---|---|
committer | Tamar Christina <tamar.christina@arm.com> | 2021-11-10 16:03:18 +0000 |
commit | 86ffc845b2d0bff59832dcf3cf6518f1358e30ac (patch) | |
tree | 91fa2fc59feb6dbd0f95dfebe316595ce4d6f7b8 /gcc/tree-vectorizer.c | |
parent | 8ed62c929c7c44627f41627e085e15d77b2e6ed4 (diff) | |
download | gcc-86ffc845b2d0bff59832dcf3cf6518f1358e30ac.zip gcc-86ffc845b2d0bff59832dcf3cf6518f1358e30ac.tar.gz gcc-86ffc845b2d0bff59832dcf3cf6518f1358e30ac.tar.bz2 |
AArch64: do not keep negated mask and inverse mask live at the same time
The following example:
void f11(double * restrict z, double * restrict w, double * restrict x,
double * restrict y, int n)
{
for (int i = 0; i < n; i++) {
z[i] = (w[i] > 0) ? w[i] : y[i];
}
}
Generates currently:
ptrue p2.b, all
ld1d z0.d, p0/z, [x1, x2, lsl 3]
fcmgt p1.d, p2/z, z0.d, #0.0
bic p3.b, p2/z, p0.b, p1.b
ld1d z1.d, p3/z, [x3, x2, lsl 3]
and after the previous patches generates:
ptrue p3.b, all
ld1d z0.d, p0/z, [x1, x2, lsl 3]
fcmgt p1.d, p0/z, z0.d, #0.0
fcmgt p2.d, p3/z, z0.d, #0.0
not p1.b, p0/z, p1.b
ld1d z1.d, p1/z, [x3, x2, lsl 3]
where a duplicate comparison is performed for w[i] > 0.
This is because in the vectorizer we're emitting a comparison for both a and ~a
where we just need to emit one of them and invert the other. After this patch
we generate:
ld1d z0.d, p0/z, [x1, x2, lsl 3]
fcmgt p1.d, p0/z, z0.d, #0.0
mov p2.b, p1.b
not p1.b, p0/z, p1.b
ld1d z1.d, p1/z, [x3, x2, lsl 3]
In order to perform the check I have to fully expand the NOT stmts when
recording them as the SSA names for the top level expressions differ but
their arguments don't. e.g. in _31 = ~_34 the value of _34 differs but not
the operands in _34.
But we only do this when the operation is an ordered one because mixing
ordered and unordered expressions can lead to de-optimized code.
Note: This patch series is working incrementally towards generating the most
efficient code for this and other loops in small steps. The mov is
created by postreload when it does a late CSE.
gcc/ChangeLog:
* tree-vectorizer.h (struct scalar_cond_masked_key): Add inverted_p.
(default_hash_traits<scalar_conf_masked_key>): Likewise.
* tree-vect-stmts.c (vectorizable_condition): Check if inverse of mask
is live.
* tree-vectorizer.c (scalar_cond_masked_key::get_cond_ops_from_tree):
Register mask inverses.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/pred-not-gen-1.c: Update testcase.
* gcc.target/aarch64/sve/pred-not-gen-2.c: Update testcase.
* gcc.target/aarch64/sve/pred-not-gen-3.c: Update testcase.
* gcc.target/aarch64/sve/pred-not-gen-4.c: Update testcase.
Diffstat (limited to 'gcc/tree-vectorizer.c')
-rw-r--r-- | gcc/tree-vectorizer.c | 19 |
1 files changed, 19 insertions, 0 deletions
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 3247c9a..f493d63 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -1678,6 +1678,7 @@ scalar_cond_masked_key::get_cond_ops_from_tree (tree t) this->code = TREE_CODE (t); this->op0 = TREE_OPERAND (t, 0); this->op1 = TREE_OPERAND (t, 1); + this->inverted_p = false; return; } @@ -1690,13 +1691,31 @@ scalar_cond_masked_key::get_cond_ops_from_tree (tree t) this->code = code; this->op0 = gimple_assign_rhs1 (stmt); this->op1 = gimple_assign_rhs2 (stmt); + this->inverted_p = false; return; } + else if (code == BIT_NOT_EXPR) + { + tree n_op = gimple_assign_rhs1 (stmt); + if ((stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (n_op)))) + { + code = gimple_assign_rhs_code (stmt); + if (TREE_CODE_CLASS (code) == tcc_comparison) + { + this->code = code; + this->op0 = gimple_assign_rhs1 (stmt); + this->op1 = gimple_assign_rhs2 (stmt); + this->inverted_p = true; + return; + } + } + } } this->code = NE_EXPR; this->op0 = t; this->op1 = build_zero_cst (TREE_TYPE (t)); + this->inverted_p = false; } /* See the comment above the declaration for details. */ |