diff options
author | Richard Sandiford <richard.sandiford@arm.com> | 2023-01-27 17:03:51 +0000 |
---|---|---|
committer | Richard Sandiford <richard.sandiford@arm.com> | 2023-01-27 17:03:51 +0000 |
commit | 7486fe153adaa868f36248b72f3e78d18b1b3ba1 (patch) | |
tree | a032a0741ea03f90921b41e02085a23d71dfde48 /gcc/tree.cc | |
parent | 553f8003ba5ecfdf0574a171692843ef838226b4 (diff) | |
download | gcc-7486fe153adaa868f36248b72f3e78d18b1b3ba1.zip gcc-7486fe153adaa868f36248b72f3e78d18b1b3ba1.tar.gz gcc-7486fe153adaa868f36248b72f3e78d18b1b3ba1.tar.bz2 |
Add support for conditional xorsign [PR96373]
This patch is an optimisation, but it's also a prerequisite for
fixing PR96373 without regressing vect-xorsign_exec.c.
Currently the vectoriser vectorises:
for (i = 0; i < N; i++)
r[i] = a[i] * __builtin_copysignf (1.0f, b[i]);
as two unconditional operations (copysign and mult).
tree-ssa-math-opts.cc later combines them into an "xorsign" function.
This works for both Advanced SIMD and SVE.
However, with the fix for PR96373, the vectoriser will instead
generate a conditional multiplication (IFN_COND_MUL). Something then
needs to fold copysign & IFN_COND_MUL to the equivalent of a conditional
xorsign. Three obvious options were:
(1) Extend tree-ssa-math-opts.cc.
(2) Do the fold in match.pd.
(3) Leave it to rtl combine.
I'm against (3), because this isn't a target-specific optimisation.
(1) would be possible, but would involve open-coding a lot of what
match.pd does for us. And, in contrast to doing the current
tree-ssa-math-opts.cc optimisation in match.pd, there should be
no danger of (2) happening too early. If we have an IFN_COND_MUL
then we're already past the stage of simplifying the original
source code.
There was also a choice between adding a conditional xorsign ifn
and simply open-coding the xorsign. The latter seems simpler,
and means less boiler-plate for target-specific code.
The signed_or_unsigned_type_for change is needed to make sure
that we stay in "SVE space" when doing the optimisation on 128-bit
fixed-length SVE.
gcc/
PR tree-optimization/96373
* tree.h (sign_mask_for): Declare.
* tree.cc (sign_mask_for): New function.
(signed_or_unsigned_type_for): For vector types, try to use the
related_int_vector_mode.
* genmatch.cc (commutative_op): Handle conditional internal functions.
* match.pd: Fold an IFN_COND_MUL+copysign into an IFN_COND_XOR+and.
gcc/testsuite/
PR tree-optimization/96373
* gcc.target/aarch64/sve/cond_xorsign_1.c: New test.
* gcc.target/aarch64/sve/cond_xorsign_2.c: Likewise.
Diffstat (limited to 'gcc/tree.cc')
-rw-r--r-- | gcc/tree.cc | 33 |
1 files changed, 33 insertions, 0 deletions
diff --git a/gcc/tree.cc b/gcc/tree.cc index 952bbec..80c0967 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -2695,6 +2695,35 @@ build_zero_cst (tree type) } } +/* If floating-point type TYPE has an IEEE-style sign bit, return an + unsigned constant in which only the sign bit is set. Return null + otherwise. */ + +tree +sign_mask_for (tree type) +{ + /* Avoid having to choose between a real-only sign and a pair of signs. + This could be relaxed if the choice becomes obvious later. */ + if (TREE_CODE (type) == COMPLEX_TYPE) + return NULL_TREE; + + auto eltmode = as_a<scalar_float_mode> (element_mode (type)); + auto bits = REAL_MODE_FORMAT (eltmode)->ieee_bits; + if (!bits || !pow2p_hwi (bits)) + return NULL_TREE; + + tree inttype = unsigned_type_for (type); + if (!inttype) + return NULL_TREE; + + auto mask = wi::set_bit_in_zero (bits - 1, bits); + if (TREE_CODE (inttype) == VECTOR_TYPE) + { + tree elt = wide_int_to_tree (TREE_TYPE (inttype), mask); + return build_vector_from_val (inttype, elt); + } + return wide_int_to_tree (inttype, mask); +} /* Build a BINFO with LEN language slots. */ @@ -10987,6 +11016,10 @@ signed_or_unsigned_type_for (int unsignedp, tree type) return NULL_TREE; if (inner == inner2) return type; + machine_mode new_mode; + if (VECTOR_MODE_P (TYPE_MODE (type)) + && related_int_vector_mode (TYPE_MODE (type)).exists (&new_mode)) + return build_vector_type_for_mode (inner2, new_mode); return build_vector_type (inner2, TYPE_VECTOR_SUBPARTS (type)); } |