diff options
author | Jakub Jelinek <jakub@redhat.com> | 2023-04-19 11:14:23 +0200 |
---|---|---|
committer | Jakub Jelinek <jakub@redhat.com> | 2023-04-19 11:14:23 +0200 |
commit | ade0a1ee5c6707b950ba284adcfed0514866c12d (patch) | |
tree | b91966ae44d14742186f1defc51962e9c0d5f83b /gcc/gimple.cc | |
parent | 76f44fbfea1f11e53d4b7e83f0debd029c94a1b3 (diff) | |
download | gcc-ade0a1ee5c6707b950ba284adcfed0514866c12d.zip gcc-ade0a1ee5c6707b950ba284adcfed0514866c12d.tar.gz gcc-ade0a1ee5c6707b950ba284adcfed0514866c12d.tar.bz2 |
tree-vect-patterns: Improve __builtin_{clz,ctz,ffs}ll vectorization [PR109011]
For __builtin_popcountll tree-vect-patterns.cc has
vect_recog_popcount_pattern, which improves the vectorized code.
Without that the vectorization is always multi-type vectorization
in the loop (at least int and long long types) where we emit two
.POPCOUNT calls with long long arguments and int return value and then
widen to long long, so effectively after vectorization do the
V?DImode -> V?DImode popcount twice, then pack the result into V?SImode
and immediately unpack.
The following patch extends that handling to __builtin_{clz,ctz,ffs}ll
builtins as well (as long as there is an optab for them; more to come
laster).
x86 can do __builtin_popcountll with -mavx512vpopcntdq, __builtin_clzll
with -mavx512cd, ppc can do __builtin_popcountll and __builtin_clzll
with -mpower8-vector and __builtin_ctzll with -mpower9-vector, s390
can do __builtin_{popcount,clz,ctz}ll with -march=z13 -mzarch (i.e. VX).
2023-04-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/109011
* tree-vect-patterns.cc (vect_recog_popcount_pattern): Rename to ...
(vect_recog_popcount_clz_ctz_ffs_pattern): ... this. Handle also
CLZ, CTZ and FFS. Remove vargs variable, use
gimple_build_call_internal rather than gimple_build_call_internal_vec.
(vect_vect_recog_func_ptrs): Adjust popcount entry.
* gcc.dg/vect/pr109011-1.c: New test.
Diffstat (limited to 'gcc/gimple.cc')
0 files changed, 0 insertions, 0 deletions