diff options
author | Wilco Dijkstra <wdijkstr@arm.com> | 2020-02-12 18:19:25 +0000 |
---|---|---|
committer | Wilco Dijkstra <wdijkstr@arm.com> | 2020-02-12 18:19:25 +0000 |
commit | 9921bbf9b2e27568d952fe6ee5bc083c93bbf7fd (patch) | |
tree | c475f27f25b512752a4c5904a3aaf3c584047a2f /gcc/fold-const.c | |
parent | e5cc04a73a3e212114ca9725911eaaa66d32303c (diff) | |
download | gcc-9921bbf9b2e27568d952fe6ee5bc083c93bbf7fd.zip gcc-9921bbf9b2e27568d952fe6ee5bc083c93bbf7fd.tar.gz gcc-9921bbf9b2e27568d952fe6ee5bc083c93bbf7fd.tar.bz2 |
[AArch64] Improve popcount expansion
The popcount expansion uses umov to extend the result and move it back
to the integer register file. If we model ADDV as a zero-extending
operation, fmov can be used to move back to the integer side. This
results in a ~0.5% speedup on deepsjeng on Cortex-A57.
A typical __builtin_popcount expansion is now:
fmov s0, w0
cnt v0.8b, v0.8b
addv b0, v0.8b
fmov w0, s0
gcc/
* config/aarch64/aarch64-simd.md
(aarch64_zero_extend<GPI:mode>_reduc_plus_<VDQV_E:mode>): New pattern.
* config/aarch64/aarch64.md (popcount<mode>2): Use it instead of
generating separate ADDV and zero_extend patterns.
* config/aarch64/iterators.md (VDQV_E): New iterator.
testsuite/
* gcc.target/aarch64/popcnt2.c: New test.
Diffstat (limited to 'gcc/fold-const.c')
0 files changed, 0 insertions, 0 deletions