diff options
author | Pengxuan Zheng <quic_pzheng@quicinc.com> | 2024-06-12 18:23:13 -0700 |
---|---|---|
committer | Pengxuan Zheng <quic_pzheng@quicinc.com> | 2024-07-02 16:06:48 -0700 |
commit | 895bbc08d38c2aca3cbbab273a247021fea73930 (patch) | |
tree | d4da22b7e4a092598b82a79b9c8078cdba32ddf4 /gcc/DATESTAMP | |
parent | a7ad9cb813063ddf51269910f33b56116c10462c (diff) | |
download | gcc-895bbc08d38c2aca3cbbab273a247021fea73930.zip gcc-895bbc08d38c2aca3cbbab273a247021fea73930.tar.gz gcc-895bbc08d38c2aca3cbbab273a247021fea73930.tar.bz2 |
aarch64: Add vector popcount besides QImode [PR113859]
This patch improves GCC’s vectorization of __builtin_popcount for aarch64 target
by adding popcount patterns for vector modes besides QImode, i.e., HImode,
SImode and DImode.
With this patch, we now generate the following for V8HI:
cnt v1.16b, v0.16b
uaddlp v2.8h, v1.16b
For V4HI, we generate:
cnt v1.8b, v0.8b
uaddlp v2.4h, v1.8b
For V4SI, we generate:
cnt v1.16b, v0.16b
uaddlp v2.8h, v1.16b
uaddlp v3.4s, v2.8h
For V4SI with TARGET_DOTPROD, we generate the following instead:
movi v0.4s, #0
movi v1.16b, #1
cnt v3.16b, v2.16b
udot v0.4s, v3.16b, v1.16b
For V2SI, we generate:
cnt v1.8b, v.8b
uaddlp v2.4h, v1.8b
uaddlp v3.2s, v2.4h
For V2SI with TARGET_DOTPROD, we generate the following instead:
movi v0.8b, #0
movi v1.8b, #1
cnt v3.8b, v2.8b
udot v0.2s, v3.8b, v1.8b
For V2DI, we generate:
cnt v1.16b, v.16b
uaddlp v2.8h, v1.16b
uaddlp v3.4s, v2.8h
uaddlp v4.2d, v3.4s
For V4SI with TARGET_DOTPROD, we generate the following instead:
movi v0.4s, #0
movi v1.16b, #1
cnt v3.16b, v2.16b
udot v0.4s, v3.16b, v1.16b
uaddlp v0.2d, v0.4s
PR target/113859
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (aarch64_<su>addlp<mode>): Rename to...
(@aarch64_<su>addlp<mode>): ... This.
(popcount<mode>2): New define_expand.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/popcnt-udot.c: New test.
* gcc.target/aarch64/popcnt-vec.c: New test.
Signed-off-by: Pengxuan Zheng <quic_pzheng@quicinc.com>
Diffstat (limited to 'gcc/DATESTAMP')
0 files changed, 0 insertions, 0 deletions