diff options
author | Ricardo Jesus <rjj@nvidia.com> | 2025-06-30 09:04:45 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-06-30 09:04:45 +0100 |
commit | b563e763065deb0bb5365a3dbdab283ae852dc7e (patch) | |
tree | ec1fad35917af9c8e3b2524f1674e0caafd7882e /llvm/lib/MC/MCCodeView.cpp | |
parent | 597ee882a5575987b63d82805e3bbaf3cedc7cc5 (diff) | |
download | llvm-b563e763065deb0bb5365a3dbdab283ae852dc7e.zip llvm-b563e763065deb0bb5365a3dbdab283ae852dc7e.tar.gz llvm-b563e763065deb0bb5365a3dbdab283ae852dc7e.tar.bz2 |
[AArch64] Improve scalar and Neon popcount with SVE CNT. (#143870)
When available, we can use SVE's CNT instruction to improve the lowering
of scalar and fixed-length popcount (CTPOP) since the SVE instruction
supports types that the Neon variant doesn't.
For the scalar types, I see the following speedups on NVIDIA Grace CPU:
| size (bits) | before (Gibit/s) | after (Gibit/s) | speedup |
|------------:|-----------------:|----------------:|--------:|
| 32 | 75.20 | 86.79 | 1.15 |
| 64 | 149.87 | 173.70 | 1.16 |
| 128 | 158.56 | 164.88 | 1.04 |
Diffstat (limited to 'llvm/lib/MC/MCCodeView.cpp')
0 files changed, 0 insertions, 0 deletions