aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/MC/MCCodeView.cpp
diff options
context:
space:
mode:
authorRicardo Jesus <rjj@nvidia.com>2025-06-30 09:04:45 +0100
committerGitHub <noreply@github.com>2025-06-30 09:04:45 +0100
commitb563e763065deb0bb5365a3dbdab283ae852dc7e (patch)
treeec1fad35917af9c8e3b2524f1674e0caafd7882e /llvm/lib/MC/MCCodeView.cpp
parent597ee882a5575987b63d82805e3bbaf3cedc7cc5 (diff)
downloadllvm-b563e763065deb0bb5365a3dbdab283ae852dc7e.zip
llvm-b563e763065deb0bb5365a3dbdab283ae852dc7e.tar.gz
llvm-b563e763065deb0bb5365a3dbdab283ae852dc7e.tar.bz2
[AArch64] Improve scalar and Neon popcount with SVE CNT. (#143870)
When available, we can use SVE's CNT instruction to improve the lowering of scalar and fixed-length popcount (CTPOP) since the SVE instruction supports types that the Neon variant doesn't. For the scalar types, I see the following speedups on NVIDIA Grace CPU: | size (bits) | before (Gibit/s) | after (Gibit/s) | speedup | |------------:|-----------------:|----------------:|--------:| | 32 | 75.20 | 86.79 | 1.15 | | 64 | 149.87 | 173.70 | 1.16 | | 128 | 158.56 | 164.88 | 1.04 |
Diffstat (limited to 'llvm/lib/MC/MCCodeView.cpp')
0 files changed, 0 insertions, 0 deletions