aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/CodeGen/CodeGenModule.cpp
diff options
context:
space:
mode:
authorRicardo Jesus <rjj@nvidia.com>2025-04-17 08:41:17 +0100
committerGitHub <noreply@github.com>2025-04-17 08:41:17 +0100
commit34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d (patch)
treea96aebaeeed4a6dbcb51e942f73963bdf3f01234 /clang/lib/CodeGen/CodeGenModule.cpp
parent62d32c2c27a83261af9f2529a961a22605df8a2b (diff)
downloadllvm-34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d.zip
llvm-34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d.tar.gz
llvm-34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d.tar.bz2
[AArch64][SVE] Fold ADD+CNTB to INCB/DECB (#118280)
Currently, given: ```cpp uint64_t incb(uint64_t x) { return x+svcntb(); } ``` LLVM generates: ```gas incb: addvl x0, x0, #1 ret ``` Which is equivalent to: ```gas incb: incb x0 ret ``` However, on microarchitectures like the Neoverse V2 and Neoverse V3, the second form (with INCB) can have significantly better latency and throughput (according to their SWOG). On the Neoverse V2, for example, ADDVL has a latency and throughput of 2, whereas some forms of INCB have a latency of 1 and a throughput of 4. The same applies to DECB. This patch adds patterns to prefer the cheaper INCB/DECB forms over ADDVL where applicable.
Diffstat (limited to 'clang/lib/CodeGen/CodeGenModule.cpp')
0 files changed, 0 insertions, 0 deletions