riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Ricardo Jesus <rjj@nvidia.com>	2025-04-17 08:41:17 +0100
committer	GitHub <noreply@github.com>	2025-04-17 08:41:17 +0100
commit	34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d (patch)
tree	a96aebaeeed4a6dbcb51e942f73963bdf3f01234 /clang/lib/CodeGen/CodeGenModule.cpp
parent	62d32c2c27a83261af9f2529a961a22605df8a2b (diff)
download	llvm-34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d.zip llvm-34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d.tar.gz llvm-34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d.tar.bz2

[AArch64][SVE] Fold ADD+CNTB to INCB/DECB (#118280)

Currently, given: ```cpp uint64_t incb(uint64_t x) { return x+svcntb(); } ``` LLVM generates: ```gas incb: addvl x0, x0, #1 ret ``` Which is equivalent to: ```gas incb: incb x0 ret ``` However, on microarchitectures like the Neoverse V2 and Neoverse V3, the second form (with INCB) can have significantly better latency and throughput (according to their SWOG). On the Neoverse V2, for example, ADDVL has a latency and throughput of 2, whereas some forms of INCB have a latency of 1 and a throughput of 4. The same applies to DECB. This patch adds patterns to prefer the cheaper INCB/DECB forms over ADDVL where applicable.

Diffstat (limited to 'clang/lib/CodeGen/CodeGenModule.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: