diff options
author | Ricardo Jesus <rjj@nvidia.com> | 2025-04-17 08:41:17 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-04-17 08:41:17 +0100 |
commit | 34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d (patch) | |
tree | a96aebaeeed4a6dbcb51e942f73963bdf3f01234 /clang/lib/CodeGen/CodeGenModule.cpp | |
parent | 62d32c2c27a83261af9f2529a961a22605df8a2b (diff) | |
download | llvm-34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d.zip llvm-34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d.tar.gz llvm-34f9ddf1ce2d775d8df1ca9c9806710f3ba8361d.tar.bz2 |
[AArch64][SVE] Fold ADD+CNTB to INCB/DECB (#118280)
Currently, given:
```cpp
uint64_t incb(uint64_t x) {
return x+svcntb();
}
```
LLVM generates:
```gas
incb:
addvl x0, x0, #1
ret
```
Which is equivalent to:
```gas
incb:
incb x0
ret
```
However, on microarchitectures like the Neoverse V2 and Neoverse V3,
the second form (with INCB) can have significantly better latency and
throughput (according to their SWOG). On the Neoverse V2, for example,
ADDVL has a latency and throughput of 2, whereas some forms of INCB
have a latency of 1 and a throughput of 4. The same applies to DECB.
This patch adds patterns to prefer the cheaper INCB/DECB forms over
ADDVL where applicable.
Diffstat (limited to 'clang/lib/CodeGen/CodeGenModule.cpp')
0 files changed, 0 insertions, 0 deletions