diff options
author | Sander de Smalen <sander.desmalen@arm.com> | 2025-10-03 11:07:07 +0200 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-10-03 10:07:07 +0100 |
commit | cc9c64d525ece2167a6fae657578a7379541ac6e (patch) | |
tree | edee05beeaaf73c55ba74b92fd160282dede4650 /libcxx/include/__algorithm/comp.h | |
parent | 5cd3db3bed62c07790c17bf1947e98bc903472a9 (diff) | |
download | llvm-cc9c64d525ece2167a6fae657578a7379541ac6e.zip llvm-cc9c64d525ece2167a6fae657578a7379541ac6e.tar.gz llvm-cc9c64d525ece2167a6fae657578a7379541ac6e.tar.bz2 |
[AArch64] Refactor and refine cost-model for partial reductions (#158641)
This cost-model takes into account any type-legalisation that would
happen on vectors such as splitting and promotion. This results in wider
VFs being chosen for loops that can use partial reductions.
The cost-model now also assumes that when SVE is available, the SVE dot
instructions for i16 -> i64 dot products can be used for fixed-length
vectors. In practice this means that loops with non-scalable VFs are
vectorized using partial reductions where they wouldn't before, e.g.
```
int64_t foo2(int8_t *src1, int8_t *src2, int N) {
int64_t sum = 0;
for (int i=0; i<N; ++i)
sum += (int64_t)src1[i] * (int64_t)src2[i];
return sum;
}
```
These changes also fix an issue where previously a partial reduction
would be used for mixed sign/zero-extends (USDOT), even when +i8mm was
not available.
Diffstat (limited to 'libcxx/include/__algorithm/comp.h')
0 files changed, 0 insertions, 0 deletions