aboutsummaryrefslogtreecommitdiff
path: root/libcxx/include/__algorithm/comp.h
diff options
context:
space:
mode:
authorSander de Smalen <sander.desmalen@arm.com>2025-10-03 11:07:07 +0200
committerGitHub <noreply@github.com>2025-10-03 10:07:07 +0100
commitcc9c64d525ece2167a6fae657578a7379541ac6e (patch)
treeedee05beeaaf73c55ba74b92fd160282dede4650 /libcxx/include/__algorithm/comp.h
parent5cd3db3bed62c07790c17bf1947e98bc903472a9 (diff)
downloadllvm-cc9c64d525ece2167a6fae657578a7379541ac6e.zip
llvm-cc9c64d525ece2167a6fae657578a7379541ac6e.tar.gz
llvm-cc9c64d525ece2167a6fae657578a7379541ac6e.tar.bz2
[AArch64] Refactor and refine cost-model for partial reductions (#158641)
This cost-model takes into account any type-legalisation that would happen on vectors such as splitting and promotion. This results in wider VFs being chosen for loops that can use partial reductions. The cost-model now also assumes that when SVE is available, the SVE dot instructions for i16 -> i64 dot products can be used for fixed-length vectors. In practice this means that loops with non-scalable VFs are vectorized using partial reductions where they wouldn't before, e.g. ``` int64_t foo2(int8_t *src1, int8_t *src2, int N) { int64_t sum = 0; for (int i=0; i<N; ++i) sum += (int64_t)src1[i] * (int64_t)src2[i]; return sum; } ``` These changes also fix an issue where previously a partial reduction would be used for mixed sign/zero-extends (USDOT), even when +i8mm was not available.
Diffstat (limited to 'libcxx/include/__algorithm/comp.h')
0 files changed, 0 insertions, 0 deletions