diff options
author | David Sherwood <david.sherwood@arm.com> | 2025-01-06 13:17:14 +0000 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-01-06 13:17:14 +0000 |
commit | 346185c42c59c344fcf0d9fd476c85d287181baf (patch) | |
tree | 5a215686f9de9743af2701f6ae44ba7722fbe372 /clang/lib/CodeGen/CodeGenFunction.cpp | |
parent | 8f17c908e3858c0a2a9b1bed3f6506fec3c6f910 (diff) | |
download | llvm-346185c42c59c344fcf0d9fd476c85d287181baf.zip llvm-346185c42c59c344fcf0d9fd476c85d287181baf.tar.gz llvm-346185c42c59c344fcf0d9fd476c85d287181baf.tar.bz2 |
[AArch64] Improve codegen of vectorised early exit loops (#119534)
Once PR #112138 lands we are able to start vectorising more loops
that have uncountable early exits. The typical loop structure
looks like this:
vector.body:
...
%pred = icmp eq <2 x ptr> %wide.load, %broadcast.splat
...
%or.reduc = tail call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> %pred)
%iv.cmp = icmp eq i64 %index.next, 4
%exit.cond = or i1 %or.reduc, %iv.cmp
br i1 %exit.cond, label %middle.split, label %vector.body
middle.split:
br i1 %or.reduc, label %found, label %notfound
found:
ret i64 1
notfound:
ret i64 0
The problem with this is that %or.reduc is kept live after the loop,
and since this is a boolean it typically requires making a copy of
the condition code register. For AArch64 this requires an additional
cset instruction, which is quite expensive for a typical find loop
that only contains 6 or 7 instructions.
This patch attempts to improve the codegen by sinking the reduction
out of the loop to the location of it's user. It's a lot cheaper to
keep the predicate alive if the type is legal and has lots of
registers for it. There is a potential downside in that a little
more work is required after the loop, but I believe this is worth
it since we are likely to spend most of our time in the loop.
Diffstat (limited to 'clang/lib/CodeGen/CodeGenFunction.cpp')
0 files changed, 0 insertions, 0 deletions