aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/CodeGen/CodeGenFunction.cpp
diff options
context:
space:
mode:
authorDavid Sherwood <david.sherwood@arm.com>2025-01-06 13:17:14 +0000
committerGitHub <noreply@github.com>2025-01-06 13:17:14 +0000
commit346185c42c59c344fcf0d9fd476c85d287181baf (patch)
tree5a215686f9de9743af2701f6ae44ba7722fbe372 /clang/lib/CodeGen/CodeGenFunction.cpp
parent8f17c908e3858c0a2a9b1bed3f6506fec3c6f910 (diff)
downloadllvm-346185c42c59c344fcf0d9fd476c85d287181baf.zip
llvm-346185c42c59c344fcf0d9fd476c85d287181baf.tar.gz
llvm-346185c42c59c344fcf0d9fd476c85d287181baf.tar.bz2
[AArch64] Improve codegen of vectorised early exit loops (#119534)
Once PR #112138 lands we are able to start vectorising more loops that have uncountable early exits. The typical loop structure looks like this: vector.body: ... %pred = icmp eq <2 x ptr> %wide.load, %broadcast.splat ... %or.reduc = tail call i1 @llvm.vector.reduce.or.v2i1(<2 x i1> %pred) %iv.cmp = icmp eq i64 %index.next, 4 %exit.cond = or i1 %or.reduc, %iv.cmp br i1 %exit.cond, label %middle.split, label %vector.body middle.split: br i1 %or.reduc, label %found, label %notfound found: ret i64 1 notfound: ret i64 0 The problem with this is that %or.reduc is kept live after the loop, and since this is a boolean it typically requires making a copy of the condition code register. For AArch64 this requires an additional cset instruction, which is quite expensive for a typical find loop that only contains 6 or 7 instructions. This patch attempts to improve the codegen by sinking the reduction out of the loop to the location of it's user. It's a lot cheaper to keep the predicate alive if the type is legal and has lots of registers for it. There is a potential downside in that a little more work is required after the loop, but I believe this is worth it since we are likely to spend most of our time in the loop.
Diffstat (limited to 'clang/lib/CodeGen/CodeGenFunction.cpp')
0 files changed, 0 insertions, 0 deletions