aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/AST/ByteCode/Compiler.cpp
diff options
context:
space:
mode:
authorLuke Lau <luke@igalia.com>2025-09-15 12:14:04 +0800
committerGitHub <noreply@github.com>2025-09-15 04:14:04 +0000
commit65ad21d730d25789454d18e811f8ff5db79cb5d4 (patch)
tree24e87afdce8784ba28574ce05edb5810e12a0795 /clang/lib/AST/ByteCode/Compiler.cpp
parentc44e015b49e832c9f3749c33cf5c9d5aacaf60a4 (diff)
downloadllvm-65ad21d730d25789454d18e811f8ff5db79cb5d4.zip
llvm-65ad21d730d25789454d18e811f8ff5db79cb5d4.tar.gz
llvm-65ad21d730d25789454d18e811f8ff5db79cb5d4.tar.bz2
[RISCV] Handle recurrences in RISCVVLOptimizer (#151285)
After #144666 we now support vectorizing loops with induction variables with EVL tail folding. The induction updates don't use VP intrinsics to avoid VL toggles but instead rely on RISCVVLOptimizer. However RISCVVLOptimizer can't reason about cycles or recurrences today, which means we are left with a VL toggle to VLMAX: # %bb.1: # %for.body.preheader li a2, 0 vsetvli a3, zero, e32, m2, ta, ma vid.v v8 .LBB0_2: # %vector.body # =>This Inner Loop Header: Depth=1 sub a3, a1, a2 sh2add a4, a2, a0 vsetvli a3, a3, e32, m2, ta, ma vle32.v v10, (a4) add a2, a2, a3 vadd.vv v10, v10, v8 vse32.v v10, (a4) vsetvli a4, zero, e32, m2, ta, ma vadd.vx v8, v8, a3 bne a2, a1, .LBB0_2 This patch teaches RISCVVLOptimizer to reason about recurrences so we can remove the VLMAX toggle: # %bb.1: # %for.body.preheader li a2, 0 vsetvli a3, zero, e32, m2, ta, ma vid.v v8 .LBB0_2: # %vector.body # =>This Inner Loop Header: Depth=1 sub a3, a1, a2 sh2add a4, a2, a0 vsetvli a3, a3, e32, m2, ta, ma vle32.v v10, (a4) add a2, a2, a3 vadd.vv v10, v10, v8 vse32.v v10, (a4) vadd.vx v8, v8, a3 bne a2, a1, .LBB0_2 With this we remove a significant number of VL toggles and vsetvli instructions across llvm-test-suite and SPEC CPU 2017 with tail folding enabled, since it affects every loop with an induction variable. This builds upon the work in #124530 where we started computing what VL each instruction demanded, and generalizes it to an optimistic sparse dataflow analysis: - We begin by optimistically assuming no VL is used by any instruction, and push instructions onto the worklist starting from the bottom. - For each instruction on the worklist we apply the transfer function, which propagates the VL needed by that instruction upwards to the instructions it uses. If a use's demanded VL changes, it's added to the worklist. - Eventually this converges to a fixpoint when all uses have been processed and every demanded VL has been propagated throughout the entire use-def chain. Only after this is the DemandedVL map accurate. Some implementation details: - The roots are stores (or other unsupported instructions not in `isSupportedInstr`) or copies to physical registers (they fail the `any_of(MI.defs(), isPhysical)` check) - This patch untangles `getMinimumVLForUser` and `checkUsers`. `getMinimumVLForUser` now returns how many lanes of an operand are read by an instruction, whilst `checkUsers` checks that an instruction and its users have compatible EEW/EMULs. - The `DemandedVL` struct was added so that we have a default constructor of 0 for `DenseMap<const MachineInstr *, DemandedVL> DemandedVLs`, so we don't need to check if a key exists when looking things up. There was no measurable compile time impact on llvm-test-suite or SPEC CPU 2017. The analysis will always terminate, there are more details in this EuroLLVM talk here: https://www.youtube.com/watch?v=Mfb5fRSdJAc Fixes #149354
Diffstat (limited to 'clang/lib/AST/ByteCode/Compiler.cpp')
0 files changed, 0 insertions, 0 deletions