riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Luke Lau <luke@igalia.com>	2025-07-09 11:09:13 +0800
committer	GitHub <noreply@github.com>	2025-07-09 11:09:13 +0800
commit	7c812ea01a2d11545033bbed8f5094c4a4763124 (patch)
tree	2d708b81a0d2676306c4fbc3f501a486ddc452c2 /flang/lib/Frontend/CompilerInvocation.cpp
parent	a8280c4be4cec474eea839bf4caf91b0d071d45b (diff)
download	llvm-7c812ea01a2d11545033bbed8f5094c4a4763124.zip llvm-7c812ea01a2d11545033bbed8f5094c4a4763124.tar.gz llvm-7c812ea01a2d11545033bbed8f5094c4a4763124.tar.bz2

[RISCV] Avoid vl toggles when lowering vector_splice/experimental_vp_splice and add +vl-dependent-latency tuning feature (#146746)

When vectorizing a loop with a fixed-order recurrence we use a splice, which gets lowered to a vslidedown and vslideup pair. However with the way we lower it today we end up with extra vl toggles in the loop, especially with EVL tail folding, e.g: .LBB0_5: # %vector.body # =>This Inner Loop Header: Depth=1 sub a5, a2, a3 sh2add a6, a3, a1 zext.w a7, a4 vsetvli a4, a5, e8, mf2, ta, ma vle32.v v10, (a6) addi a7, a7, -1 vsetivli zero, 1, e32, m2, ta, ma vslidedown.vx v8, v8, a7 sh2add a6, a3, a0 vsetvli zero, a5, e32, m2, ta, ma vslideup.vi v8, v10, 1 vadd.vv v8, v10, v8 add a3, a3, a4 vse32.v v8, (a6) vmv2r.v v8, v10 bne a3, a2, .LBB0_5 Because the vslideup overwrites all but UpOffset elements from the vslidedown, we currently set the vslidedown's AVL to said offset. But in the vslideup we use either VLMAX or the EVL which causes a toggle. This increases the AVL of the vslidedown so it matches vslideup, even if the extra elements are overridden, to avoid the toggle. A new tuning feature +vl-dependent-latency has been added which keeps the old behaviour for microarchitectures that dynamically dispatch uops based on vl, e.g. sifive-x280. +vl-dependent-latency can be reused for the recently proposed Ovlt optimization directive if/when it's ratified: https://lists.riscv.org/g/tech-privileged/message/2487 If we wanted to aggressively optimise for vl at the expense of introducing more toggles we could probably look at doing this in RISCVVLOptimizer.

Diffstat (limited to 'flang/lib/Frontend/CompilerInvocation.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: