rocket-tools/riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Philip Reames <preames@rivosinc.com>	2023-10-01 17:42:07 -0700
committer	GitHub <noreply@github.com>	2023-10-01 17:42:07 -0700
commit	f0505c3dbe50f533e55d21dfcb584fcac44bd80c (patch)
tree	1a0b55813d098ab850d79dc3191167abd2d85699 /flang/lib/Frontend/CompilerInvocation.cpp
parent	2375d84f0656ea97d221a68bba5a6b125e2f0856 (diff)
download	llvm-f0505c3dbe50f533e55d21dfcb584fcac44bd80c.zip llvm-f0505c3dbe50f533e55d21dfcb584fcac44bd80c.tar.gz llvm-f0505c3dbe50f533e55d21dfcb584fcac44bd80c.tar.bz2

[RISCV] Form vredsum from explode_vector + scalar (left) reduce (#67821)

This change adds two related DAG combines which together will take a left-reduce scalar add tree of an explode_vector, and will incrementally form a vector reduction of the vector prefix. If the entire vector is reduced, the result will be a reduction over the entire vector. Profitability wise, this relies on vredsum being cheaper than a pair of extracts and scalar add. Given vredsum is linear in LMUL, and the vslidedown required for the extract is *also* linear in LMUL, this is clearly true at higher index values. At N=2, it's a bit questionable, but I think the vredsum form is probably a better canonical form anyways. Note that this only matches left reduces. This happens to be the motivating example I have (from spec2017 x264). This approach could be generalized to handle right reduces without much effort, and could be generalized to handle any reduce whose tree starts with adjacent elements if desired. The approach fails for a reduce such as (A+C)+(B+D) because we can't find a root to start the reduce with without scanning the entire associative add expression. We could maybe explore using masked reduces for the root node, but that seems of questionable profitability. (As in, worth questioning - I haven't explored in any detail.) This is covering up a deficiency in SLP. If SLP encounters the scalar form of reduce_or(A) + reduce_sum(a) where a is some common vectorizeable tree, SLP will sometimes fail to revisit one of the reductions after vectorizing the other. Fixing this in SLP is hard, and there's no good reason not to handle the easy cases in the backend. Another option here would be to do this in VectorCombine or generic DAG. I chose not to as the profitability of the non-legal typed prefix cases is very target dependent. I think this makes sense as a starting point, even if we move it elsewhere later. This is currently restructed only to add reduces, but obviously makes sense for any associative reduction operator. Once this is approved, I plan to extend it in this manner. I'm simply staging work in case we decide to go in another direction.

Diffstat (limited to 'flang/lib/Frontend/CompilerInvocation.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: