aboutsummaryrefslogtreecommitdiff
path: root/clang/lib/Frontend/CompilerInvocation.cpp
diff options
context:
space:
mode:
authorPhilip Reames <preames@rivosinc.com>2024-09-20 08:34:36 -0700
committerGitHub <noreply@github.com>2024-09-20 08:34:36 -0700
commit7f6bbb3c4f2c5f03d2abd3a79c85f1e0e4e03e05 (patch)
tree71dbd65761cab40acd55a374a17bf4639f4a12b1 /clang/lib/Frontend/CompilerInvocation.cpp
parent02071a8d83b963da525236ac2a1bd0299d1647e1 (diff)
downloadllvm-7f6bbb3c4f2c5f03d2abd3a79c85f1e0e4e03e05.zip
llvm-7f6bbb3c4f2c5f03d2abd3a79c85f1e0e4e03e05.tar.gz
llvm-7f6bbb3c4f2c5f03d2abd3a79c85f1e0e4e03e05.tar.bz2
[RISCV][TTI] Reduce cost of a build_vector pattern (#108419)
This change is actually two related changes, but they're very hard to meaningfully separate as the second balances the first, and yet doesn't do much good on it's own. First, we can reduce the cost of a build_vector pattern. Our current costing for this defers to generic insertelement costing which isn't unreasonable, but also isn't correct. While inserting N elements requires N-1 slides and N vmv.s.x, doing the full build_vector only requires N vslide1down. (Note there are other cases that our build vector lowering can do more cheaply, this is simply the easiest upper bound which appears to be "good enough" for SLP costing purposes.) Second, we need to tell SLP that calls don't preserve vector registers. Without this, SLP will vectorize scalar code which performs e.g. 4 x float @exp calls as two <2 x float> @exp intrinsic calls. Oddly, the costing works out that this is in fact the optimal choice - except that we don't actually have a <2 x float> @exp, and unroll during DAG. This would be fine (or at least cost neutral) except that the libcall for the scalar @exp blows all vector registers. So the net effect is we added a bunch of spills that SLP had no idea about. Thankfully, AArch64 has a similiar problem, and has taught SLP how to reason about spill cost once the right TTI hook is implemented. Now, for some implications... The SLP solution for spill costing has some inaccuracies. In particular, it basically just guesses whether a intrinsic will be lowered to a call or not, and can be wrong in both directions. It also has no mechanism to differentiate on calling convention. This has the effect of making partial vectorization (i.e. starting in scalar) more profitable. In practice, the major effect of this is to make it more like SLP will vectorize part of a tree in an intersecting forrest, and then vectorize the remaining tree once those uses have been removed. This has the effect of biasing us slightly away from strided, or indexed loads during vectorization - because the scalar cost is more accurately modeled, and these instructions look relevatively less profitable.
Diffstat (limited to 'clang/lib/Frontend/CompilerInvocation.cpp')
0 files changed, 0 insertions, 0 deletions