riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	David Sherwood <david.sherwood@arm.com>	2021-07-07 13:18:20 +0100
committer	David Sherwood <david.sherwood@arm.com>	2021-07-26 10:26:06 +0100
commit	0aff1798b5721d5f95d16f465b99d357012bb8d1 (patch)
tree	33ef05a5ac939f76d568dfae922f3d6ccc8f540d /llvm/lib/CodeGen/MachineFunction.cpp
parent	f924a3d47492b7b586ccfd1333ca086a7e2d88b2 (diff)
download	llvm-0aff1798b5721d5f95d16f465b99d357012bb8d1.zip llvm-0aff1798b5721d5f95d16f465b99d357012bb8d1.tar.gz llvm-0aff1798b5721d5f95d16f465b99d357012bb8d1.tar.bz2

[Analysis] Add simple cost model for strict (in-order) reductions

I have added a new FastMathFlags parameter to getArithmeticReductionCost to indicate what type of reduction we are performing: 1. Tree-wise. This is the typical fast-math reduction that involves continually splitting a vector up into halves and adding each half together until we get a scalar result. This is the default behaviour for integers, whereas for floating point we only do this if reassociation is allowed. 2. Ordered. This now allows us to estimate the cost of performing a strict vector reduction by treating it as a series of scalar operations in lane order. This is the case when FP reassociation is not permitted. For scalable vectors this is more difficult because at compile time we do not know how many lanes there are, and so we use the worst case maximum vscale value. I have also fixed getTypeBasedIntrinsicInstrCost to pass in the FastMathFlags, which meant fixing up some X86 tests where we always assumed the vector.reduce.fadd/mul intrinsics were 'fast'. New tests have been added here: Analysis/CostModel/AArch64/reduce-fadd.ll Analysis/CostModel/AArch64/sve-intrinsics.ll Transforms/LoopVectorize/AArch64/strict-fadd-cost.ll Transforms/LoopVectorize/AArch64/sve-strict-fadd-cost.ll Differential Revision: https://reviews.llvm.org/D105432

Diffstat (limited to 'llvm/lib/CodeGen/MachineFunction.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: