riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Spenser Bauman <sbauman@mathworks.com>	2023-12-01 10:16:51 -0500
committer	GitHub <noreply@github.com>	2023-12-01 15:16:51 +0000
commit	0d87e2577914a6384f4ad5952b8fa9b0d8e48da8 (patch)
tree	a5c9227169fdfb701db41192140a230457cf4e1a /llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
parent	faebb1b2e6891687e4f608b74205985ec78ade40 (diff)
download	llvm-0d87e2577914a6384f4ad5952b8fa9b0d8e48da8.zip llvm-0d87e2577914a6384f4ad5952b8fa9b0d8e48da8.tar.gz llvm-0d87e2577914a6384f4ad5952b8fa9b0d8e48da8.tar.bz2

[mlir][tosa] Improve lowering to tosa.fully_connected (#73049)

The current lowering of tosa.fully_connected produces a linalg.matmul followed by a linalg.generic to add the bias. The IR looks like the following: %init = tensor.empty() %zero = linalg.fill ins(0 : f32) outs(%init) %prod = linalg.matmul ins(%A, %B) outs(%zero) // Add the bias %initB = tensor.empty() %result = linalg.generic ins(%prod, %bias) outs(%initB) { // add bias and product } This has two down sides: 1. The tensor.empty operations typically result in additional allocations after bufferization 2. There is a redundant traversal of the data to add the bias to the matrix product. This extra work can be avoided by leveraging the out-param of linalg.matmul. The new IR sequence is: %init = tensor.empty() %broadcast = linalg.broadcast ins(%bias) outs(%init) %prod = linalg.matmul ins(%A, %B) outs(%broadcast) In my experiments, this eliminates one loop and one allocation (post bufferization) from the generated code.

Diffstat (limited to 'llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: