aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
diff options
context:
space:
mode:
authorSpenser Bauman <sbauman@mathworks.com>2023-12-01 10:16:51 -0500
committerGitHub <noreply@github.com>2023-12-01 15:16:51 +0000
commit0d87e2577914a6384f4ad5952b8fa9b0d8e48da8 (patch)
treea5c9227169fdfb701db41192140a230457cf4e1a /llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
parentfaebb1b2e6891687e4f608b74205985ec78ade40 (diff)
downloadllvm-0d87e2577914a6384f4ad5952b8fa9b0d8e48da8.zip
llvm-0d87e2577914a6384f4ad5952b8fa9b0d8e48da8.tar.gz
llvm-0d87e2577914a6384f4ad5952b8fa9b0d8e48da8.tar.bz2
[mlir][tosa] Improve lowering to tosa.fully_connected (#73049)
The current lowering of tosa.fully_connected produces a linalg.matmul followed by a linalg.generic to add the bias. The IR looks like the following: %init = tensor.empty() %zero = linalg.fill ins(0 : f32) outs(%init) %prod = linalg.matmul ins(%A, %B) outs(%zero) // Add the bias %initB = tensor.empty() %result = linalg.generic ins(%prod, %bias) outs(%initB) { // add bias and product } This has two down sides: 1. The tensor.empty operations typically result in additional allocations after bufferization 2. There is a redundant traversal of the data to add the bias to the matrix product. This extra work can be avoided by leveraging the out-param of linalg.matmul. The new IR sequence is: %init = tensor.empty() %broadcast = linalg.broadcast ins(%bias) outs(%init) %prod = linalg.matmul ins(%A, %B) outs(%broadcast) In my experiments, this eliminates one loop and one allocation (post bufferization) from the generated code.
Diffstat (limited to 'llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp')
0 files changed, 0 insertions, 0 deletions