diff options
author | Spenser Bauman <sbauman@mathworks.com> | 2023-12-01 10:16:51 -0500 |
---|---|---|
committer | GitHub <noreply@github.com> | 2023-12-01 15:16:51 +0000 |
commit | 0d87e2577914a6384f4ad5952b8fa9b0d8e48da8 (patch) | |
tree | a5c9227169fdfb701db41192140a230457cf4e1a /llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp | |
parent | faebb1b2e6891687e4f608b74205985ec78ade40 (diff) | |
download | llvm-0d87e2577914a6384f4ad5952b8fa9b0d8e48da8.zip llvm-0d87e2577914a6384f4ad5952b8fa9b0d8e48da8.tar.gz llvm-0d87e2577914a6384f4ad5952b8fa9b0d8e48da8.tar.bz2 |
[mlir][tosa] Improve lowering to tosa.fully_connected (#73049)
The current lowering of tosa.fully_connected produces a linalg.matmul
followed by a linalg.generic to add the bias. The IR looks like the
following:
%init = tensor.empty()
%zero = linalg.fill ins(0 : f32) outs(%init)
%prod = linalg.matmul ins(%A, %B) outs(%zero)
// Add the bias
%initB = tensor.empty()
%result = linalg.generic ins(%prod, %bias) outs(%initB) {
// add bias and product
}
This has two down sides:
1. The tensor.empty operations typically result in additional
allocations after bufferization
2. There is a redundant traversal of the data to add the bias to the
matrix product.
This extra work can be avoided by leveraging the out-param of
linalg.matmul. The new IR sequence is:
%init = tensor.empty()
%broadcast = linalg.broadcast ins(%bias) outs(%init)
%prod = linalg.matmul ins(%A, %B) outs(%broadcast)
In my experiments, this eliminates one loop and one allocation (post
bufferization) from the generated code.
Diffstat (limited to 'llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp')
0 files changed, 0 insertions, 0 deletions