diff options
author | Vivian Zhang <zhyuhang88@gmail.com> | 2025-07-29 09:58:30 -0700 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-07-29 09:58:30 -0700 |
commit | dc6d7f0637e7c80e39e8b7f0e8b61515b4961b0f (patch) | |
tree | 862ca7d4aeff1b0c77d50734b4b2352e2e01264c /llvm/lib/CodeGen/TargetLoweringBase.cpp | |
parent | 8a1b252a994dee0c30238f2e6c07516ec523cb70 (diff) | |
download | llvm-dc6d7f0637e7c80e39e8b7f0e8b61515b4961b0f.zip llvm-dc6d7f0637e7c80e39e8b7f0e8b61515b4961b0f.tar.gz llvm-dc6d7f0637e7c80e39e8b7f0e8b61515b4961b0f.tar.bz2 |
[mlir][linalg] Fix padding shape computation in PadTilingInterface for convs (#149576)
This PR fixes the computation of padded shapes for convolution-style
affine maps (e.g., d0 + d1) in `PadTilingInterface`. Previously, the
codes used the direct sum of loop upper bounds, leading to over-padding.
For example, the following `conv_2d_nhwc_fhwc` op, if only padding the c
dimensions to multiples of 16, it also incorrectly pads the convolved
dimensions and generates the wrong input shape as:
```
%padded = tensor.pad %arg0 low[0, 0, 0, 0] high[0, 1, 1, 12] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : f32
} : tensor<1x16x16x4xf32> to tensor<1x17x17x16xf32>
%padded_0 = tensor.pad %arg1 low[0, 0, 0, 0] high[0, 0, 0, 12] {
^bb0(%arg3: index, %arg4: index, %arg5: index, %arg6: index):
tensor.yield %cst : f32
} : tensor<16x3x3x4xf32> to tensor<16x3x3x16xf32>
%0 = linalg.conv_2d_nhwc_fhwc {dilations = dense<1> : tensor<2xi64>, strides = dense<1> : tensor<2xi64>} ins(%padded, %padded_0 : tensor<1x17x17x16xf32>, tensor<16x3x3x16xf32>) outs(%arg2 : tensor<1x14x14x16xf32>) -> tensor<1x14x14x16xf32>
return %0 : tensor<1x14x14x16xf32>
```
The new implementation uses the maximum accessed index as the input for
affine map and then adds 1 after aggregating all the terms to get the
final padded size. This fixed
https://github.com/llvm/llvm-project/issues/148679.
Diffstat (limited to 'llvm/lib/CodeGen/TargetLoweringBase.cpp')
0 files changed, 0 insertions, 0 deletions