diff options
| author | Andrzej WarzyĆski <andrzej.warzynski@arm.com> | 2024-09-24 14:03:30 +0100 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2024-09-24 14:03:30 +0100 |
| commit | b47d1787b51f55d69ef1b4f88e72cd54af451649 (patch) | |
| tree | 40529c877f5c38c91aed3304106ea258a6b10f1f /llvm/lib/Object/MachOObjectFile.cpp | |
| parent | 12033e550b186f3b3e4d2ca3ce9cfc3d3a3fa6e1 (diff) | |
| download | llvm-b47d1787b51f55d69ef1b4f88e72cd54af451649.zip llvm-b47d1787b51f55d69ef1b4f88e72cd54af451649.tar.gz llvm-b47d1787b51f55d69ef1b4f88e72cd54af451649.tar.bz2 | |
[mlir][vector] Refine vectorisation of tensor.extract (#109580)
This PR fixes a bug in `isLoopInvariantIdx`. It makes sure that the
following case is vectorised as `vector.gather` (as opposed to
attempting a contiguous load):
```mlir
func.func @index_from_output_column_vector_gather_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> {
%c0 = arith.constant 0 : index
%0 = tensor.empty() : tensor<8x1xf32>
%res = linalg.generic {
indexing_maps = [#map],
iterator_types = ["parallel", "parallel"]
} outs(%0 : tensor<8x1xf32>) {
^bb0(%arg1: f32):
%1 = linalg.index 0 : index
%extracted = tensor.extract %src[%1, %c0] : tensor<8x128xf32>
linalg.yield %extracted : f32
} -> tensor<8x1xf32>
return %res : tensor<8x1xf32>
}
```
Specifically, when looking for loop-invariant indices in
`tensor.extract` Ops, any `linalg.index` Op that's used in address
colcluation should only access loop dims that are == 1. In the example
above, the following does not meet that criteria:
```mlir
%1 = linalg.index 0 : index
```
Note that this PR also effectively addresses the issue fixed in #107922,
i.e. exercised by:
* `@vectorize_nd_tensor_extract_load_1d_column_vector_using_gather_load`
`getNonUnitLoopDim` introduced in #107922 is still valid though. In
fact, it is required to identify that the following case is a contiguous
load:
```mlir
func.func @index_from_output_column_vector_contiguous_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> {
%c0 = arith.constant 0 : index
%0 = tensor.empty() : tensor<8x1xf32>
%res = linalg.generic {
indexing_maps = [#map],
iterator_types = ["parallel", "parallel"]
} outs(%0 : tensor<8x1xf32>) {
^bb0(%arg1: f32):
%1 = linalg.index 0 : index
%extracted = tensor.extract %src[%c0, %1] : tensor<8x128xf32>
linalg.yield %extracted : f32
} -> tensor<8x1xf32>
return %res : tensor<8x1xf32>
}
```
Some logic is still missing to lower the above to
`vector.transfer_read`, so it is conservatively lowered to
`vector.gather` instead (see TODO in
`getTensorExtractMemoryAccessPattern`).
There's a few additional changes:
* `getNonUnitLoopDim` is simplified and renamed as
`getTrailingNonUnitLoopDimIdx`, additional comments are added (note
that the functionality didn't change);
* extra comments in a few places, variable names in comments update to
use Markdown (which is the preferred approach in MLIR).
This is a follow-on for:
* https://github.com/llvm/llvm-project/pull/107922
* https://github.com/llvm/llvm-project/pull/102321
Diffstat (limited to 'llvm/lib/Object/MachOObjectFile.cpp')
0 files changed, 0 insertions, 0 deletions
