diff options
| author | Andrzej WarzyĆski <andrzej.warzynski@arm.com> | 2024-09-24 14:03:30 +0100 | 
|---|---|---|
| committer | GitHub <noreply@github.com> | 2024-09-24 14:03:30 +0100 | 
| commit | b47d1787b51f55d69ef1b4f88e72cd54af451649 (patch) | |
| tree | 40529c877f5c38c91aed3304106ea258a6b10f1f /llvm/lib/Object/SymbolSize.cpp | |
| parent | 12033e550b186f3b3e4d2ca3ce9cfc3d3a3fa6e1 (diff) | |
| download | llvm-b47d1787b51f55d69ef1b4f88e72cd54af451649.zip llvm-b47d1787b51f55d69ef1b4f88e72cd54af451649.tar.gz llvm-b47d1787b51f55d69ef1b4f88e72cd54af451649.tar.bz2  | |
[mlir][vector] Refine vectorisation of tensor.extract (#109580)
This PR fixes a bug in `isLoopInvariantIdx`. It makes sure that the
following case is vectorised as `vector.gather` (as opposed to
attempting a contiguous load):
```mlir
  func.func @index_from_output_column_vector_gather_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> {
    %c0 = arith.constant 0 : index
    %0 = tensor.empty() : tensor<8x1xf32>
    %res = linalg.generic {
      indexing_maps = [#map],
      iterator_types = ["parallel", "parallel"]
    } outs(%0 : tensor<8x1xf32>) {
    ^bb0(%arg1: f32):
        %1 = linalg.index 0 : index
      %extracted = tensor.extract %src[%1, %c0] : tensor<8x128xf32>
        linalg.yield %extracted : f32
    } -> tensor<8x1xf32>
    return %res : tensor<8x1xf32>
  }
```
Specifically, when looking for loop-invariant indices in
`tensor.extract` Ops, any `linalg.index` Op that's used in address
colcluation should only access loop dims that are == 1. In the example
above, the following does not meet that criteria:
```mlir
  %1 = linalg.index 0 : index
```
Note that this PR also effectively addresses the issue fixed in #107922,
i.e. exercised by:
  * `@vectorize_nd_tensor_extract_load_1d_column_vector_using_gather_load`
`getNonUnitLoopDim` introduced in #107922 is still valid though. In
fact, it is required to identify that the following case is a contiguous
load:
```mlir
  func.func @index_from_output_column_vector_contiguous_load(%src: tensor<8x128xf32>) -> tensor<8x1xf32> {
    %c0 = arith.constant 0 : index
    %0 = tensor.empty() : tensor<8x1xf32>
    %res = linalg.generic {
      indexing_maps = [#map],
      iterator_types = ["parallel", "parallel"]
    } outs(%0 : tensor<8x1xf32>) {
    ^bb0(%arg1: f32):
        %1 = linalg.index 0 : index
      %extracted = tensor.extract %src[%c0, %1] : tensor<8x128xf32>
        linalg.yield %extracted : f32
    } -> tensor<8x1xf32>
    return %res : tensor<8x1xf32>
  }
```
Some logic is still missing to lower the above to
`vector.transfer_read`, so it is conservatively lowered to
`vector.gather` instead (see TODO in
`getTensorExtractMemoryAccessPattern`).
There's a few additional changes:
  * `getNonUnitLoopDim` is simplified and renamed as
    `getTrailingNonUnitLoopDimIdx`, additional comments are added (note
    that the functionality didn't change);
  * extra comments in a few places, variable names in comments update to
    use Markdown (which is the preferred approach in MLIR).
This is a follow-on for:
  * https://github.com/llvm/llvm-project/pull/107922
  * https://github.com/llvm/llvm-project/pull/102321
Diffstat (limited to 'llvm/lib/Object/SymbolSize.cpp')
0 files changed, 0 insertions, 0 deletions
