diff options
author | Andrzej WarzyĆski <andrzej.warzynski@arm.com> | 2025-04-24 18:05:41 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-04-24 18:05:41 +0100 |
commit | 2de936b6eb38e7a37224a97c2a22aa79b9dfb9dc (patch) | |
tree | 6e183f9c0e53f2f3d6bca16096c01ed3ea277bd0 /clang/lib/Frontend/CompilerInstance.cpp | |
parent | e78b763568e47e685926614195c3075afa35668c (diff) | |
download | llvm-2de936b6eb38e7a37224a97c2a22aa79b9dfb9dc.zip llvm-2de936b6eb38e7a37224a97c2a22aa79b9dfb9dc.tar.gz llvm-2de936b6eb38e7a37224a97c2a22aa79b9dfb9dc.tar.bz2 |
[mlir][vector] Fix emulation of "narrow" type `vector.store` (#133231)
Below are two examples of "narrow" `vector.stores`. The first example
does not require partial stores and hence no RMW stores. This is
currently emulated correctly.
```mlir
func.func @example_1(%arg0: vector<4xi2>) {
%0 = memref.alloc() : memref<13xi2>
%c4 = arith.constant 4 : index
vector.store %arg0, %0[%c4] : memref<13xi2>, vector<4xi2>
return
}
```
The second example requires a partial (and hence RMW) store due to the
offset pointing outside the emulated type boundary (`%c3`).
```mlir
func.func @example_2(%arg0: vector<4xi2>) {
%0 = memref.alloc() : memref<13xi2>
%c3 = arith.constant 3 : index
vector.store %arg0, %0[%c3] : memref<13xi2>, vector<4xi2>
return
}
```
This is currently incorrectly emulated as a single "full" store (note
that the offset is incorrect) instead of partial stores:
```mlir
func.func @example_2(%arg0: vector<4xi2>) {
%alloc = memref.alloc() : memref<4xi8>
%0 = vector.bitcast %arg0 : vector<4xi2> to vector<1xi8>
%c0 = arith.constant 0 : index
vector.store %0, %alloc[%c0] : memref<4xi8>, vector<1xi8>
return
}
```
The incorrect emulation stems from this simplified (i.e. incomplete)
calculation of the front padding:
```cpp
std::optional<int64_t> foldedNumFrontPadElems =
isDivisibleInSize ? 0
: getConstantIntValue(linearizedInfo.intraDataOffset);
```
Since `isDivisibleInSize` is `true` (i8 / i2 = 4):
* front padding is set to `0` and, as a result,
* the input offset (`%c3`) is ignored, and
* we incorrectly assume that partial stores won't be needed.
Note that in both examples we are storing `vector<4xi2>` into
`memref<13xi2>` (note _different_ trailing dims) and hence partial
stores might in fact be required. The condition above is updated to:
```cpp
std::optional<int64_t> foldedNumFrontPadElems =
(isDivisibleInSize && trailingDimsMatch)
? 0
: getConstantIntValue(linearizedInfo.intraDataOffset);
```
This change ensures that the input offset is properly taken into
account, which fixes the issue. It doesn't affect `@example1`.
Additional comments are added to clarify the current logic.
Diffstat (limited to 'clang/lib/Frontend/CompilerInstance.cpp')
0 files changed, 0 insertions, 0 deletions