diff options
| author | Charitha Saumya <136391709+charithaintc@users.noreply.github.com> | 2025-11-04 13:15:32 -0800 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2025-11-04 13:15:32 -0800 |
| commit | 9703bda95b088bb6a455ef9faffdb41c537aff2f (patch) | |
| tree | fa30be9f0439ed7d1efc519c13f31b0260226a68 /llvm/lib/Target/WebAssembly/WebAssemblyFixFunctionBitcasts.cpp | |
| parent | 2141edf506baab7e526f3a305bcdb6d6f2c772bc (diff) | |
| download | llvm-9703bda95b088bb6a455ef9faffdb41c537aff2f.zip llvm-9703bda95b088bb6a455ef9faffdb41c537aff2f.tar.gz llvm-9703bda95b088bb6a455ef9faffdb41c537aff2f.tar.bz2 | |
[mlir][xegpu] Add OptimizeBlockLoads pass. (#165483)
This pass rewrites certain xegpu `CreateNd` and `LoadNd` operations that
feeds into `vector.transpose` to more optimal form to improve
performance. Specifically, low precision (bitwidth < 32) `LoadNd` ops
that feeds into transpose ops are rewritten to i32 loads with a valid
transpose layout such that later passes can use the load with transpose
HW feature to accelerate such load ops.
**Update:**
Pass is renamed to `OptimizeBlockLoads ` because later we plan to add
the array length optimization into this pass as well. This will break
down a larger load (like `32x32xf16`) into more DPAS-favorable array
length loads (`32x16xf16` with array length = 2). Both these
optmizations require rewriting `CreateNd` and `LoadNd` and it makes
sense to have a common pass for both.
Diffstat (limited to 'llvm/lib/Target/WebAssembly/WebAssemblyFixFunctionBitcasts.cpp')
0 files changed, 0 insertions, 0 deletions
