riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Manish Gupta <manigupta@google.com>	2022-11-29 20:36:40 -0800
committer	Manish Gupta <manigupta@google.com>	2022-12-01 18:26:33 -0800
commit	9774cd17e80fc413cef73e1e7e9bac20ef21ebae (patch)
tree	74ae3184285affc633c7b65b0a326217ec4f47d4 /clang/lib/Lex/ModuleMap.cpp
parent	1c2ee6bdceaa608fee9095f948dca724ddf15b83 (diff)
download	llvm-9774cd17e80fc413cef73e1e7e9bac20ef21ebae.zip llvm-9774cd17e80fc413cef73e1e7e9bac20ef21ebae.tar.gz llvm-9774cd17e80fc413cef73e1e7e9bac20ef21ebae.tar.bz2

[mlir][nvgpu] Fix affine maps computing indices for LdMatrixOp srcMemref

This patch fixes and simplifies the ldmatrix affine map arithmetic by abstracting the affine expressions in terms of pitch-linear layout (strided and contiguous dimensions). Then it applies the maps for strided and contiguous dimensions in row-major and col-major. LdMatrixOp collaboratively (32 threads in a warp) load tiles (8 row x 128b col) of data. It can load either x1, x2, x4 tiles. Additionally, it can transpose at 16-bit granularity when moving data from the Shared Memory to registers. This patch fixes affine map: (laneid -> coordinate index a thread points in a tile). - Loading x4 tiles needs all 32 lanes T0-31 point to a contiguous chunk of 128b. The issue was exposed when running this case. - Loading x2 tiles and x1 needs T0-15 threads and T0-7 threads points to contiguous chunk of 128b. The patch is NFC for these cases. Differential Revision: https://reviews.llvm.org/D138978

Diffstat (limited to 'clang/lib/Lex/ModuleMap.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: