aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
diff options
context:
space:
mode:
authorDurgadoss R <durgadossr@nvidia.com>2024-11-07 15:21:53 +0530
committerGitHub <noreply@github.com>2024-11-07 15:21:53 +0530
commit1b01064faad2cd93c516341cfaf047b7a0f8da42 (patch)
treee84a1dd29dec16a07b63bf6aca3376eb717bb0de /llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp
parent2d7f34f2a5df9396a33a0ea044cfe3ddf33e1f5c (diff)
downloadllvm-1b01064faad2cd93c516341cfaf047b7a0f8da42.zip
llvm-1b01064faad2cd93c516341cfaf047b7a0f8da42.tar.gz
llvm-1b01064faad2cd93c516341cfaf047b7a0f8da42.tar.bz2
[NVPTX] Add TMA bulk tensor copy intrinsics (#96083)
This patch adds NVVM intrinsics and NVPTX codegen for: * cp.async.bulk.tensor.S2G.1D -> 5D variants, supporting both Tile and Im2Col modes. These intrinsics optionally support cache_hints as indicated by the boolean flag argument. * cp.async.bulk.tensor.G2S.1D -> 5D variants, with support for both Tile and Im2Col modes. The Im2Col variants have an extra set of offsets as parameters. These intrinsics optionally support multicast and cache_hints, as indicated by the boolean arguments at the end of the intrinsics. * The backend looks through these flag arguments and lowers to the appropriate PTX instruction. * Lit tests are added for all combinations of these intrinsics in cp-async-bulk-tensor-g2s/s2g.ll. * The generated PTX is verified with a 12.3 ptxas executable. * Added docs for these intrinsics in NVPTXUsage.rst file. * PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-tensor Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Diffstat (limited to 'llvm/lib/Target/ARM/Disassembler/ARMDisassembler.cpp')
0 files changed, 0 insertions, 0 deletions