diff options
author | Durgadoss R <durgadossr@nvidia.com> | 2025-05-15 16:08:01 +0530 |
---|---|---|
committer | GitHub <noreply@github.com> | 2025-05-15 16:08:01 +0530 |
commit | c507a0830df2e4fd0c234eee035aac2109de6d6e (patch) | |
tree | 98cda74dc70e29430000ac58897f8a1488a22fb0 /clang/lib/AST/ByteCode/Compiler.cpp | |
parent | d5da557782dd47395fb41e03d7663df6319d7ea6 (diff) | |
download | llvm-c507a0830df2e4fd0c234eee035aac2109de6d6e.zip llvm-c507a0830df2e4fd0c234eee035aac2109de6d6e.tar.gz llvm-c507a0830df2e4fd0c234eee035aac2109de6d6e.tar.bz2 |
[NVPTX] Add TMA Bulk Copy Intrinsics (#138679)
This patch adds a new variant of TMA Bulk Copy
intrinsics introduced in sm100+. This variant
has an additional byte_mask to select the bytes
for the copy operation.
* Selection is all done through table-gen now.
So, this patch removes the corresponding
SelectCpAsyncBulkS2G() function.
* lit tests are verified with a cuda-12.8 ptxas
executable.
PTX Spec link:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-bulk-copy
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Diffstat (limited to 'clang/lib/AST/ByteCode/Compiler.cpp')
0 files changed, 0 insertions, 0 deletions