diff options
author | Durgadoss R <durgadossr@nvidia.com> | 2024-01-10 01:34:13 +0530 |
---|---|---|
committer | GitHub <noreply@github.com> | 2024-01-09 12:04:13 -0800 |
commit | 340cc1702e21128b62799c5dfbf2875c3c2c96a1 (patch) | |
tree | 766fbc2aa6d700c1ec9d84cc64d530b632df3db6 /llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp | |
parent | c7c68f1764ddd38d940946007c634b4bacb902b2 (diff) | |
download | llvm-340cc1702e21128b62799c5dfbf2875c3c2c96a1.zip llvm-340cc1702e21128b62799c5dfbf2875c3c2c96a1.tar.gz llvm-340cc1702e21128b62799c5dfbf2875c3c2c96a1.tar.bz2 |
[LLVM][NVPTX]: Add intrinsic for setmaxnreg (#77289)
This patch adds an intrinsic for setmaxnreg PTX instruction.
* PTX Doc link for this instruction:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg
* The i32 argument, an immediate value, specifies the actual
absolute register count for the instruction.
* The `setmaxnreg` instruction is available in SM90a.
So, this patch adds 'hasSM90a' predicate to use in
the NVPTX backend.
* lit tests are added to verify the lowering of the intrinsic.
* Verifier logic (and tests) are added to test the register
count range and divisibility-by-8 requirements.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Diffstat (limited to 'llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp')
0 files changed, 0 insertions, 0 deletions