riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Durgadoss R <durgadossr@nvidia.com>	2024-01-10 01:34:13 +0530
committer	GitHub <noreply@github.com>	2024-01-09 12:04:13 -0800
commit	340cc1702e21128b62799c5dfbf2875c3c2c96a1 (patch)
tree	766fbc2aa6d700c1ec9d84cc64d530b632df3db6 /llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp
parent	c7c68f1764ddd38d940946007c634b4bacb902b2 (diff)
download	llvm-340cc1702e21128b62799c5dfbf2875c3c2c96a1.zip llvm-340cc1702e21128b62799c5dfbf2875c3c2c96a1.tar.gz llvm-340cc1702e21128b62799c5dfbf2875c3c2c96a1.tar.bz2

[LLVM][NVPTX]: Add intrinsic for setmaxnreg (#77289)

This patch adds an intrinsic for setmaxnreg PTX instruction. * PTX Doc link for this instruction: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#miscellaneous-instructions-setmaxnreg * The i32 argument, an immediate value, specifies the actual absolute register count for the instruction. * The `setmaxnreg` instruction is available in SM90a. So, this patch adds 'hasSM90a' predicate to use in the NVPTX backend. * lit tests are added to verify the lowering of the intrinsic. * Verifier logic (and tests) are added to test the register count range and divisibility-by-8 requirements. Signed-off-by: Durgadoss R <durgadossr@nvidia.com>

Diffstat (limited to 'llvm/lib/CodeGen/TargetLoweringObjectFileImpl.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: