aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-09-03[AMDGPU] Support cluster_load_async_to_lds instructions on gfx1250 (#156595)Changpeng Fang1-0/+11
2025-09-02[AMDGPU] Support cluster load instructions for gfx1250 (#156548)Changpeng Fang1-0/+10
2025-08-28AMDGPU: Refactor lowering of s_barrier to split barriers (#154648)Nicolai Hähnle1-37/+0
Let's do the lowering of non-split into split barriers in a new IR pass, AMDGPULowerIntrinsics. That way, there is no code duplication between SelectionDAG and GlobalISel. This simplifies some upcoming extensions to the code.
2025-08-12[AMDGPU] Add s_barrier_init|join|leave instructions (#153296)Stanislav Mekhanoshin1-1/+10
2025-08-12[AMDGPU][GISel] Only fold flat offsets if they are inbounds (#153001)Fabian Ritter1-20/+35
For flat memory instructions where the address is supplied as a base address register with an immediate offset, the memory aperture test ignores the immediate offset. Currently, ISel does not respect that, which leads to miscompilations where valid input programs crash when the address computation relies on the immediate offset to get the base address in the proper memory aperture. Global or scratch instructions are not affected. This patch only selects flat instructions with immediate offsets from address computations with the inbounds flag: If the address computation does not leave the bounds of the allocated object, it cannot leave the bounds of the memory aperture and is therefore safe to handle with an immediate offset. Relevant tests are in fold-gep-offset.ll. Analogous to #132353 for SDAG (which is not yet in a mergeable state, its progress is currently blocked by #146076). Fixes SWDEV-516125 for GISel.
2025-08-04[AMDGPU] Use SDNodeXForm to select a few VOP3P modifiers, NFC (#151907)Changpeng Fang1-60/+32
It is not necessary to use ComplexPattern to select VOP3PModsNeg, VOP3PModsNegs and VOP3PModsNegAbs. We can use SDNodeXForm instead.
2025-07-30[AMDGPU] Fix destination op_sel for v_cvt_scale32_* and v_cvt_sr_* (#151411)Changpeng Fang1-2/+2
GFX950 uses OP_SEL[MSB:LSB] for both src reads and dest writes. So this patch essentially revert the work from https://github.com/llvm/llvm-project/pull/151286 regarding dest writes.
2025-07-30[AMDGPU] Fix op_sel settings for v_cvt_scale32_* and v_cvt_sr_* (#151286)Changpeng Fang1-7/+8
For OPF_OPSEL_SRCBYTE: Vector instruction uses OPSEL[1:0] to specify a byte select for the first source operand. So op_sel [0, 0], [1, 0], [0, 1] and [1, 1] should map to byte 0, 1, 2 and 3, respectively. For OPF_OPSEL_DSTBYTE: OPSEL is used as a destination byte select. OPSEL[2:3] specify which byte of the destination to write to. Note that the order of the bits is different from that of OPF_OPSEL_SRCBYT. So the mapping should be: op_sel [0, 0], [0, 1], [1, 0] and [1, 1] map to byte 0, 1, 2 and 3, respectively. Fixes: SWDEV-544901
2025-07-29[AMDGPU] Bitop3 opcodes for gfx1250 (#151235)Stanislav Mekhanoshin1-0/+3
2025-07-29[AMDGPU] Implement v_mad_u32/v_mad_nc_u|i64_u32 on gfx1250 (#151226)Stanislav Mekhanoshin1-0/+9
2025-07-29[AMDGPU] Support builtin/intrinsics for async loads/stores on gfx1250 (#151058)Changpeng Fang1-0/+11
2025-07-24[AMDGPU] Support builtin/intrinsics for load monitors on gfx1250 (#150540)Changpeng Fang1-0/+10
2025-07-24[AMDGPU] Select VMEM prefetch for llvm.prefetch on gfx1250 (#150493)Stanislav Mekhanoshin1-0/+11
We have a choice to use a scalar or vector prefetch for an uniform pointer. Since we do not have scalar stores our scalar cache is practically readonly. The rw argument of the prefetch intrinsic is used to force vector operation even for an uniform case. On GFX12 scalar prefetch will be used anyway, it is still useful but it will only bring data to L2.
2025-07-22[AMDGPU] Select scale_offset for scratch instructions on gfx1250 (#150111)Stanislav Mekhanoshin1-5/+15
2025-07-22[AMDGPU] Select scale_offset for global instructions on gfx1250 (#150107)Stanislav Mekhanoshin1-13/+50
Also switches immediate offset to signed for the subtarget.
2025-07-22[AMDGPU] Select scale_offset with SMEM instructions (#150078)Stanislav Mekhanoshin1-18/+144
2025-07-21[AMDGPU] ISel & PEI for whole wave functions (#145858)Diana Picus1-0/+4
Whole wave functions are functions that will run with a full EXEC mask. They will not be invoked directly, but instead will be launched by way of a new intrinsic, `llvm.amdgcn.call.whole.wave` (to be added in a future patch). These functions are meant as an alternative to the `llvm.amdgcn.init.whole.wave` or `llvm.amdgcn.strict.wwm` intrinsics. Whole wave functions will set EXEC to -1 in the prologue and restore the original value of EXEC in the epilogue. They must have a special first argument, `i1 %active`, that is going to be mapped to EXEC. They may have either the default calling convention or amdgpu_gfx. The inactive lanes need to be preserved for all registers used, active lanes only for the CSRs. At the IR level, arguments to a whole wave function (other than `%active`) contain poison in their inactive lanes. Likewise, the return value for the inactive lanes is poison. This patch contains the following work: * 2 new pseudos, SI_SETUP_WHOLE_WAVE_FUNC and SI_WHOLE_WAVE_FUNC_RETURN used for managing the EXEC mask. SI_SETUP_WHOLE_WAVE_FUNC will return a SReg_1 representing `%active`, which needs to be passed into SI_WHOLE_WAVE_FUNC_RETURN. * SelectionDAG support for generating these 2 new pseudos and the special handling of %active. Since the return may be in a different basic block, it's difficult to add the virtual reg for %active to SI_WHOLE_WAVE_FUNC_RETURN, so we initially generate an IMPLICIT_DEF which is later replaced via a custom inserter. * Expansion of the 2 pseudos during prolog/epilog insertion. PEI also marks any used VGPRs as WWM registers, which are then spilled and restored with the usual logic. Future patches will include the `llvm.amdgcn.call.whole.wave` intrinsic and a lot of optimization work (especially in order to reduce spills around function calls). --------- Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com> Co-authored-by: Shilei Tian <i@tianshilei.me>
2025-07-18[AMDGPU] Select flat GVS atomics on gfx1250 (#149554)Stanislav Mekhanoshin1-2/+18
2025-07-15AMDGPU: Support intrinsic selection for gfx1250 wmma instructions (#148957)Changpeng Fang1-0/+93
Co-authored-by: Stanislav Mekhanoshin <Stanislav.Mekhanoshin@amd.com> Co-authored-by: Shilei Tian <Shilei.Tian@amd.com>
2025-07-08[AMDGPU] Fix broken uses of isLegalFLATOffset and splitFlatOffset (#147469)Fabian Ritter1-1/+2
The last parameter of these functions used to be `Signed`, and it looks like a few calls weren't updated when that was changed to `FlatVariant`. Effectively, the functions were called with `FlatVariant=SALU` due to integer promotions, which doesn't make any sense.
2025-07-08[AMDGPU] Re-Re-apply: Implement vop3p complex pattern optmization for gisel ↵Shoreshen1-32/+553
(#146984) Reverts llvm/llvm-project#146982 Fix up reported building error for https://github.com/llvm/llvm-project/pull/136262 with: ``` FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4566 | } | ^ 1 error generated. ninja: build stopped: subcommand failed. ``` --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-04Revert "[AMDGPU] Re-apply: Implement vop3p complex pattern optmization for ↵Shoreshen1-563/+32
gisel" (#146982) Reverts llvm/llvm-project#136262 Due to building error: ``` FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /home/b/sanitizer-aarch64-linux/build/llvm_build0/bin/clang++ -DGTEST_HAS_RTTI=0 -DLLVM_EXPORTS -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux/build/build_default/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux/build/build_default/include -I/home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden -fno-exceptions -funwind-tables -fno-rtti -UNDEBUG -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUInstructionSelector.cpp.o -c /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4566:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4566 | } | ^ 1 error generated. ninja: build stopped: subcommand failed. ```
2025-07-04[AMDGPU] Re-apply: Implement vop3p complex pattern optmization for gisel ↵Shoreshen1-32/+563
(#136262) This is a fix up for patch https://github.com/llvm/llvm-project/pull/130234, which is reverted in https://github.com/llvm/llvm-project/pull/136249 The main reason of building failure are: 1. ``` /home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp: In function ‘llvm::SmallVector<std::pair<const llvm::MachineOperand*, SrcStatus> > getSrcStats(const llvm::MachineOperand*, const llvm::MachineRegisterInfo&, searchOptions, int)’: /home/botworker/bbot/amdgpu-offload-rhel-9-cmake-build-only/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4669: error: could not convert ‘Statlist’ from ‘SmallVector<[...],4>’ to ‘SmallVector<[...],3>’ 4669 | return Statlist; ``` 2. ``` /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4554:1: error: non-void function does not return a value in all control paths [-Werror,-Wreturn-type] 4554 | } | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4644:39: error: overlapping comparisons always evaluate to true [-Werror,-Wtautological-overlap-compare] 4644 | (Stat >= SrcStatus::NEG_START || Stat <= SrcStatus::NEG_END)) { | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4893:66: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4893 | [=](MachineInstrBuilder &MIB) { MIB.addImm(getAllKindImm(Op)); }, | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9: note: 'Op' declared here 4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4894:52: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4894 | [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13: note: 'Mods' declared here 4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4899:50: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4899 | [=](MachineInstrBuilder &MIB) { MIB.addReg(Op->getReg()); }, | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:9: note: 'Op' declared here 4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4900:50: error: captured structured bindings are a C++20 extension [-Werror,-Wc++20-extensions] 4900 | [=](MachineInstrBuilder &MIB) { MIB.addImm(Mods); } // src_mods | ^ /home/b/sanitizer-aarch64-linux/build/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp:4890:13: note: 'Mods' declared here 4890 | auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); | ^ 6 errors generated. ``` Both error cannot be reproduced at my local machine, the fix applied are: 1. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function `getSrcStats` replace ``` SmallVector<std::pair<const MachineOperand *, SrcStatus>, 4> Statlist; ``` with ``` SmallVector<std::pair<const MachineOperand *, SrcStatus>> Statlist; ``` 2. In `llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp` function `AMDGPUInstructionSelector::selectVOP3PRetHelper` replace ``` auto [Op, Mods] = selectVOP3PModsImpl(&Root, MRI, IsDOT); ``` with ``` auto Results = selectVOP3PModsImpl(&Root, MRI, IsDOT); const MachineOperand *Op = Results.first; unsigned Mods = Results.second; ``` These change hasn't be testified since both errors cannot be reproduced in local
2025-06-23AMDGPU: Avoid report_fatal_error on ds ordered intrinsics (#145202)Matt Arsenault1-6/+15
2025-06-19AMDGPU/GFX12: Fix s_barrier_signal_isfirst for single-wave workgroups (#143634)Nicolai Hähnle1-0/+3
Barrier instructions are no-ops in single-wave workgroups. This includes s_barrier_signal_isfirst, which will leave SCC unmodified. Model this correctly (via an implicit use of SCC) and ensure SCC==1 before the barrier instruction (if the wave is the only one of the workgroup, then it is the first). --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
2025-05-28Warn on misuse of DiagnosticInfo classes that hold Twines (#137397)Justin Bogner1-3/+3
This annotates the `Twine` passed to the constructors of the various DiagnosticInfo subclasses with `[[clang::lifetimebound]]`, which causes us to warn when we would try to print the twine after it had already been destructed. We also update `DiagnosticInfoUnsupported` to hold a `const Twine &` like all of the other DiagnosticInfo classes, since this warning allows us to clean up all of the places where it was being used incorrectly.
2025-05-19[AMDGPU] Add a new amdgcn.load.to.lds intrinsic (#137425)Krzysztof Drewniak1-0/+5
This PR adds a amdgns_load_to_lds intrinsic that abstracts over loads to LDS from global (address space 1) pointers and buffer fat pointers (address space 7), since they use the same API and "gather from a pointer to LDS" is something of an abstract operation. This commit adds the intrinsic and its lowerings for addrspaces 1 and 7, and updates the MLIR wrappers to use it (loosening up the restrictions on loads to LDS along the way to match the ground truth from target features). It also plumbs the intrinsic through to clang.
2025-05-05AMDGPU: Fix -Wextra (#138539)Matt Arsenault1-2/+3
Another stupid gcc warning. Ideally we would directly use the enum type, but subregister indexes are emitted as an anonymous enum. Fixes #125548
2025-05-05[AMDGPU] Support arbitrary types in amdgcn.dead (#134841)Diana Picus1-6/+0
Legalize the amdgcn.dead intrinsic to work with types other than i32. It still generates IMPLICIT_DEFs. Remove some of the previous code for selecting/reg bank mapping it for 32-bit types, since everything is done in the legalizer now.
2025-04-24[AMDGPU] Use variadic isa<>. NFC. (#137016)Jay Foad1-2/+1
2025-04-17Revert "[AMDGPU] Implement vop3p complex pattern optmization for gisel" ↵Shoreshen1-571/+33
(#136249) Reverts llvm/llvm-project#130234
2025-04-18[AMDGPU] Implement vop3p complex pattern optmization for gisel (#130234)Shoreshen1-33/+571
Seeking opportunities to optimize VOP3P instructions by altering opsel, opsel_hi, neg, neg_hi bits Tests differences: 1. fix op_sel_hi bit for inline constant: 1. `CodeGen/AMDGPU/packed-fp32.ll` 2. use neg bit to remove xor with 0x80008000 1. `CodeGen/AMDGPU/strict_fsub.f16.ll` 2. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.fdot2.ll` 3. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot4.ll` 4. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot8.ll` 5. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot2.ll` 6. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot4.ll` 7. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.udot8.ll` 3. Remove xor 0x80008000, and use opsel, opsel_hi to remove alignbit 1. `CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sdot2.ll`
2025-04-02[AMDGPU] Use a target feature to enable __builtin_amdgcn_global_load_lds on ↵Juan Manuel Martinez Caamaño1-1/+5
gfx9/10 (#133055) This patch introduces the `vmem-to-lds-load-insts` target feature, which can be used to enable builtins `__builtin_amdgcn_global_load_lds` and `__builtin_amdgcn_raw_ptr_buffer_load_lds` on platforms which have this feature. This feature is only available on gfx9/10. A limitation of using a common target feature for both builtins is that we could have made `__builtin_amdgcn_raw_ptr_buffer_load_lds` available on gfx6,7,8.
2025-03-29[GlobalISel][NFC] Rename GISelKnownBits to GISelValueTracking (#133466)Tim Gymnich1-20/+21
- rename `GISelKnownBits` to `GISelValueTracking` to analyze more than just `KnownBits` in the future
2025-03-19[AMDGPU] Support image_bvh8_intersect_ray instruction and intrinsic. (#130041)Mariusz Sikora1-0/+1
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-03-19[AMDGPU] Add intrinsic and MI for image_bvh_dual_intersect_ray (#130038)Mariusz Sikora1-3/+6
- Add llvm.amdgcn.image.bvh.dual.intersect.ray intrinsic and image_bvh_dual_intersect_ray machine instruction. - Add llvm_v10i32_ty and llvm_v10f32_ty --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2025-03-17[AMDGPU] Add intrinsics and MIs for ds_bvh_stack_* (#130007)Mariusz Sikora1-1/+18
New intrinsics / instructions : int_amdgcn_ds_bvh_stack_push4_pop1_rtn / ds_bvh_stack_push4_pop1_rtn_b32 int_amdgcn_ds_bvh_stack_push8_pop1_rtn / ds_bvh_stack_push8_pop1_rtn_b32 int_amdgcn_ds_bvh_stack_push8_pop2_rtn / ds_bvh_stack_push8_pop2_rtn_b64 Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com>
2025-03-13[AMDGPU][True16][CodeGen] gisel true16 for ICMP (#128913)Brox Chen1-4/+16
GlobalIsel true16 selection for ICMP
2025-03-06[AMDGPU][NFC] Update name for BVH Intersect Ray (#130036)Mariusz Sikora1-3/+4
Co-authored-by: Ivan Kosarev <ivan.kosarev@amd.com>
2025-03-04[AMDGPU] Remove unused s_barrier_{init,join,leave} instructions (#129548)Mariusz Sikora1-10/+1
2025-03-03[AMDGPU] Simplify conditional expressions. NFC. (#129228)Jay Foad1-2/+2
Simplfy `cond ? val : false` to `cond && val` and similar.
2025-02-21[AMDGPU][True16][CodeGen] build_vector pattern in true16 (#118904)Brox Chen1-1/+1
build_vector pattern in true16 SDAG
2025-02-20[AMDGPU] Add llvm.amdgcn.dead intrinsic (#123190)Diana Picus1-0/+6
Shaders that use the llvm.amdgcn.init.whole.wave intrinsic need to explicitly preserve the inactive lanes of VGPRs of interest by adding them as dummy arguments. The code usually looks something like this: ``` define amdgcn_cs_chain void f(active vgpr args..., i32 %inactive.vgpr1, ..., i32 %inactive.vgprN) { entry: %c = call i1 @llvm.amdgcn.init.whole.wave() br i1 %c, label %shader, label %tail shader: [...] tail: %inactive.vgpr.arg1 = phi i32 [ %inactive.vgpr1, %entry], [poison, %shader] [...] ; %inactive.vgpr* then get passed into a llvm.amdgcn.cs.chain call ``` Unfortunately, this kind of phi node will get optimized away and the backend won't be able to figure out that it's ok to use the active lanes of `%inactive.vgpr*` inside `shader`. This patch fixes the issue by introducing a llvm.amdgcn.dead intrinsic, whose result can be used as a PHI operand instead of the poison. This will be selected to an IMPLICIT_DEF, which the backend can work with. At the moment, the llvm.amdgcn.dead intrinsic works only on i32 values. Support for other types can be added later if needed.
2025-02-19[AMDGPU] Replace gfx940 and gfx941 with gfx942 in llvm (#126763)Fabian Ritter1-1/+1
gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all non-documentation occurrences of gfx940/gfx941 from the llvm directory, and the remaining occurrences in clang. Documentation changes will follow. For SWDEV-512631
2025-01-30[AMDGPU][True16][CodeGen] true16 codegen for icmp and is_fpclass (#124757)Brox Chen1-2/+1
True16 codegen pattern for icmp patterns and is_fpclass
2025-01-24AMDGPU/GlobalISel: AMDGPURegBankLegalize (#112864)Petar Avramovic1-1/+89
Lower G_ instructions that can't be inst-selected with register bank assignment from AMDGPURegBankSelect based on uniformity analysis. - Lower instruction to perform it on assigned register bank - Put uniform value in vgpr because SALU instruction is not available - Execute divergent instruction in SALU - "waterfall loop" Given LLTs on all operands after legalizer, some register bank assignments require lowering while other do not. Note: cases where all register bank assignments would require lowering are lowered in legalizer. AMDGPURegBankLegalize goals: - Define Rules: when and how to perform lowering - Goal of defining Rules it to provide high level table-like brief overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). - Fast search of Rules, depends on how complicated Rule.Predicate is - For some opcodes there would be too many Rules that are essentially all the same just for different combinations of types and banks. Write custom function that handles all cases. - Rules are made from enum IDs that correspond to each operand. Names of IDs are meant to give brief description what lowering does for each operand or the whole instruction. - AMDGPURegBankLegalizeHelper implements lowering algorithms Since this is the first patch that actually enables -new-reg-bank-select here is the summary of regression tests that were added earlier: - if instruction is uniform always select SALU instruction if available - eliminate back to back vgpr to sgpr to vgpr copies of uniform values - fast rules: small differences for standard and vector instruction - enabling Rule based on target feature - salu_float - how to specify lowering algorithm - vgpr S64 AND to S32 - on G_TRUNC in reg, it is up to user to deal with truncated bits G_TRUNC in reg is treated as no-op. - dealing with truncated high bits - ABS S16 to S32 - sgpr S1 phi lowering - new opcodes for vcc-to-scc and scc-to-vcc copies - lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC) - S1 zext and sext lowering to select - uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into SALU instruction - divergent phi with uniform inputs - divergent instruction with temporal divergent use, source instruction is defined as uniform(AMDGPURegBankSelect) - missing temporal divergence lowering - uniform phi, because of undef incoming, is assigned to vgpr. Will be fixed in AMDGPURegBankSelect via another fix in machine uniformity analysis.
2025-01-10[AMDGPU] Remove s_wakeup_barrier instruction (#122277)Mirko Brkušanin1-5/+0
2025-01-10[AMDGPU] Allow selection of BITOP3 for some 2 opcodes and B32 cases (#122267)Jakub Chlanda1-7/+6
This came up in downstream static analysis - as a dead code. Admittedly, it depends on what the intention was when checking for [`if (NumOpcodes == 2 && IsB32)`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp#L3792C3-L3792C32) and I took a guess that for certain cases the selection should take place. If that's incorrect, that whole if statement can be removed, as it is after a check for: [`if (NumOpcodes < 4)`](https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp#L3788)
2025-01-03[AMDGPU][True16][MC] disable incorrect VOPC t16 instruction (#120271)Brox Chen1-1/+2
The current VOPC t16 instructions are not implemented with the correct t16 pseudo. Thus the current t16/fake16 instructions are all in fake16 format. The plan is to remove the incorrect t16 instructions and refactor them. The first step is to remove them in this patch. The next step will be updating the t16/fake16 pseudo to the correct format and add back true16 instruction one by one in the upcoming patches.
2024-11-30AMDGPU/GlobalISel: Do not try to form v_bitop3_b32 for SGPR results (#117940)Matt Arsenault1-5/+7