aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
AgeCommit message (Collapse)AuthorFilesLines
12 days[PowerPC] using milicode call for strlen instead of lib call (#153600)zhijian lin1-3/+2
AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strlen routine is a millicode implementation; we use millicode for the strlen function instead of a library call to improve performance.
14 days[IR] NFC: Remove 'experimental' from partial.reduce.add intrinsic (#158637)Sander de Smalen1-1/+1
The partial reduction intrinsics are no longer experimental, because they've been used in production for a while and are unlikely to change.
2025-09-16[SelectionDAGBuilder][PPC] Use getShiftAmountConstant. (#158400)Craig Topper1-16/+11
The PowerPC changes are caused by shifts created by different IR operations being CSEd now. This allows consecutive loads to be turned into vectors earlier. This has effects on the ordering of other combines and legalizations. This leads to some improvements and some regressions.
2025-09-04[AMDGPU] Tail call support for whole wave functions (#145860)Diana Picus1-21/+33
Support tail calls to whole wave functions (trivial) and from whole wave functions (slightly more involved because we need a new pseudo for the tail call return, that patches up the EXEC mask). Move the expansion of whole wave function return pseudos (regular and tail call returns) to prolog epilog insertion, since that's where we patch up the EXEC mask.
2025-09-02[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes ↵Sam Tebbs1-0/+12
(#117007) It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds intrinsics designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. The `loop.dependence.war.mask` intrinsic represents cases where the store occurs after the load, and the opposite for `loop.dependence.raw.mask`. The distinction between write-after-read and read-after-write is important, since the ordering of the read and write operations affects if the chain of those instructions can be done safely. Along with the two pointer parameters, the intrinsics also take an immediate that represents the size in bytes of the vector element types. This will be used by #100579.
2025-08-27[CodeGen][RISCV] Add support of RISCV nontemporal to vector predication ↵daniel-trujillo-bsc1-10/+26
instructions. (#153033) This PR adds support for VP intrinsics to be aware of the nontemporal metadata information.
2025-08-22[llvm] Remove unused includes of SmallSet.h (NFC) (#154893)Kazu Hirata1-1/+0
We just replaced SmallSet<T *, N> with SmallPtrSet<T *, N>, bypassing the redirection found in SmallSet.h. With that, we no longer need to include SmallSet.h in many files.
2025-08-18[llvm] Replace SmallSet with SmallPtrSet (NFC) (#154068)Kazu Hirata1-1/+1
This patch replaces SmallSet<T *, N> with SmallPtrSet<T *, N>. Note that SmallSet.h "redirects" SmallSet to SmallPtrSet for pointer element types: template <typename PointeeType, unsigned N> class SmallSet<PointeeType*, N> : public SmallPtrSet<PointeeType*, N> {}; We only have 140 instances that rely on this "redirection", with the vast majority of them under llvm/. Since relying on the redirection doesn't improve readability, this patch replaces SmallSet with SmallPtrSet for pointer element types.
2025-08-18DAG: Remove unnecessary getPointerTy call (#154055)Matt Arsenault1-5/+2
getValueType already did this
2025-08-18[CodeGen][Mips] Remove fp128 libcall list (#153798)Nikita Popov1-9/+21
Mips requires fp128 args/returns to be passed differently than i128. It handles this by inspecting the pre-legalization type. However, for soft float libcalls, the original type is currently not provided (it will look like a i128 call). To work around that, MIPS maintains a list of libcalls working on fp128. This patch removes that list by providing the original, pre-softening type to calling convention lowering. This is done by carrying additional information in CallLoweringInfo, as we unfortunately do need both types (we want the un-softened type for OrigTy, but we need the softened type for the actual register assignment etc.) This is in preparation for completely removing all the custom pre-analysis code in the Mips backend and replacing it with use of OrigTy.
2025-08-15[CodeGen] Give ArgListEntry a proper constructor (NFC) (#153817)Nikita Popov1-43/+14
This ensures that the required fields are set, and also makes the construction more convenient.
2025-08-15[SDAGBuilder] Rename RetTys -> RetVTs (NFC)Nikita Popov1-15/+15
Make it clearer that this is a vector of EVTs, not IR types. Based on: https://github.com/llvm/llvm-project/pull/153798#discussion_r2279066696
2025-08-15[SDAG] Remove IndexType manipulation in getUniformBase and callers (#151578)Philip Reames1-34/+23
All paths set it to the same value, just propagate that value to the consumer.
2025-08-15Reapply "[AMDGPU] Intrinsic for launching whole wave functions" (#153584)Diana Picus1-0/+37
This reverts commit 14cd1339318b16e08c1363ec6896bd7d1e4ae281. The buildbot failure seems to have been a cmake issue which has been discussed in more detail in this Discourse post: https://discourse.llvm.org/t/cmake-doesnt-regenerate-all-tablegen-target-files/87901 If any buildbots fail to select arbitrary intrinsics with this patch, it's worth considering using clean builds with ccache instead of incremental builds, as recommended here: https://llvm.org/docs/HowToAddABuilder.html#:~:text=Use%20CCache%20and%20NOT%20incremental%20builds The original commit message for this patch: Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave functions. This will take as its first argument the callee with the amdgpu_gfx_whole_wave calling convention, followed by the call parameters which must match the signature of the callee except for the first function argument (the i1 original EXEC mask, which doesn't need to be passed in). Indirect calls are not allowed. Make direct calls to amdgpu_gfx_whole_wave functions a verifier error. Tail calls are handled in a future patch.
2025-08-14[TargetLowering] Store Context in variable (NFC)Nikita Popov1-19/+17
Avoid repeating CLI.RetTy->getContext() many times.
2025-08-13[CodeGen] Make OrigTy in CC lowering the non-aggregate type (#153414)Nikita Popov1-28/+32
https://github.com/llvm/llvm-project/pull/152709 exposed the original IR argument type to the CC lowering logic. However, in SDAG, this used the raw type, prior to aggregate splitting. This PR changes it to use the non-aggregate type instead. (This matches what happened in the GlobalISel case already.) I've also added some more detailed documentation on the InputArg/OutputArg fields, to explain how they differ. In most cases ArgVT is going to be the EVT of OrigTy, so they encode very similar information (OrigTy just preserves some additional information lost in EVTs, like pointer types). One case where they do differ is in post-legalization lowering of libcalls, where ArgVT is going to be a legalized type, while OrigTy is going to be the original non-legalized type.
2025-08-13[CodeGen] Remove default ctors for InputArg and OutputArg (#153205)Nikita Popov1-16/+14
These make it easy to forget to initialize some members, like the newly added OrigTy. Force these to always go through the ctor instead.
2025-08-12[SelectionDAGBuilder] Look for appropriate INLINEASM_BR instruction to ↵XChy1-4/+9
verify (#152591) Partially fix #149023. The original code `MRI.def_begin(Reg)->getParent()` may return the incorrect MI, as the physical register `Reg` may have multiple definitions. This patch selects the correct MI to verify by comparing the MBB of each definition. New testcase hangs with -O1/2/3 enabled. The BranchFolding may be to blame.
2025-08-11[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)Nikita Popov1-5/+7
It is common to have ABI requirements for illegal types: For example, two i64 argument parts that originally came from an fp128 argument may have a different call ABI than ones that came from a i128 argument. The current calling convention lowering does not provide access to this information, so backends come up with various hacks to support it (like additional pre-analysis cached in CCState, or bypassing the default logic entirely). This PR adds the original IR type to InputArg/OutputArg and passes it down to CCAssignFn. It is not actually used anywhere yet, this just does the mechanical changes to thread through the new argument.
2025-08-08[IR] Introduce the `ptrtoaddr` instructionAlexander Richardson1-0/+5
This introduces a new `ptrtoaddr` instruction which is similar to `ptrtoint` but has two differences: 1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance 2) `ptrtoaddr` only extracts (and then extends/truncates) the low index-width bits of the pointer For most architectures, difference 2) does not matter since index (address) width and pointer representation width are the same, but this does make a difference for architectures that have pointers that aren't just plain integer addresses such as AMDGPU fat pointers or CHERI capabilities. This commit introduces textual and bitcode IR support as well as basic code generation, but optimization passes do not handle the new instruction yet so it may result in worse code than using ptrtoint. Follow-up changes will update capture tracking, etc. for the new instruction. RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54 Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-08[IR] Remove size argument from lifetime intrinsics (#150248)Nikita Popov1-1/+1
Now that #149310 has restricted lifetime intrinsics to only work on allocas, we can also drop the explicit size argument. Instead, the size is implied by the alloca. This removes the ability to only mark a prefix of an alloca alive/dead. We never used that capability, so we should remove the need to handle that possibility everywhere (though many key places, including stack coloring, did not actually respect this).
2025-08-07[PowerPC][AIX] Using milicode for memcmp instead of libcall (#147093)zhijian lin1-1/+1
AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __memcmp routine is a millicode implementation; we use millicode for the memcmp function instead of a library call to improve performance.
2025-08-07[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)Nikita Popov1-8/+8
The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).
2025-08-06Revert "[AMDGPU] Intrinsic for launching whole wave functions" (#152286)Diana Picus1-37/+0
Reverts llvm/llvm-project#145859 because it broke a HIP test: ``` [34/59] Building CXX object External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o FAILED: External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o /home/botworker/bbot/clang-hip-vega20/botworker/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG -O3 -DNDEBUG -w -Werror=date-time --rocm-path=/opt/botworker/llvm/External/hip/rocm-6.3.0 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.3.0.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /home/botworker/bbot/clang-hip-vega20/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc fatal error: error in backend: Cannot select: intrinsic %llvm.amdgcn.readfirstlane ```
2025-08-06[AMDGPU] Intrinsic for launching whole wave functions (#145859)Diana Picus1-0/+37
Add the llvm.amdgcn.call.whole.wave intrinsic for calling whole wave functions. This will take as its first argument the callee with the amdgpu_gfx_whole_wave calling convention, followed by the call parameters which must match the signature of the callee except for the first function argument (the i1 original EXEC mask, which doesn't need to be passed in). Indirect calls are not allowed. Make direct calls to amdgpu_gfx_whole_wave functions a verifier error. Unspeakable horrors happen around calls from whole wave functions, the plan is to improve the handling of caller/callee-saved registers in a future patch. Tail calls are also handled in a future patch.
2025-08-05[VP][RISCV] Add a vp.load.ff intrinsic for fault only first load. (#128593)Craig Topper1-0/+31
There's been some interest in supporting early-exit loops recently. https://discourse.llvm.org/t/rfc-supporting-more-early-exit-loops/84690 This patch was extracted from our downstream where we've been using it in our vectorizer.
2025-08-04[IR] Allow poison argument to lifetime markers (#151148)Nikita Popov1-1/+3
This slightly relaxes the invariant established in #149310, by also allowing the lifetime argument to be poison. This is to support the typical pattern of RAUWing with poison when removing an instruction. It's worth noting that this does not require any conservative assumptions, lifetimes with poison arguments can simply be skipped. Fixes https://github.com/llvm/llvm-project/issues/151119.
2025-07-29[IR][SDAG] Remove lifetime size handling from SDAG (#150944)Nikita Popov1-3/+1
Split out from https://github.com/llvm/llvm-project/pull/150248: Specify that the argument of lifetime.start/lifetime.end is ignored and will be removed in the future. Remove lifetime size handling from SDAG. The size was previously discarded during isel, so was always ignored for stack coloring anyway. Where necessary, obtain the size of the full frame index.
2025-07-29[SelectionDAG] Remove `UnsafeFPMath` in LegalizeDAG (#146316)paperchalice1-1/+5
These global flags hinder further improvements like [[RFC] Honor pragmas with -ffp-contract=fast](https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast) and pass concurrency support. Remove them incrementally.
2025-07-22[CodeGen] Remove handling for lifetime.start/end on non-alloca (#149838)Nikita Popov1-24/+9
After https://github.com/llvm/llvm-project/pull/149310 we are guaranteed that the argument is an alloca, so we don't need to look at underlying objects (which was not a correct thing to do anyway). This also drops the offset argument for lifetime nodes in SDAG. The offset is fixed to zero now. (Peculiarly, while SDAG pretended to have an offset, it just gets silently dropped during selection.)
2025-07-17[SelectionDAG] Fix misplaced commas in operand bundle errors (#149331)Fraser Cormack1-7/+5
2025-07-15[SelectionDAG] improve error messages for invalid operator bundle (#148945)Florian Mayer1-34/+34
2025-07-14[SelectionDAG] improve error message for invalid op bundles (#148722)Florian Mayer1-6/+23
2025-07-14[SelectionDAG] [KCFI] Allow "kcfi" on invoke (#148742)Florian Mayer1-1/+1
This is handled in CallBase, so it is valid for both call and invoke
2025-07-02[SDAG] Prefer scalar for prefix of vector GEP expansion (#146719)Philip Reames1-14/+19
When generating SDAG for a getelementptr with a vector result, we were previously generating splats for each scalar operand. This essentially has the effect of aggressively vectorizing the sequence, and leaving it later combines to scalarize if profitable. Instead, we can keep the accumulating address as a scalar for as long as the prefix of operands allows before lazily converting to vector on the first vector operand. This both better fits hardware which frequently has a scalar base on the scatter/gather instructions, and reduces the addressing cost even when not as otherwise we end up with a scalar to vector domain crossing for each scalar operand. Note that constant splat offsets are treated as scalar for the above, and only variable offsets can force a conversion to vector. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>
2025-06-24DAG: Move get_dynamic_area_offset type check to IR verifier (#145268)Matt Arsenault1-6/+0
Also fix the LangRef to match the implementation. This was checking against the alloca address space size rather than the default address space. The check was also more permissive than the LangRef. The error check permitted any size less than the pointer size; follow the stricter wording of the LangRef.
2025-06-18[RemoveDIs][NFC] Remove dbg intrinsic handling code from SelectionDAG ISel ↵Orlando Cazalet-Hyams1-63/+0
(#144702)
2025-06-17[DebugInfo][RemoveDIs] Remove a swathe of debug-intrinsic code (#144389)Jeremy Morse1-4/+1
Seeing how we can't generate any debug intrinsics any more: delete a variety of codepaths where they're handled. For the most part these are plain deletions, in others I've tweaked comments to remain coherent, or added a type to (what was) type-generic-lambdas. This isn't all the DbgInfoIntrinsic call sites but it's most of the simple scenarios. Co-authored-by: Nikita Popov <github@npopov.com>
2025-06-17[LLVM][CodeGen] Lower ConstantInt vectors like shufflevector base splats. ↵Paul Walker1-2/+20
(#144395) ConstantInt vectors utilise DAG.getConstant() when constructing the initial DAG. This can have the effect of legalising the constant before the DAG combiner is run, significant altering the generated code. To mitigate this (hopefully as a temporary measure) we instead try to construct the DAG in the same way as shufflevector based splats.
2025-06-12[CodeGen] Inline stack guard check on Windows (#136290)Omair Javaid1-17/+78
This patch optimizes the Windows security cookie check mechanism by moving the comparison inline and only calling __security_check_cookie when the check fails. This reduces the overhead of making a DLL call for every function return. Previously, we implemented this optimization through a machine pass (X86WinFixupBufferSecurityCheckPass) in PR #95904 submitted by @mahesh-attarde. We have reverted that pass in favor of this new approach. Also we have abandoned the AArch64 specific implementation of same pass in PR #121938 in favor of this more general solution. The old machine instruction pass approach: - Scanned the generated code to find __security_check_cookie calls - Modified these calls by splitting basic blocks - Added comparison logic and conditional branching - Required complex block management and live register computation The new approach: - Implements the same optimization during instruction selection - Directly emits the comparison and conditional branching - No need for post-processing or basic block manipulation - Disables optimization at -Oz. Thanks @tamaspetz, @efriedma-quic and @arsenm for their help.
2025-06-09[SelectionDAG][X86] Handle `llvm.type.test` in DAGBuilder (#142939)Abhishek Kaushik1-0/+5
Closes #142937
2025-06-04[SelectionDAG] Use reportFatalUsageError() for invalid operand bundles (#142613)Nikita Popov1-15/+17
Replace the asserts with reportFatalUsageError(), as these can be reached with invalid user-provided IR. Fixes https://github.com/llvm/llvm-project/issues/142531.
2025-06-03[ARM,AArch64] Don't put BTI at asm goto branch targets (#141562)Simon Tatham1-1/+5
In 'asm goto' statements ('callbr' in LLVM IR), you can specify one or more labels / basic blocks in the containing function which the assembly code might jump to. If you're also compiling with branch target enforcement via BTI, then previously listing a basic block as a possible jump destination of an asm goto would cause a BTI instruction to be placed at the start of the block, in case the assembly code used an _indirect_ branch instruction (i.e. to a destination address read from a register) to jump to that location. Now it doesn't do that any more: branches to destination labels from the assembly code are assumed to be direct branches (to a relative offset encoded in the instruction), which don't require a BTI at their destination. This change was proposed in https://discourse.llvm.org/t/85845 and there seemed to be no disagreement. The rationale is: 1. it brings clang's handling of asm goto in Arm and AArch64 in line with gcc's, which didn't generate BTIs at the target labels in the first place. 2. it improves performance in the Linux kernel, which uses a lot of 'asm goto' in which the assembly language just contains a NOP, and the label's address is saved elsewhere to let the kernel self-modify at run time to swap between the original NOP and a direct branch to the label. This allows hot code paths to be instrumented for debugging, at only the cost of a NOP when the instrumentation is turned off, instead of the larger cost of an indirect branch. In this situation a BTI is unnecessary (if the branch happens it's direct), and since the code paths are hot, also a noticeable performance hit. Implementation: `SelectionDAGBuilder::visitCallBr` is the place where 'asm goto' target labels are handled. It calls `setIsInlineAsmBrIndirectTarget()` on each target `MachineBasicBlock`. Previously it also called `setMachineBlockAddressTaken()`, which made `hasAddressTaken()` return true, which caused a BTI to be added in the Arm backends. Now `visitCallBr` doesn't call `setMachineBlockAddressTaken()` any more on asm goto targets, but `hasAddressTaken()` also checks the flag set by `setIsInlineAsmBrIndirectTarget()`. So call sites that were using `hasAddressTaken()` don't need to be modified. But the Arm backends don't call `hasAddressTaken()` any more: instead they test two more specific query functions that cover all the reasons `hasAddressTaken()` might have returned true _except_ being an asm goto target. Testing: The new test `AArch64/callbr-asm-label-bti.ll` is testing the actual change, where it expects not to see a `bti` instruction after `[[LABEL]]`. The rest of the test changes are all churn, due to the flags on basic blocks changing. Actual output code hasn't changed in any of the existing tests, only comments and diagnostics. Further work: `RISCVIndirectBranchTracking.cpp` and `X86IndirectBranchTracking.cpp` also call `hasAddressTaken()` in a way that might benefit from using the same more specific check I've put in `ARMBranchTargets.cpp` and `AArch64BranchTargets.cpp`. But I'm not sure of that, so in this commit I've only changed the Arm backends, and left those alone.
2025-05-30[SDAG] Remove noundef workaround for range metadata/attributes (#141745)Nikita Popov1-13/+3
In https://reviews.llvm.org/D157685 I changed SDAG to only transfer range metadata to SDAG if it also has !noundef. At the time, this was necessary because SDAG incorrectly propagated poison when folding logical and/or to bitwise and/or. The root cause of that issue has since been addressed by https://github.com/llvm/llvm-project/pull/84924, so drop the workaround now.
2025-05-28[SelectionDAG] Introduce ISD::PTRADD (#140017)Fabian Ritter1-10/+8
This opcode represents the addition of a pointer value (first operand) and an integer offset (second operand). PTRADD nodes are only generated if the TargetMachine opts in by overriding TargetMachine::shouldPreservePtrArith(). The PTRADD node and respective visitPTRADD() function were adapted by @rgwott from the CHERI/Morello LLVM tree. Original authors: @davidchisnall, @jrtc27, @arichardson. The changes in this PR were extracted from PR #105669. --------- Co-authored-by: David Chisnall <github@theravensnest.org> Co-authored-by: Jessica Clarke <jrtc27@jrtc27.com> Co-authored-by: Alexander Richardson <alexrichardson@google.com> Co-authored-by: Rodolfo Wottrich <rodolfo.wottrich@arm.com>
2025-05-26[IR] Add llvm.vector.[de]interleave{4,6,8} (#139893)Luke Lau1-0/+18
This adds [de]interleave intrinsics for factors of 4,6,8, so that every interleaved memory operation supported by the in-tree targets can be represented by a single intrinsic. For context, [de]interleaves of fixed-length vectors are represented by a series of shufflevectors. The intrinsics are needed for scalable vectors, and we don't currently scalably vectorize all possible factors of interleave groups supported by RISC-V/AArch64. The underlying reason for this is that higher factors are currently represented by interleaving multiple interleaves themselves, which made sense at the time in the discussion in https://github.com/llvm/llvm-project/pull/89018. But after trying to integrate these for higher factors on RISC-V I think we should revisit this design choice: - Matching these in InterleavedAccessPass is non-trivial: We currently only support factors that are a power of 2, and detecting this requires a good chunk of code - The shufflevector masks used for [de]interleaves of fixed-length vectors are much easier to pattern match as they are strided patterns, but for the intrinsics it's much more complicated to match as the structure is a tree. - Unlike shufflevectors, there's no optimisation that happens on [de]interleave2 intriniscs - For non-power-of-2 factors e.g. 6, there are multiple possible ways a [de]interleave could be represented, see the discussion in #139373 - We already have intrinsics for 2,3,5 and 7, so by avoiding 4,6 and 8 we're not really saving much By representing these higher factors are interleaved-interleaves, we can in theory support arbitrarily high interleave factors. However I'm not sure this is actually needed in practice: SVE only has instructions for factors 2,3,4, whilst RVV only supports up to factor 8. This patch would make it much easier to support scalable interleaved accesses in the loop vectorizer for RISC-V for factors 3,5,6 and 7, as the loop vectorizer and InterleavedAccessPass wouldn't need to construct and match trees of interleaves. For interleave factors above 8, for which there are no hardware memory operations to match in the InterleavedAccessPass, we can still keep the wide load + recursive interleaving in the loop vectorizer.
2025-05-22[CodeGen] Add SSID & Atomic Ordering to IntrinsicInfo (#140896)Pierre van Houtryve1-3/+10
getTgtMemIntrinsic should be able to propagate such information to the MMO
2025-05-19[AMDGPU] Set AS8 address width to 48 bitsAlexander Richardson1-3/+12
Of the 128-bits of buffer descriptor only 48 bits are address bits, so following the discussion on https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54, the logic conclusion is to set the index width to 48 bits instead of the current value of 128. Most of the test changes are mechanical datalayout updates, but there is one actual change: the ptrmask test now uses .i48 instead of .i128 and I had to update SelectionDAGBuilder to correctly extend the mask. Reviewed By: krzysz00 Pull Request: https://github.com/llvm/llvm-project/pull/139419
2025-05-15[SelectionDAG] Add an ISD node for for get.active.lane.mask (#139084)Kerry McLaughlin1-2/+3
For now expansion still happens in SelectionDAGBuilder when GET_ACTIVE_LANE_MASK is not legal on the target. This patch also includes changes in AArch64ISelLowering to replace handling of the get.active.lane.mask intrinsic to use the ISD node. Tablegen patterns are added which match to whilelo for scalable types. A follow up change will add support for more types to be lowered to GET_ACTIVE_LANE_MASK by allowing splitting of the node.
2025-05-15CodeGen: Add ISD::AssertNoFPClass (#138839)YunQiang Su1-3/+12
It is used to mark a value that we are sure that it is not some fcType. The examples include: * An arguments of a function is marked with nofpclass * Output value of an intrinsic can be sure to not be some type So that the following operation can make some assumptions.