aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/TargetLoweringBase.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-09-02[NFC] RuntimeLibcalls: Prefix the impls with 'Impl_' (#153850)Daniel Paoliello1-13/+13
As noted in #153256, TableGen is generating reserved names for RuntimeLibcalls, which resulted in a build failure for Arm64EC since `vcruntime.h` defines `__security_check_cookie` as a macro. To avoid using reserved names, all impl names will now be prefixed with `Impl_`. `NumLibcallImpls` was lifted out as a `constexpr size_t` instead of being an enum field. While I was churning the dependent code, I also removed the TODO to move the impl enum into its own namespace and use an `enum class`: I experimented with using an `enum class` and adding a namespace, but we decided it was too verbose so it was dropped.
2025-09-02[Intrinsics][AArch64] Add intrinsics for masking off aliasing vector lanes ↵Sam Tebbs1-0/+3
(#117007) It can be unsafe to load a vector from an address and write a vector to an address if those two addresses have overlapping lanes within a vectorised loop iteration. This PR adds intrinsics designed to create a mask with lanes disabled if they overlap between the two pointer arguments, so that only safe lanes are loaded, operated on and stored. The `loop.dependence.war.mask` intrinsic represents cases where the store occurs after the load, and the opposite for `loop.dependence.raw.mask`. The distinction between write-after-read and read-after-write is important, since the ordering of the read and write operations affects if the chain of those instructions can be done safely. Along with the two pointer parameters, the intrinsics also take an immediate that represents the size in bytes of the vector element types. This will be used by #100579.
2025-08-27[CodeGen][RISCV] Add support of RISCV nontemporal to vector predication ↵daniel-trujillo-bsc1-0/+28
instructions. (#153033) This PR adds support for VP intrinsics to be aware of the nontemporal metadata information.
2025-08-23RuntimeLibcalls: Add entries for stackprotector globals (#154930)Matt Arsenault1-18/+26
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie, and __guard_local. As far as I can tell these are all just different names for the same shaped functionality on different systems. These aren't really functions, but special global variable names. They should probably be treated the same way; all the same contexts that need to know about emittable function names also need to know about this. This avoids a special case check in IRSymtab. This isn't a complete change, there's a lot more cleanup which should be done. The stack protector configuration system is a complete mess. There are multiple overlapping controls, used in 3 different places. Some of the target control implementations overlap with conditions used in the emission points, and some use correlated but not identical conditions in different contexts. i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and insertSSPDeclarations are all used in inconsistent ways so I don't know if I've tracked the intention of the system correctly. The PowerPC test change is a bug fix on linux. Previously the manual conditions were based around !isOSOpenBSD, which is not the condition where __stack_chk_guard are used. Now getSDagStackGuard returns the proper global reference, resulting in LOAD_STACK_GUARD getting a MachineMemOperand which allows scheduling.
2025-08-13[CodeGen] Make OrigTy in CC lowering the non-aggregate type (#153414)Nikita Popov1-6/+6
https://github.com/llvm/llvm-project/pull/152709 exposed the original IR argument type to the CC lowering logic. However, in SDAG, this used the raw type, prior to aggregate splitting. This PR changes it to use the non-aggregate type instead. (This matches what happened in the GlobalISel case already.) I've also added some more detailed documentation on the InputArg/OutputArg fields, to explain how they differ. In most cases ArgVT is going to be the EVT of OrigTy, so they encode very similar information (OrigTy just preserves some additional information lost in EVTs, like pointer types). One case where they do differ is in post-legalization lowering of libcalls, where ArgVT is going to be a legalized type, while OrigTy is going to be the original non-legalized type.
2025-08-12PreISelIntrinsicLowering: Lower llvm.log to a loop if scalable vec arg (#129744)Stephen Long1-0/+2
Similar to ab976a1, but for llvm.log.
2025-08-11[CodeGen] Provide original IR type to CC lowering (NFC) (#152709)Nikita Popov1-1/+1
It is common to have ABI requirements for illegal types: For example, two i64 argument parts that originally came from an fp128 argument may have a different call ABI than ones that came from a i128 argument. The current calling convention lowering does not provide access to this information, so backends come up with various hacks to support it (like additional pre-analysis cached in CCState, or bypassing the default logic entirely). This PR adds the original IR type to InputArg/OutputArg and passes it down to CCAssignFn. It is not actually used anywhere yet, this just does the mechanical changes to thread through the new argument.
2025-08-08[IR] Introduce the `ptrtoaddr` instructionAlexander Richardson1-0/+1
This introduces a new `ptrtoaddr` instruction which is similar to `ptrtoint` but has two differences: 1) Unlike `ptrtoint`, `ptrtoaddr` does not capture provenance 2) `ptrtoaddr` only extracts (and then extends/truncates) the low index-width bits of the pointer For most architectures, difference 2) does not matter since index (address) width and pointer representation width are the same, but this does make a difference for architectures that have pointers that aren't just plain integer addresses such as AMDGPU fat pointers or CHERI capabilities. This commit introduces textual and bitcode IR support as well as basic code generation, but optimization passes do not handle the new instruction yet so it may result in worse code than using ptrtoint. Follow-up changes will update capture tracking, etc. for the new instruction. RFC: https://discourse.llvm.org/t/clarifiying-the-semantics-of-ptrtoint/83987/54 Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/139357
2025-08-07[CodeGen] Remove an unnecessary cast (NFC) (#152441)Kazu Hirata1-1/+1
getActiveBits() already returns unsigned.
2025-08-07[CodeGen] Move IsFixed into ArgFlags (NFCI) (#152319)Nikita Popov1-1/+1
The information whether a specific argument is vararg or fixed is currently stored separately from all the other argument information in ArgFlags. This means that it is not accessible from CCAssign, and backends have developed all kinds of workarounds for how they can access it after all. Move this information to ArgFlags to make it directly available in all relevant places. I've opted to invert this and store it as IsVarArg, as I think that both makes the meaning more obvious and provides for a better default (which is IsVarArg=false).
2025-08-05[LLVM][CGP] Allow finer control for sinking compares. (#151366)Paul Walker1-1/+0
Compare sinking is selectable based on the result of hasMultipleConditionRegisters. This function is too coarse grained by not taking into account the differences between scalar and vector compares. This PR extends the interface to take an EVT to allow finer control. The new interface is used by AArch64 to disable sinking of scalable vector compares, but with isProfitableToSinkOperands updated to maintain the cases that are specifically tested.
2025-08-04[DAG] Combine `store + vselect` to `masked_store` (#145176)Abhishek Kaushik1-0/+2
Add a new combine to replace ``` (store ch (vselect cond truevec (load ch ptr offset)) ptr offset) ``` to ``` (mstore ch truevec ptr offset cond) ``` This saves a blend operation on targets that support conditional stores.
2025-07-29[LLVM][Cygwin] Enable conditions that are shared with MinGW (#149638)jeremyd20191-1/+1
Cygwin and MinGW share the auto import behavior that could result in __stack_check_guard being non-dso-local. Allow windres to assume a Cygwin target as well as a MinGW one, so defines like _WIN32 would not be present on Cygwin.
2025-07-28[CodeGen] More consistently expand float ops by default (#150597)Nikita Popov1-17/+17
These float operations were expanded for scalar f32/f64/f128, but not for f16 and more problematically, not for vectors. A small subset of them was separately set to expand for vectors. Change these to always expand by default, and adjust targets to mark these as legal where necessary instead. This is a much safer default, and avoids unnecessary legalization failures because a target failed to manually mark them as expand. Fixes https://github.com/llvm/llvm-project/issues/110753. Fixes https://github.com/llvm/llvm-project/issues/121390.
2025-07-15SafeStack: Check if __safestack_pointer_address is available (#147917)Matt Arsenault1-3/+14
Start using RuntimeLibcalls in the base implementation of getSafeStackPointerLocation instead of hardcoding the function names.
2025-07-10TargetLowering: Avoid a use of PointerType::getUnqual (#147884)Matt Arsenault1-1/+3
Use the default globals address space
2025-07-09RuntimeLibcalls: Remove table of soft float compare cond codes (#146082)Matt Arsenault1-33/+73
Previously we had a table of entries for every Libcall for the comparison to use against an integer 0 if it was a soft float compare function. This was only relevant to a handful of opcodes, so it was wasteful. Now that we can distinguish the abstract libcall for the compare with the concrete implementation, we can just directly hardcode the comparison against the libcall impl without this configuration system.
2025-07-09DAG: Fall back to separate sin and cos when softening sincos (#147468)Matt Arsenault1-0/+8
Fix asserting in the error case.
2025-07-08[DAG] Add generic expansion for ISD::FCANONICALIZE nodes (#142105)Dominik Steenken1-0/+4
This PR takes the work previously done by @pawan-nirpal-031 on X86 in #106370, and makes it available in common code. This should enable all targets to use `__builtin_canonicalize` for all `f(16|32|64|128)` data types. Canonicalization is implemented here as multiplication by `1.0`, as suggested in [the docs](https://llvm.org/docs/LangRef.html#llvm-canonicalize-intrinsic).
2025-07-07DAG: Add RTLIB::getPOW helper (#147274)Matt Arsenault1-0/+4
Co-authored-by: Paul Walker <paul.walker@arm.com>
2025-07-04[llvm] Use llvm::fill instead of std::fill(NFC) (#146911)Austin1-3/+2
Use llvm::fill instead of std::fill
2025-06-23RuntimeLibcalls: Pass in ABI name from MCOptions (#144894)Matt Arsenault1-1/+2
ARM needs this to compute the available libcalls.
2025-06-19RuntimeLibcalls: Pass in exception handling type (#144696)Matt Arsenault1-2/+2
All of the ABI options that influence libcall decisions need to be passed in.
2025-06-19RuntimeLibcalls: Pass in FloatABI and EABI type (#144691)Matt Arsenault1-1/+2
We need the full set of ABI options to accurately compute the full set of libcalls. This partially resolves missing information required to compute the set of ARM calls.
2025-06-16[TargetLowering][RISCV] Allow scalable non-simple EVTs to be split even if ↵Craig Topper1-1/+1
the element type isn't a legal scalar type. (#144007) This fixes an inconsistency in i64 vector handling between RV32 and RV64. Even if i64 isn't legal as a scalar, we should still be able to split a large i64 vector to get down to a legal vector type. We only need to give up if we need to split a vscale x 1 vector.
2025-05-27IR: Make Module::getOrInsertGlobal() return a GlobalVariable.Peter Collingbourne1-4/+3
After pointer element types were removed this function can only return a GlobalVariable, so reflect that in the type and comments and clean up callers. Reviewers: nikic Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/141323
2025-04-23[AArch64][SVE] Add dot product lowering for PARTIAL_REDUCE_MLA node (#130933)Nicholas Guy1-4/+0
Add lowering in tablegen for PARTIAL_REDUCE_U/SMLA ISD nodes. Only happens when the combine has been performed on the ISD node. Also adds in check to only do the DAG combine when the node can then eventually be lowered, so changes neon tests too. --------- Co-authored-by: James Chesterman <james.chesterman@arm.com>
2025-04-14[CodeGen] Prune headers and move code out of line for build efficiency, NFC ↵Reid Kleckner1-0/+4
(#135622) I noticed these destructors taking time with -ftime-trace and moved some of them for minor build efficiency improvements. The main impact of moving destructors out of line is that it avoids requiring container fields containing other types from being complete, i.e. one can have uptr<T> or vector<T> as a field with an incomplete type T, and that means we can reduce transitive includes, as with LegalizerInfo.h. Move expensive getDebugOperandsForReg template out-of-line. The std::function instantiation shows up in time trace even if you don't use the function.
2025-03-31Fix crash lowering stack guard on OpenBSD/aarch64. (#125416)34056915821-0/+3
TargetLoweringBase::getIRStackGuard refers to a platform-specific guard variable. Before this change, TargetLoweringBase::getSDagStackGuard only referred to a different variable. This means that SelectionDAGBuilder's getLoadStackGuard does not get memory operands. However, AArch64InstrInfo::expandPostRAPseudo assumes that the passed MachineInstr has nonzero memoperands, causing a segfault. We have two possible options here: either disabling the LOAD_STACK_GUARD node entirely in AArch64TargetLowering::useLoadStackGuardNode or just making the platform-specific values match across TargetLoweringBase. Here, we try the latter.
2025-03-07[RISCV][LibCall] Add libcall for i64 -> bf16 (#130024)Jim Lin1-0/+4
Add support for lowering i64 -> bf16 with libcall.
2025-02-18[SelectionDAG] Add PARTIAL_REDUCE_U/SMLA ISD Nodes (#125207)James Chesterman1-0/+4
Add signed and unsigned PARTIAL_REDUCE_MLA ISD nodes. Add command line argument (aarch64-enable-partial-reduce-nodes) that indicates whether the intrinsic experimental_vector_partial_ reduce_add will be transformed into the new ISD node. Lowering with the new ISD nodes will, for now, always be done as an expand.
2025-02-11[RTLIB] Rename getFSINCOS() to getSINCOS (NFC) (#126705)Benjamin Maxwell1-1/+1
This makes the name more consistent with the other helpers.
2025-02-11[IR] Add llvm.sincospi intrinsic (#125873)Benjamin Maxwell1-1/+6
This adds the `llvm.sincospi` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.sincospi` intrinsic takes a floating-point value and returns both the sine and cosine of the value multiplied by pi. It computes the result more accurately than the naive approach of doing the multiplication ahead of time, especially for large input values. ``` declare { float, float } @llvm.sincospi.f32(float %Val) declare { double, double } @llvm.sincospi.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincospi.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincospi.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincospi.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincospi.v4f32(<4 x float> %Val) ``` Currently, the default lowering of this intrinsic relies on the `sincospi[f|l]` functions being available in the target's runtime (e.g. libc).
2025-02-07[IR] Add `llvm.modf` intrinsic (#121948)Benjamin Maxwell1-3/+8
This adds the `llvm.modf` intrinsic, legalization, and lowering (mostly reusing the lowering for sincos and frexp). The `llvm.modf` intrinsic takes a floating-point value and returns both the integral and fractional parts (as a struct). ``` declare { float, float } @llvm.modf.f32(float %Val) declare { double, double } @llvm.modf.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.modf.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.modf.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.modf.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.modf.v4f32(<4 x float> %Val) ``` This corresponds to the libm `modf` function but returns multiple values in a struct (rather than take output pointers), which makes it easier to vectorize.
2025-01-24PreISelIntrinsicLowering: Lower llvm.exp/llvm.exp2 to a loop if scalable vec ↵Stephen Long1-0/+11
arg (#117568)
2025-01-20[SDAG] Add an ISD node to help lower vector.extract.last.active (#118810)Graham Hunter1-0/+3
Based on feedback from the clastb codegen PR, I'm refactoring basic codegen for the vector.extract.last.active intrinsic to lower to an ISD node in SelectionDAGBuilder then expand in LegalizeVectorOps, instead of doing everything in the builder. The new ISD node (vector_find_last_active) only covers finding the index of the last active element of the mask, and extracting the element + handling passthru is left to existing ISD nodes.
2024-12-13[GISel] Remove unused DataLayout operand from getApproximateEVTForLLT (#119833)Craig Topper1-1/+1
2024-12-09[TargetLowering] Return Align from getByValTypeAlignment (NFC) (#119233)Sergei Barannikov1-6/+3
2024-11-12[X86][BF16] Add libcall for FP128 -> BF16 (#115825)Feng Zou1-0/+2
This is to fix #115710.
2024-11-04SafeStack: Respect alloca addrspace (#112536)Matt Arsenault1-1/+4
Just insert addrspacecast in cases where the alloca uses a different address space, since I don't know what else you could possibly do.
2024-10-29[IR] Add `llvm.sincos` intrinsic (#109825)Benjamin Maxwell1-2/+3
This adds the `llvm.sincos` intrinsic, legalization, and lowering. The `llvm.sincos` intrinsic takes a floating-point value and returns both the sine and cosine (as a struct). ``` declare { float, float } @llvm.sincos.f32(float %Val) declare { double, double } @llvm.sincos.f64(double %Val) declare { x86_fp80, x86_fp80 } @llvm.sincos.f80(x86_fp80 %Val) declare { fp128, fp128 } @llvm.sincos.f128(fp128 %Val) declare { ppc_fp128, ppc_fp128 } @llvm.sincos.ppcf128(ppc_fp128 %Val) declare { <4 x float>, <4 x float> } @llvm.sincos.v4f32(<4 x float> %Val) ``` The lowering is built on top of the existing FSINCOS ISD node, with additional type legalization to allow for f16, f128, and vector values.
2024-10-28Check hasOptSize() in shouldOptimizeForSize() (#112626)Ellis Hoag1-1/+0
2024-10-16[X86][CodeGen] Add base atan2 intrinsic lowering (p4) (#110760)Tex Riddell1-3/+4
This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 Based on example PR #96222 and fix PR #101268, with some differences due to 2-arg intrinsic and intermediate refactor (RuntimeLibCalls.cpp). - Add llvm.experimental.constrained.atan2 - Intrinsics.td, ConstrainedOps.def, LangRef.rst - Add to ISDOpcodes.h and TargetSelectionDAG.td, connect to intrinsic in BasicTTIImpl.h, and LibFunc_ in SelectionDAGBuilder.cpp - Update LegalizeDAG.cpp, LegalizeFloatTypes.cpp, LegalizeVectorOps.cpp, and LegalizeVectorTypes.cpp - Update isKnownNeverNaN in SelectionDAG.cpp - Update SelectionDAGDumper.cpp - Update libcalls - RuntimeLibcalls.def, RuntimeLibcalls.cpp - TargetLoweringBase.cpp - Expand for vectors, promote f16 - X86ISelLowering.cpp - Expand f80, promote f32 to f64 for MSVC Part 4 for Implement the atan2 HLSL Function #70096.
2024-09-24[SDAG] Avoid creating redundant stack slots when lowering FSINCOS (#108401)Benjamin Maxwell1-0/+5
When lowering `FSINCOS` to a library call (that takes output pointers) we can avoid creating new stack allocations if the results of the `FSINCOS` are being stored. Instead, we can take the destination pointers from the stores and pass those to the library call. --- Note: As a NFC this also adds (and uses) `RTLIB::getFSINCOS()`.
2024-09-19Reland "[X86][BF16] Add libcall for F80 -> BF16 (#109116)" (#109143)Phoebe Wang1-0/+2
This reverts commit ababfee78714313a0cad87591b819f0944b90d09. Add X86 FP80 check.
2024-09-18Revert "[X86][BF16] Add libcall for F80 -> BF16" (#109140)Phoebe Wang1-2/+0
Reverts llvm/llvm-project#109116
2024-09-18[X86][BF16] Add libcall for F80 -> BF16 (#109116)Phoebe Wang1-0/+2
This fixes #108936, but the calling convention doesn't match with GCC. I doubt we have such a lib function for now, so leave the calling convention as is.
2024-08-31Revert "[RISCV] RISCV vector calling convention (2/2)" (#97994)Brandon Wu1-10/+2
This reverts commit 91dd844aa499d69c7ff75bf3156e2e3593a88057. Stacked on https://github.com/llvm/llvm-project/pull/97993
2024-08-21Scalarize the vector inputs to llvm.lround intrinsic by default. (#101054)Sumanth Gundapaneni1-2/+3
Verifier is updated in a different patch to let the vector types for llvm.lround and llvm.llround intrinsics.
2024-08-15Intrinsic: introduce minimumnum and maximumnum for IR and SelectionDAG (#96649)YunQiang Su1-0/+1
C23 introduced new functions fminimum_num and fmaximum_num, and they follow the minimumNumber and maximumNumber of IEEE754-2019. Let's introduce new intrinsics to support them. This patch introduces support only support for scalar values. The support of vector (vp, vp.reduce, vector.reduce), experimental.constrained will be added in future patches. With this patch, MIPSr6 and LoongArch can work out of box with fcanonical and fmax/fmin. Aarch64/PowerPC64 can use the same login as MIPSr6 and LoongArch, while they have no fcanonical support yet. I will add it in future patches. The FMIN/FMAX of RISC-V instructions follows the minimumNumber/maximumNumber of IEEE754-2019. We can just add it in future patch. Background https://discourse.llvm.org/t/rfc-fix-llvm-min-f-and-llvm-max-f-intrinsics/79735 Currently we have fminnum/fmaxnum, which have different behavior on different platform for NUM vs sNaN: 1) Fallback to fmin(3)/fmax(3): return qNaN. 2) ARM64/ARM32+Neon: same as libc. 3) MIPSr6/LoongArch/RISC-V: return NUM. And the fix of fminnum/fmaxnum to follow minNUM/maxNUM of IEEE754-2008 will submit as separated patches.