aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/TargetLoweringBase.cpp
AgeCommit message (Collapse)AuthorFilesLines
2024-08-14[DAG] Support saturated truncate (#99418)hanbeom1-0/+5
A truncate is considered saturated if no additional conversion is required between the target and return values. If the target is saturated when attempting to truncate from a vector, there is an opportunity to optimize it. Previously, each architecture had its own attempt at optimization, leading to redundant code. This patch implements common logic by introducing three new ISDs: `ISD::TRUNCATE_SSAT_S`: When the operand is a signed value and the range of values matches the range of signed values of the destination type. `ISD::TRUNCATE_SSAT_U`: When the operand is a signed value and the range of values matches the range of unsigned values of the destination type. `ISD::TRUNCATE_USAT_U`: When the operand is an unsigned value and the range of values matches the range of unsigned values of the destination type. These ISDs indicate a saturated truncate. Fixes https://github.com/llvm/llvm-project/issues/85903
2024-07-24[AMDGPU] Implement llvm.lrint intrinsic lowering (#98931)Sumanth Gundapaneni1-9/+9
This patch enabled the target-independent lowering of llvm.lrint via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU.
2024-07-23[AMDGPU] Implement llvm.lround intrinsic lowering. (#98970)Sumanth Gundapaneni1-7/+10
This patch enables the target-independent lowering of llvm.lround via GlobalISel. For SelectionDAG, the instrinsic is custom lowered for AMDGPU. In order to support vector floating point input for llvm.lround, this patch extends the target independent APIs and provide support for scalarizing. pr98950 is needed to let verifier allow vector floating point types
2024-07-20Reapply "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped ↵Joseph Huber1-381/+35
(#98512)" This reverts commit 740161a9b98c9920dedf1852b5f1c94d0a683af5. I moved the `ISD` dependencies into the CodeGen portion of the handling, it's a little awkward but it's the easiest solution I can think of for now.
2024-07-20ReformatNAKAMURA Takumi1-3/+3
2024-07-20Revert "[LLVM][LTO] Factor out RTLib calls and allow them to be dropped ↵NAKAMURA Takumi1-3/+384
(#98512)" This reverts commit c05126bdfc3b02daa37d11056fa43db1a6cdef69. (llvmorg-19-init-17714-gc05126bdfc3b) See #99610
2024-07-17[LLVM] Add `llvm.experimental.vector.compress` intrinsic (#92289)Lawrence Benson1-0/+3
This PR adds a new vector intrinsic `@llvm.experimental.vector.compress` to "compress" data within a vector based on a selection mask, i.e., it moves all selected values (i.e., where `mask[i] == 1`) to consecutive lanes in the result vector. A `passthru` vector can be provided, from which remaining lanes are filled. The main reason for this is that the existing `@llvm.masked.compressstore` has very strong constraints in that it can only write values that were selected, resulting in guard branches for all targets except AVX-512 (and even there the AMD implementation is _very_ slow). More instruction sets support "compress" logic, but only within registers. So to store the values, an additional store is needed. But this combination is likely significantly faster on many target as it avoids branches. In follow up PRs, my plan is to add target-specific lowerings for x86, SVE, and possibly RISCV. I also want to combine this with a store instruction, as this is probably a common case and we can avoid some memory writes in that case. See [discussion in forum](https://discourse.llvm.org/t/new-intrinsic-for-masked-vector-compress-without-store/78663) for initial discussion on the design.
2024-07-16[LLVM][LTO] Factor out RTLib calls and allow them to be dropped (#98512)Joseph Huber1-384/+3
Summary: The LTO pass and LLD linker have logic in them that forces extraction and prevent internalization of needed runtime calls. However, these currently take all RTLibcalls into account, even if the target does not support them. The target opts-out of a libcall if it sets its name to nullptr. This patch pulls this logic out into a class in the header so that LTO / lld can use it to determine if a symbol actually needs to be kept. This is important for targets like AMDGPU that want to be able to use `lld` to perform the final link step, but does not want the overhead of uncalled functions. (This adds like a second to the link time trivially)
2024-07-12[NVPTX] Disable all RTLib libcalls (#98672)Joseph Huber1-0/+7
Summary: This patch explicitly disables runtime calls to be emitted from the NVPTX backend. This allows other utilities to know that we do not need to worry about emitting these.
2024-07-11[Darwin] Fix availability of exp10 for watchOS, tvOS, xROS. (#98542)Florian Hahn1-9/+8
Update availability information added in 1eb7f055d9a. exp10 is available on iOS >= 7.0 and macOS >= 10.9. On all other platforms, it is available on any version. Also drop the x86 check, as the availability only depends on the OS version, not the target platform. PR: https://github.com/llvm/llvm-project/pull/98542
2024-07-11[X86][CodeGen] Add base trig intrinsic lowerings (#96222)Farzon Lotfi1-7/+17
This change is an implementation of https://github.com/llvm/llvm-project/issues/87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This change adds constraint intrinsics and some lowering cases for `acos`, `asin`, `atan`, `cosh`, `sinh`, and `tanh`. The only x86 specific change was for f80. https://github.com/llvm/llvm-project/issues/70079 https://github.com/llvm/llvm-project/issues/70080 https://github.com/llvm/llvm-project/issues/70081 https://github.com/llvm/llvm-project/issues/70083 https://github.com/llvm/llvm-project/issues/70084 https://github.com/llvm/llvm-project/issues/95966 The x86 lowering is going to be done in three pr changes with this being the first. A second PR will be put up for Loop Vectorizing and then SLPVectorizer. The constraint intrinsics is also going to be in multiple parts, but just 2. This part covers just the llvm specific changes, part2 will cover clang specifc changes and legalization for backends than have special legalization requirements like aarch64 and wasm.
2024-07-11[LLVM] Factor disabled Libcalls into the initializer (#98421)Joseph Huber1-0/+138
Summary: These Libcalls represent which functions are available to the backend. If a runtime call is not available, the target sets the the name to `nullptr`. Currently, this logic is spread around the various targets. This patch pulls all of the locations that disable libcalls into the intializer. This patch is effectively NFC. The motivation behind this patch is that currently the LTO handling uses the list of all runtime calls to determine which functions cannot be internalized and must be extracted from static libraries. We do not want this to happen for libcalls that are not emitted by the backend. A follow-up patch will move out this logic so the LTO pass can know which rtlib calls are actually used by the backend.
2024-07-04[SelectionDAG] Remove LegalTypes argument from getShiftAmountTy. NFC (#97757)Craig Topper1-2/+2
This argument is no longer used inside the function. Remove it from the interface.
2024-07-04[SelectionDAG] Ignore LegalTypes parameter in ↵Craig Topper1-2/+1
TargetLoweringBase::getShiftAmountTy. (#97645) When this flag was false, `getShiftAmountTy` would return `PointerTy` instead of the target's preferred shift amount type for scalar shifts. This used to be needed when the target's preferred type wasn't large enough to support the shift amount needed for an illegal type. For example, any scalar type larger than i256 on X86 since X86's preferred shift amount type is i8. For a while now, we've had code that uses `MVT::i32` if `LegalTypes` is true, but the target's preferred type is too small. This fixed a repeated cause of crashes where the `LegalTypes` flag wasn't set to false when illegal types could be present. This has made it unnecessary to set the `LegalTypes` flag correctly, and as a result more and more places don't. So I think its time for this flag to go away. This first patch just disconnects the flag. The interface and all callers will be cleaned up in follow up patches. The X86 test change is because we now have the same shift type for both shifts in a (srl (sub C, (shl X, 32), 32) sequence. This makes the shift amounts appear equal in value and type which is needed to enable a combine.
2024-06-21Revert "Intrinsic: introduce minimumnum and maximumnum (#93841)"Nikita Popov1-1/+0
As far as I can tell, this pull request was not approved, and did not go through an RFC on discourse. This reverts commit 89881480030f48f83af668175b70a9798edca2fb. This reverts commit 225d8fc8eb24fb797154c1ef6dcbe5ba033142da.
2024-06-21Intrinsic: introduce minimumnum and maximumnum (#93841)YunQiang Su1-0/+1
Currently, on different platform, the behaivor of llvm.minnum is different if one operand is sNaN: When we compare sNaN vs NUM: ARM/AArch64/PowerPC: follow the IEEE754-2008's minNUM: return qNaN. RISC-V/Hexagon follow the IEEE754-2019's minimumNumber: return NUM. X86: Returns NUM but not same with IEEE754-2019's minimumNumber as +0.0 is not always greater than -0.0. MIPS/LoongArch/Generic: return NUM. LIBCALL: returns qNaN. So, let's introduce llvm.minmumnum/llvm.maximumnum, which always follow IEEE754-2019's minimumNumber/maximumNumber. Half-fix: #93033
2024-06-17[SelectionDAG] Add support for the 3-way comparison intrinsics [US]CMP (#91871)Poseydon421-0/+3
This PR adds initial support for the `scmp`/`ucmp` 3-way comparison intrinsics in the SelectionDAG. Some of the expansions/lowerings are not optimal yet.
2024-06-14[CodeGen] Support vectors across all backends (#95518)Farzon Lotfi1-1/+2
Add a default f16 type promotion
2024-06-05[x86] Add tan intrinsic part 4 (#90503)Farzon Lotfi1-1/+3
This change is an implementation of #87367's investigation on supporting IEEE math operations as intrinsics. Which was discussed in this RFC: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 Much of this change was following how G_FSIN and G_FCOS were used. Changes: - `llvm/docs/GlobalISel/GenericOpcode.rst` - Document the `G_FTAN` opcode - `llvm/docs/LangRef.rst` - Document the tan intrinsic - `llvm/include/llvm/Analysis/VecFuncs.def` - Associate the tan intrinsic as a vector function similar to the tanf libcall. - `llvm/include/llvm/CodeGen/BasicTTIImpl.h` - Map the tan intrinsic to `ISD::FTAN` - `llvm/include/llvm/CodeGen/ISDOpcodes.h` - Define ISD opcodes for `FTAN` and `STRICT_FTAN` - `llvm/include/llvm/IR/Intrinsics.td` - Create the tan intrinsic - `llvm/include/llvm/IR/RuntimeLibcalls.def` - Define tan libcall mappings - `llvm/include/llvm/Target/GenericOpcodes.td` - Define the `G_FTAN` Opcode - `llvm/include/llvm/Support/TargetOpcodes.def` - Create a `G_FTAN` Opcode handler - `llvm/include/llvm/Target/GlobalISel/SelectionDAGCompat.td` - Map `G_FTAN` to `ftan` - `llvm/include/llvm/Target/TargetSelectionDAG.td` - Define `ftan`, `strict_ftan`, and `any_ftan` and map them to the ISD opcodes for `FTAN` and `STRICT_FTAN` - `llvm/lib/Analysis/VectorUtils.cpp` - Associate the tan intrinsic as a vector intrinsic - `llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp` Map the tan intrinsic to `G_FTAN` Opcode - `llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp` - Add `G_FTAN` to the list of floating point math operations also associate `G_FTAN` with the `TAN_F` runtime lib. - `llvm/lib/CodeGen/GlobalISel/Utils.cpp` - More floating point math operation common behaviors. - llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp - List the function expansion operations for `FTAN` and `STRICT_FTAN`. Also define both opcodes in `PromoteNode`. - `llvm/lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp` - More `FTAN` and `STRICT_FTAN` handling in the legalizer - `llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h` - Define `SoftenFloatRes_FTAN` and `ExpandFloatRes_FTAN`. - `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp` - Define `FTAN` as a legal vector operation. - `llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp` - Define `FTAN` as a legal vector operation. - `llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp` - define tan as an intrinsic that doesn't return NaN. - `llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp` Map `LibFunc_tan`, `LibFunc_tanf`, and `LibFunc_tanl` to `ISD::FTAN`. Map `Intrinsic::tan` to `ISD::FTAN` and add selection dag handling for `Intrinsic::tan`. - `llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp` - Define `ftan` and `strict_ftan` names for the equivalent ISD opcodes. - `llvm/lib/CodeGen/TargetLoweringBase.cpp` -Define a Tan128 libcall and ISD::FTAN as a target lowering action. - `llvm/lib/Target/X86/X86ISelLowering.cpp` - Add x86_64 lowering for tan intrinsic resolves https://github.com/llvm/llvm-project/issues/70082
2024-05-30[SelectionDAG] Add an ISD::CLEAR_CACHE node to lower llvm.clear_cache (#93795)Roger Ferrer Ibáñez1-0/+4
The current way of lowering `llvm.clear_cache` is a bit unusual. As suggested by Matt Arsenault we are better off using an ISD node. This change introduces a new `ISD::CLEAR_CACHE`, registers a new libcall by default named `__clear_cache` and the default legalisation is a libcall. This is preparatory work for a custom lowering of `ISD::CLEAR_CACHE` needed by RISC-V on some platforms.
2024-05-29[ValueTypes] Remove MVT::MAX_ALLOWED_VALUETYPE. NFC (#93654)Craig Topper1-3/+0
Despite the comment, this isn't used to size bit vectors or tables. That's done by VALUETYPE_SIZE. MAX_ALLOWED_VALUETYPE is only used by some static_asserts that compare it to VALUETYPE_SIZE. This patch removes it and most of the static_asserts. I left one where I compared VALUETYPE_SIZE to token which is the first type that isn't part of the VALUETYPE range. This isn't strictly needed, we'd probably catch duplication error from VTEmitter.cpp first.
2024-05-20CodeGen: Fix libcall names for exp10 on the various darwins (#92520)Matt Arsenault1-0/+28
It's really great that we have the same information duplicated in TargetLibraryInfo and RuntimeLibcalls which both assume everything by default. Should fix issue reported after #92287
2024-05-09[Analysis] Add cost model for experimental.cttz.elts intrinsic (#90720)David Sherwood1-0/+18
In PR #88385 I've added support for auto-vectorisation of some early exit loops, which requires using the experimental.cttz.elts to calculate final indices in the early exit block. We need a more accurate cost model for this intrinsic to better reflect the cost of work required in the early exit block. I've tried to accurately represent the expansion code for the intrinsic when the target does not have efficient lowering for it. It's quite tricky to model because you need to first figure out what types will actually be used in the expansion. The type used can have a significant effect on the cost if you end up using illegal vector types. Tests added here: Analysis/CostModel/AArch64/cttz_elts.ll Analysis/CostModel/RISCV/cttz_elts.ll
2024-05-07[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of ↵Kazu Hirata1-2/+2
StringRef::equals (NFC) (#91304) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 53 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".
2024-04-16Recommit [RISCV] RISCV vector calling convention (2/2) (#79096) (#87736)Brandon Wu1-2/+10
Bug fix: Handle RVV return type in calling convention correctly. Return values are handled in a same way as function arguments. One thing to mention is that if a type can be broken down into homogeneous vector types, e.g. {<vscale x 4 x i32>, {<vscale x 4 x i32>, <vscale x 4 x i32>}}, it is considered as a vector tuple type and need to be handled by tuple type rule.
2024-03-28[ISel] Move handling of atomic loads from SystemZ to DAGCombiner (NFC). (#86484)Jonas Paulsson1-0/+6
The folding of sign/zero extensions into an atomic load by specifying an extension type is not target specific, and therefore belongs in the DAGCombiner rather than in the SystemZ backend. - Handle atomic loads similarly to regular loads by adding AtomicLoadExtActions with set/get methods. - Move SystemZ extendAtomicLoad() to DagCombiner.cpp.
2024-03-27[FreeBSD] Mark __stack_chk_guard dso_local except for PPC64 (#86665)Justin Cady1-1/+2
Adjust logic of 1cb9f37a17ab to match freebsd/freebsd-src@9a4d48a645a7a. D113443 is the original attempt to bring this FreeBSD patch to llvm-project, but it never landed. This change is required to build FreeBSD kernel modules with -fstack-protector using a standard LLVM toolchain. The FreeBSD kernel loader does not handle R_X86_64_REX_GOTPCRELX relocations. Fixes #50932.
2024-03-20[AArch64] Support scalable offsets with isLegalAddressingMode (#83255)Graham Hunter1-0/+4
Allows us to indicate that an addressing mode featuring a vscale-relative immediate offset is supported.
2024-03-11[CodeGen] Do not pass MF into MachineRegisterInfo methods. NFC. (#84770)Jay Foad1-1/+1
MachineRegisterInfo already knows the MF so there is no need to pass it in as an argument.
2024-03-04[SelectionDAG] Add `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16` (#80056)Shilei Tian1-0/+3
This patch adds the support for `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16`.
2024-03-04Revert "[SelectionDAG] Add `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16` (#80056)"Shilei Tian1-3/+0
This reverts commit b0c158bd947c360a4652eb0de3a4794f46deb88b. The changes in `compiler-rt` broke tests.
2024-03-04[SelectionDAG] Add `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16` (#80056)Shilei Tian1-0/+3
This patch adds the support for `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16`.
2024-02-13[X86][CodeGen] Restrict F128 lowering to GNU environment (#81664)Pranav Kant1-1/+1
Otherwise it breaks some environment like X64 Android that doesn't have f128 functions available in its libc. Followup to #79611.
2024-02-13[LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime ↵Joseph Huber1-0/+3
clocks (#81331) Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to `readcyclecounter` to avoid needing to change existing code and to make the separation more explicit.
2024-02-09[X86][CodeGen] Emit float128 libcalls for math functions (#79611)Pranav Kant1-0/+40
Make LLVM emit libcalls to proper float128 variants for float128 types.
2024-02-06[FastISel][X86] Use getTypeForExtReturn in GetReturnInfo. (#80803)Craig Topper1-9/+2
The comment and code here seems to match getTypeForExtReturn. The history shows that at the time this code was added, similar code existed in SelectionDAGBuilder. SelectionDAGBuiler code has since been refactored into getTypeForExtReturn. This patch makes FastISel match SelectionDAGBuilder. The test changes are because X86 has customization of getTypeForExtReturn. So now we only extend returns to i8. Stumbled onto this difference by accident.
2024-02-02[NFC] Add useFPRegsForHalfType(). (#74147)Harald van Dijk1-4/+9
Currently, half operations can be promoted in one of two ways. * If softPromoteHalfType() returns false, fp16 values are passed around in fp32 registers, and whole chains of fp16 operations are promoted to fp32 in one go. * If softPromoteHalfType() returns true, fp16 values are passed around in i16 registers, and individual fp16 operations are promoted to fp32 and the result truncated to fp16 right away. The softPromoteHalfType behavior is necessary for correctness, but changing this for an existing target breaks the ABI. Therefore, this commit adds a third option: * If softPromoteHalfType() returns true and useFPRegsForHalfType() returns true as well, fp16 values are passed around in fp32 registers, but individual fp16 operations are promoted to fp32 and the result truncated to fp16 right away. This change does not yet update any target to make use of it.
2024-01-25[llvm] Move CodeGenTypes library to its own directory (#79444)Nico Weber1-1/+1
Finally addresses https://reviews.llvm.org/D148769#4311232 :) No behavior change.
2024-01-11Set the default value for MaxAtomicSizeInBitsSupported to 0.James Y Knight1-3/+1
This was planned since its introduction, but wasn't rolled out for a little bit longer than intended (ahem...8 years). All in-tree targets have now been adjusted to call setMaxAtomicSizeInBitsSupported explicitly where required, so this should be a no-op. The docs in docs/Atomics.rst already claimed the default was 0, so that doesn't need updating.
2024-01-04[CodeGen] Remove unused variables in TargetLoweringBase.cpp (NFC)Jie Fu1-1/+0
llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:12: error: unused variable 'ModeN' [-Werror,-Wunused-variable] 570 | unsigned ModeN, ModelN; | ^~~~~ llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:19: error: unused variable 'ModelN' [-Werror,-Wunused-variable] 570 | unsigned ModeN, ModelN; | ^~~~~~ 2 errors generated.
2024-01-04Add out-of-line-atomics support to GlobalISel (#74588)Thomas Preud'homme1-15/+26
This patch implement the GlobalISel counterpart to 4d7df43ffdb460dddb2877a886f75f45c3fee188.
2023-11-27[llvm] Replace calls to Type::getPointerTo (NFC)Youngsuk Kim1-3/+3
Cleanup work towards removing the method Type::getPointerTo. If a call to Type::getPointerTo is used solely to support an unneeded pointer-cast, remove the call entirely.
2023-11-14[AMDGPU] Generic lowering for rint and nearbyint (#69596)Acim-Maravic1-4/+3
The are three different rounding intrinsics, that are brought down to same instruction. Co-authored-by: Acim Maravic <acim.maravic@amd.com>
2023-11-07[NFC] Remove Type::getInt8PtrTy (#71029)Paulo Matos1-6/+6
Replace this with PointerType::getUnqual(). Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-10-31insertSSPDeclarations: adjust Darwin condition that sets dso_localFangrui Song1-1/+2
This change is for AArch32 and not strictly needed, but it ensures that we follow the model that direct accesses are only emitted for dso_local and we do not need TargetMachine::shouldAssumeDSOLocal to force dso_local for a dso_preemptable variable. There is no behavior change to the arm/arm64 configurations listed in commit 5888dee7d04748744743a35d3aef030018bdc275.
2023-10-19ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924)Ramkumar Ramachandra1-6/+6
The issue #55208 noticed that std::rint is vectorized by the SLPVectorizer, but a very similar function, std::lrint, is not. std::lrint corresponds to ISD::LRINT in the SelectionDAG, and std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now, neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant, and the LangRef makes this clear in the documentation of llvm.lrint.* and llvm.llrint.*. This patch extends the LangRef to include vector variants of llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of scalarizing it for all targets. However, this patch would be devoid of motivation unless we show the utility of these new vector variants. Hence, the RISCV target has been chosen to implement a custom lowering to the vfcvt.x.f.v instruction. The patch also includes a CostModel for RISCV, and a trivial follow-up can potentially enable the SLPVectorizer to vectorize std::lrint and std::llrint, fixing #55208. The patch includes tests, obviously for the RISCV target, but also for the X86, AArch64, and PowerPC targets to justify the addition of the vector variants to the LangRef.
2023-09-01IR: Add llvm.exp10 intrinsicMatt Arsenault1-3/+3
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10 to fix this asymmetry. AMDGPU already has most of the code for f32 exp10 expansion implemented alongside exp, so the current implementation is duplicating nearly identical effort between the compiler and library which is inconvenient. https://reviews.llvm.org/D157871
2023-08-24[FPEnv] Intrinsics for access to FP control modesSerge Pavlov1-0/+6
The change introduces intrinsics 'get_fpmode', 'set_fpmode' and 'reset_fpmode'. They manage all target dynamic floating-point control modes, which include, for instance, rounding direction, precision, treatment of denormals and so on. The intrinsics do the same operations as the C library functions 'fegetmode' and 'fesetmode'. By default they are lowered to calls to these functions. Two main use cases are supported by this implementation. 1. Local modification of the control modes. In this case the code usually has a pattern (in pseudocode): saved_modes = get_fpmode() set_fpmode(<new_modes>) ... <do operations under the new modes> ... set_fpmode(saved_modes) In the case when it is known that the current FP environment is default, the code may be shorter: set_fpmode(<new_modes>) ... <do operations under the new modes> ... reset_fpmode() Such patterns appear not only in user code but also in implementations of various FP controlling pragmas. In particular, the implementation of `#pragma STDC FENV_ROUND` requires similar code if the target does not support static rounding mode. 2. Portable control of FP modes. Usually FP control modes are set by writing to some control register. Different targets have different layout of this register, the way the register is accessed also may be different. Using set of target-specific definitions for the control register bits together with these intrinsic functions provides enough portable way to handle control modes across wide range of hardware. This change defines only llvm intrinsic function, which implement the access required for the aforementioned use cases. Differential Revision: https://reviews.llvm.org/D82525
2023-07-31[AArch64] Add some basic handling for bf16 constants.David Green1-1/+1
This adds some basic handling for bf16 constants, attempting to treat them a lot like fp16 constants where it can. Zero immediates get lowered to FMOVH0, others either get lowered to FMOVWHr(MOVi32imm) or use FMOVHi if they can. Without fp16 they get expanded. This may not always be optimal, but fixes a gap in our lowering. See llvm/test/CodeGen/AArch64/f16-imm.ll for the equivalent fp16 test. Differential Revision: https://reviews.llvm.org/D156649
2023-06-28IR: Add llvm.frexp intrinsicMatt Arsenault1-2/+13
Add an intrinsic which returns the two pieces as multiple return values. Alternatively could introduce a pair of intrinsics to separately return the fractional and exponent parts. AMDGPU has native instructions to return the two halves, but could use some generic legalization and optimization handling. For example, we should be able to handle legalization of f16 on older targets, and for bf16. Additionally antique targets need a hardware workaround which would be better handled in the backend rather than in library code where it is now.