aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/TargetLoweringBase.cpp
AgeCommit message (Collapse)AuthorFilesLines
2024-05-09[Analysis] Add cost model for experimental.cttz.elts intrinsic (#90720)David Sherwood1-0/+18
In PR #88385 I've added support for auto-vectorisation of some early exit loops, which requires using the experimental.cttz.elts to calculate final indices in the early exit block. We need a more accurate cost model for this intrinsic to better reflect the cost of work required in the early exit block. I've tried to accurately represent the expansion code for the intrinsic when the target does not have efficient lowering for it. It's quite tricky to model because you need to first figure out what types will actually be used in the expansion. The type used can have a significant effect on the cost if you end up using illegal vector types. Tests added here: Analysis/CostModel/AArch64/cttz_elts.ll Analysis/CostModel/RISCV/cttz_elts.ll
2024-05-07[Analysis, CodeGen, DebugInfo] Use StringRef::operator== instead of ↵Kazu Hirata1-2/+2
StringRef::equals (NFC) (#91304) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 53 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".
2024-04-16Recommit [RISCV] RISCV vector calling convention (2/2) (#79096) (#87736)Brandon Wu1-2/+10
Bug fix: Handle RVV return type in calling convention correctly. Return values are handled in a same way as function arguments. One thing to mention is that if a type can be broken down into homogeneous vector types, e.g. {<vscale x 4 x i32>, {<vscale x 4 x i32>, <vscale x 4 x i32>}}, it is considered as a vector tuple type and need to be handled by tuple type rule.
2024-03-28[ISel] Move handling of atomic loads from SystemZ to DAGCombiner (NFC). (#86484)Jonas Paulsson1-0/+6
The folding of sign/zero extensions into an atomic load by specifying an extension type is not target specific, and therefore belongs in the DAGCombiner rather than in the SystemZ backend. - Handle atomic loads similarly to regular loads by adding AtomicLoadExtActions with set/get methods. - Move SystemZ extendAtomicLoad() to DagCombiner.cpp.
2024-03-27[FreeBSD] Mark __stack_chk_guard dso_local except for PPC64 (#86665)Justin Cady1-1/+2
Adjust logic of 1cb9f37a17ab to match freebsd/freebsd-src@9a4d48a645a7a. D113443 is the original attempt to bring this FreeBSD patch to llvm-project, but it never landed. This change is required to build FreeBSD kernel modules with -fstack-protector using a standard LLVM toolchain. The FreeBSD kernel loader does not handle R_X86_64_REX_GOTPCRELX relocations. Fixes #50932.
2024-03-20[AArch64] Support scalable offsets with isLegalAddressingMode (#83255)Graham Hunter1-0/+4
Allows us to indicate that an addressing mode featuring a vscale-relative immediate offset is supported.
2024-03-11[CodeGen] Do not pass MF into MachineRegisterInfo methods. NFC. (#84770)Jay Foad1-1/+1
MachineRegisterInfo already knows the MF so there is no need to pass it in as an argument.
2024-03-04[SelectionDAG] Add `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16` (#80056)Shilei Tian1-0/+3
This patch adds the support for `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16`.
2024-03-04Revert "[SelectionDAG] Add `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16` (#80056)"Shilei Tian1-3/+0
This reverts commit b0c158bd947c360a4652eb0de3a4794f46deb88b. The changes in `compiler-rt` broke tests.
2024-03-04[SelectionDAG] Add `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16` (#80056)Shilei Tian1-0/+3
This patch adds the support for `STRICT_BF16_TO_FP` and `STRICT_FP_TO_BF16`.
2024-02-13[X86][CodeGen] Restrict F128 lowering to GNU environment (#81664)Pranav Kant1-1/+1
Otherwise it breaks some environment like X64 Android that doesn't have f128 functions available in its libc. Followup to #79611.
2024-02-13[LLVM] Add `__builtin_readsteadycounter` intrinsic and builtin for realtime ↵Joseph Huber1-0/+3
clocks (#81331) Summary: This patch adds a new intrinsic and builtin function mirroring the existing `__builtin_readcyclecounter`. The difference is that this implementation targets a separate counter that some targets have which returns a fixed frequency clock that can be used to determine elapsed time, this is different compared to the cycle counter which often has variable frequency. This patch only adds support for the NVPTX and AMDGPU targets. This is done as a new and separate builtin rather than an argument to `readcyclecounter` to avoid needing to change existing code and to make the separation more explicit.
2024-02-09[X86][CodeGen] Emit float128 libcalls for math functions (#79611)Pranav Kant1-0/+40
Make LLVM emit libcalls to proper float128 variants for float128 types.
2024-02-06[FastISel][X86] Use getTypeForExtReturn in GetReturnInfo. (#80803)Craig Topper1-9/+2
The comment and code here seems to match getTypeForExtReturn. The history shows that at the time this code was added, similar code existed in SelectionDAGBuilder. SelectionDAGBuiler code has since been refactored into getTypeForExtReturn. This patch makes FastISel match SelectionDAGBuilder. The test changes are because X86 has customization of getTypeForExtReturn. So now we only extend returns to i8. Stumbled onto this difference by accident.
2024-02-02[NFC] Add useFPRegsForHalfType(). (#74147)Harald van Dijk1-4/+9
Currently, half operations can be promoted in one of two ways. * If softPromoteHalfType() returns false, fp16 values are passed around in fp32 registers, and whole chains of fp16 operations are promoted to fp32 in one go. * If softPromoteHalfType() returns true, fp16 values are passed around in i16 registers, and individual fp16 operations are promoted to fp32 and the result truncated to fp16 right away. The softPromoteHalfType behavior is necessary for correctness, but changing this for an existing target breaks the ABI. Therefore, this commit adds a third option: * If softPromoteHalfType() returns true and useFPRegsForHalfType() returns true as well, fp16 values are passed around in fp32 registers, but individual fp16 operations are promoted to fp32 and the result truncated to fp16 right away. This change does not yet update any target to make use of it.
2024-01-25[llvm] Move CodeGenTypes library to its own directory (#79444)Nico Weber1-1/+1
Finally addresses https://reviews.llvm.org/D148769#4311232 :) No behavior change.
2024-01-11Set the default value for MaxAtomicSizeInBitsSupported to 0.James Y Knight1-3/+1
This was planned since its introduction, but wasn't rolled out for a little bit longer than intended (ahem...8 years). All in-tree targets have now been adjusted to call setMaxAtomicSizeInBitsSupported explicitly where required, so this should be a no-op. The docs in docs/Atomics.rst already claimed the default was 0, so that doesn't need updating.
2024-01-04[CodeGen] Remove unused variables in TargetLoweringBase.cpp (NFC)Jie Fu1-1/+0
llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:12: error: unused variable 'ModeN' [-Werror,-Wunused-variable] 570 | unsigned ModeN, ModelN; | ^~~~~ llvm-project/llvm/lib/CodeGen/TargetLoweringBase.cpp:570:19: error: unused variable 'ModelN' [-Werror,-Wunused-variable] 570 | unsigned ModeN, ModelN; | ^~~~~~ 2 errors generated.
2024-01-04Add out-of-line-atomics support to GlobalISel (#74588)Thomas Preud'homme1-15/+26
This patch implement the GlobalISel counterpart to 4d7df43ffdb460dddb2877a886f75f45c3fee188.
2023-11-27[llvm] Replace calls to Type::getPointerTo (NFC)Youngsuk Kim1-3/+3
Cleanup work towards removing the method Type::getPointerTo. If a call to Type::getPointerTo is used solely to support an unneeded pointer-cast, remove the call entirely.
2023-11-14[AMDGPU] Generic lowering for rint and nearbyint (#69596)Acim-Maravic1-4/+3
The are three different rounding intrinsics, that are brought down to same instruction. Co-authored-by: Acim Maravic <acim.maravic@amd.com>
2023-11-07[NFC] Remove Type::getInt8PtrTy (#71029)Paulo Matos1-6/+6
Replace this with PointerType::getUnqual(). Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-10-31insertSSPDeclarations: adjust Darwin condition that sets dso_localFangrui Song1-1/+2
This change is for AArch32 and not strictly needed, but it ensures that we follow the model that direct accesses are only emitted for dso_local and we do not need TargetMachine::shouldAssumeDSOLocal to force dso_local for a dso_preemptable variable. There is no behavior change to the arm/arm64 configurations listed in commit 5888dee7d04748744743a35d3aef030018bdc275.
2023-10-19ISel: introduce vector ISD::LRINT, ISD::LLRINT; custom RISCV lowering (#66924)Ramkumar Ramachandra1-6/+6
The issue #55208 noticed that std::rint is vectorized by the SLPVectorizer, but a very similar function, std::lrint, is not. std::lrint corresponds to ISD::LRINT in the SelectionDAG, and std::llrint is a familiar cousin corresponding to ISD::LLRINT. Now, neither ISD::LRINT nor ISD::LLRINT have a corresponding vector variant, and the LangRef makes this clear in the documentation of llvm.lrint.* and llvm.llrint.*. This patch extends the LangRef to include vector variants of llvm.lrint.* and llvm.llrint.*, and lays the necessary ground-work of scalarizing it for all targets. However, this patch would be devoid of motivation unless we show the utility of these new vector variants. Hence, the RISCV target has been chosen to implement a custom lowering to the vfcvt.x.f.v instruction. The patch also includes a CostModel for RISCV, and a trivial follow-up can potentially enable the SLPVectorizer to vectorize std::lrint and std::llrint, fixing #55208. The patch includes tests, obviously for the RISCV target, but also for the X86, AArch64, and PowerPC targets to justify the addition of the vector variants to the LangRef.
2023-09-01IR: Add llvm.exp10 intrinsicMatt Arsenault1-3/+3
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10 to fix this asymmetry. AMDGPU already has most of the code for f32 exp10 expansion implemented alongside exp, so the current implementation is duplicating nearly identical effort between the compiler and library which is inconvenient. https://reviews.llvm.org/D157871
2023-08-24[FPEnv] Intrinsics for access to FP control modesSerge Pavlov1-0/+6
The change introduces intrinsics 'get_fpmode', 'set_fpmode' and 'reset_fpmode'. They manage all target dynamic floating-point control modes, which include, for instance, rounding direction, precision, treatment of denormals and so on. The intrinsics do the same operations as the C library functions 'fegetmode' and 'fesetmode'. By default they are lowered to calls to these functions. Two main use cases are supported by this implementation. 1. Local modification of the control modes. In this case the code usually has a pattern (in pseudocode): saved_modes = get_fpmode() set_fpmode(<new_modes>) ... <do operations under the new modes> ... set_fpmode(saved_modes) In the case when it is known that the current FP environment is default, the code may be shorter: set_fpmode(<new_modes>) ... <do operations under the new modes> ... reset_fpmode() Such patterns appear not only in user code but also in implementations of various FP controlling pragmas. In particular, the implementation of `#pragma STDC FENV_ROUND` requires similar code if the target does not support static rounding mode. 2. Portable control of FP modes. Usually FP control modes are set by writing to some control register. Different targets have different layout of this register, the way the register is accessed also may be different. Using set of target-specific definitions for the control register bits together with these intrinsic functions provides enough portable way to handle control modes across wide range of hardware. This change defines only llvm intrinsic function, which implement the access required for the aforementioned use cases. Differential Revision: https://reviews.llvm.org/D82525
2023-07-31[AArch64] Add some basic handling for bf16 constants.David Green1-1/+1
This adds some basic handling for bf16 constants, attempting to treat them a lot like fp16 constants where it can. Zero immediates get lowered to FMOVH0, others either get lowered to FMOVWHr(MOVi32imm) or use FMOVHi if they can. Without fp16 they get expanded. This may not always be optimal, but fixes a gap in our lowering. See llvm/test/CodeGen/AArch64/f16-imm.ll for the equivalent fp16 test. Differential Revision: https://reviews.llvm.org/D156649
2023-06-28IR: Add llvm.frexp intrinsicMatt Arsenault1-2/+13
Add an intrinsic which returns the two pieces as multiple return values. Alternatively could introduce a pair of intrinsics to separately return the fractional and exponent parts. AMDGPU has native instructions to return the two halves, but could use some generic legalization and optimization handling. For example, we should be able to handle legalization of f16 on older targets, and for bf16. Additionally antique targets need a hardware workaround which would be better handled in the backend rather than in library code where it is now.
2023-06-23Darwin: Use the GOT to reference ___stack_chk_guard.Amara Emerson1-1/+2
e018cbf7208b changed the default behaviour for Darwin, and this breaks some existing software. rdar://110350601
2023-06-13[Intrinsic] Introduce reduction intrinsics for minimum/maximumAnna Thomas1-1/+2
This patch introduces the reduction intrinsic for floating point minimum and maximum which has the same semantics (for NaN and signed zero) as llvm.minimum and llvm.maximum. Reviewed-By: nikic Differential Revision: https://reviews.llvm.org/D152370
2023-06-06IR: Add llvm.ldexp and llvm.experimental.constrained.ldexp intrinsicsMatt Arsenault1-1/+14
AMDGPU has native instructions and target intrinsics for this, but these really should be subject to legalization and generic optimizations. This will enable legalization of f16->f32 on targets without f16 support. Implement a somewhat horrible inline expansion for targets without libcall support. This could be better if we could introduce control flow (GlobalISel version not yet implemented). Support for strictfp legalization is less complete but works for the simple cases.
2023-06-05[FPEnv] Intrinsics for access to FP environmentSerge Pavlov1-0/+8
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'. They are used to read floating-point environment, set it or reset to some default state. They do the same actions as C library functions 'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls to these functions. The new intrinsics specify FP environment as a value of integer type, it is convenient of most targets where the FP state is a content of some register. Some targets however use long representations. On X86 the size of FP environment is 256 bits, and even half of this size is not a legal ibteger type. To facilitate legalization in such cases, two sets of DAG nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP environment may be represented by a legal integer type. Nodes GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in memory, much like `fesetenv` and `fegetenv` do. They are used when target has long representation for floationg-point state. Differential Revision: https://reviews.llvm.org/D71742
2023-05-23[IR] Make stack protector symbol dso_local according to ↵Fangrui Song1-1/+1
-f[no-]direct-access-external-data There are two motivations. `-fno-pic -fstack-protector -mstack-protector-guard=global` created `__stack_chk_guard` is referenced directly on all ELF OSes except FreeBSD. This patch allows referencing the symbol indirectly with -fno-direct-access-external-data. Some Linux kernel folks want `-fno-pic -fstack-protector -mstack-protector-guard-reg=gs -mstack-protector-guard-symbol=__stack_chk_guard` created `__stack_chk_guard` to be referenced directly, avoiding R_X86_64_REX_GOTPCRELX (even if the relocation may be optimized out by the linker). https://github.com/llvm/llvm-project/issues/60116 Why they need this isn't so clear to me. --- Add module flag "direct-access-external-data" and set the dso_local property of the stack protector symbol. The module flag can benefit other LLVMCodeGen synthesized symbols that are not represented in LLVM IR. Nowadays, with `-fno-pic` being uncommon, ideally we should set "direct-access-external-data" when it is true. However, doing so would require ~90 clang/test tests to be updated, which are too much. As a compromise, we set "direct-access-external-data" only when it's different from the implied default value. Reviewed By: nickdesaulniers Differential Revision: https://reviews.llvm.org/D150841
2023-05-03Restore CodeGen/MachineValueType.h from `Support`NAKAMURA Takumi1-1/+1
This is rework of; - rG13e77db2df94 (r328395; MVT) Since `LowLevelType.h` has been restored to `CodeGen`, `MachinveValueType.h` can be restored as well. Depends on D148767 Differential Revision: https://reviews.llvm.org/D149024
2023-04-29[SelectionDAG] Rename ADDCARRY/SUBCARRY to UADDO_CARRY/USUBO_CARRY (NFC)Sergei Barannikov1-2/+2
This will make them consistent with other overflow-aware nodes. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D148196
2023-04-05[SelectionDAG] Expand VP SDNodes by default.Craig Topper1-4/+4
Differential Revision: https://reviews.llvm.org/D147643
2023-03-01[CodeGen] Always expand division larger than i128Nikita Popov1-1/+3
Default MaxDivRemBitWidthSupported to 128, so that divisions larger than 128 bits are always expanded, without requiring additional configuration from the target. Note that this may still emit calls to __udivti3 on 32-bit targets, which likely don't have an implementation of that builtin. However, I believe this is sufficient to fix https://github.com/llvm/llvm-project/issues/60531, because Zig must already be defining those builtins. Differential Revision: https://reviews.llvm.org/D144871
2023-02-15Use llvm::has_single_bit<uint32_t> (NFC)Kazu Hirata1-1/+1
This patch replaces isPowerOf2_32 with llvm::has_single_bit<uint32_t> where the argument is wider than uint32_t.
2023-02-14Revert "[CGP] Add generic TargetLowering::shouldAlignPointerArgs() ↵Jake Egan1-37/+0
implementation" These commits are causing a test-suite build failure on AIX. Revert for now for time to investigate. https://lab.llvm.org/buildbot/#/builders/214/builds/5779/steps/9/logs/stdio This reverts commit bd87a2449da0c82e63cebdf9c131c54a5472e3a7 and 4c72266830ffa332ebb7cf1d3bbd6c56d001fa0f.
2023-02-14[CodeGen] Trivial simplification of some getRegisterType calls. NFC.Jay Foad1-1/+1
2023-02-09Fix call to deprecated API in bd87a2449da0c82e63cebdf9c131c54a5472e3a7Alex Richardson1-1/+1
2023-02-09[CGP] Add generic TargetLowering::shouldAlignPointerArgs() implementationAlex Richardson1-0/+37
This function was added for ARM targets, but aligning global/stack pointer arguments passed to memcpy/memmove/memset can improve code size and performance for all targets that don't have fast unaligned accesses. This adds a generic implementation that adjusts the alignment to pointer size if unaligned accesses are slow. Review D134168 suggests that this significantly improves performance on synthetic benchmarks such as Dhrystone on RV32 as it avoids memcpy() calls. Reviewed By: efriedma Differential Revision: https://reviews.llvm.org/D134282
2023-02-07[NFC][TargetParser] Remove llvm/ADT/Triple.hArchibald Elliott1-1/+1
I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.
2023-01-28Use llvm::bit_ceil (NFC)Kazu Hirata1-2/+1
Note that: std::has_single_bit(X) ? X : llvm::NextPowerOf2(X); is equivalent to: std::bit_ceil(X) even for input 0.
2023-01-13DAG/GlobalISel: Fix broken/redundant setting of MODereferenceableMatt Arsenault1-4/+6
This was incorrectly setting dereferenceable on unaligned operands. getLoadMemOperandFlags does the alignment dereferenceabilty check without alignment, and then both paths went on to check isDereferenceableAndAlignedPointer. Make getLoadMemOperandFlags check isDereferenceableAndAlignedPointer, and remove the second call.
2023-01-11[NFC] Use TypeSize::getKnownMinValue() instead of TypeSize::getKnownMinSize()Guillaume Chatelet1-1/+1
This change is one of a series to implement the discussion from https://reviews.llvm.org/D141134.
2022-12-31[DAGCombiner][TLI] Do not fuse bitcast to <1 x ?> into a load/store of a vectorRoman Lebedev1-0/+6
Single-element vectors are legalized by splitting, so the the memory operations would also get scalarized. While we do have some support to reconstruct scalarized loads, we clearly don't catch everything. The comment for the affected AArch64 store suggests that having two stores was the desired outcome in the first place. This was showing as a source of *many* regressions with more aggressive ZERO_EXTEND_VECTOR_INREG recognition.
2022-12-31[NFC][TLI] Move `isLoadBitCastBeneficial()` implementation into source fileRoman Lebedev1-0/+22
... so any change to it does not cause 700 source files to be recompiled.
2022-12-01[X86] Add ExpandLargeFpConvert Pass and enable for X86Freddy Ye1-0/+2
As stated in https://discourse.llvm.org/t/rfc-llc-add-expandlargeintfpconvert-pass-for-fp-int-conversion-of-large-bitint/65528, this implementation is very similar to ExpandLargeDivRem, which expands ‘fptoui .. to’, ‘fptosi .. to’, ‘uitofp .. to’, ‘sitofp .. to’ instructions with a bitwidth above a threshold into auto-generated functions. This is useful for targets like x86_64 that cannot lower fp convertions with more than 128 bits. The expanded nodes are referring from the IR generated by `compiler-rt/lib/builtins/floattidf.c`, `compiler-rt/lib/builtins/fixdfti.c`, and etc. Corner cases: 1. For fp16: as there is no related builtins added in compliler-rt. So I mainly utilized the fp32 <-> fp16 lib calls to implement. 2. For fp80: as this pass is soft fp emulation and no fp80 instructions can help in this problem. I recommend users to deprecate this usage. For now, the implementation uses fp128 as the temporary conversion type and inserts fptrunc/ext at top/end of the function. 3. For bf16: as clang FE currently doesn't support bf16 algorithm operations (convert to int, float, +, -, *, ...), this patch doesn't consider bf16 for now. 4. For unsigned FPToI: since both default hardware behaviors and libgcc are ignoring "returns 0 for negative input" spec. This pass follows this old way to ignore unsigned FPToI. See this example: https://gcc.godbolt.org/z/bnv3jqW1M The end-to-end tests are uploaded at https://reviews.llvm.org/D138261 Reviewed By: LuoYuanke, mgehre-amd Differential Revision: https://reviews.llvm.org/D137241
2022-11-22[X86] Allow no X87 on 32-bitPhoebe Wang1-0/+9
This patch is an alternative of D100091. It solved the problems in `f80` type lowering. Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D137946