aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen
AgeCommit message (Collapse)AuthorFilesLines
10 hoursSelectionDAG/expandFMINNUM_FMAXNUM: skips vector if SETCC/VSELECT is not ↵YunQiang Su1-0/+5
legal (#109570) If SETCC or VSELECT is not legal for vector, we should not expand it, instead we can split the vectors. So that, some simple scale instructions can be emitted instead of some pairs of comparation+selection.
13 hours[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)Jeffrey Byrnes1-3/+3
Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.
19 hours[NFC] Format MachineVerifier.cpp to remove extra indentation (#111602)Ellis Hoag1-250/+251
Many structs in this class have the wrong indentation. To generate this diff, I touched the first line of each struct and then ran `git clang-format`. This will make blaming more difficult, but this autoformatting is difficult to avoid triggering. I think it's best to push this as one NFC PR.
25 hoursDAG: Preserve more flags when expanding gep (#110815)Matt Arsenault1-8/+24
This allows selecting the addressing mode for stack instructions in cases where we need to prove the sign bit is zero.
33 hours[LiveDebugValues][NVPTX]VarLocBasedImpl handle vregs, enable for NVPTX (#111456)William G Hatch2-13/+20
This patch handles virtual registers in the VarLocBasedImpl of the LiveDebugVariables pass, allowing it to be used on architectures that depend on virtual registers in debugging, like NVPTX. It enables the pass for NVPTX.
42 hours[DAG] foldVSelectToSignBitSplatMask - pull out repeated code and use ↵Simon Pilgrim1-3/+4
getShiftAmountConstant helper. We're assuming shift amount type matches the result type - which is true for vectors, but I'm hoping to generalize this fold in the future.
2 days[NFC][EarlyIfConverter] Rename SSAIfConv::runOnMachineFunction to ↵Juan Manuel Martinez Caamaño1-4/+4
SSAIfConv::init (#111500)
2 daysFix comment typo in ExpandFCOPYSIGN (#111489)Ralf Jung1-1/+2
I noticed this while following https://github.com/llvm/llvm-project/pull/111269. It makes little sense that FCOPYSIGN would look at the sign of `x`, right? Surely this must be `y`. Also fix the inconsistency where it's sometimes `x` and sometimes `X`.
2 daysRevert "[NFC][EarlyIfConverter] Turn SSAIfConv into a local variable ↵Juan Manuel Martinez Caamaño1-21/+22
(#107390)" (#111385) This reverts commit 09a4c23eb410d4be52202bed21c967a3653c3544.
3 daysRevert "[NFC][EarlyIfConverter] Replace boolean Predicate for a class ↵Juan Manuel Martinez Caamaño1-121/+136
(#108519)" (#111372) This reverts commit 9e7315912656628b606e884e39cdeb261b476f16.
3 daysRevert "[NFC][EarlyIfConverter] Remove unused member variables"Juan Manuel Martinez Caamaño1-0/+8
This reverts commit 3c83102f0615c7d66f6df698ca472ddbf0e9483d.
3 days[LLVM][CodeGen] Add lowering for scalable vector bfloat operations. (#109803)Paul Walker2-4/+65
Specifically: fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm, fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt & ftrunc
3 days[LegalizeVectorTypes] Always widen fabs (#111298)Luke Lau1-2/+1
fabs and fneg are similar nodes in that they can always be expanded to integer ops, but currently they diverge when widened. If the widened vector fabs is marked as expand (and the corresponding scalar type is too), LegalizeVectorTypes thinks that it may be turned into a libcall and so will unroll it to avoid the overhead on the undef elements. However unlike the other ops in that list like fsin, fround, flog etc., an fabs marked as expand will never be legalized into a libcall. Like fneg, it can always be expanded into an integer op. This moves it below unrollExpandedOp to bring it in line with fneg, which fixes an issue on RISC-V with f16 fabs being unexpectedly scalarized when there's no zfhmin.
3 days[LegalizeVectorTypes] When widening don't check for libcalls if promoted ↵Luke Lau1-1/+1
(#111297) When widening some FP ops, LegalizeVectorTypes will check to see if the widened op may be scalarized and then turned into a bunch of libcalls, and if so unroll early to avoid unnecessary libcalls of the padded undef elements. It checks if the widened op is legal or custom to see if it will be scalarized, but promoted ops will also avoid scalarization. This relaxes the check to account for this which fixes some illegal vector types on RISC-V from being scalarized when they could be widened.
4 days[CodeGen] Avoid repeated hash lookups (NFC) (#111274)Kazu Hirata1-6/+4
5 days[GISel] Don't preserve NSW flag when converting G_MUL of INT_MIN to G_SHL. ↵Craig Topper1-0/+2
(#111230) mul and shl have different meanings for the nsw flag. We need to drop it when converting a multiply by the minimum negative value.
6 daysRevert "[CFIFixup] Factor CFI remember/restore insertion into a helper ↵Daniel Hoekwater1-30/+16
(NFC)" (#111168) Reverts llvm/llvm-project#111066 This seems to be breaking some builds: - https://lab.llvm.org/buildbot/#/builders/51/builds/4732 - https://lab.llvm.org/buildbot/#/builders/41/builds/2534 - https://lab.llvm.org/buildbot/#/builders/73/builds/6601
6 days[CFIFixup] Factor CFI remember/restore insertion into a helper (NFC) (#111066)Daniel Hoekwater1-16/+30
Inserting a remember/restore pair is a very clean abstraction, so we can split the logic out into a helper function. Additionally, cleaning this up will make it easier to add logic for handling functions that are split across multiple sections.
6 days[NFC][CodeGen] Remove unused HasFakeUses MachineFunctionPropertyStephen Tozer1-1/+0
A previous commit d826b0c9 accidentally added a new MachineFunctionProperty, HasFakeUses, that was unused by the commit (and results in an uncovered-switch warning, which was fixed by a separate followup 1811e872); this patch removes that enum value.
6 days[CodeGen] Fix enumeration value 'HasFakeUses' not handled in switch (NFC)Jie Fu1-0/+1
llvm-project/llvm/lib/CodeGen/MachineFunction.cpp:95:10: error: enumeration value 'HasFakeUses' not handled in switch [-Werror,-Wswitch] switch(Prop) { ^~~~ 1 error generated.
6 days[LLVM] Add HasFakeUses to MachineFunction (#110097)Stephen Tozer5-8/+32
Following the addition of the llvm.fake.use intrinsic and corresponding MIR instruction, two further changes are planned: to add an -fextend-lifetimes flag to Clang that emits these intrinsics, and to have -Og enable this flag by default. Currently, some logic for handling fake uses is gated by the optdebug attribute, which is intended to be switched on by -fextend-lifetimes (and by extension -Og later on). However, the decision was made that a general optdebug attribute should be incompatible with other opt_ attributes (e.g. optsize, optnone), since they all express different intents for how to optimize the program. We would still like to allow -fextend-lifetimes with optsize however (i.e. -Os -fextend-lifetimes should be legal), since it may be a useful configuration and there is no technical reason to not allow it. This patch resolves this by tracking MachineFunctions that have fake uses, allowing us to run passes that interact with them and skip passes that clash with them.
7 days[NVPTX] add support for .debug_loc section (#110905)William G Hatch2-22/+27
Enable .debug_loc section for NVPTX backend. This commit makes NVPTX omit DW_AT_low_pc (and DW_AT_high_pc) for DW_TAG_compile_unit. This is because cuda-gdb uses the compile unit's low_pc as a base address, and adds the addresses in the debug_loc section to it. Removing low_pc is equivalent to setting that base address to zero, so addition doesn't break the location ranges. Additionally, this patch forces debug_loc label emission to emit single labels with no subtraction or base. This would not be necessary if we could emit `label1 - label2` expressions in PTX. The PTX documentation at https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#debugging-directives-section makes it seem like this is supported, but it doesn't actually work. I believe when that documentation says that you can subtract “label addresses between labels in the same dwarf section”, it doesn't merely mean that the labels need to be in the same section as each other, but in fact they need to be in the same section as the use. If support for label subtraction is supported such that in the debug_loc section you can subtract labels from the main code section, then we can remove the workarounds added in this PR. Also, since this now emits valid .debug_loc sections, it replaces the empty .debug_loc to force existence of at least one debug section with an empty .debug_macinfo section, which matches what nvcc does.
7 days[SDAG][RISCV] Don't promote VP_REDUCE_{FADD,FMUL} (#111000)Luke Lau1-3/+0
In https://reviews.llvm.org/D153848, promotion was added for a variety of f16 ops with zvfhmin, including VP reductions. However I don't believe it's correct to promote f16 fadd or fmul reductions to f32 since we need to round the intermediate results. Today if we lower @llvm.vp.reduce.fadd.nxv1f16 on RISC-V, we'll get two different results depending on whether we compiled with +zvfh or +zvfhmin, for example with a 3 element reduction: ; v9 = [0.1563, 5.97e-8, 0.00006104] ; zvfh vsetivli x0, 3, e16, m1, ta, ma vmv.v.i v8, 0 vfredosum.vs v8, v9, v8 vfmv.f.s fa0, v8 ; fa0 = 0.1563 ; zvfhmin vsetivli x0, 3, e16, m1, ta, ma vfwcvt.f.f.v v10, v9 vsetivli x0, 3, e32, m1, ta, ma vmv.v.i v8, 0 vfredosum.vs v8, v10, v8 vfmv.f.s fa0, v8 fcvt.h.s fa0, fa0 ; fa0 = 0.1564 This same thing happens with reassociative reductions e.g. vfredusum.vs, and this also applies for bf16. I couldn't find anything in the LangRef for reductions that suggest the excess precision is allowed. There may be something we can do in Clang with -fexcess-precision=fast, but I haven't looked into this yet. I presume the same precision issue occurs with fmul, but not with fmin/fmax/fminimum/fmaximum. I can't think of another way of lowering these other than scalarizing, and we can't scalarize scalable vectors, so this just removes the promotion and adjusts the cost model to return an invalid cost. (It looks like we also don't currently cost fmul reductions, so presumably they also have an invalid cost?) I think this should be enough to stop the loop vectorizer or SLP from emitting these intrinsics.
7 daysFix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if instead of #ifdef ↵Mehdi Amini1-1/+1
(#110938) This macros is always defined: either 0 or 1. The correct pattern is to use #if. Re-apply #110185 with more fixes for debug build with the ABI breaking checks disabled.
8 daysRevert "Fix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if inst… ↵Christopher Di Bella1-1/+1
(#110923) …ead of #ifdef (#110883)" This reverts commit 1905cdbf4ef15565504036c52725cb0622ee64ef, which causes lots of failures where LLVM doesn't have the right header guards. The errors can be seen on [BuildKite](https://buildkite.com/llvm-project/upstream-bazel/builds/112362#01924eae-231c-4d06-ba87-2c538cf40e04), where the source uses `#ifndef NDEBUG`, but the content in question is defined when `LLVM_ENABLE_ABI_BREAKING_CHECKS == 1`. For example, `llvm/include/llvm/Support/GenericDomTreeConstruction.h` has the following: ```cpp // Helper struct used during edge insertions. struct InsertionInfo { // ... #ifdef LLVM_ENABLE_ABI_BREAKING_CHECKS SmallVector<TreeNodePtr, 8> VisitedUnaffected; #endif }; // ... InsertionInfo II; // ... #ifndef NDEBUG II.VisitedUnaffected.push_back(SuccTN); #endif ```
8 days[CodeGen] Fix InstructionCount remarks for MI bundles (#107621)Francis Visoiu Mistrih1-7/+33
For MI bundles, the instruction count remark doesn't count the instructions inside the bundle.
8 days[CodeLayout] Size-aware machine block placement (#109711)spupyrev1-33/+84
This is an implementation of a new "size-aware" machine block placement. The idea is to reorder blocks so that the number of fall-through jumps is maximized. Observe that profile data is ignored for the optimization, and it is applied only for instances with hasOptSize()=true. This strategy has two benefits: (i) it eliminates jump instructions, which results in smaller text size; (ii) we avoid using profile data while reordering blocks, which yields more "uniform" functions, thus helping ICF and machine outliner/merger. For large (mobile) apps, the size benefits of (i) and (ii) are roughly the same, combined providing up to 0.5% uncompressed and up to 1% compressed savings size on top of the current solution. The optimization is turned off by default.
8 daysFix LLVM_ENABLE_ABI_BREAKING_CHECKS macro check: use #if instead of #ifdef ↵Mehdi Amini1-1/+1
(#110883) This macros is always defined: either 0 or 1. The correct pattern is to use #if. Reapply https://github.com/llvm/llvm-project/pull/110185 with fixes.
8 daysDAG: Preserve disjoint flag when emitting final instructions (#110795)Matt Arsenault1-0/+3
8 days[CodeGen][RAGreedy] Inform LiveDebugVariables about snippets spilled by ↵Bevin Hansson2-2/+18
InlineSpiller. (#109962) RAGreedy invokes InlineSpiller to spill a particular virtreg inline. When the spiller does this, it also identifies small, adjacent liveranges called snippets. These are also spilled or rematerialized in the process. However, the spiller does not inform RA that it has spilled these regs. This means that debug variable locations referencing these regs/ranges are lost. Mark any spilled regs which do not have a stack slot assigned to them as allocated to the slot being spilled to to tell LDV that those regs are located in that slot, even though the regs might no longer exist in the program after regalloc is finished. Also, inform RA about all of the regs which were replaced (spilled or rematted), not just the one that was requested so that it can properly manage the ranges of the debug vars.
9 days[RISCV][GISEL] Legalize G_EXTRACT_SUBVECTOR (#109426)Michael Maitland1-0/+61
This is heavily based on the SelectionDAG lowerEXTRACT_SUBVECTOR code.
9 days[RegisterPressure] NFC: Clean up RP handling for instructions with ↵Jeffrey Byrnes1-8/+2
overlapping Def/Use (#109875) The current RP handling for uses of an MI that overlap with defs is confusing and unnecessary. Moreover, the lane masks do not accurately model the liveness behavior of the subregs. This cleans things up a bit and more accurately models subreg lane liveness by sinking the use handling into subsent Uses loop. The effect of this PR is to replace A. `increaseRegPressure(Reg, LiveAfter, ~LiveAfter & LiveBefore)` with B. `increaseRegPressure(Reg, LiveAfter, LiveBefore)` Note that A (Defs loop) and B (Uses loop) have different definitions of LiveBefore A. `LiveBefore = (LiveAfter & ~DefLanes) | UseLanes` and B. `LiveBefore = LiveAfter | UseLanes` Also note, `increaseRegPressure` will exit if `PrevMask` (`LiveAfter` for both A/B) has any active lanes, thus these calls will only have an effect if `LiveAfter` is 0. A. NewMask = ~LiveAfter & ((LiveAfter & ~DefLanes) | UseLanes) => (1 & UseLanes) => UseLanes = (0 | UseLanes) => (LiveAfter | UseLanes) = NewMask B.
9 daysRevert 412d59f0a510a05c08ed45545943dfd2f901bc5d "[DAG] combineShiftToMULH - ↵Simon Pilgrim1-4/+2
handle zext nneg as sext" Reverting until I can investigate a miscompilation reported by @mstorsjo
10 days[NFC] Use initial-stack-allocations for more data structures (#110544)Jeremy Morse4-6/+8
This replaces some of the most frequent offenders of using a DenseMap that cause a malloc, where the typical element-count is small enough to fit in an initial stack allocation. Most of these are fairly obvious, one to highlight is the collectOffset method of GEP instructions: if there's a GEP, of course it's going to have at least one offset, but every time we've called collectOffset we end up calling malloc as well for the DenseMap in the MapVector.
10 days[GlobalISel] Import extract/insert subvector (#110287)Thorsten Schütt1-0/+103
Test: AArch64/GlobalISel/irtranslator-subvector.ll Reference: https://llvm.org/docs/LangRef.html#llvm-vector-extract-intrinsic https://llvm.org/docs/LangRef.html#llvm-vector-insert-intrinsic
10 days[LegalizeVectorOps] Enable ExpandFABS/COPYSIGN to use integer ops for fixed ↵Craig Topper1-6/+17
vectors in some cases. (#109232) Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and ARM.
10 days[NFC] Move intrinsic related functions to Intrinsic namespace (#110125)Rahul Joshi1-1/+1
Move static functions `Function::lookupIntrinsicID` and `Function::isTargetIntrinsic` to Intrinsic namespace.
10 days[MachineLICM] Avoid repeated hash lookups (NFC) (#110452)Kazu Hirata1-13/+9
10 days[DAG] combineShiftToMULH - handle zext nneg as sextSimon Pilgrim1-2/+4
Fixes poor codegen on AVX512 targets for a test case from #109790
10 daysFastISel: Fix incorrectly using getPointerTy (#110465)Matt Arsenault1-4/+4
This was using the default address space instead of the correct one. Fixes #56055
10 daysDAG: Handle vector legalization of minimumnum/maximumnum (#109779)Matt Arsenault2-0/+5
Follow the same patterns as the other min/max variants.
10 days[ReachingDefAnalysis] Turn MBBReachingDefsInfo into a proper class (NFC) ↵Kazu Hirata1-19/+21
(#110432) I'm trying to speed up the reaching def analysis by changing the underlying data structure. Turning MBBReachingDefsInfo into a proper class decouples the data structure and its users. This patch does not change the existing three-dimensional vector structure. --------- Co-authored-by: Nikita Popov <github@npopov.com>
11 days[LiveDebugValues] Simplify code with MapVector::insert_or_assign (NFC) (#110398)Kazu Hirata1-6/+2
Note that we must use insert_or_assign because operator[] would require DbgValue to have the default constructor.
12 days[LiveDebugValues] Avoid repeated hash lookups (NFC) (#110379)Kazu Hirata1-6/+4
12 days[AsmPrinter] Avoid repeated hash lookups (NFC) (#110376)Kazu Hirata1-4/+2
12 days[CodeGen] Avoid repeated hash lookups (NFC) (#110203)Kazu Hirata1-5/+3
13 daysFix issues with GlobalMerge on Mach-O. (#110046)James Y Knight1-6/+28
As a side-effect of PR #101222, GlobalMerge started making transforms which are unsafe on Mach-O platforms. Two issues, in particular, are fixed here: 1. We must never merge symbols in the `__cfstring` section, as the linker assumes each object in this section is only ever referenced directly, and that it can split the section as it likes. Previously, we avoided this problem because CFString literals are identified by private-linkage symbols. This patch adds a list of section-names with special behavior, to avoid merging under Mach-O. 2. When GlobalMerge code was originally written, it had to be careful about emitting symbol aliases, due to issues with Mach-O's subsection splitting in the linker with `-dead_strip` enabled. The underlying cause of this problem was fixed in 2016, via creation of the `.alt_entry` assembler directive, which allows a symbol to not also imply the start of a new subsection. GlobalMerge's workaround for that issue was never removed. In the meantime, Apple's new ld-prime linker was written, and has a bug in `.alt_entry` handling. Therefore, even though the original issue was fixed, we must _continue_ to be careful not to emit any such symbol aliases. The existing workaround avoided it for InternalLinkage symbols, but after the above-mentioned PR, we also must avoid emitting aliases for PrivateLinkage symbols. I will file an Apple bug-report about this issue, so that it can be fixed in a future version of ld-prime. But, in the meantime, the workaround is sufficient for GlobalMerge, unless `-global-merge-on-externals` is enabled (which it is already not by default, on MachO platforms, due to the original issue). Fixes #104625
13 days[BranchRelaxation] Remove quadratic behavior in relaxation pass (#96250)Daniel Hoekwater1-27/+50
Currently, we recompute block offsets after each relaxation. This causes the complexity to be O(n^2) in the number of instructions, inflating compile time. If we instead recompute block offsets after each iteration of the outer loop, the complexity is O(n). Recomputing offsets in the outer loop will cause some out-of-range branches to be missed in the inner loop, but they will be relaxed in the next iteration of the outer loop. This change may introduce unnecessary relaxations for an architecture where the relaxed branch is smaller than the unrelaxed branch, but AFAIK there is no such architecture.
13 days[AArch64][GlobalISel] Lower fp16 abs and neg without fullfp16. (#110096)David Green1-6/+2
This changes the existing promote logic to lower, so that it can use normal integer operations. A minor change was needed to fneg lower code to handle vectors.
13 days[DWARF] Don't emit DWARF5 symbols for DWARF2/3 + non-lldb (#110120)sinan1-1/+1
Modify other legacy dwarf versions to align with the dwarf4 handling approach when determining whether to generate DWARF5 or GNU extensions.