aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen
AgeCommit message (Collapse)AuthorFilesLines
8 hours[TargetLowering] Remove NoSignedZerosFPMath uses (#160975)paperchalice1-7/+5
Remove NoSignedZerosFPMath in TargetLowering part, users should always use instruction level fast math flags.
25 hours[CodeGen] Get rid of incorrect `std` template specializations (#160804)A. Jiang1-4/+5
This patch renames comparators - from `std::equal_to<llvm::rdf::RegisterRef>` to `llvm::rdf::RegisterRefEqualTo`, and - from `std::less<llvm::rdf::RegisterRef>` to `llvm::rdf::RegisterRefLess`. The original specializations don't satisfy the requirements for the original `std` templates by being stateful and non-default-constructible, so they make the program have UB due to C++17 [namespace.std]/2, C++20/23 [namespace.std]/5. > A program may explicitly instantiate a class template defined in the standard library only if the declaration > - depends on the name of at least one program-defined type, and > - the instantiation meets the standard library requirements for the original template.
37 hours[SDAG] Constant fold frexp in signed way (#161015)Hongyu Chen1-2/+2
Fixes #160981 The exponential part of a floating-point number is signed. This patch prevents treating it as unsigned.
3 daysPeepholeOpt: Use initializer list (#160898)Matt Arsenault1-2/+1
3 daysGreedy: Make trySplitAroundHintReg try to match hints with subreg copies ↵Matt Arsenault1-12/+28
(#160294) This is essentially the same patch as 116ca9522e89f1e4e02676b5bbe505e80c4d4933; when trying to match a physreg hint, try to find a compatible physreg if there is a subregister copy. This has the slight difference of using getSubReg on the hint instead of getMatchingSuperReg (the other use should also use getSubReg instead, it's faster). At the moment this turns out to have very little effect. The adjacent code needs better handling of subregisters, so continue adding this piecemeal. The X86 test shows a net reduction in real instructions, plus a few new kills.
3 daysRevert "[RegAlloc] Strengthen asserts in LiveRangeEdit::scanRemattable ↵Philip Reames1-3/+3
[nfc]" (#160897) Reverts llvm/llvm-project#160765. Failures on buildbot indicate second assertion does not in fact hold.
3 days[RegAlloc] Add printer and dump for VNInfo [nfc] (#160758)Philip Reames1-12/+18
Uses the existing format of the LiveRange printer, and just factors it out so that you can do vni->dump() when debugging, or log a vni in a debug print statement.
3 days[RegAlloc] Strengthen asserts in LiveRangeEdit::scanRemattable [nfc] (#160765)Philip Reames1-3/+3
We should always be able to find the VNInfo in the original live interval which corresponds to the subset we're trying to spill, and the only cases where we have a VNInfo without a definition instruction are if the vni is unused, or corresponds to a phi. Adjust the code structure to explicitly check for PHIDef, and assert the stronger conditions.
3 days[RegAlloc] Add additional tracing in InlineSpiller::rematerializeFor (#160761)Philip Reames1-2/+11
We didn't have trace logging for two cases in this routine which makes it sometimes hard to tell what is going on. In addition to debug trace statements, add comments to explain the logic behind the early exits which don't mark the virtual register live. Suggestions on how to word these more precisely very welcome; I'm not clear I understand all the intrinicies of this code myself.
3 days[CodeGen] Adjust global-split remat heuristic to match LICM (#160709)Philip Reames1-1/+2
This heuristic was originally added in 40c4aa with the stated purpose of avoiding global split on live long ranges created by MachineLICM hoisting trivially rematerializable instructions. In the meantime, various backends have introduced non-trivial rematerialization cases, MachineLICM gained an explicitly triviality check, and we've reworked our APIs to match naming wise. Let's move this heuristic back to truely trivial remat only. This is a functional change, though somewhat hard to hit. This change will cause non-trivially rematerializable instructions to be globally split more often. This is likely a good thing since non-trivial remat may not be legal at all possible points in the live interval, but may cost slightly more compile time. I don't have a motivating example; I found it when reviewing the callers of isRemMaterializable(MI).
3 days[SelectionDAG] Improve v2f16 maximumnum expansion (#160723)Lewis Crawford1-1/+3
On targets where f32 maximumnum is legal, but maximumnum on vectors of smaller types is not legal (e.g. v2f16), try unrolling the vector first as part of the expansion. Only fall back to expanding the full maximumnum computation into compares + selects if maximumnum on the scalar element type cannot be supported.
3 days[CodeGen] Ignore requiresStructuredCFG check in canSplitCriticalEdge if ↵Wenju He1-4/+13
successor is loop header (#154063) This addresses a performance issue for our downstream GPU target that sets requiresStructuredCFG to true. The issue is that EarlyMachineLICM pass does not hoist loop invariants because a critical edge is not split. The critical edge's destination a loop header. Splitting the critical edge will not break structured CFG. Add a nvptx test to demonstrate the issue since the target also requires structured CFG. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
3 days[RegisterCoalescer] Mark implicit-defs of super-registers as dead in remat ↵Benjamin Maxwell1-12/+24
(#159110) Currently, something like: ``` $eax = MOV32ri -11, implicit-def $rax %al = COPY $eax ``` Can be rematerialized as: ``` dead $eax = MOV32ri -11, implicit-def $rax ``` Which marks the full $rax as used, not just $al. With this change, this is rematerialized as: ``` dead $eax = MOV32ri -11, implicit-def dead $rax, implicit-def $al ``` To indicate that only $al is used. Note: This issue is latent right now, but is exposed when #134408 is applied, as it results in the register pressure being incorrectly calculated (unless this patch is applied too). I think this change is in line with past fixes in this area, notably: https://github.com/llvm/llvm-project/commit/059cead5ed7aa11ce1eae0bcc751ea0d1e23ea75 https://github.com/llvm/llvm-project/commit/69cd121dd9945429b565b6a5eb8719130de880a7
3 days[MachineSink] Remove subrange of live-ins from super register as well. (#159145)Pete Chou2-4/+22
Post-RA machine sinking could sink a copy of sub-register into a successor. However, the sub-register might not be removed from the live-in bitmask of its super register in successor and then a later pass, e.g, if-converter, may add an implicit use of the register from live-in resulting in an use of an undefined register. This change makes sure subrange of live-ins from super register could be removed as well.
3 days[DAGCombiner] Remove `NoSignedZerosFPMath` uses in `visitFADD` (#160635)paperchalice1-7/+5
Remove these global flags and use node level flags instead.
4 days[RegAlloc] Account for use availability when applying rematerializable ↵Luke Lau4-65/+81
weight discount (#159180) This aims to fix the issue that caused https://reviews.llvm.org/D106408 to be reverted. CalcSpillWeights will reduce the weight of an interval by half if it's considered rematerializable, so it will be evicted before others. It does this by checking TII.isTriviallyReMaterializable. However rematerialization may still fail if any of the defining MI's uses aren't available at the locations it needs to be rematerialized. LiveRangeEdit::canRematerializeAt calls allUsesAvailableAt to check this but CalcSpillWeights doesn't, so the two diverge. This fixes it by also checking allUsesAvailableAt in CalcSpillWeights. In practice this has zero change AArch64/X86-64/RISC-V as measured on llvm-test-suite, but prevents weights from being perturbed in an upcoming patch which enables more rematerialization by re-attempting https://reviews.llvm.org/D106408
4 days[CodeGenPrepare] Bail out of usubo creation if sub's parent is not the same ↵AZero131-0/+6
as the comparison (#160358) We match uadd's behavior here. Codegen comparison: https://godbolt.org/z/x8j4EhGno
4 days[X86][GlobalISel] Added support for llvm.set.rounding (#156591)JaydeepChauhan141-0/+3
- This implementation is adapted from **SDAG X86TargetLowering::LowerSET_ROUNDING**.
4 daysGlobalISel: Adjust insert point when expanding G_[SU]DIVREMMatt Arsenault1-0/+2
(#160683) The insert point management is messy here. We probably should have an insert point guard, and not have ths dest operand utilities modify the insert point. Fixes #159716
4 days[MachineStripDebug] Remove debug instructions from inside bundles (#160297)Jay Foad1-1/+1
Some passes, like AMDGPU's SIInsertHardClauses, wrap sequences of instructions into bundles, and these bundles may end up with debug instructions in the middle. Assuming that this is allowed, this patch fixes MachineStripDebug to be able to remove these instructions from inside a bundle.
4 days[CodeGen] Extract copy-paste on PHI MachineInstr income removal. (#158634)Afanasyev Ivan5-35/+27
5 days[TII] Split isTrivialReMaterializable into two versions [nfc] (#160377)Philip Reames7-49/+23
This change builds on https://github.com/llvm/llvm-project/pull/160319 which tries to clarify which *callers* (not backends) assume that the result is actually trivial. This change itself should be NFC. Essentially, I'm just renaming the existing isTrivialRematerializable to the non-trivial version and then adding a new trivial version (with the same name as the prior function) and simplifying a few callers which want that semantic. This change does *not* enable non-trivial remat any more broadly than was already done for our targets which were lying through the old APIs; that will come separately. The goal here is simply to make the code easier to follow in terms of what assumptions are being made where. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>
5 days[TargetLowering][ExpandABD] Prefer selects over usubo if we do the same for ↵AZero132-7/+10
ucmp (#159889) Same deal we use for determining ucmp vs scmp. Using selects on platforms that like selects is better than using usubo. Rename function to be more general fitting this new description.
5 days[Propeller] Read the CFG profile from the propeller directive. (#160422)Rahman Lavaee1-0/+53
The CFG allows us to do layout optimization in the compiler. Furthermore, it allows further branch optimization.
5 days[Debug][AArch64] Do not crash on unknown subreg register sizes. (#160442)David Green1-1/+1
The AArch64 zsub regs are scalable, so defined with a size of -1 (which comes through as 65535). The RegisterSize is only 128, so code to try and find overlapping regs of a z30_z31 in DwarfEmitter can crash on trying to access out of range bits in a BitVector. Hexagon and x86 also contain subregs with unknown sizes. Ideally most of these would be scalable values but in the meantime add a check that the register are small enough to overlap with the current register size, to prevent us from crashing. This fixes the issue reported on #153810.
5 days[MachineScheduler] Turn SU->isScheduled check into an assert in pickNode() ↵Jonas Paulsson1-61/+59
(#160145) It is unnecessary and confusing to have a do/while loop that checks SU->isScheduled as this should never be true. ScheduleDAGMI::updateQueues() is always called after pickNode() and it sets isScheduled on the SU. Turn this into an assertion instead.
6 daysRevert "Speculative buildbot fix after ca2e8f"Philip Reames1-5/+2
This reverts commit bd2dac98ed4f19dcf90c098ae0b9976604880b59, and part of ca2e8fc928ad103f46ca9f827e147c43db3a5c47. My speculative attempt at fixing buildbot failed, so just roll back the relavant part of the change.
6 days[CodeGen] Rename isReallyTriviallyReMaterializable [nfc]Philip Reames1-1/+1
.. to isReMaterializableImpl. The "Really" naming has always been awkward, and we're working towards removing the "Trivial" part now, so go ehead and remove both pieces in a single rename. Note that this doesn't change any aspect of the current implementation; we still "mostly" only return instructions which are trivial (meaning no virtual register uses), but some targets do lie about that today.
6 daysUpdate callers of isTriviallyReMaterializable to check trivialness (#160319)Philip Reames2-3/+10
This is a preparatory change for an upcoming reorganization of our rematerialization APIs. Despite the interface being documented as "trivial" (meaning no virtual register uses on the instruction being considered for remat), our actual implementation inconsistently supports non-trivial remat, and certain backends (AMDGPU and RISC-V mostly) lie about instructions being trivial to abuse that. We want to allow non-triial remat more broadly, but first we need to do some cleanup to make it understandable what's going on. These three call sites are ones which appear to actually want the trivial definition, and appear fairly low risk to change. p.s. I'm deliberately *not* updating any APIs in this change, I'm going to do that as a followup once it's clear which category each callsite fits in.
6 daysRevert "[DebugInfo][DwarfDebug] Separate creation and population of abstract ↵Vladislav Dzhidzhoev11-168/+64
subprogram DIEs" (#160349) Reverts llvm/llvm-project#159104 due to the issues reported in https://github.com/llvm/llvm-project/issues/160197.
6 days[MIR] Support save/restore points with independent sets of registers (#119358)Elizaveta Noskova5-30/+79
This patch adds the MIR parsing and serialization support for save and restore points with subsets of callee saved registers. That is, it syntactically allows a function to contain two or more distinct sub-regions in which distinct subsets of registers are spilled/filled as callee save. This is useful if e.g. one of the CSRs isn't modified in one of the sub-regions, but is in the other(s). Support for actually using this capability in code generation is still forthcoming. This patch is the next logical step for multiple save/restore points support. All points are now stored in DenseMap from MBB to vector of CalleeSavedInfo. Shrink-Wrap points split Part 4. RFC: https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581 Part 1: https://github.com/llvm/llvm-project/pull/117862 (landed) Part 2: https://github.com/llvm/llvm-project/pull/119355 (landed) Part 3: https://github.com/llvm/llvm-project/pull/119357 (landed) Part 5: https://github.com/llvm/llvm-project/pull/119359 (likely to be further split)
7 daysGreedy: Make eviction broken hint cost use CopyCost units (#160084)Matt Arsenault1-3/+5
Change the eviction advisor heuristic cost based on number of broken hints to work in units of copy cost, rather than a magic number 1. The intent is to allow breaking hints for cheap subregisters in favor of more expensive register tuples. The llvm.amdgcn.image.dim.gfx90a.ll change shows a simple example of the case I am attempting to solve. Use of tuples in ABI contexts ends up looking like this: %argN = COPY $vgprN %tuple = inst %argN $vgpr0 = COPY %tuple.sub0 $vgpr1 = COPY %tuple.sub1 $vgpr2 = COPY %tuple.sub2 $vgpr3 = COPY %tuple.sub3 Since there are physreg copies in the input and output sequence, both have hints to a physreg. The wider tuple hint on the output should win though, since this satisfies 4 hints instead of 1. This is the obvious part of a larger change to better handle subregister interference with register tuples, and is not sufficient to handle the original case I am looking at. There are several bugs here that are proving tricky to untangle. In particular, there is a double counting bug for all registers with multiple regunits; the cost of breaking the interfering hint is added for each interfering virtual register, which have repeat visits across regunits. Fixing the double counting badly regresses a number of RISCV tests, which seem to rely on overestimating the cost in tryFindEvictionCandidate to avoid early-exiting the eviction candidate loop (RISCV is possibly underestimating the copy costs for vector registers).
7 days[llvm][AsmPrinter] Add direct calls to callgraph section (#155706)Prabhu Rajasekaran1-16/+44
Extend CallGraphSection to include metadata about direct calls. This simplifies the design of tools that must parse .callgraph section to not require dependency on MC layer.
7 days[Remarks] Restructure bitstream remarks to be fully standalone (#156715)Tobias Stadler1-10/+9
Currently there are two serialization modes for bitstream Remarks: standalone and separate. The separate mode splits remark metadata (e.g. the string table) from actual remark data. The metadata is written into the object file by the AsmPrinter, while the remark data is stored in a separate remarks file. This means we can't use bitstream remarks with tools like opt that don't generate an object file. Also, it is confusing to post-process bitstream remarks files, because only the standalone files can be read by llvm-remarkutil. We always need to use dsymutil to convert the separate files to standalone files, which only works for MachO. It is not possible for clang/opt to directly emit bitstream remark files in standalone mode, because the string table can only be serialized after all remarks were emitted. Therefore, this change completely removes the separate serialization mode. Instead, the remark string table is now always written to the end of the remarks file. This requires us to tell the serializer when to finalize remark serialization. This automatically happens when the serializer goes out of scope. However, often the remark file goes out of scope before the serializer is destroyed. To diagnose this, I have added an assert to alert users that they need to explicitly call finalizeLLVMOptimizationRemarks. This change paves the way for further improvements to the remark infrastructure, including more tooling (e.g. #159784), size optimizations for bitstream remarks, and more. Pull Request: https://github.com/llvm/llvm-project/pull/156715
7 days[SPIRV] Add support for the SPIR-V extension SPV_KHR_bfloat16 (#155645)YixingZhang0071-1/+1
This PR introduces the support for the SPIR-V extension `SPV_KHR_bfloat16`. This extension extends the `OpTypeFloat` instruction to enable the use of bfloat16 types with cooperative matrices and dot products. TODO: Per the `SPV_KHR_bfloat16` extension, there are a limited number of instructions that can use the bfloat16 type. For example, arithmetic instructions like `FAdd` or `FMul` can't operate on `bfloat16` values. Therefore, a future patch should be added to either emit an error or fall back to FP32 for arithmetic in cases where bfloat16 must not be used. Reference Specification: https://github.com/KhronosGroup/SPIRV-Registry/blob/main/extensions/KHR/SPV_KHR_bfloat16.asciidoc
7 daysRegalloc: Add operator >= to EvictionCost (#160070)Matt Arsenault1-1/+1
Make the actual use context less ugly.
7 daysGreedy: Simplify collectHintInfo using MachineOperands. NFCI. (#159724)Jay Foad1-13/+9
If a COPY uses Reg but only in an implicit operand then the new implementation ignores it but the old implementation would have treated it as a copy of Reg. Probably this case never occurs in practice. Other than that, this patch is NFC. Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
7 days[DAG] Add ISD::VECTOR_COMPRESS handling in ↵Kavin Gnanapandithan1-0/+22
computeKnownBits/ComputeNumSignBits (#159692) Resolves #158332
7 days[DAG] Fold rem(rem(A, BCst), Op1Cst) -> rem(A, Op1Cst) (#159517)kper1-0/+18
Fixes [157370](https://github.com/llvm/llvm-project/issues/157370) UREM General proof: https://alive2.llvm.org/ce/z/b_GQJX SREM General proof: https://alive2.llvm.org/ce/z/Whkaxh I have added it as rv32i and rv64i tests because they are the only architectures where I could verify that it works.
7 days[CodeGen] Use MCRegister::id() to avoid implicit conversions to unsigned. ↵Craig Topper3-28/+30
NFC (#159965)
8 days[DAG] Skip `mstore` combine for `<1 x ty>` vectors (#159915)Abhishek Kaushik1-0/+6
Fixes #159912
8 days[GlobalISel] Add G_ABS computeKnownBits (#154413)Pragyansh Chaturvedi1-0/+8
The code is taken from `SelectionDAG::computeKnownBits`. This ticks off ABS from #150515
10 days[CodeGen] Untangle RegisterCoalescer from LRE's ScannedRemattable flag [nfc[ ↵Philip Reames2-17/+9
(#159839) LiveRangeEdit's rematerialization checking logic is used in two quite different ways. For SplitKit and InlineSpiller, we're analyzing all defs associated with a live interval, doing that analysis up front, and then using the result a bit later. The RegisterCoalescer, we're analysing exactly one ValNo at a time, and using the legality result immediately. LRE had a checkRematerializable which existed basically to adapt the later into the former usage model. Instead, this change bypasses the ScannedRemat and Remattable structures, and directly queries the underlying routines. This is easy to read, and makes it more clear as to which uses actually need the deferred analysis. (A following change may try to unwind that too, but it's not strictly NFC.)
10 days[CodeGenPrepare] Consider target memory intrinics as memory use (#159638)Jeffrey Byrnes1-0/+13
When deciding to sink address instructions into their uses, we check if it is profitable to do so. The profitability check is based on the types of uses of this address instruction -- if there are users which are not memory instructions, then do not fold. However, this profitability check wasn't considering target intrinsics, which may be loads / stores. This adds some logic to handle target memory intrinsics.
10 days[KnownBits] Add setAllConflict to set all bits in Zero and One. NFC (#159815)Craig Topper2-16/+10
This is a common pattern to initialize Knownbits that occurs before loops that call intersectWith.
10 days[WebAssembly] Require tags for Wasm EH and Wasm SJLJ to be defined ↵Sam Clegg2-24/+1
externally (#159143) Rather then defining these tags in each object file that requires them we can can declare them as undefined and require that they defined externally in, for example, compiler-rt or libcxxabi.
10 days[PowerPC] using milicode call for strlen instead of lib call (#153600)zhijian lin2-3/+29
AIX has "millicode" routines, which are functions loaded at boot time into fixed addresses in kernel memory. This allows them to be customized for the processor. The __strlen routine is a millicode implementation; we use millicode for the strlen function instead of a library call to improve performance.
10 days[CodeGen][NewPM] Port `ReachingDefAnalysis` to new pass manager. (#159572)Mikhail Gudim4-99/+144
In this commit: (1) Added new pass manager support for `ReachingDefAnalysis`. (2) Added printer pass. (3) Make old pass manager use `ReachingDefInfoWrapperPass`
10 daysCodeGen: Add RegisterClass by HwMode (#158269)Matt Arsenault1-2/+5
This is a generalization of the LookupPtrRegClass mechanism. AMDGPU has several use cases for swapping the register class of instruction operands based on the subtarget, but none of them really fit into the box of being pointer-like. The current system requires manual management of an arbitrary integer ID. For the AMDGPU use case, this would end up being around 40 new entries to manage. This just introduces the base infrastructure. I have ports of all the target specific usage of PointerLikeRegClass ready.
10 days[AMDGPU][SDAG] DAGCombine PTRADD -> disjoint OR (#146075)Fabian Ritter1-0/+13
If we can't fold a PTRADD's offset into its users, lowering them to disjoint ORs is preferable: Often, a 32-bit OR instruction suffices where we'd otherwise use a pair of 32-bit additions with carry. This needs to be a DAGCombine (and not a selection rule) because its main purpose is to enable subsequent DAGCombines for bitwise operations. We don't want to just turn PTRADDs into disjoint ORs whenever that's sound because this transform loses the information that the operation implements pointer arithmetic, which AMDGPU for instance needs when folding constant offsets. For SWDEV-516125.