aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen
AgeCommit message (Collapse)AuthorFilesLines
3 hours[X86] Correct 32-bit immediate assertion and fix 64-bit lowering for huge ↵Wesley Wiser1-1/+1
frame offsets (#123872) The assertion previously did not work correctly because the operand was being truncated to an `int` prior to comparison. Change the assertion into a a reported error as suggested in https://github.com/llvm/llvm-project/pull/101840#issuecomment-2304992425 by @arsenm Finally, fix the lowering on 64-bit targets so that offsets larger than 32-bit are correctly addressed and add tests for various reported issues.
14 hours[DAG] Always use stack to promote bitcast when the source is vector (#151065)Min-Yih Hsu1-2/+3
The optimization introduced by #125637 tried to avoid using stacks to promote bitcast with vector result type. However, it wouldn't be correct if the input type is vector. This patch limits that optimizations to only scalar to vector bitcasts.
19 hours[TargetLowering] Use getShiftAmountConstant in buildSDIVPow2WithCMov.Craig Topper1-2/+2
39 hours[SelectionDAG] Move sign pattern check from AArch64 and ARM to general ↵AZero131-4/+18
SelectionDAG (#151736) This works on all cases much like the XOR case above it in SelectionDAG.
2 days[LLVM][DAGCombiner] fold (shl (X * vscale(C0)), C1) -> (X * vscale(C0 << ↵Paul Walker1-0/+13
C1)). (#150651)
2 daysAdd m_SelectCCLike matcher to match SELECT_CC or SELECT with SETCC (#149646)黃國庭1-12/+11
Fix #147282 and Follow-up to #148834 --------- Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
2 days[DAGCombiner] Add combine for vector interleave of splats (#151110)David Sherwood1-0/+48
This patch adds two DAG combines: 1. vector_interleave(splat, splat, ...) -> {splat,splat,...} 2. concat_vectors(splat, splat, ...) -> wide_splat where all the input splats are identical. Both of these together enable us to fold concat_vectors(vector_interleave(splat, splat, ...)) into a wide splat. Post-legalisation we must only do the concat_vector combine if the wider type and splat operation is legal. For fixed-width vectors the DAG combine only occurs for interleave factors of 3 or more, however it's not currently safe to test this for AArch64 since there isn't any lowering support for fixed-width interleaves. I've only added fixed-width tests for RISCV.
2 days[MachineScheduler] Make cluster check more efficient (#150884)Ruiling, Song1-26/+40
2 days[RegAlloc] Fix use-after-free in `RegAllocBase::cleanupFailedVReg` (#151435)Shilei Tian1-3/+1
#128400 introduced a use-after-free bug in `RegAllocBase::cleanupFailedVReg` when removing intervals from regunits. The issue is from the `InterferenceCache` in `RAGreedy`, which holds `LiveRange*`. The current `InterferenceCache` APIs make it difficult to update it, and there isn't a straightforward way to do that. Since #128400 already mentions it's not clear about the necessity of removing intervals from regunits, this PR avoids the issue by simply skipping that step. Fixes SWDEV-527146.
3 days[llvm][AsmPrinter] Emit call graph sectionPrabhu Rajasekaran1-0/+108
Collect the necessary information for constructing the call graph section, and emit to .callgraph section of the binary. MD5 hash of the callee_type metadata string is used as the numerical type id emitted. Reviewers: ilovepi Reviewed By: ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/87576
3 days[SelectionDAG] Improve the doxygen description for SDValue::isOperandOf. NFC ↵Craig Topper1-1/+1
(#151244) SDValue::isOperandOf checks the result number in addition to the SDNode*. SDNode::isOperandOf only checks the SDNode*.
3 days[TailDup] Delay aggressive computed-goto taildup to after RegAlloc. (#150911)Florian Hahn1-6/+10
https://github.com/llvm/llvm-project/pull/114990 allowed more aggressive tail duplication for computed-gotos in both pre- and post-regalloc tail duplication. In some cases, performing tail-duplication too early can lead to worse results, especially if we duplicate blocks with a number of phi nodes. This is causing a ~3% performance regression in some workloads using Python 3.12. This patch updates TailDup to delay aggressive tail-duplication for computed gotos to after register allocation. This means we can keep the non-duplicated version for a bit longer throughout the backend, which should reduce compile-time as well as allowing a number of optimizations and simplifications to trigger before drastically expanding the CFG. For the case in https://github.com/llvm/llvm-project/issues/106846, I get the same performance with and without this patch on Skylake. PR: https://github.com/llvm/llvm-project/pull/150911
3 daysMachineInstrBuilder: Introduce copyMIMetadata() function.Peter Collingbourne1-1/+1
This reduces the amount of boilerplate required when adding a new field to MIMetadata and reduces the chance of bugs like the one I fixed in TargetInstrInfo::reassociateOps. Reviewers: arsenm, nikic Reviewed By: nikic Pull Request: https://github.com/llvm/llvm-project/pull/133535
3 days[MachineBB] Make sure there are successors in terminatorIsComputedGoto. ↵Florian Hahn1-1/+1
(#151342) Currently terminatorIsComputedGoto will return for blocks with a indirect branch terminator and no successor. If there are no successor, the terminator is likely not a computed goto, return false in that case. Note that this is currently NFC, as the only use checks it only if there are successors, but it will be needed in https://github.com/llvm/llvm-project/pull/150911. PR: https://github.com/llvm/llvm-project/pull/151342
3 days[MachineFunction] Move CallSiteInfo constructor out of header (#151520)Prabhu Rajasekaran1-0/+20
3 days[X86][APX] Do optimizeMemoryInst for v1X masked load/store (#151331)Phoebe Wang1-0/+23
Fix redundant LEA: https://godbolt.org/z/34xEYE818
4 days[llvm] Extract and propagate callee_type metadataPrabhu Rajasekaran1-1/+2
Update MachineFunction::CallSiteInfo to extract numeric CalleeTypeIds from callee_type metadata attached to indirect call instructions. Reviewers: nikic, ilovepi Reviewed By: ilovepi Pull Request: https://github.com/llvm/llvm-project/pull/87575
4 days[CodeGen] Remove an unnecessary cast (NFC) (#151280)Kazu Hirata1-1/+1
LoopValStage is already of int.
4 daysReland "RegisterCoalescer: Add implicit-def of super register when ↵Sander de Smalen1-15/+164
coalescing SUBREG_TO_REG" (#134408) This tries to reland #123632 (previously reverted by commit 6b1db79887df19bc8e8c946108966aa6021c8b87) This PR aims to fix coalescing of SUBREG_TO_REG when sub-register liveness tracking is enabled and this is now the so-manieth reincarnation of this effort :) This change is needed in order to enable subreg liveness tracking for AArch64, because without the implicit-def, Machine Copy Propagation would remove a 'redundant' copy because it doesn't realise that the top 32-bits of the register are zeroed, which subsequent instructions rely on. Changes compared to previous PR: * Rather than updating all instructions that define the source register (SrcReg) of the SUBREG_TO_REG, this new approach only updates instructions that define SrcReg when they dominate the SUBREG_TO_REG. The live-ranges are updated accordingly.
4 days[GISel] Introduce MIFlags::InBounds (#150900)Fabian Ritter6-3/+19
This flag applies to G_PTR_ADD instructions and indicates that the operation implements an inbounds getelementptr operation, i.e., the pointer operand is in bounds wrt. the allocated object it is based on, and the arithmetic does not change that. It is set when the IRTranslator lowers inbounds GEPs (currently only in some cases, to be extended with a future PR), and in the (build|materialize)ObjectPtrOffset functions. Inbounds information is useful in ISel when we have instructions that perform address computations whose intermediate steps must be in the same memory region as the final result. A follow-up patch will start using it for AMDGPU's flat memory instructions, where the immediate offset must not affect the memory aperture of the address. This is analogous to a concurrent effort in SDAG: #131862 (related: #140017, #141725). For SWDEV-516125.
4 days[LLVM][SelectionDAG] Align poison/undef binop folds with IR. (#149334)Paul Walker1-20/+61
The "at construction" binop folds in SelectionDAG::getNode() has different behaviour when compared to the equivalent LLVM IR. This PR makes the behaviour consistent while also extending the coverage to include signed/unsigned max/min operations.
4 days[DAG] Fold (setcc ((x | x >> c0 | ...) & mask)) sequences (#146054)Pierre van Houtryve1-1/+88
Fold sequences where we extract a bunch of contiguous bits from a value, merge them into the low bit and then check if the low bits are zero or not. Usually the and would be on the outside (the leaves) of the expression, but the DAG canonicalizes it to a single `and` at the root of the expression. The reason I put this in DAGCombiner instead of the target combiner is because this is a generic, valid transform that's also fairly niche, so there isn't much risk of a combine loop I think. See #136727
4 days[TargetLowering] Use getShiftAmountConstant in CTTZTableLookup. NFCCraig Topper1-1/+1
5 days[ELF][AsmPrinter] Emit trailing dot for constant pool section when it has a ↵Mingming Liu1-7/+7
hotness prefix (#150859) Currently, `TargetLoweringObjectFileELF::getSectionForConstant` produce `.<section>.hot` or `.<section>.unlikely` for a constant with non-empty section prefix. This PR changes the implementation add trailing dot when section prefix is not empty, to disambiguate `.hot` as a hotness prefix from `.hot` as a (pure C) variable name. Relevant discussions are in https://github.com/llvm/llvm-project/pull/148985#discussion_r2221141273 and https://github.com/llvm/llvm-project/pull/148985#discussion_r2233382641 and
5 days[LLVM][Cygwin] Enable conditions that are shared with MinGW (#149638)jeremyd20191-1/+1
Cygwin and MinGW share the auto import behavior that could result in __stack_check_guard being non-dso-local. Allow windres to assume a Cygwin target as well as a MinGW one, so defines like _WIN32 would not be present on Cygwin.
5 days[DAG] Remove AssertZext if the input is masked (#146052)Pierre van Houtryve1-13/+21
Remove AssertZext if the input ensures the assert cannot fail.
5 days[GISel] Introduce MachineIRBuilder::(build|materialize)ObjectPtrOffset (#150392)Fabian Ritter4-16/+31
These functions are for building G_PTR_ADDs when we know that the base pointer and the result are both valid pointers into (or just after) the same object. They are similar to SelectionDAG::getObjectPtrOffset. This PR also changes call sites of the generic (build|materialize)PtrAdd functions that implement pointer arithmetic to split large memory accesses to the new functions. Since memory accesses have to fit into an object in memory, pointer arithmetic to an offset into a large memory access also yields an address in that object. Currently, these (build|materialize)ObjectPtrOffset functions only add "nuw" to the generated G_PTR_ADD, but I intend to introduce an "inbounds" MIFlag in a later PR (analogous to a concurrent effort in SDAG: #131862, related: #140017, #141725) that will also be set in the (build|materialize)ObjectPtrOffset functions. Most test changes just add "nuw" to G_PTR_ADDs. Exceptions are AMDGPU's call-outgoing-stack-args.ll, flat-scratch.ll, and freeze.ll tests, where offsets are now folded into scratch instructions, and cases where the behavior of the check regeneration script changed, resulting, e.g., in better checks for "nusw G_PTR_ADD" instructions, matched empty lines, and the use of "CHECK-NEXT" in MIPS tests. For SWDEV-516125.
5 daysFix build warnings after 6fbc397964340ebc9cb04a094fd04bef9a53abc3 (#151100)David Sherwood1-7/+0
5 days[BranchFolding] Follow up #149999 crash fixOrlando Cazalet-Hyams1-2/+3
fbf6271c7da20356d7b34583b3711b4126ca1dbb introduced an assertion failure as setDebugValueUndef was called on DBG_LABELs, which isn't allowed and doesn't make sense. Fix by skipping the call for DBG_LABELs and hoisting, in line with the original behaviour.
5 days[IR][SDAG] Remove lifetime size handling from SDAG (#150944)Nikita Popov4-19/+14
Split out from https://github.com/llvm/llvm-project/pull/150248: Specify that the argument of lifetime.start/lifetime.end is ignored and will be removed in the future. Remove lifetime size handling from SDAG. The size was previously discarded during isel, so was always ignored for stack coloring anyway. Where necessary, obtain the size of the full frame index.
5 days[IR] Add new CreateVectorInterleave interface (#150931)David Sherwood1-10/+7
This PR adds a new interface to IRBuilder called CreateVectorInterleave, which can be used to create vector.interleave intrinsics of factors 2-8. For convenience I have also moved getInterleaveIntrinsicID and getDeinterleaveIntrinsicID from VectorUtils.cpp to Intrinsics.cpp where it can be used by IRBuilder.
5 days[GlobalISel] Remove `UnsafeFPMath` references (#146319)paperchalice2-5/+3
This is the GlobalISel part to remove `UnsafeFPMath` flag in CodeGen pipeline.
5 days[AMDGPU] Add NoaliasAddrSpace to AAMDnodes (#149247)Shoreshen4-1/+12
This is the following PR of https://github.com/llvm/llvm-project/pull/136553 which calculate NoaliasAddrSpace. This PR carries the info calculated into MIR by adding it into AAMDnodes
6 days[SelectionDAG] Remove `UnsafeFPMath` in LegalizeDAG (#146316)paperchalice2-2/+6
These global flags hinder further improvements like [[RFC] Honor pragmas with -ffp-contract=fast](https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast) and pass concurrency support. Remove them incrementally.
6 daysHot-patch __ref_* variables should be placed in .rdata, not .data (#151008)sivadeilra1-0/+13
This is a refinment of #145565 . That PR added support for "Windows Secure Hot-patching". In this design, functions that are compiled for hot-patching need to be modified when they access mutable global variables. The modification is to insert a level of indirection, the so-called `__ref_*` variables. Ref variables are supposed to be inserted into the `.rdata` section, not `.data`. This provides a degree of protection against modification (accidental or malicious) of ref variables during program execution. When the Windows hot-patch subsystem loads a module as a hot-patch, it finds all ref variables and changes the page protections for the pages containing them to read/write. Then it sets the ref variables to point to the real variable locations within the base image. Then it changes page protections back to read-only. This relies on the variables being placed in the `.rdata` section, not `.data`. However, it is still important that the LLVM `GlobalVariable` that is created for the ref variable be created with `isConstant = false`. This prevents LLVM from optimizing accesses to the `GlobalVariable`, i.e. assuming that the variable can never change and thus inlining its value into expressions that would ordinarily dereference it. That optimization would defeat the purpose of hot-patching, so `isConstant = false` is still the correct value for these ref variables.
6 daysReapply "[llvm] Add CalleeTypeIds field to CallSiteInfo" (#150335) (#150990)Prabhu Rajasekaran4-7/+28
This reverts commit 05e08cdb3e576cc0887d1507ebd2f756460c7db7. Adding the missing -mtriple flags in MIR/X86 test files which caused these tests to fail which was the reason for reverting the patch.
6 daysUse F.hasOptSize() instead of checking optsize directly (#147348)Ellis Hoag1-2/+1
6 daysReapply (2) [BranchFolding] Kill common hoisted debug instructions (#149999)Orlando Cazalet-Hyams1-7/+40
Reapply #140091. branch-folder hoists common instructions from TBB and FBB into their pred. Without this patch it achieves this by splicing the instructions from TBB and deleting the common ones in FBB. That moves the debug locations and debug instructions from TBB into the pred without modification, which is not ideal. Debug locations are handled in #140063. This patch handles debug instructions - in the simplest way possible, which is to just kill (undef) them. We kill and hoist the ones in FBB as well as TBB because otherwise the fact there's an assignment on the code path is deleted (which might lead to a prior location extending further than it should). There's possibly something we could do to preserve some variable locations in some cases, but this is the easiest not-incorrect thing to do. Note I had to replace the constant DBG_VALUEs to use registers in the test- it turns out setDebugValueUndef doesn't undef constant DBG_VALUEs... which feels wrong to me, but isn't something I want to touch right now. --- Fix end-iterator-dereference and add test.
6 days[CodeGen] More consistently expand float ops by default (#150597)Nikita Popov1-17/+17
These float operations were expanded for scalar f32/f64/f128, but not for f16 and more problematically, not for vectors. A small subset of them was separately set to expand for vectors. Change these to always expand by default, and adjust targets to mark these as legal where necessary instead. This is a much safer default, and avoids unnecessary legalization failures because a target failed to manually mark them as expand. Fixes https://github.com/llvm/llvm-project/issues/110753. Fixes https://github.com/llvm/llvm-project/issues/121390.
6 days[COFF] Set .llvmbc and .llvmcmd to metadata section (#150879)Haohai Wen1-1/+2
Those are metadata sections for ELF but was not properly set for COFF.
7 days[AsmPrinter] Remove an unnecessary cast (NFC) (#150839)Kazu Hirata1-4/+3
getLabelAfterInsn() already returns MCSymbol *.
8 days[IA] Fix a bug introduced by a recent refactoringPhilip Reames1-0/+6
I had dropped the check for which intrinsics were supported. This is a quick fix to get tree back into an unbroken state, a cleaner change may follow.
8 daysMCSectionXCOFF: Remove classofFangrui Song2-7/+9
The object file format specific derived classes are used in context like MCStreamer and MCObjectTargetWriter where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSection::SectionVariant in the base class.
8 daysMCSectionCOFF: Remove classofFangrui Song3-8/+9
The object file format specific derived classes are used in context like MCStreamer and MCObjectTargetWriter where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSection::SectionVariant in the base class.
8 daysDAG: Emit an error if trying to legalize read/write register with illegal ↵Matt Arsenault2-0/+58
types (#145197) This is a starting point to have better legalization failure diagnostics
9 daysMCSectionELF: Remove classofFangrui Song1-1/+1
The object file format specific derived classes are used in context like MCStreamer and MCObjectTargetWriter where the type is statically known. We don't use isa/dyn_cast and we want to eliminate MCSection::SectionVariant in the base class.
9 days[CodeGenPrepare] Make sure that `AddOffset` is also a loop invariant (#150625)Yingwei Zheng1-0/+4
Closes https://github.com/llvm/llvm-project/issues/150611.
9 daysRevert "[BranchFolding] Kill common hoisted debug instructions" (#150632)Orlando Cazalet-Hyams1-44/+6
Reverts llvm/llvm-project#149999 https://lab.llvm.org/buildbot/#/builders/139/builds/17622
9 daysReapply [BranchFolding] Kill common hoisted debug instructions (#149999)Orlando Cazalet-Hyams1-6/+44
Reapply #140091. branch-folder hoists common instructions from TBB and FBB into their pred. Without this patch it achieves this by splicing the instructions from TBB and deleting the common ones in FBB. That moves the debug locations and debug instructions from TBB into the pred without modification, which is not ideal. Debug locations are handled in #140063. This patch handles debug instructions - in the simplest way possible, which is to just kill (undef) them. We kill and hoist the ones in FBB as well as TBB because otherwise the fact there's an assignment on the code path is deleted (which might lead to a prior location extending further than it should). There's possibly something we could do to preserve some variable locations in some cases, but this is the easiest not-incorrect thing to do. Note I had to replace the constant DBG_VALUEs to use registers in the test- it turns out setDebugValueUndef doesn't undef constant DBG_VALUEs... which feels wrong to me, but isn't something I want to touch right now.
9 days[IA] Recognize repeated masks which come from shuffle vectors (#150285)Philip Reames1-0/+21
This extends the fixed vector lowering to support the case where the mask is formed via shufflevector idiom. --------- Co-authored-by: Luke Lau <luke_lau@icloud.com>