aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen
AgeCommit message (Collapse)AuthorFilesLines
2021-08-19[AArch64][GlobalISel] Add G_VECREDUCE fewerElements support for full ↵Amara Emerson2-33/+94
scalarization. For some reductions like G_VECREDUCE_OR on AArch64, we need to scalarize completely if the source is <= 64b. This change adds support for that in the legalizer. If the source has a pow-2 num elements, then we can do a tree reduction using the scalar operation in the individual elements. Otherwise, we just create a sequential chain of operations. For AArch64, we only need to scalarize if the input is <64b. If it's great than 64b then we can first do a fewElements step to 64b, taking advantage of vector instructions until we reach the point of scalarization. I also had to relax the verifier checks for reductions because the intrinsics support <1 x EltTy> types, which we lower to scalars for GlobalISel. Differential Revision: https://reviews.llvm.org/D108276
2021-08-19Move function definition out-of-line to fix the modularized build (NFC)Adrian Prantl1-0/+11
2021-08-19Revert "[SelectionDAGBuilder] Compute and cache PreferredExtendType on demand."Craig Topper2-34/+29
This reverts commit add08c874147638e52d89eb07e40797dbc98d73b. There was a compile time jump on tramp3d-v4 on https://llvm-compile-time-tracker.com/ Want to see if it goes away with this reverted.
2021-08-19[ISel] Expand saddsat and ssubsat via asr and xorDavid Green2-8/+6
This changes the lowering of saddsat and ssubsat so that instead of using: r,o = saddo x, y c = setcc r < 0 s = c ? INTMAX : INTMIN ret o ? s : r into using asr and xor to materialize the INTMAX/INTMIN constants: r,o = saddo x, y s = ashr r, BW-1 x = xor s, INTMIN ret o ? x : r https://alive2.llvm.org/ce/z/TYufgD This seems to reduce the instruction count in most testcases across most architectures. X86 has some custom lowering added to compensate for cases where it can increase instruction count. Differential Revision: https://reviews.llvm.org/D105853
2021-08-19[SelectionDAGBuilder] Compute and cache PreferredExtendType on demand.Craig Topper2-29/+34
Previously we pre-calculated this and cached it for every instruction in the function. Most of the calculated results will never be used. So instead calculate it only on the first use, and then cache it. The cache was originally added to fix a compile time issue which caused r216066 to be reverted. This change exposed that we weren't pre-computing the Value for Arguments. I've explicitly disabled that for now as it seemed to regress some tests on AArch64 which has sext built into its compare instructions. Spotted while investigating how to improve heuristics to work better with RISCV preferring sign extend for unsigned compares for i32 on RV64. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107976
2021-08-19[TypePromotion] Use Instruction* instead of Value* for a couple functions. NFCCraig Topper1-13/+7
This matches how they are called and allows some isa/cast/dyn_cast to be removed. Differential Revision: https://reviews.llvm.org/D108333
2021-08-19[LegalizeTypes][VP] Add widening support for binary VP opsFraser Cormack2-4/+47
This patch adds the beginnings of more thorough support in the legalizers for vector-predicated (VP) operations. The first step is the ability to widen illegal vectors. The more complicated scenario in which the result/operands need widening but the mask doesn't has not been handled here. That would require a lot of code without an in-tree target on which to test it. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D107904
2021-08-18[SampleFDO] Flow Sensitive Sample FDO (FSAFDO) profile loaderRong Xu3-2/+390
This patch implements Flow Sensitive Sample FDO (FSAFDO) profile loader. We have two profile loaders for FS profile, one before RegAlloc and one before BlockPlacement. To enable it, when -fprofile-sample-use=<profile> is specified, add "-enable-fs-discriminator=true \ -disable-ra-fsprofile-loader=false \ -disable-layout-fsprofile-loader=false" to turn on the FS profile loaders. Differential Revision: https://reviews.llvm.org/D107878
2021-08-18[NFC][DebugInfo] getDwarfCompileUnitIDKyungwoo Lee2-7/+16
This is a refactoring for the use in https://reviews.llvm.org/D108261 Reviewed By: dblaikie Differential Revision: https://reviews.llvm.org/D108271
2021-08-18[NFC] Remove some unnecessary AttributeList methodsArthur Eubanks1-2/+2
These rely on methods I'm trying to cleanup.
2021-08-18[GlobalISel] Implement lowering for G_ISNAN + use it in AArch64Jessica Paquette1-0/+34
GlobalISel equivalent to `TargetLowering::expandISNAN`. Use it in AArch64 and add a testcase. Differential Revision: https://reviews.llvm.org/D108227
2021-08-18[GlobalISel] Add IRTranslator support for G_ISNANJessica Paquette1-0/+8
Translate the `@llvm.isnan` intrinsic to G_ISNAN when we see it. This is pretty much the same as the associated SelectionDAGBuilder code. Main difference is that we don't expand it here. It makes more sense to do that during legalization in GlobalISel. GlobalISel will just legalize the generated illegal types. Differential Revision: https://reviews.llvm.org/D108226
2021-08-18[GlobalISel] Add G_ISNANJessica Paquette1-0/+19
Add a generic opcode equivalent to the `llvm.isnan` intrinsic + MachineVerifier support for it. We need an opcode here because we may want target-specific lowering later on. Differential Revision: https://reviews.llvm.org/D108222
2021-08-18Revert "Allow rematerialization of virtual reg uses"Petr Hosek1-2/+7
This reverts commit 877572cc193a470f310eec46a7ce793a6cc97c2f which introduced PR51516.
2021-08-17[NFC] More get/removeAttribute() cleanupArthur Eubanks1-2/+1
2021-08-17[NFC] Cleanup more AttributeList::addAttribute()Arthur Eubanks2-3/+2
2021-08-18[RegAlloc] Remove addAllocPriorityToGlobalRanges hookQiu Chaofan1-3/+1
It was introduced in 1a6dc92 and only enabled on PowerPC/AMDGPU. That should be enabled for all targets. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D108010
2021-08-18[DAGCombiner] Teach isKnownToBeAPowerOfTwo handle SPLAT_VECTORjacquesguan1-0/+6
Make DAGCombine turn mul by power of 2 into shl for scalable vector. Reviewed By: frasercrmck Differential Revision: https://reviews.llvm.org/D107883
2021-08-18[X86] AVX512FP16 instructions enabling 3/6Wang, Pengfei1-0/+2
Enable FP16 conversion instructions. Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html Reviewed By: LuoYuanke Differential Revision: https://reviews.llvm.org/D105265
2021-08-17SelectionDAGBuilder::visitInlineAsm - don't dereference dyn_cast<> results.Simon Pilgrim1-1/+1
dyn_cast<> can return nullptr if the cast is illegal, use cast<> instead which will assert that the cast is correct. Fixes static analyser warning.
2021-08-17[VP] Add vector-predicated reduction intrinsicsFraser Cormack2-0/+145
This patch adds vector-predicated ("VP") reduction intrinsics corresponding to each of the existing unpredicated `llvm.vector.reduce.*` versions. Unlike the unpredicated reductions, all VP reductions have a start value. This start value is returned when the no vector element is active. Support for expansion on targets without native vector-predication support is included. This patch is based on the ["reduction slice"](https://reviews.llvm.org/D57504#1732277) of the LLVM-VP reference patch (https://reviews.llvm.org/D57504). Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D104308
2021-08-17[GlobalISel] Add combine for PTR_ADD with regbanksSebastian Neubauer1-5/+16
Combine two G_PTR_ADDs, but keep the register bank of the constant. That way, the combine can be used in post-regbank-select combines. Introduce two helper methods in CombinerHelper, getRegBank and setRegBank that get and set an optional register bank to a register. That way, they can be used before and after register bank selection. Differential Revision: https://reviews.llvm.org/D103326
2021-08-17[CodeGenPrepare] The instruction to be sunk should be inserted before its ↵Tiehu Zhang1-2/+12
user in a block In current implementation, the instruction to be sunk will be inserted before the target instruction without considering the def-use tree, which may case Instruction does not dominate all uses error. We need to choose a suitable location to insert according to the use chain Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D107262
2021-08-17[DebugInfo][InstrRef] Honour too-much-debug-info cutoutsJeremy Morse4-36/+59
This reapplies 54a61c94f93, its follow up in 547b712500e, which were reverted 95fe61e63954. Original commit message: VarLoc based LiveDebugValues will abandon variable location propagation if there are too many blocks and variable assignments in the function. If it didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end up with 1 million DBG_VALUEs just at the start of blocks. Instruction-referencing LiveDebugValues should honour this limitation too (because the same limitation applies to it). Hoist the relevant command line options into LiveDebugValues.cpp and pass it down into the implementation classes as an argument to ExtendRanges. I've duplicated all the run-lines in live-debug-values-cutoffs.mir to have an instruction-referencing flavour. Differential Revision: https://reviews.llvm.org/D107823
2021-08-16[NFC] Remove/replace some confusing attribute getters on FunctionArthur Eubanks1-1/+1
2021-08-16[AsmPrinter] fix nullptr dereference for MBBs with hasAddressTaken property ↵Afanasyev Ivan1-3/+3
without BB Basic block pointer is dereferenced unconditionally for MBBs with hasAddressTaken property. MBBs might have hasAddressTaken property without reference to BB. Backend developers must assign fake BB to MBB to workaround this issue and it should be fixed. Reviewed By: rnk Differential Revision: https://reviews.llvm.org/D108092
2021-08-16[Remarks] Emit optimization remarks for atomics generating CAS loopAnshil Gandhi1-1/+16
Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891
2021-08-16Allow rematerialization of virtual reg usesStanislav Mekhanoshin1-7/+2
Currently isReallyTriviallyReMaterializableGeneric() implementation prevents rematerialization on any virtual register use on the grounds that is not a trivial rematerialization and that we do not want to extend liveranges. It appears that LRE logic does not attempt to extend a liverange of a source register for rematerialization so that is not an issue. That is checked in the LiveRangeEdit::allUsesAvailableAt(). The only non-trivial aspect of it is accounting for tied-defs which normally represent a read-modify-write operation and not rematerializable. The test for a tied-def situation already exists in the /CodeGen/AMDGPU/remat-vop.mir, test_no_remat_v_cvt_f32_i32_sdwa_dst_unused_preserve. The change has affected ARM/Thumb, Mips, RISCV, and x86. For the targets where I more or less understand the asm it seems to reduce spilling (as expected) or be neutral. However, it needs a review by all targets' specialists. Differential Revision: https://reviews.llvm.org/D106408
2021-08-16Prevent machine licm if remattable with a vreg useStanislav Mekhanoshin1-4/+24
Check if a remateralizable nstruction does not have any virtual register uses. Even though rematerializable RA might not actually rematerialize it in this scenario. In that case we do not want to hoist such instruction out of the loop in a believe RA will sink it back if needed. This already has impact on AMDGPU target which does not check for this condition in its isTriviallyReMaterializable implementation and have instructions with virtual register uses enabled. The other targets are not impacted at this point although will be when D106408 lands. Differential Revision: https://reviews.llvm.org/D107677
2021-08-16[TypePromotion] Don't mutate the result type of SwitchInst.Craig Topper1-2/+2
SwitchInst should have a void result type. Add a check to the verifier to catch this error. Reviewed By: samparker Differential Revision: https://reviews.llvm.org/D108084
2021-08-16[DAG] Fold concat_vectors(concat_vectors(x,y),concat_vectors(a,b)) -> ↵Simon Pilgrim1-2/+46
concat_vectors(x,y,a,b) Follow-up to D107068, attempt to fold nested concat_vectors/undefs, as long as both the vector and inner subvector types are legal. This exposed the same issue in ARM's MVE LowerCONCAT_VECTORS_i1 (raised as PR51365) and AArch64's performConcatVectorsCombine which both assumed concat_vectors only took 2 subvector operands. Differential Revision: https://reviews.llvm.org/D107597
2021-08-16Revert 54a61c94f93 and its follow up in 547b712500eJeremy Morse4-40/+20
These were part of D107823, however asan has found something excitingly wrong happening: https://lab.llvm.org/buildbot/#/builders/5/builds/10543/steps/13/logs/stdio
2021-08-16Suppress signedness-comparison warningJeremy Morse1-1/+1
This is a follow-up to 54a61c94f93.
2021-08-16[DebugInfo][InstrRef] Honour too-much-debug-info cutoutsJeremy Morse4-20/+40
VarLoc based LiveDebugValues will abandon variable location propagation if there are too many blocks and variable assignments in the function. If it didn't, and we had (say) 1000 blocks and 1000 variables in scope, we'd end up with 1 million DBG_VALUEs just at the start of blocks. Instruction-referencing LiveDebugValues should honour this limitation too (because the same limitation applies to it). Hoist the relevant command line options into LiveDebugValues.cpp and pass it down into the implementation classes as an argument to ExtendRanges. I've duplicated all the run-lines in live-debug-values-cutoffs.mir to have an instruction-referencing flavour. Differential Revision: https://reviews.llvm.org/D107823
2021-08-15[DAGCombiner] Stop visitEXTRACT_SUBVECTOR creating illegal BITCASTs post ↵Paul Walker1-1/+5
legalisation. visitEXTRACT_SUBVECTOR can sometimes create illegal BITCASTs when removing "redundant" INSERT_SUBVECTOR operations. This patch adds an extra check to ensure such combines only occur after operation legalisation if any resulting BITBAST is itself legal. Differential Revision: https://reviews.llvm.org/D108086
2021-08-15[NFC] Simply update a FIXME commentQiu Chaofan1-2/+2
X86 overrided LowerOperationWrapper was moved to common implementation in a7eae62.
2021-08-15Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"Dávid Bolvanský1-22/+1
This reverts commit 435785214f73ff0c92e97f2ade6356e3ba3bf661. Still same compile time issues for -O0 -g, eg. +1.3% for sqlite3.
2021-08-14[Remarks] Emit optimization remarks for atomics generating CAS loopAnshil Gandhi1-1/+22
Implements ORE in AtomicExpand pass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891
2021-08-13Revert "[Remarks] Emit optimization remarks for atomics generating CAS loop"Anshil Gandhi1-22/+1
This reverts commit c4e5425aa579d21530ef1766d7144b38a347f247.
2021-08-13[Remarks] Emit optimization remarks for atomics generating CAS loopAnshil Gandhi1-1/+22
Implements ORE in AtomicExpandPass to report atomics generating a compare and swap loop. Differential Revision: https://reviews.llvm.org/D106891
2021-08-13[GlobalISel] Narrow binops feeding into G_AND with a maskJessica Paquette1-0/+91
This is a fairly common pattern: ``` %mask = G_CONSTANT iN <mask val> %add = G_ADD %lhs, %rhs %and = G_AND %add, %mask ``` We have combines to eliminate G_AND with a mask that does nothing. If we combined the above to this: ``` %mask = G_CONSTANT iN <mask val> %narrow_lhs = G_TRUNC %lhs %narrow_rhs = G_TRUNC %rhs %narrow_add = G_ADD %narrow_lhs, %narrow_rhs %ext = G_ZEXT %narrow_add %and = G_AND %ext, %mask ``` We'd be able to take advantage of those combines using the trunc + zext. For this to work (or be beneficial in the best case) - The operation we want to narrow then widen must only be used by the G_AND - The G_TRUNC + G_ZEXT must be free - Performing the operation at a narrower width must not produce a different value than performing it at the original width *after masking.* Example comparison between SDAG + GISel: https://godbolt.org/z/63jzb1Yvj At -Os for AArch64, this is a 0.2% code size improvement on CTMark/pairlocalign. Differential Revision: https://reviews.llvm.org/D107929
2021-08-13GlobalISel: Add helper function for getting EVT from LLTMatt Arsenault2-2/+13
This can only give an imperfect approximation, but is enough to avoid crashing in places where we call into EVT functions starting from LLTs.
2021-08-13[NFC] Cleanup calls to AttributeList::getAttribute(FunctionIndex)Arthur Eubanks1-3/+1
getAttribute() is confusing, use a clearer method.
2021-08-13[NFC] Clean up users of AttributeList::hasAttribute()Arthur Eubanks4-15/+11
AttributeList::hasAttribute() is confusing, use clearer methods like hasParamAttr()/hasRetAttr(). Add hasRetAttr() since it was missing from AttributeList.
2021-08-13[NFC] Rename AttributeList::hasFnAttribute() -> hasFnAttr()Arthur Eubanks3-3/+3
This is more consistent with similar methods.
2021-08-13SplitKit: Don't further split subrange mask in buildCopyRuiling Song1-11/+12
We may use several COPY instructions to copy the needed sub-registers during split. But the way we split the lanes during the COPYs may be different from the subranges of the old register. This would fail when we extend the subranges of the new register because the LaneMasks do not match exactly between subranges of new register and old register. Since we are bundling the COPYs, I think there is no need to further refine the subranges of the new register based on the set of LaneMasks of the inserted COPYs. I am not sure if there will be further breaking cases. But as the subranges of new register are created based on the LaneMasks of the subranges of old register, it will be highly possible we will always find an exact LaneMask match. We can think about how to make the extendPHIKillRanges() work for subrange mask mismatch case if we meet more such cases in the future. The test case was from D105065 by @arsenm. Differential Revision: https://reviews.llvm.org/D107829
2021-08-11[SampleFDO] Add two passes of MIRAddFSDiscriminatorsPassRong Xu1-0/+9
This patch adds Pass1 of MIRADDFSDiscriminatorsPass before register allocation, and Pass2 of MIRAddFSDiscriminatorsPass before Block-Placement. This is still under --enable-fs-discrmininator option (default false). This would reduce the turn-around time for FSAFDO transition. Differential Revision: https://reviews.llvm.org/D104579
2021-08-11[LegalizeTypes][NFC] Remove else-after-returnFraser Cormack1-11/+7
Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D107890
2021-08-11[ELF] Don't emit SHF_GNU_RETAIN on SolarisRainer Orth1-4/+7
The introduction of `SHF_GNU_RETAIN` has caused massive problems on Solaris. Initially, as reported in Bug 49437, it caused dozens of testsuite failures on both sparc and x86. The objects were marked as `ELFOSABI_NONE`, but `SHF_GNU_RETAIN` is a GNU extension. In the native Solaris ABI, that flag (in the range for OS-specific values) is `SHF_SUNW_ABSENT` with a completely different semantics, which confuses Solaris `ld` very much. Later, the objects became (correctly) marked `ELFOSABI_GNU`, which Solaris `ld` doesn't support, causing it to SEGV and break the build. The linker is currently being hardened to not accept non-native OS ABIs to avoid this. The need for linker support is already documented in `clang/include/clang/Basic/AttrDocs.td`, but not currently checked. This patch avoids all this by not emitting `SHF_GNU_RETAIN` on Solaris at all. Tested on `amd64-pc-solaris2.11`, `sparcv9-sun-solaris2.11`, and `x86_64-pc-linux-gnu`. Differential Revision: https://reviews.llvm.org/D107747
2021-08-11[DAG] Reword comment for EnforceNodeIdInvariant and InvalidateNodeId. NFC.madhur134901-16/+16
Reviewed By: niravd Differential Revision: https://reviews.llvm.org/D107845