riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2020-08-10	[PowerPC] Add intrinsic to read or set FPSCR register	Qiu Chaofan	4	-1/+33
	This patch introduces two intrinsics: llvm.ppc.setflm and llvm.ppc.readflm. They read from or write to FPSCR register (floating-point status & control) which contains rounding mode and exception status. To ensure correctness of program, we need to prevent FP operations from being moved across these intrinsics (mffs/mtfsf instruction), so here I set them as scheduling boundaries. We can relax such restriction if FPSCR is modeled well in the future. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D84914
2020-08-10	[ScalarizeMaskedMemIntrin] Scalarize constant mask expandload as ↵	Simon Pilgrim	1	-8/+19
	shuffle(build_vector,pass_through) As noticed on D66004, scalarization of an expandload with a constant mask as a chain of irregular loads+inserts makes it tricky to optimize before lowering, resulting in difficulties in merging loads etc. This patch instead scalarizes the expansion to a build_vector(load0, load1, undef, load2,....) style pattern and then performs a blend shuffle with the pass through vector. This allows us to more easily make use of all the build_vector combines, merging of consecutive loads etc. Differential Revision: https://reviews.llvm.org/D85416
2020-08-10	[DebugInfo] Fix initialization of DwarfCompileUnit::LabelBegin.	Igor Kudrin	1	-2/+2
	This also fixes the condition in the assertion in DwarfCompileUnit::getLabelBegin() because it checked something unrelated to the returned value. Differential Revision: https://reviews.llvm.org/D85437
2020-08-10	AMDGPU/GlobalISel: Lower G_FREM	Petar Avramovic	2	-0/+30
	Add custom lower for G_FREM. Differential Revision: https://reviews.llvm.org/D84324
2020-08-09	[NFC][StackSafety] Add a couple of early returns	Vitaly Buka	1	-2/+3

2020-08-09	[NFC][StackSafety] Count dataflow inputs	Vitaly Buka	1	-0/+3

2020-08-09	[StackSafety] Fix union which produces wrapped sets	Vitaly Buka	1	-18/+28

2020-08-09	[NFC][StackSafety] Avoid assert in getBaseObjec	Vitaly Buka	1	-1/+1

2020-08-10	[BuildLibCalls] Add noundef to standard I/O functions	Juneyoung Lee	1	-0/+81
	This patch adds noundef to return value and arguments of standard I/O functions. With this patch, passing undef or poison to the functions becomes undefined behavior in LLVM IR. Since undef/poison is lowered from operations having UB in C/C++, passing undef to them was already UB in source. With this patch, the functions cannot return undef or poison anymore as well. According to C17 standard, ungetc/ungetwc/fgetpos/ftell can generate unspecified value; 3.19.3 says unspecified value is a valid value of the relevant type, and using unspecified value is unspecified behavior, which is not UB, so it cannot be undef (using undef is UB when e.g. it is used at branch condition). — The value of the file position indicator after a successful call to the ungetc function for a text stream, or the ungetwc function for any stream, until all pushed-back characters are read or discarded (7.21.7.10, 7.29.3.10). — The details of the value stored by the fgetpos function (7.21.9.1). — The details of the value returned by the ftell function for a text stream (7.21.9.4). In the long run, most of the functions listed in BuildLibCalls should have noundefs; to remove redundant diffs which will anyway disappear in the future, I added noundef to a few more non-I/O functions as well. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D85345
2020-08-09	[StackSafety] Don't keep FullSet in index	Vitaly Buka	1	-0/+3
	Optimization. Missing record is enterpreted as FullSet anyway.
2020-08-09	[StackSafety] Use getSignedMin() to serialize ranges	Vitaly Buka	1	-1/+1
	Almost NFC as it's important only for full sets which should not be serialized at all.
2020-08-09	Fix 64-bit copy to SCC	Piotr Sobczak	2	-8/+18
	Fix 64-bit copy to SCC by restricting the pattern resulting in such a copy to subtargets supporting 64-bit scalar compare, and mapping the copy to S_CMP_LG_U64. Before introducing the S_CSELECT pattern with explicit SCC (0045786f146e78afee49eee053dc29ebc842fee1), there was no need for handling 64-bit copy to SCC ($scc = COPY sreg_64). The proposed handling to read only the low bits was however based on a false premise that it is only one bit that matters, while in fact the copy source might be a vector of booleans and all bits need to be considered. The practical problem of mapping the 64-bit copy to SCC is that the natural instruction to use (S_CMP_LG_U64) is not available on old hardware. Fix it by restricting the problematic pattern to subtargets supporting the instruction (hasScalarCompareEq64). Differential Revision: https://reviews.llvm.org/D85207
2020-08-09	[InstSimplify/NewGVN] Add option to control the use of undef.	Florian Hahn	2	-31/+33
	Making use of undef is not safe if the simplification result is not used to replace all uses of the result. This leads to problems in NewGVN, which does not replace all uses in the IR directly. See PR33165 for more details. This patch adds an option to SimplifyQuery to disable the use of undef. Note that I've only guarded uses if isa<UndefValue>/m_Undef where SimplifyQuery is currently available. If we agree on the general direction, I'll update the remaining uses. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D84792
2020-08-09	[SCEVExpander] Make sure cast properly dominates Builder's IP.	Florian Hahn	1	-2/+3
	The selected cast must properly dominate the Builder's IP, so we cannot re-use the cast, if it matches the builder's IP.
2020-08-09	[HotColdSplit] Add options for splitting cold functions in separate section	Aditya Kumar	1	-2/+18
	Add support for (if enabled) splitting cold functions into a separate section in order to further boost locality of hot code. Authored by: rjf (Ruijie Fang) Reviewed by: hiraditya,rcorcs,vsk Differential Revision: https://reviews.llvm.org/D85331
2020-08-09	[VectorCombine] try to create vector loads from scalar loads	Sanjay Patel	1	-0/+59
	This patch was adjusted to match the most basic pattern that starts with an insertelement (so there's no extract created here). Hopefully, that removes any concern about interfering with other passes. Ie, the transform should almost always be profitable. We could make an argument that this could be part of canonicalization, but we conservatively try not to create vector ops from scalar ops in passes like instcombine. If the transform is not profitable, the backend should be able to re-scalarize the load. Differential Revision: https://reviews.llvm.org/D81766
2020-08-09	[SCEVExpander] Avoid re-using existing casts if it means updating users.	Florian Hahn	1	-27/+29
	Currently the SCEVExpander tries to re-use existing casts, even if they are not exactly at the insertion point it was asked to create the cast. To do so in some case, it creates a new cast at the insertion point and updates all users to use the new cast. This behavior is problematic, because it changes the IR outside of the instructions created during the expansion. Therefore we cannot completely undo all changes made during expansion. This re-use should be only an extra optimization, so only using the new cast in the expanded instructions should not be a correctness issue. There are many cases equivalent instructions are created during expansion. This patch also adjusts findInsertPointAfter to skip instructions inserted during expansion. This enables re-using existing casts without the renaming any uses, by picking a better insertion point. Reviewed By: efriedma, lebedev.ri Differential Revision: https://reviews.llvm.org/D84399
2020-08-09	[ARM] Add VADDV and VMLAV patterns for v16i16	David Green	1	-0/+24
	This adds patterns for v16i16's vecreduce, using all the existing code to go via an i32 VADDV/VMLAV and truncating the result. Differential Revision: https://reviews.llvm.org/D85452
2020-08-09	[ARM] Allow vecreduce_add in tail predicated loops	David Green	1	-2/+3
	This allows vecreduce_add in loops so that we can tailpredicate them. Differential Revision: https://reviews.llvm.org/D85454
2020-08-09	[ARM] Some formatting and predicate VRHADD patterns. NFC	David Green	1	-35/+44
	This formats some of the MVE patterns, and adds a missing Predicates = [HasMVEInt] to some VRHADD patterns I noticed as going through. Although I don't believe NEON would ever use the patterns (as it would use ADDL and VSHRN instead) they should ideally be predicated on having MVE instructions.
2020-08-09	[X86][GlobalISel] Remove unneeded code for handling zext i8->16, i8->i64, ↵	Craig Topper	1	-59/+8
	i16->i64, i32->i64. These all seem to be handled by tablegen pattern imports.
2020-08-08	[DAGCombiner] Teach SimplifySetCC SETUGE X, SINTMIN -> SETLT X, 0 and SETULE ↵	Craig Topper	1	-9/+9
	X, SINTMAX -> SETGT X, -1. These aren't the canonical forms we'd get from InstCombine, but we do have X86 tests for them. Recognizing them is pretty cheap. While there make use of APInt:isSignedMinValue/isSignedMaxValue instead of creating a new APInt to compare with. Also use SelectionDAG::getAllOnesConstant helper to hide the all ones APInt creation.
2020-08-08	Revert "[CMake] Simplify CMake handling for zlib"	Petr Hosek	3	-33/+8
	This reverts commit ccbc1485b55ff4acd21bcfafbf7aec4ed0fd818d which is still failing on the Windows MLIR bots.
2020-08-08	[CMake] Simplify CMake handling for zlib	Petr Hosek	3	-8/+33
	Rather than handling zlib handling manually, use find_package from CMake to find zlib properly. Use this to normalize the LLVM_ENABLE_ZLIB, HAVE_ZLIB, HAVE_ZLIB_H. Furthermore, require zlib if LLVM_ENABLE_ZLIB is set to YES, which requires the distributor to explicitly select whether zlib is enabled or not. This simplifies the CMake handling and usage in the rest of the tooling. This is a reland of abb0075 with all followup changes and fixes that should address issues that were reported in PR44780. Differential Revision: https://reviews.llvm.org/D79219
2020-08-08	[WebAssembly] Fix FastISel address calculation bug	Thomas Lively	1	-9/+8
	Fixes PR47040, in which an assertion was improperly triggered during FastISel's address computation. The issue was that an `Address` set to be relative to the FrameIndex with offset zero was incorrectly considered to have an unset base. When the left hand side of an add set the Address to be 0 off the FrameIndex, the right side would not detect that the Address base had already been set and could try to set the Address to be relative to a register instead, triggering an assertion. This patch fixes the issue by explicitly tracking whether an `Address` has been set rather than interpreting an offset of zero to mean the `Address` has not been set. Differential Revision: https://reviews.llvm.org/D85581
2020-08-08	[X86] Remove a DCI.isBeforeLegalize() call from ↵	Craig Topper	1	-1/+1
	combineVSelectWithAllOnesOrZeros. This was blocking isTypeLegal call so that we could do a particular transform on illegal types before type legalization. But the we create a target specific node using that type. We shouldn't do that if the type isn't legal. So I think we should just always make sure the type is legal. I suspect that in order to get the condition VT to not be a vector of i1 we already completed type legalization anyway so this probably doesn't matter much in practice.
2020-08-08	[X86] Support matching VPTERNLOG when the root node is X86ISD::ANDNP.	Craig Topper	1	-0/+11

2020-08-08	[AArch64RegisterInfo] Supress new warning	Dávid Bolvanský	1	-2/+2

2020-08-08	[X86] Remove isSafeToClobberEFLAGS helper and just inline it into the call ↵	Craig Topper	3	-15/+8
	sites. This is just a thin wrapper around computeRegisterLivness which we can just call directly. The only real difference is that isSafeToClobberEFLAGS returns a bool and computeRegisterLivness returns an enum. So we need to check for the specific enum value that isSafeToClobberEFLAGS was hiding. I've also adjusted which sites pass an explicit value for Neighborhood since the default for computeRegisterLivness is 10.
2020-08-08	Recommit "[X86] Increase the number of instructions searched for ↵	Craig Topper	3	-4/+5
	isSafeToClobberEFLAGS in a couple places" I messed up the bug numbers in the commit message before Previously this function searched 4 instructions forwards or backwards to determine if it was ok to clobber eflags. This is called in 3 places: rematerialization, turning 2 operand leas into adds or splitting 3 ops leas into an lea and add on some CPU targets. This patch increases the search limit to 10 instructions for rematerialization and 2 operand lea to add. I've left the old treshold for 3 ops lea spliting as that increases code size. Fixes PR47024 and PR46315.
2020-08-08	Revert "[X86] Increase the number of instructions searched for ↵	Craig Topper	3	-5/+4
	isSafeToClobberEFLAGS in a couple places" This reverts commit 44b260cb0aab387d85e4d59c16fc7b8866264f5e. I messed up the bug number in the commit message so I'm reverting to fix it.
2020-08-08	[X86][SSE] combineTargetShuffle - use scaleShuffleMask helper to widen ↵	Simon Pilgrim	1	-8/+2
	shuffle mask. NFCI. Use scaleShuffleMask helper for the shuffle(hadd,hadd) canonicalization.
2020-08-08	[X86] Increase the number of instructions searched for isSafeToClobberEFLAGS ↵	Craig Topper	3	-4/+5
	in a couple places Previously this function searched 4 instructions forwards or backwards to determine if it was ok to clobber eflags. This is called in 3 places: rematerialization, turning 2 operand leas into adds or splitting 3 ops leas into an lea and add on some CPU targets. This patch increases the search limit to 10 instructions for rematerialization and 2 operand lea to add. I've left the old treshold for 3 ops lea spliting as that increases code size. Fixes PR47024 and PR43014
2020-08-08	[InstCombine] Use CreateVectorSplat(ElementCount) variant directly	Simon Pilgrim	1	-2/+2
	This was introduced at rGe20223672100, and the CreateVectorSplat(unsigned NumElements) variant calls it internally
2020-08-08	[SimplifyCFG] Fix invoke->call fold w/ multiple invokes in presence of ↵	Roman Lebedev	1	-39/+28
	lifetime intrinsics SimplifyCFG has two main folds for resumes - one when resume is directly using the landingpad, and the other one where resume is using a PHI node. While for the first case, we were already correctly ignoring all the PHI nodes, and both the debug info intrinsics and lifetime intrinsics, in the PHI-based-one, we weren't ignoring PHI's in the resume block, and weren't ignoring lifetime intrinsics. That is clearly a bug. On RawSpeed library, this results in +9.34% (+81) more invoke->call folds, -0.19% (-39) landing pads, -0.24% (-81) invoke instructions but +51 call instructions and -132 basic blocks. Though, the run-time performance impact appears to be within the noise.
2020-08-08	[NFC][SimplifyCFG] Rewrite isCleanupBlockEmpty() to be iterator_range-based	Roman Lebedev	1	-6/+7

2020-08-08	[NFC][SimplifyCFG] Count the number of invokes turned into calls due to ↵	Roman Lebedev	1	-0/+5
	empty cleanup blocks
2020-08-08	[DAGCombiner] reassociate reciprocal sqrt expression to eliminate FP ↵	Sanjay Patel	1	-14/+19
	division, part 2 Follow-up to D82716 / rGea71ba11ab11 We do not have the fabs removal fold in IR yet for the case where the sqrt operand is repeated, so that's another potential improvement.
2020-08-08	lib/CodeGen doesn't depend on lib/Passes.	Benjamin Kramer	1	-1/+1

2020-08-08	[InstCombine] Optimize select(freeze(icmp eq/ne x, y), x, y)	Juneyoung Lee	1	-0/+29
	This patch adds an optimization that folds select(freeze(icmp eq/ne x, y), x, y) to x or y. This was needed to resolve slowdown after D84940 is applied. I tried to bake this logic into foldSelectInstWithICmp, but it wasn't clear. This patch conservatively writes the pattern in a separate function, foldSelectWithFrozenICmp. The output does not need freeze; https://alive2.llvm.org/ce/z/X49hNE (from @nikic) Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D85533
2020-08-07	[X86] Limit the scope of the min/max canonicalization in combineSelect	Craig Topper	1	-12/+8
	Previously the transform was doing these two canonicalizations (x > y) ? x : y -> (x >= y) ? x : y (x < y) ? x : y -> (x <= y) ? x : y But those don't seem to be useful generally. And they actively pessimize the cases in PR47049. This patch limits it to (x > 0) ? x : 0 -> (x >= 0) ? x : 0 (x < -1) ? x : -1 -> (x <= -1) ? x : -1 These are the cases mentioned in the comments as the motivation for the canonicalization. These allow the CMOV to use the S flag from the compare thus improving opportunities to use a TEST or the flags from an arithmetic instruction.
2020-08-07	[X86] Don't produce bad x86andp nodes for i1 vectors	Keno Fischer	1	-4/+8
	In D85499, I attempted to fix this same issue by canonicalizing andnp for i1 vectors, but since there was some opposition to such a change, this commit just fixes the bug by using two different forms depending on which kind of vector type is in use. We can then always decide to switch the canonical forms later. Description of the original bug: We have a DAG combine that tries to fold (vselect cond, 0000..., X) -> (andnp cond, x). However, it does so by attempting to create an i64 vector with the number of elements obtained by truncating division by 64 from the bitwidth. This is bad for mask vectors like v8i1, since that division is just zero. Besides, we don't want i64 vectors anyway. For i1 vectors, switch the pattern to (andnp (not cond), x), which is the canonical form for `kandn` on mask registers. Fixes https://github.com/JuliaLang/julia/issues/36955. Differential Revision: https://reviews.llvm.org/D85553
2020-08-07	Reland "Revert "[NewPM][CodeGen] Introduce machine pass and machine pass ↵	Yuanfang Chen	3	-1/+106
	manager"" This relands commit 320eab2d558fde0b61437e9b9075bfd301c2c474. The test failed because it was looking for x86-linux target unconditionally. Now it gets the default target.
2020-08-07	AMDGPU: Avoid explicitly listing all the memory nodes	Matt Arsenault	1	-31/+16

2020-08-07	[NFC][StackSafety] Fix statistics	Vitaly Buka	1	-2/+2

2020-08-07	[NewPM] Print 'Skipping pass' as pass instrumentation	Arthur Eubanks	1	-6/+10
	If OptNoneInstrumentation prints it instead, 'Skipping pass' will print for even required passes. Reviewed By: ychen Differential Revision: https://reviews.llvm.org/D85493
2020-08-07	[NFC][MLInliner] Refactor logging implementation	Mircea Trofin	1	-27/+93
	This prepares it for logging externally-specified outputs. Differential Revision: https://reviews.llvm.org/D85451
2020-08-07	Revert "[StackSafety] Skip ambiguous lifetime analysis"	Vitaly Buka	1	-29/+29
	This reverts commit 0b2616a8045cb776ea1514c3401d0a8577de1060. Crashes with safe-stack.
2020-08-07	[StackSafety,NFC] Add Stats counters	Vitaly Buka	1	-0/+18

2020-08-07	Revert "[MSAN] Instrument libatomic load/store calls"	Gui Andrade	1	-113/+0
	Problems with instrumenting atomic_load when the call has no successor, blocking compiler roll This reverts commit 33d239513c881d8c11c60d5710c55cf56cc309a5.