aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen
AgeCommit message (Collapse)AuthorFilesLines
2014-10-31PR20557: Fix the bug that bogus cpu parameter crashes llc on AArch64 backend.Hao Liu1-1/+5
Initial patch by Oleg Ranevskyy. llvm-svn: 220945
2014-10-30[SelectionDAG] When scalarizing trunc, don't assert for legal operands.Ahmed Bougacha1-1/+17
r212242 introduced a legalizer hook, originally to let AArch64 widen v1i{32,16,8} rather than scalarize, because the legalizer expected, when scalarizing the result of a conversion operation, to already have scalarized the operands. On AArch64, v1i64 is legal, so that commit ensured operations such as v1i32 = trunc v1i64 wouldn't assert. It did that by choosing to widen v1 types whenever possible. However, v1i1 types, for which there's no legal widened type, would still trigger the assert. This commit fixes that, by only scalarizing a trunc's result when the operand has already been scalarized, and introducing an extract_elt otherwise. This is similar to r205625. Fixes PR20777. llvm-svn: 220937
2014-10-30Fix incorrect invariant check in DAG CombineLouis Gerbarg1-1/+1
Earlier this summer I fixed an issue where we were incorrectly combining multiple loads that had different constraints such alignment, invariance, temporality, etc. Apparently in one case I made copt paste error and swapped alignment and invariance. Tests included. rdar://18816719 llvm-svn: 220933
2014-10-30PR21408: Workaround the appearance of duplicate variables due to problems ↵David Blaikie1-1/+6
when inlining two calls to the same function from the same call site. llvm-svn: 220923
2014-10-29Whitespace.NAKAMURA Takumi5-40/+40
llvm-svn: 220857
2014-10-28Minimize the scope of some variables, NFC.David Blaikie1-2/+2
llvm-svn: 220759
2014-10-27[PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of theseLang Hames1-29/+50
sets as keys into a cache of interference matrice values in the Interference constraint adder. Creating interference matrices was one of the large remaining time-sinks in PBQP. Caching them reduces the total compile time (when using PBQP) on the nightly test suite by ~10%. llvm-svn: 220688
2014-10-26Remove some unnecessary casts.David Blaikie1-2/+2
llvm-svn: 220658
2014-10-24Sink DwarfUnit::constructImportedEntityDIE into DwarfCompileUnit.Frederic Riss4-32/+32
So that it has access to getOrCreateGlobalVariableDIE. If we ever support decsribing using directive in C++ classes (thus requiring support in type units), it will certainly use another mechanism anyway. Differential Revision: http://reviews.llvm.org/D5975 llvm-svn: 220594
2014-10-24Fix copy paste commentMatt Arsenault1-2/+2
llvm-svn: 220581
2014-10-24DebugInfo: Sink DwarfDebug::ScopeVariables down into DwarfFileDavid Blaikie5-11/+11
(part of refactoring to allow subprogram emission in both the skeleton and main units to enable -gmlt-like data to be included in the skeleton for live inlined backtracing purposes) llvm-svn: 220578
2014-10-24Remove DwarfDebug::FirstCU as it has no useDavid Blaikie2-17/+5
It was only being used as a flag to identify the lack of debug info from within endModule - use the section labels for that instead. llvm-svn: 220575
2014-10-24Use rsqrt (X86) to speed up reciprocal square root calcsSanjay Patel1-40/+77
This is a first step for generating SSE rsqrt instructions for reciprocal square root calcs when fast-math is allowed. For now, be conservative and only enable this for AMD btver2 where performance improves significantly - for example, 29% on llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c (if we convert the data type to single-precision float). This patch adds a two constant version of the Newton-Raphson refinement algorithm to DAGCombiner that can be selected by any target via a parameter returned by getRsqrtEstimate().. See PR20900 for more details: http://llvm.org/bugs/show_bug.cgi?id=20900 Differential Revision: http://reviews.llvm.org/D5658 llvm-svn: 220570
2014-10-24Added reset of LexicalScope in LiveDebugVariables reset function.Marcello Maggioni1-0/+1
llvm-svn: 220545
2014-10-24Fix PR21189 -- Emit symbol subsection required to debug LLVM-built binaries ↵Timur Iskhodzhanov1-9/+47
with VS2012+ Reviewed at http://reviews.llvm.org/D5772 llvm-svn: 220544
2014-10-24DebugInfo: Remove DwarfDebug::addScopeVariable now that it's just a trivial ↵David Blaikie4-13/+6
wrapper llvm-svn: 220542
2014-10-23[SelectionDAG] Teach the vector scalarizer about FP conversions.Ahmed Bougacha1-0/+4
This adds support for legalization of instructions of the form: [fp_conv] <1 x i1> %op to <1 x double> where fp_conv is one of fpto[us]i, [us]itofp. This used to assert because they were simply missing from the vector operand scalarizer. A similar problem arose in r190830, with trunc instead. Fixes PR20778. Differential Revision: http://reviews.llvm.org/D5810 llvm-svn: 220533
2014-10-23Update comment and fix typos in assert message. (NFC)Ahmed Bougacha1-3/+3
llvm-svn: 220531
2014-10-23ScheduleDAG: record PhysReg dependencies represented by CopyFromReg nodesTim Northover3-19/+36
x86's CMPXCHG -> EFLAGS consumer wasn't being recorded as a real EFLAGS dependency because it was represented by a pair of CopyFromReg(EFLAGS) -> CopyToReg(EFLAGS) nodes. ScheduleDAG was expecting the source to be an implicit-def on the instruction, where the result numbers in the DAG and the Uses list in TableGen matched up precisely. The Copy notation seems much more robust, so this patch extends ScheduleDAG rather than refactoring x86. Should fix PR20376. llvm-svn: 220529
2014-10-23DebugInfo: Remove DwarfDebug::CurrentFnArguments since we have to handle ↵David Blaikie5-51/+8
argument ordering of other arguments (abstract arguments) in the same way and already have code for that too. While refactoring this code I was confused by both the name I had introduced (addNonArgumentVariable... but it has all this logic to handle argument numbering and keep things in order?) and by the redundancy. Seems when I fixed the misordered inlined argument handling, I didn't realize it was mostly redundant with the argument ordering code (which I may've also written, I'm not sure). So let's just rely on the more general case. The only oddity in output this produces is that it means when we emit all the variables for the current function, we don't track when we've finished the argument variables and are about to start the local variables and insert DW_AT_unspecified_parameters (for varargs functions) there. Instead it ends up after the local variables, scopes, etc. But this isn't invalid and doesn't cause DWARF consumers problems that I know of... so we'll just go with that because it makes the code nice & simple. (though, let's see what the buildbots have to say about this - *crosses fingers*) There will be some cleanup commits to follow to remove the now trivial wrappers, etc. llvm-svn: 220527
2014-10-23DebugInfo: Sink DwarfDebug::addNonArgumentScopeVariable into DwarfFile.David Blaikie4-35/+35
llvm-svn: 220520
2014-10-23DebugInfo: Remove DwarfDebug::addCurrentFnArgument declaration now that it's ↵David Blaikie1-4/+0
moved to DwarfFile. llvm-svn: 220515
2014-10-23DebugInfo: Simplify/tidy/correct global variable decl/def emission handling.David Blaikie1-51/+26
This fixes a bug (introduced by fixing the IR emitted from Clang where the definition of a static member would be scoped within the class, rather than within its lexical decl context) where the definition of a static variable would be placed inside a class. It also improves source fidelity by scoping static class member definitions inside the lexical decl context in which tehy are written (eg: namespace n { class foo { static int i; } int foo::i; } - the definition of 'i' will be within the namespace 'n' in the DWARF output now). Lastly, and the original goal, this reduces debug info size slightly (and makes debug info easier to read, etc) by placing the definitions of non-member global variables within their namespace, rather than using a separate namespace-scoped declaration along with a definition at global scope. Based on patches and discussion with Frédéric. llvm-svn: 220497
2014-10-23Remove explicit (void) use of DwarfFile::DD that was accidentally left in ↵David Blaikie1-3/+1
r220452. Caught in post-commit review by Frédéric. llvm-svn: 220487
2014-10-23[DebugInfo] Sink DwarfDebug::addCurrentFnArgument down into DwarfFile.David Blaikie3-24/+28
Variable handling will be sunk into DwarfFile so that abstract variables and the like can be shared across multiple CUs (to handle cross-CU inlining, for example). llvm-svn: 220453
2014-10-23[DebugInfo] Add DwarfDebug& to DwarfFile.David Blaikie3-10/+17
Use the DwarfDebug in one function that previously took it as a parameter, and lay the foundation for use this for other operations coming soon. llvm-svn: 220452
2014-10-23[DebugInfo] Remove LexicalScopes::isCurrentFunctionScope and CSE a use of ↵David Blaikie2-13/+19
LexicalScopes::getCurrentFunctionScope Now that we're sure the only root (non-abstract) scope is the current function scope, there's no need for isCurrentFunctionScope, the property can be tested directly instead. llvm-svn: 220451
2014-10-22Strength reduce constant-sized vectors into arrays. No functionality change.Benjamin Kramer1-2/+2
llvm-svn: 220412
2014-10-22Fix typoMatt Arsenault1-1/+1
llvm-svn: 220353
2014-10-21Add minnum / maxnum codegenMatt Arsenault11-0/+188
llvm-svn: 220342
2014-10-21Pacify bots and simplify r220321Arnaud A. de Grandmaison1-1/+1
llvm-svn: 220335
2014-10-21[PBQP] Teach PassConfig to tell if the default register allocator is used.Arnaud A. de Grandmaison1-0/+6
This enables targets to adapt their pass pipeline to the register allocator in use. For example, with the AArch64 backend, using PBQP with the cortex-a57, the FPLoadBalancing pass is no longer necessary. llvm-svn: 220321
2014-10-21[PBQP] Fix coalescing benefitsArnaud A. de Grandmaison1-2/+2
As coalescing registers is a benefit, the cost should be improved (i.e. made smaller) when coalescing is possible. llvm-svn: 220302
2014-10-21Fix a bit of confusion about .set and produce more readable assembly.Rafael Espindola1-34/+24
Every target we support has support for assembly that looks like a = b - c .long a What is special about MachO is that the above combination suppresses the production of a relocation. With this change we avoid producing the intermediary labels when they don't add any value. llvm-svn: 220256
2014-10-21Make AsmPrinter::EmitLabelOffsetDifference a static helper and simplify.Rafael Espindola2-33/+27
It had exactly one caller in a position where we know hasSetDirective is true. llvm-svn: 220250
2014-10-21Introduce enum values for previously defined metadata types. (NFC)Philip Reames2-5/+5
Our metadata scheme lazily assigns IDs to string metadata, but we have a mechanism to preassign them as well. Using a preassigned ID is helpful since we get compile time type checking, and avoid some (minimal) string construction and comparison. This change adds enum value for three existing metadata types: + MD_nontemporal = 9, // "nontemporal" + MD_mem_parallel_loop_access = 10, // "llvm.mem.parallel_loop_access" + MD_nonnull = 11 // "nonnull" I went through an updated various uses as well. I made no attempt to get all uses; I focused on the ones which were easily grepable and easily to translate. For example, there were several items in LoopInfo.cpp I chose not to update. llvm-svn: 220248
2014-10-18[PBQP] Replace the interference-constraints algorithm with a faster versionLang Hames1-16/+115
loosely based on linear scan. On x86-64 this is good for a ~2% drop in compile time on the nightly test suite. llvm-svn: 220143
2014-10-17Check for dynamic alloca's when selecting lifetime intrinsics.Pete Cooper1-1/+7
TL;DR: Indexing maps with [] creates missing entries. The long version: When selecting lifetime intrinsics, we index the *static* alloca map with the AllocaInst we find for that lifetime. Trouble is, we don't first check to see if this is a dynamic alloca. On the attached example, this causes a dynamic alloca to create an entry in the static map, and returns 0 (the default) as the frame index for that lifetime. 0 was used for the frame index of the stack protector, which given that it now has a lifetime, is coloured, and merged with other stack slots. PEI would later trigger an assert because it expects the stack protector to not be dead. This fix ensures that we only get frame indices for static allocas, ie, those in the map. Dynamic ones are effectively dropped, which is suboptimal, but at least isn't completely broken. rdar://problem/18672951 llvm-svn: 220099
2014-10-17[Stackmaps] Enable invoking the patchpoint intrinsic.Juergen Ributzka2-51/+64
Patch by Kevin Modzelewski Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits, reames Differential Revision: http://reviews.llvm.org/D5634 llvm-svn: 220055
2014-10-17SelectionDAG: Add sext_inreg optimizationsJan Vesely1-0/+22
v2: use dyn_cast fixup comments v3: use cast Reviewed-by: Matt Arsenault <arsenm2@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220044
2014-10-16Reduce code duplication between patchpoint and non-patchpoint lowering. NFC.Juergen Ributzka2-44/+58
This is in preparation for another patch that makes patchpoints invokable. Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5657 llvm-svn: 219967
2014-10-16Erase fence insertion from SelectionDAGBuilder.cpp (NFC)Robin Morisset1-67/+20
Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain *unsound*, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957
2014-10-15Avoid caching the MachineFunction, we don't use it outside ofEric Christopher1-9/+7
runOnMachineFunction. llvm-svn: 219847
2014-10-15Simplify handling of --noexecstack by using getNonexecutableStackSection.Rafael Espindola2-7/+9
llvm-svn: 219799
2014-10-15[MachineSink] Use the real post dominator treeJingyue Wu1-21/+14
Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. This is the second try of the fix. The first one (D4814) caused a performance regression due to failing to sink instructions out of loops (PR21115). This patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower one regardless of whether the target block post-dominates the source. Thanks Alexey Volkov for reporting PR21115! Test Plan: Added a NVPTX codegen test to verify that our change prevents the backend from over-sinking. It also shows the unnecessary register pressure caused by over-sinking. Added an X86 test to verify we can sink instructions out of loops regardless of the dominance relationship. This test is reduced from Alexey's test in PR21115. Updated an affected test in X86. Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime performance. Results are attached separately in the review thread. Reviewers: Jiangning, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5633 llvm-svn: 219773
2014-10-14[AAarch64] Optimize CSINC-branch sequenceGerolf Hoflehner1-0/+12
Peephole optimization that generates a single conditional branch for csinc-branch sequences like in the examples below. This is possible when the csinc sets or clears a register based on a condition code and the branch checks that register. Also the condition code may not be modified between the csinc and the original branch. Examples: 1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44 to b.<invCC> 2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44 to b.<CC> rdar://problem/18506500 llvm-svn: 219742
2014-10-14Remove unused member variable.Rafael Espindola2-5/+3
Fixes pr20904. llvm-svn: 219706
2014-10-14DebugInfo: Ensure that all debug location scope chains from instructions ↵David Blaikie1-2/+7
within a function, lead to the function itself. Let me tell you a tale... Originally committed in r211723 after discovering a nasty case of weird scoping due to inlining, this was reverted in r211724 after it fired in ASan/compiler-rt. (minor diversion where I accidentally committed/reverted again in r211871/r211873) After further testing and fixing bugs in ArgumentPromotion (r211872) and Inlining (r212065) it was recommitted in r212085. Reverted in r212089 after the sanitizer buildbots still showed problems. Fixed another bug in ArgumentPromotion (r212128) found by this assertion. Recommitted in r212205, reverted in r212226 after it crashed some more on sanitizer buildbots. Fix clang some more in r212761. Recommitted in r212776, reverted in r212793. ASan failures. Recommitted in r213391, reverted in r213432, trying to reproduce flakey ASan build failure. Fixed bugs in r213805 (ArgPromo + DebugInfo), r213952 (LiveDebugVariables strips dbg_value intrinsics in functions not described by debug info). Recommitted in r214761, reverted in r214999, flakey failure on Windows buildbot. Fixed DeadArgElimination + DebugInfo bug in r219210. Recommitted in r219215, reverted in r219512, failure on ObjC++ atomic properties in the test-suite on Darwin. Fixed ObjC++ atomic properties issue in Clang in r219690. [This commit is provided 'as is' with no hope that this is the last time I commit this change either expressed or implied] llvm-svn: 219702
2014-10-14Revert "Fix stuff... again."David Blaikie1-7/+2
Accidental commit. This reverts commit r219693. llvm-svn: 219695
2014-10-14Revert some parts of r196288 that were confusing and untested.David Blaikie1-8/+2
If we figure out why they should be here, let's add some testing of some kind so we can better demonstrate why it's needed. llvm-svn: 219694