aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib
AgeCommit message (Collapse)AuthorFilesLines
2016-09-06Avoid using alignas and constexpr.Rafael Espindola1-136/+5
This requires removing the custom allocator, since Demangle cannot depend on Support and so cannot use Compiler.h. llvm-svn: 280750
2016-09-06[AMDGPU] Wave and register controlsKonstantin Zhuravlyov14-196/+555
- Implemented amdgpu-flat-work-group-size attribute - Implemented amdgpu-num-active-waves-per-eu attribute - Implemented amdgpu-num-sgpr attribute - Implemented amdgpu-num-vgpr attribute - Dynamic LDS constraints are in a separate patch Patch by Tom Stellard and Konstantin Zhuravlyov Differential Revision: https://reviews.llvm.org/D21562 llvm-svn: 280747
2016-09-06Try to fix a circular dependency in the modules build.Rafael Espindola1-2/+2
llvm-svn: 280746
2016-09-06AMDGPU/SI: Teach SIInstrInfo::FoldImmediate() to fold immediates into copiesTom Stellard2-3/+29
Summary: I put this code here, because I want to re-use it in a few other places. This supersedes some of the immediate folding code we have in SIFoldOperands. I think the peephole optimizers is probably a better place for folding immediates into copies, since it does some register coalescing in the same time. This will also make it easier to transition SIFoldOperands into a smarter pass, where it looks at all uses of instruction at once to determine the optimal way to fold operands. Right now, the pass just considers one operand at a time. Reviewers: arsenm Subscribers: wdng, nhaehnle, arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23402 llvm-svn: 280744
2016-09-06AMDGPU : Add XNACK feature to GPUs that support it.Wei Ding1-2/+2
Differential Revision: http://reviews.llvm.org/D24276 llvm-svn: 280742
2016-09-06Fix ItaniumDemangle.cpp build with MSVC 2013Reid Kleckner1-18/+19
llvm-svn: 280740
2016-09-06[AArch64] Adjust the scheduling model for Exynos M1.Evandro Menezes1-18/+16
Further refine the model for branches. llvm-svn: 280736
2016-09-06[AArch64] Adjust the scheduling model for Exynos M1.Evandro Menezes1-2/+7
Further refine the model for stores. llvm-svn: 280735
2016-09-06[AArch64] Adjust the scheduling model for Exynos M1.Evandro Menezes1-11/+11
Further refine the model for loads. llvm-svn: 280734
2016-09-06Add an c++ itanium demangler to llvm.Rafael Espindola7-8/+4460
This adds a copy of the demangler in libcxxabi. The code also has no dependencies on anything else in LLVM. To enforce that I added it as another library. That way a BUILD_SHARED_LIBS will fail if anyone adds an use of StringRef for example. The no llvm dependency combined with the fact that this has to build on linux, OS X and Windows required a few changes to the code. In particular: No constexpr. No alignas On OS X at least this library has only one global symbol: __ZN4llvm16itanium_demangleEPKcPcPmPi My current plan is: Commit something like this Change lld to use it Change lldb to use it as the fallback Add a few #ifdefs so that exactly the same file can be used in libcxxabi to export abi::__cxa_demangle. Once the fast demangler in lldb can handle any names this implementation can be replaced with it and we will have the one true demangler. llvm-svn: 280732
2016-09-06fix formatting; NFCSanjay Patel1-19/+14
llvm-svn: 280727
2016-09-06[MCTargetDesc] Delete dead code. Found by GCC7 -Wunused-function.Davide Italiano1-17/+0
Also unbreak newer gcc build with -Werror. llvm-svn: 280726
2016-09-06[RDF] Ignore undef use operandsKrzysztof Parzyszek1-1/+1
llvm-svn: 280717
2016-09-06Formatting with clang-format patch r280700Leny Kholodov3-49/+47
llvm-svn: 280716
2016-09-06[SelectionDAG] Simplify extract_subvector( insert_subvector ( Vec, In, Idx ↵Simon Pilgrim1-0/+6
), Idx ) -> In If we are extracting a subvector that has just been inserted then we should just use the original inserted subvector. This has come up in certain several x86 shuffle lowering cases where we are crossing 128-bit lanes. Differential Revision: https://reviews.llvm.org/D24254 llvm-svn: 280715
2016-09-06[JumpThreading] Only write back branch-weight MDs for blocks that originally ↵Adam Nemet1-1/+52
had PGO info Currently the pass updates branch weights in the IR if the function has any PGO info (entry frequency is set). However we could still have regions of the CFG that does not have branch weights collected (e.g. a cold region). In this case we'd use static estimates. Since static estimates for branches are determined independently, they are inconsistent. Updating them can "randomly" inflate block frequencies. I've run into this in a completely cold loop of h264ref from SPEC. -Rpass-with-hotness showed the loop to be completely cold during inlining (before JT) but completely hot during vectorization (after JT). The new testcase demonstrate the problem. We check array elements against 1, 2 and 3 in a loop. The check against 3 is the loop-exiting check. The block names should be self-explanatory. In this example, jump threading incorrectly updates the weight of the loop-exiting branch to 0, drastically inflating the frequency of the loop (in the range of billions). There is no run-time profile info for edges inside the loop, so branch probabilities are estimated. These are the resulting branch and block frequencies for the loop body: check_1 (16) (8) / | eq_1 | (8) \ | check_2 (16) (8) / | eq_2 | (8) \ | check_3 (16) (1) / | (loop exit) | (15) | (back edge) First we thread eq_1 -> check_2 to check_3. Frequencies are updated to remove the frequency of eq_1 from check_2 and then from the false edge leaving check_2. Changed frequencies are highlighted with * *: check_1 (16) (8) / | eq_1~ | (8) / | / check_2 (*8*) / (8) / | \ eq_2 | (*0*) \ \ | ` --- check_3 (16) (1) / | (loop exit) | (15) | (back edge) Next we thread eq_1 -> check_3 and eq_2 -> check_3 to check_1 as new back edges. Frequencies are updated to remove the frequency of eq_1 and eq_3 from check_3 and then the false edge leaving check_3 (changed frequencies are highlighted with * *): check_1 (16) (8) / | eq_1~ | (8) / | / check_2 (*8*) / (8) / | /-- eq_2~ | (*0*) (back edge) | check_3 (*0*) (*0*) / | (loop exit) | (*0*) | (back edge) As a result, the loop exit edge ends up with 0 frequency which in turn makes the loop header to have maximum frequency. There are a few potential problems here: 1. The profile data seems odd. There is a single profile sample of the loop being entered. On the other hand, there are no weights inside the loop. 2. Based on static estimation we shouldn't set edges to "extreme" values, i.e. extremely likely or unlikely. 3. We shouldn't create profile metadata that is calculated from static estimation. I am not sure what policy is but it seems to make sense to treat profile metadata as something that is known to originate from profiling. Estimated probabilities should only be reflected in BPI/BFI. Any one of these would probably fix the immediate problem. I went for 3 because I think it's a good policy to have and added a FIXME about 2. Differential Revision: https://reviews.llvm.org/D24118 llvm-svn: 280713
2016-09-06[Sparc][Leon] Corrected supported atomics size for processors supporting ↵Chris Dewhurst1-1/+1
Leon CASA instruction back to 32 bits. This was erroneously checked-in for 64 bits while trying to find if there was a way to get 64 bit atomicity in Leon processors. There is not and this change should not have been checked-in. There is no unit test for this as the existing unit tests test for behaviour to 32 bits, which was the original intention of the code. llvm-svn: 280710
2016-09-06[mips] Tighten FastISel restrictionsSimon Dardis1-1/+17
LLVM PR/29052 highlighted that FastISel for MIPS attempted to lower arguments assuming that it was using the paired 32bit registers to perform operations for f64. This mode of operation is not supported for MIPSR6. This patch resolves the reported issue by adding additional checks for unsupported floating point unit configuration. Thanks to mike.k for reporting this issue! Reviewers: seanbruno, vkalintiris Differential Review: https://reviews.llvm.org/D23795 llvm-svn: 280706
2016-09-06[PPC] Claim stack frame before storing into it, if no red zone is presentKrzysztof Parzyszek1-25/+91
Unlike PPC64, PPC32/SVRV4 does not have red zone. In the absence of it there is no guarantee that this part of the stack will not be modified by any interrupt. To avoid this, make sure to claim the stack frame first before storing into it. This fixes https://llvm.org/bugs/show_bug.cgi?id=26519. Differential Revision: https://reviews.llvm.org/D24093 llvm-svn: 280705
2016-09-06DebugInfo: use strongly typed enum for debug info flagsLeny Kholodov5-64/+70
Use ADT/BitmaskEnum for DINode::DIFlags for the following purposes: Get rid of unsigned int for flags to avoid problems on platforms with sizeof(int) < 4 Flags are now strongly typed Patch by: Victor Leschuk <vleschuk@gmail.com> Differential Revision: https://reviews.llvm.org/D23766 llvm-svn: 280700
2016-09-06[RegisterScavenger] Remove aliasing registers of operands from the candidate setSilviu Baranga1-1/+2
Summary: In addition to not including the register operand of the current instruction also don't include any aliasing registers. We can't consider these as candidates because using them will clobber the corresponding register operand of the current instruction. This change doesn't include a test case and it would probably be difficult to produce a stable one since the bug depends on the results of register allocation. Reviewers: MatzeB, qcolombet, hfinkel Subscribers: hfinkel, llvm-commits Differential Revision: https://reviews.llvm.org/D24130 llvm-svn: 280698
2016-09-06[AVX-512] Fix masked VPERMI2PS isel when the index comes from a bitcast.Craig Topper3-58/+39
We need to bitcast the index operand to a floating point type so that it matches the result type. If not then the passthru part of the DAG will be a bitcast from the index's original type to the destination type. This makes it very difficult to match. The other option would be to add 5 sets of patterns for every other possible type. llvm-svn: 280696
2016-09-06[X86] Remove unused encoding from IntrinsicType enum.Craig Topper2-4/+1
llvm-svn: 280694
2016-09-06[X86] Fix indentation. NFCCraig Topper1-2/+2
llvm-svn: 280693
2016-09-06ARM: workaround bundled operation predicationSaleem Abdulrasool1-0/+3
This is a Windows ARM specific issue. If the code path in the if conversion ends up using a relocation which will form a IMAGE_REL_ARM_MOV32T, we end up with a bundle to ensure that the mov.w/mov.t pair is not split up. This is normally fine, however, if the branch is also predicated, then we end up trying to predicate the bundle. For now, report a bundle as being unpredicatable. Although this is false, this would trigger a failure case previously anyways, so this is no worse. That is, there should not be any code which would previously have been if converted and predicated which would not be now. Under certain circumstances, it may be possible to "predicate the bundle". This would require scanning all bundle instructions, and ensure that the bundle contains only predicatable instructions, and converting the bundle into an IT block sequence. If the bundle is larger than the maximal IT block length (4 instructions), it would require materializing multiple IT blocks from the single bundle. llvm-svn: 280689
2016-09-06Revert "DebugInfo: use strongly typed enum for debug info flags"Mehdi Amini5-104/+101
This reverts commit r280686, bots are broken. llvm-svn: 280688
2016-09-06[LTO] Constify (NFC)Mehdi Amini1-16/+20
llvm-svn: 280687
2016-09-06DebugInfo: use strongly typed enum for debug info flagsMehdi Amini5-101/+104
Use ADT/BitmaskEnum for DINode::DIFlags for the following purposes: * Get rid of unsigned int for flags to avoid problems on platforms with sizeof(int) < 4 * Flags are now strongly typed Patch by: Victor Leschuk <vleschuk@gmail.com> Differential Revision: https://reviews.llvm.org/D23766 llvm-svn: 280686
2016-09-06[AVX-512] Fix v8i64 shift by immediate lowering on 32-bit targets.Craig Topper1-1/+2
llvm-svn: 280684
2016-09-06CodeGen: ensure that libcalls are always AAPCS CCSaleem Abdulrasool1-7/+6
All of the builtins are designed to be invoked with ARM AAPCS CC even on ARM AAPCS VFP CC hosts. Tweak the default initialisation to ARM AAPCS CC rather than C CC for ARM/thumb targets. The changes to the tests are necessary to ensure that the calling convention for the lowered library calls are honoured. Furthermore, these adjustments cause certain branch invocations to change to branch-and-link since the returned value needs to be moved across registers (d0 -> r0, r1). llvm-svn: 280683
2016-09-05[AVX-512] Teach fastisel load/store handling to use EVEX encoded ↵Craig Topper1-42/+81
instructions for 128/256-bit vectors and scalar single/double. Still need to fix the register classes to allow the extended range of registers. llvm-svn: 280682
2016-09-05[Coroutines] Part12: Handle alloca address-takenGor Nishanov1-1/+46
Summary: Move early uses of spilled variables after CoroBegin. For example, if a parameter had address taken, we may end up with the code like: define @f(i32 %n) { %n.addr = alloca i32 store %n, %n.addr ... call @coro.begin This patch fixes the problem by moving uses of spilled variables after CoroBegin. Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24234 llvm-svn: 280678
2016-09-05[InstCombine] don't assert that division-by-constant has been folded (PR30281)Sanjay Patel1-7/+6
This is effectively a revert of: https://reviews.llvm.org/rL280115 And this should fix https://llvm.org/bugs/show_bug.cgi?id=30281: llvm-svn: 280677
2016-09-05[InstCombine] revert r280637 because it causes test failures on an ARM botSanjay Patel1-33/+43
http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15/builds/14952/steps/ninja%20check%201/logs/FAIL%3A%20LLVM%3A%3Aicmp.ll llvm-svn: 280676
2016-09-05[AVX-512] Integrate mask register copying more completely into ↵Craig Topper1-68/+53
X86InstrInfo::copyPhysReg and simplify. No functional change intended. The code is now written in terms of source and dest classes with feature checks inside each type of copy instead of having separate functions for each feature set. llvm-svn: 280673
2016-09-05[WebAssembly] Unbreak the build.Benjamin Kramer1-8/+9
Not sure why ADL isn't working here. llvm-svn: 280656
2016-09-05[AMDGPU] Refactor FLAT TD instructionsValery Pykhtin6-438/+525
Differential revision: https://reviews.llvm.org/D24072 llvm-svn: 280655
2016-09-05[Thumb1] Add relocations for fixups fixup_arm_thumb_{br,bcc}James Molloy1-0/+6
These need to be mapped through to R_ARM_THM_JUMP{11,8} respectively. Fixes PR30279. llvm-svn: 280651
2016-09-05[AVX512] Fix v8i1 /v16i1 zext + bitcast lowering pattern. Explicitly zero ↵Igor Breger1-4/+4
upper bits. Differential Revision: http://reviews.llvm.org/D23983 llvm-svn: 280650
2016-09-05[X86] Make some static arrays of opcodes const and shrink to uint16_t. NFCCraig Topper1-6/+6
llvm-svn: 280649
2016-09-05[AVX-512] Simplify X86InstrInfo::copyPhysReg for 128/256-bit vectors with ↵Craig Topper3-33/+7
AVX512, but not VLX. We should use the VEX opcodes and trust the register allocator to not use the extended XMM/YMM register space. Previously we were extending to copying the whole ZMM register. The register allocator shouldn't use XMM16-31 or YMM16-31 in this configuration as the instructions to spill them aren't available. llvm-svn: 280648
2016-09-05[Coroutines] Part11: Add final suspend handling.Gor Nishanov3-17/+93
Summary: A frontend may designate a particular suspend to be final, by setting the second argument of the coro.suspend intrinsic to true. Such a suspend point has two properties: * it is possible to check whether a suspended coroutine is at the final suspend point via coro.done intrinsic; * a resumption of a coroutine stopped at the final suspend point leads to undefined behavior. The only possible action for a coroutine at a final suspend point is destroying it via coro.destroy intrinsic. This patch adds final suspend handling logic to CoroEarly and CoroSplit passes. Now, the final suspend point example from docs\Coroutines.rst compiles and produces expected result (see test/Transform/Coroutines/ex5.ll). Reviewers: majnemer Subscribers: mehdi_amini, llvm-commits Differential Revision: https://reviews.llvm.org/D24068 llvm-svn: 280646
2016-09-05[X86] Remove FsVMOVAPSrm/FsVMOVAPDrm/FsMOVAPSrm/FsMOVAPDrm. Due to their ↵Craig Topper3-40/+0
placement in the td file they had lower precedence than (V)MOVSS/SD and could almost never be selected. The only way to select them was in AVX512 mode because EVEX VMOVSS/SD was below them and the patterns weren't qualified properly for AVX only. So if you happened to have an aligned FR32/FR64 load in AVX512 you could get a VEX encoded VMOVAPS/VMOVAPD. I tried to search back through history and it seems like these instructions were probably unselectable for at least 5 years, at least to the time the VEX versions were added. But I can't prove they ever were. llvm-svn: 280644
2016-09-04[InstCombine] allow icmp (and X, C2), C1 folds for splat constant vectorsSanjay Patel1-43/+33
The code to calculate 'UsesRemoved' could be simplified. As-is, that code is a victim of PR30273: https://llvm.org/bugs/show_bug.cgi?id=30273 llvm-svn: 280637
2016-09-04[AVX-512] Add EVEX encoded scalar FMA intrinsic instructions to ↵Craig Topper1-12/+24
isNonFoldablePartialRegisterLoad. llvm-svn: 280636
2016-09-04[AVX-512] Remove 128-bit and 256-bit masked floating point add/sub/mul/div ↵Craig Topper2-16/+44
intrinsics and upgrade to native IR. llvm-svn: 280633
2016-09-04[ORC] Clone module flags metadata into the globals module in theLang Hames1-0/+9
CompileOnDemandLayer. Also contains a tweak to the orc-lazy jit in LLI to enable the test case. llvm-svn: 280632
2016-09-04[InstCombine] recode icmp fold in a vector-friendly way; NFCSanjay Patel1-22/+30
The transform in question: icmp (and (trunc W), C2), C1 -> icmp (and W, C2'), C1' ...is still not enabled for vectors, thus no functional change intended. It's not clear to me if this is a good transform for vectors or even scalars in general. Changing that behavior may be a follow-on patch. llvm-svn: 280627
2016-09-04[PowerPC] During branch relaxation, recompute padding offsets before each ↵Hal Finkel1-7/+39
iteration We used to compute the padding contributions to the block sizes during branch relaxation only at the start of the transformation. As we perform branch relaxation, we change the sizes of the blocks, and so the amount of inter-block padding might change. Accordingly, we need to recompute the (alignment-based) padding in between every iteration on our way toward the fixed point. Unfortunately, I don't have a test case (and none was provided in the bug report), and while this obviously seems needed, algorithmically, I don't have any way of generating a small and/or non-fragile regression test. llvm-svn: 280626
2016-09-04revert r279960. Igor Breger2-23/+5
https://llvm.org/bugs/show_bug.cgi?id=30249 llvm-svn: 280625