aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target
AgeCommit message (Collapse)AuthorFilesLines
2016-09-13Defer asm errors to post-statement failureNirav Dave7-223/+106
Recommitting after fixing AsmParser Initialization. Allow errors to be deferred and emitted as part of clean up to simplify and shorten Assembly parser code. This will allow error messages to be emitted in helper functions and be modified by the caller which has better context. As part of this many minor cleanups to the Parser: * Unify parser cleanup on error * Add Workaround for incorrect return values in ParseDirective instances * Tighten checks on error-signifying return values for parser functions and fix in-tree TargetParsers to be more consistent with the changes. * Fix AArch64 test cases checking for spurious error messages that are now fixed. These changes should be backwards compatible with current Target Parsers so long as the error status are correctly returned in appropriate functions. Reviewers: rnk, majnemer Subscribers: aemerson, jyknight, llvm-commits Differential Revision: https://reviews.llvm.org/D24047 llvm-svn: 281336
2016-09-13Revert "[ARM] Promote small global constants to constant pools"James Molloy4-144/+1
This reverts commit r281314. Speculatively revert as it's possible this caused linker errors: http://lab.llvm.org:8011/builders/clang-native-arm-lnt/builds/19656 llvm-svn: 281327
2016-09-13[ARM] Add ".code 32" to functions in the ARM instruction setPablo Barrio1-1/+2
Before, only Thumb functions were marked as ".code 16". These ".code x" directives are effective until the next directive of its kind is encountered. Therefore, in code with interleaved ARM and Thumb functions, it was possible to declare a function as ARM and end up with a Thumb function after assembly. A test has been added. An existing test has also been fixed to take this change into account. Reviewers: aschwaighofer, t.p.northover, jmolloy, rengolin Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D24337 llvm-svn: 281324
2016-09-13[Thumb] Teach ISel how to lower compares of AND bitmasks efficientlyJames Molloy2-5/+138
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 281323
2016-09-13[ARM] Support ldr.w in pseudo instruction ldr rd,=immediatePeter Smith2-0/+7
The changes made in r269352, r269353 and r269354 to support the transformation of the ldr rd,=immediate to mov introduced a regression from 3.8 (ldr.w rd, =immediate) not supported. This change puts support back in for ldr.w by means of a t2InstAlias for the .w form. The .w is ignored in ARM state and propagated to the ldr in Thumb2. llvm-svn: 281319
2016-09-13[ARM] Promote small global constants to constant poolsJames Molloy4-1/+144
If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). llvm-svn: 281314
2016-09-13Revert of r281304 as it is causing build bot failures in hexagonSjoerd Meijer5-52/+49
hwloop regression tests. These tests pass locally; will be investigating where these differences come from. llvm-svn: 281306
2016-09-13This adds a new field isAdd to MCInstrDesc. The ARM and Hexagon instructionSjoerd Meijer5-49/+52
descriptions now tag add instructions, and the Hexagon backend is using this to identify loop induction statements. Patch by Sam Parker and Sjoerd Meijer. Differential Revision: https://reviews.llvm.org/D23601 llvm-svn: 281304
2016-09-13AVX-512: Fix for PR28175 - Scalar code optimization.Elena Demikhovsky2-7/+17
Optimized (truncate (assertzext x) to i1) and anyext i1 to i8/16/32. Optimization of this patterns is a one more step towards i1 optimization on AVX-512. Differential Revision: https://reviews.llvm.org/D24456 llvm-svn: 281302
2016-09-13[AArch64] Support stackmap/patchpoint in getInstSizeInBytesDiana Picus1-4/+21
We currently return 4 for stackmaps and patchpoints, which is very optimistic and can in rare cases cause the branch relaxation pass to fail to relax certain branches. This patch causes getInstSizeInBytes to return a pessimistic estimate of the size as the number of bytes requested in the stackmap/patchpoint. In the future, we could provide a more accurate estimate by sharing some of the logic in AArch64::LowerSTACKMAP/PATCHPOINT. Fixes part of https://llvm.org/bugs/show_bug.cgi?id=28750 Differential Revision: https://reviews.llvm.org/D24073 llvm-svn: 281301
2016-09-13[X86] Remove masked shufpd/shufps intrinsics and autoupgrade to native ↵Craig Topper1-12/+0
vector shuffles. They were removed from clang previously but accidentally left in the backend. llvm-svn: 281300
2016-09-13X86: Conditional tail calls should not have isBarrier = 1Hans Wennborg1-18/+31
That confuses e.g. machine basic block placement, which then doesn't realize that control can fall through a block that ends with a conditional tail call. Instead, isBranch=1 should be set. Also, mark EFLAGS as used by these instructions. llvm-svn: 281281
2016-09-13Temporarily Revert "[MC] Defer asm errors to post-statement failure" as it's ↵Eric Christopher7-106/+223
causing errors on the sanitizer bots. This reverts commit r281249. llvm-svn: 281280
2016-09-12Revert r281215, it caused PR30358.Nico Weber2-139/+5
llvm-svn: 281263
2016-09-12[MC] Defer asm errors to post-statement failureNirav Dave7-223/+106
Allow errors to be deferred and emitted as part of clean up to simplify and shorten Assembly parser code. This will allow error messages to be emitted in helper functions and be modified by the caller which has better context. As part of this many minor cleanups to the Parser: * Unify parser cleanup on error * Add Workaround for incorrect return values in ParseDirective instances * Tighten checks on error-signifying return values for parser functions and fix in-tree TargetParsers to be more consistent with the changes. * Fix AArch64 test cases checking for spurious error messages that are now fixed. These changes should be backwards compatible with current Target Parsers so long as the error status are correctly returned in appropriate functions. Reviewers: rnk, majnemer Subscribers: aemerson, jyknight, llvm-commits Differential Revision: https://reviews.llvm.org/D24047 llvm-svn: 281249
2016-09-12AMDGPU: Do not clobber SCC in SIWholeQuadModeNicolai Haehnle2-75/+210
Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D22198 llvm-svn: 281230
2016-09-12Revert "[ARM] Promote small global constants to constant pools"James Molloy4-123/+1
This reverts commit r281213. It made a bot go bang: http://lab.llvm.org:8011/builders/clang-cmake-armv7-a15-full/builds/14625 llvm-svn: 281228
2016-09-12[X86] Copy imp-uses when folding tailcall into conditional branch.Ahmed Bougacha1-1/+1
r280832 added 32-bit support for emitting conditional tail-calls, but dropped imp-used parameter registers. This went unnoticed until r281113, which added 64-bit support, as this is only exposed with parameter passing via registers. Don't drop the imp-used parameters. llvm-svn: 281223
2016-09-12[AMDGPU] Assembler: Move disabled SDWA and DPP instruction into Disable asm ↵Sam Kolton2-0/+12
variant Summary: This removes disabled instructions from match tables so we will not match them at all. Reviewers: tstellarAMD, vpykhtin, artem.tamazov Subscribers: wdng, nhaehnle, arsenm Differential Revision: https://reviews.llvm.org/D24452 llvm-svn: 281216
2016-09-12[Thumb] Teach ISel how to lower compares of AND bitmasks efficientlyJames Molloy2-5/+139
For the common pattern (CMPZ (AND x, #bitmask), #0), we can do some more efficient instruction selection if the bitmask is one consecutive sequence of set bits (32 - clz(bm) - ctz(bm) == popcount(bm)). 1) If the bitmask touches the LSB, then we can remove all the upper bits and set the flags by doing one LSLS. 2) If the bitmask touches the MSB, then we can remove all the lower bits and set the flags with one LSRS. 3) If the bitmask has popcount == 1 (only one set bit), we can shift that bit into the sign bit with one LSLS and change the condition query from NE/EQ to MI/PL (we could also implement this by shifting into the carry bit and branching on BCC/BCS). 4) Otherwise, we can emit a sequence of LSLS+LSRS to remove the upper and lower zero bits of the mask. 1-3 require only one 16-bit instruction and can elide the CMP. 4 requires two 16-bit instructions but can elide the CMP and doesn't require materializing a complex immediate, so is also a win. llvm-svn: 281215
2016-09-12[ARM] Promote small global constants to constant poolsJames Molloy4-1/+123
If a constant is unamed_addr and is only used within one function, we can save on the code size and runtime cost of an indirection by changing the global's storage to inside the constant pool. For example, instead of: ldr r0, .CPI0 bl printf bx lr .CPI0: &format_string format_string: .asciz "hello, world!\n" We can emit: adr r0, .CPI0 bl printf bx lr .CPI0: .asciz "hello, world!\n" This can cause significant code size savings when many small strings are used in one function (4 bytes per string). llvm-svn: 281213
2016-09-12Fix WebAssembly broken build related to interface change in r281172.Eric Liu1-2/+1
Reviewers: bkramer Subscribers: jfb, llvm-commits, dschuff Differential Revision: https://reviews.llvm.org/D24449 llvm-svn: 281201
2016-09-11CodeGen: Give MachineBasicBlock::reverse_iterator a handle to the current MIDuncan P. N. Exon Smith8-39/+26
Now that MachineBasicBlock::reverse_instr_iterator knows when it's at the end (since r281168 and r281170), implement MachineBasicBlock::reverse_iterator directly on top of an ilist::reverse_iterator by adding an IsReverse template parameter to MachineInstrBundleIterator. This replaces another hard-to-reason-about use of std::reverse_iterator on list iterators, matching the changes for ilist::reverse_iterator from r280032 (see the "out of scope" section at the end of that commit message). MachineBasicBlock::reverse_iterator now has a handle to the current node and has obvious invalidation semantics. r280032 has a more detailed explanation of how list-style reverse iterators (invalidated when the pointed-at node is deleted) are different from vector-style reverse iterators like std::reverse_iterator (invalidated on every operation). A great motivating example is this commit's changes to lib/CodeGen/DeadMachineInstructionElim.cpp. Note: If your out-of-tree backend deletes instructions while iterating on a MachineBasicBlock::reverse_iterator or converts between MachineBasicBlock::iterator and MachineBasicBlock::reverse_iterator, you'll need to update your code in similar ways to r280032. The following table might help: [Old] ==> [New] delete &*RI, RE = end() delete &*RI++ RI->erase(), RE = end() RI++->erase() reverse_iterator(I) std::prev(I).getReverse() reverse_iterator(I) ++I.getReverse() --reverse_iterator(I) I.getReverse() reverse_iterator(std::next(I)) I.getReverse() RI.base() std::prev(RI).getReverse() RI.base() ++RI.getReverse() --RI.base() RI.getReverse() std::next(RI).base() RI.getReverse() (For more details, have a look at r280032.) llvm-svn: 281172
2016-09-11[AVX512] Fix pattern for vgetmantsd and all other instructions that use same ↵Igor Breger1-8/+1
class. Fix memory operand size, remove unnecessary pattern. Differential Revision: http://reviews.llvm.org/D24443 llvm-svn: 281164
2016-09-11[AVX-512] Add VPTERNLOG to load folding tables.Craig Topper1-0/+18
llvm-svn: 281156
2016-09-11[X86] Make a helper method into a static function local to the cpp file.Craig Topper2-11/+10
llvm-svn: 281154
2016-09-11[NVPTX] Use ldg for explicitly invariant loads.Justin Lebar1-13/+22
Summary: With this change (plus some changes to prevent !invariant from being clobbered within llvm), clang will be able to model the __ldg CUDA builtin as an invariant load, rather than as a target-specific llvm intrinsic. This will let the optimizer play with these loads -- specifically, we should be able to vectorize them in the load-store vectorizer. Reviewers: tra Subscribers: jholewinski, hfinkel, llvm-commits, chandlerc Differential Revision: https://reviews.llvm.org/D23477 llvm-svn: 281152
2016-09-11[CodeGen] Split out the notions of MI invariance and MI dereferenceability.Justin Lebar11-41/+67
Summary: An IR load can be invariant, dereferenceable, neither, or both. But currently, MI's notion of invariance is IR-invariant && IR-dereferenceable. This patch splits up the notions of invariance and dereferenceability at the MI level. It's NFC, so adds some probably-unnecessary "is-dereferenceable" checks, which we can remove later if desired. Reviewers: chandlerc, tstellarAMD Subscribers: jholewinski, arsenm, nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D23371 llvm-svn: 281151
2016-09-10We also need to pass swifterror in R12 under swiftcc not only under cccArnold Schwaighofer1-0/+3
rdar://28190687 llvm-svn: 281138
2016-09-10[AMDGPU] Refactor MUBUF/MTBUF instructionsValery Pykhtin6-1168/+1306
Differential revision: https://reviews.llvm.org/D24295 llvm-svn: 281137
2016-09-10[WebAssembly] Fix typos in commentsHeejin Ahn1-11/+14
llvm-svn: 281131
2016-09-10AMDGPU: Implement is{LoadFrom|StoreTo}FrameIndexMatt Arsenault6-21/+90
llvm-svn: 281128
2016-09-10AMDGPU: Fix scheduling info for spill pseudosMatt Arsenault1-2/+3
These defaulted to Write32Bit. I don't think this actually matters since these don't exist during scheduling. llvm-svn: 281127
2016-09-10[CodeGen] Rename MachineInstr::isInvariantLoad to ↵Justin Lebar2-2/+2
isDereferenceableInvariantLoad. NFC Summary: I want to separate out the notions of invariance and dereferenceability at the MI level, so that they correspond to the equivalent concepts at the IR level. (Currently an MI load is MI-invariant iff it's IR-invariant and IR-dereferenceable.) First step is renaming this function. Reviewers: chandlerc Subscribers: MatzeB, jfb, llvm-commits Differential Revision: https://reviews.llvm.org/D23370 llvm-svn: 281125
2016-09-09AMDGPU: Fix immediate folding logic when shrinking instructionsMatt Arsenault3-16/+10
If the literal is being folded into src0, it doesn't matter if it's an SGPR because it's being replaced with the literal. Also fixes initially selecting 32-bit versions of some instructions which also confused commuting. llvm-svn: 281117
2016-09-09X86: Fold tail calls into conditional branches also for 64-bit (PR26302)Hans Wennborg4-12/+40
This extends the optimization in r280832 to also work for 64-bit. The only quirk is that we can't do this for 64-bit Windows (yet). Differential Revision: https://reviews.llvm.org/D24423 llvm-svn: 281113
2016-09-09AMDGPU: Run LoadStoreVectorizer pass by defaultMatt Arsenault2-1/+4
llvm-svn: 281112
2016-09-09[X86][XOP] Fix VPERMIL2PD mask creation on 32-bit targetsSimon Pilgrim1-5/+5
Use getConstVector helper to correctly create v2i64/v4i64 constants on 32-bit targets llvm-svn: 281105
2016-09-09[Hexagon] Fix disassembler crash after r279255Krzysztof Parzyszek1-0/+3
When p0 was added as an explicit operand to the duplex subinstructions, the disassembler was not updated to reflect this. llvm-svn: 281104
2016-09-09[NVPTX] Implement llvm.fabs.f32, llvm.max.f32, etc.Justin Lebar2-16/+132
Summary: Previously these only worked via NVPTX-specific intrinsics. This change will allow us to convert these target-specific intrinsics into the general LLVM versions, allowing existing LLVM passes to reason about their behavior. It also gets us some minor codegen improvements as-is, from situations where we canonicalize code into one of these llvm intrinsics. Reviewers: majnemer Subscribers: llvm-commits, jholewinski, tra Differential Revision: https://reviews.llvm.org/D24300 llvm-svn: 281092
2016-09-09ARM: move the builtins libcall CC setupSaleem Abdulrasool2-0/+169
Move the target specific setup into the target specific lowering setup. As pointed out by Anton, the initial change was moving this too high up the stack resulting in a violation of the layering (the target generic code path setup target specific bits). Sink this into the ARM specific setup. NFC. llvm-svn: 281088
2016-09-09AMDGPU : Fix mqsad_u32_u8 instruction incorrect data type.Wei Ding3-9/+17
Differential Revision: http://reviews.llvm.org/D23700 llvm-svn: 281081
2016-09-09AMDGPU/SI: Make sure llvm.amdgcn.implicitarg.ptr() is 8-byte aligned for HSATom Stellard2-1/+6
Reviewers: arsenm Subscribers: arsenm, wdng, nhaehnle, llvm-commits Differential Revision: https://reviews.llvm.org/D24405 llvm-svn: 281080
2016-09-09AMDGPU] Assembler: better support for immediate literals in assembler.Sam Kolton14-351/+708
Summary: Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals. E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least. With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction). Here are rules how we convert literals: - We parsed fp literal: - Instruction expects 64-bit operand: - If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5) - then we do nothing this literal - Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5) - report error - Else literal is not-inlinable but we can encode it as additional 32-bit literal constant - If instruction expect fp operand type (f64) - Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5) - If so then do nothing - Else (e.g. v_fract_f64 v[0:1], 3.1415) - report warning that low 32 bits will be set to zeroes and precision will be lost - set low 32 bits of literal to zeroes - Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5) - report error as it is unclear how to encode this literal - Instruction expects 32-bit operand: - Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow - Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5) - do nothing - Else report error - Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0) - Parsed binary literal: - Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35) - do nothing - Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35) - report error - Else, literal is not-inlinable and we are not required to inline it - Are high 32 bit of literal zeroes or same as sign bit (32 bit) - do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef) - Else - report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0) For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types: ''' enum OperandType { OPERAND_REG_IMM32_INT, OPERAND_REG_IMM32_FP, OPERAND_REG_INLINE_C_INT, OPERAND_REG_INLINE_C_FP, } ''' This is not working yet: - Several tests are failing - Problems with predicate methods for inline immediates - LLVM generated assembler parts try to select e64 encoding before e32. More changes are required for several AsmOperands. Reviewers: vpykhtin, tstellarAMD Subscribers: arsenm, kzhuravl, artem.tamazov Differential Revision: https://reviews.llvm.org/D22922 llvm-svn: 281050
2016-09-09[Sparc][LEON] Removed the parts of the errata fixes implemented using inline ↵Chris Dewhurst1-76/+0
assembly as this is not the desired behaviour for end-users. Small change to a unit test to implement this without requiring the inline assembly. llvm-svn: 281047
2016-09-09[ARM] ADD with a negative offset can become SUB for freeJames Molloy1-0/+4
So model that directly in TTI::getIntImmCost(). llvm-svn: 281044
2016-09-09[ARM] icmp %x, -C can be lowered to a simple ADDS or CMNJames Molloy1-0/+11
Tell TargetTransformInfo about this so ConstantHoisting is informed. llvm-svn: 281043
2016-09-09[Thumb] Select (CMPZ X, -C) -> (CMPZ (ADDS X, C), 0)James Molloy1-0/+42
The CMPZ #0 disappears during peepholing, leaving just a tADDi3, tADDi8 or t2ADDri. This avoids having to materialize the expensive negative constant in Thumb-1, and allows a shrinking from a 32-bit CMN to a 16-bit ADDS in Thumb-2. llvm-svn: 281040
2016-09-09GlobalISel: remove G_TYPE and G_PHITim Northover1-10/+0
These instructions were only necessary when type information was stored in the MachineInstr (because only generic MachineInstrs possessed a type). Now that it's in MachineRegisterInfo, COPY and PHI work fine. llvm-svn: 281037
2016-09-09GlobalISel: move type information to MachineRegisterInfo.Tim Northover3-18/+16
We want each register to have a canonical type, which means the best place to store this is in MachineRegisterInfo rather than on every MachineInstr that happens to use or define that register. Most changes following from this are pretty simple (you need an MRI anyway if you're going to be doing any transformations, so just check the type there). But legalization doesn't really want to check redundant operands (when, for example, a G_ADD only ever has one type) so I've made use of MCInstrDesc's operand type field to encode these constraints and limit legalization's work. As an added bonus, more validation is possible, both in MachineVerifier and MachineIRBuilder (coming soon). llvm-svn: 281035