riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2020-03-12	[AMDGPU] Add ISD::FSHR -> ALIGNBIT support	Simon Pilgrim	4	-0/+11
	This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions. This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default. Differential Revision: https://reviews.llvm.org/D76070
2020-03-12	[WebAssembly] Fix SIMD shift unrolling to avoid assertion failure	Thomas Lively	1	-16/+19
	Summary: Using the default DAG.UnrollVectorOp on v16i8 and v8i16 vectors results in i8 or i16 nodes being inserted into the SelectionDAG. Since those are illegal types, this causes a legalization assertion failure for some code patterns, as uncovered by PR45178. This change unrolls shifts manually to avoid this issue by adding and using a new optional EVT argument to DAG.ExtractVectorElements to control the type of the extract_element nodes. Reviewers: aheejin, dschuff Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D76043
2020-03-12	[AMDGPU] Simplify nested SI_END_CF	Stanislav Mekhanoshin	1	-0/+55
	This is to replace the optimization from the SIOptimizeExecMaskingPreRA. We have less opportunities in the control flow lowering because many VGPR copies are still in place and will be removed later, but we know for sure an instruction is SI_END_CF and not just an arbitrary S_OR_B64 with EXEC. The subsequent change needs to convert s_and_saveexec into s_and and address new TODO lines in tests, then code block guarded by the -amdgpu-remove-redundant-endcf option in the pre-RA exec mask optimizer will be removed. Differential Revision: https://reviews.llvm.org/D76033
2020-03-12	[PowerPC][AIX] Implement formal arguments passed in stack memory.	Zarko Todorovski	1	-32/+52
	This patch is the callee side counterpart for https://reviews.llvm.org/D73209. It removes the fatal error when we pass more formal arguments than available registers. Differential Revision: https://reviews.llvm.org/D74225
2020-03-12	[PowerPC32] Fix the `setcc` inconsistent result type problem	Xiangling Liao	1	-2/+4
	Summary: On 32-bit PPC target[AIX and BE], when we convert an `i64` to `f32`, a `setcc` operand expansion is needed. The expansion will set the result type of expanded `setcc` operation based on if the subtarget use CRBits or not. If the subtarget does use the CRBits, like AIX and BE, then it will set the result type to `i1`, leading to an inconsistency with original `setcc` result type[i32]. And the reason why it crashed underneath is because we don't set result type of setcc consistent in those two places. This patch fixes this problem by setting original setcc opnode result type also with `getSetCCResultType` interface. Reviewers: sfertile, cebowleratibm, hubert.reinterpretcast, Xiangling_L Reviewed By: sfertile Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75702
2020-03-12	[VE][nfc] Use RRIm for RRINDm, remove the latter	Simon Moll	1	-61/+2
	Summary: De-duplicate isel instruction classes by using RRIm for RRINDm. The latter becomes obsolete. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D76063
2020-03-12	[PowerPC][AIX] Fix printing of program counter for AIX assembly.	Sean Fertile	1	-3/+8
	Program counter on AIX is the dollar-sign. Differential Revision:https://reviews.llvm.org/D75627
2020-03-12	[AArch64][SVE] Add intrinsics for non-temporal scatters/gathers	Andrzej Warzynski	5	-16/+98
	Summary: This patch adds the following intrinsics for non-temporal gather loads and scatter stores: * aarch64_sve_ldnt1_gather_index * aarch64_sve_stnt1_scatter_index These intrinsics implement the "scalar + vector of indices" addressing mode. As opposed to regular and first-faulting gathers/scatters, there's no instruction that would take indices and then scale them. Instead, the indices for non-temporal gathers/scatters are scaled before the intrinsics are lowered to `ldnt1` instructions. The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as placeholders so that we can easily identify the cases implemented in this patch in performGatherLoadCombine and performScatterStoreCombined. Once encountered, they are replaced with: * GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1 * SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1 The patterns for lowering ISD::SHL for scalable vectors (required by this patch) were missing, so these are added too. Reviewed By: sdesmalen Differential Revision: https://reviews.llvm.org/D75601
2020-03-12	[X86] Add FeatureFast7ByteNOP flag	Simon Pilgrim	3	-1/+11
	Lets us remove another SLM proc family flag usage. This is NFC, but we should probably check whether atom/glm/knl? should be using this flag as well...
2020-03-13	[AVR] Fix reads of uninitialized variables from constructor of AVRSubtarget	Dylan McKay	2	-8/+11
	The initialization order was not correct. These bugs were discovered by valgrind. They appear to work fine in practice but this patch should unblock switching the AVR backend on by default as now a standard AVR llc invocation runs without memory errors. The AVRISelLowering constructor would run before the subtarget boolean fields were initialized to false. Now, the initialization order is correct.
2020-03-12	[X86] combineOrShiftToFunnelShift - remove shift by immediate handling.	Simon Pilgrim	1	-4/+0
	Now that D75114 has landed, DAGCombiner handles this case so the code is redundant.
2020-03-12	[ARM,MVE] Add intrinsics and isel for MVE fused multiply-add.	Simon Tatham	1	-22/+60
	Summary: This adds the ACLE intrinsic family for the VFMA and VFMS instructions, which perform fused multiply-add on vectors of floats. I've represented the unpredicated versions in IR using the cross- platform `@llvm.fma` IR intrinsic. We already had isel rules to convert one of those into a vector VFMA in the simplest possible way; but we didn't have rules to detect a negated argument and turn it into VFMS, or rules to detect a splat argument and turn it into one of the two vector/scalar forms of the instruction. Now we have all of those. The predicated form uses a target-specific intrinsic as usual, but I've stuck to just one, for a predicated FMA. The subtraction and splat versions are code-generated by passing an fneg or a splat as one of its operands, the same way as the unpredicated version. In arm_mve_defs.h, I've had to introduce a tiny extra piece of infrastructure: a record `id` for use in codegen dags which implements the identity function. (Just because you can't declare a Tablegen value of type dag which is //only// a `$varname`: you have to wrap it in something. Now I can write `(id $varname)` to get the same effect.) Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard Reviewed By: dmgreen Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D75998
2020-03-13	[AVR] Fix read of uninitialized variable AVRSubtarget:::ELFArch	Dylan McKay	2	-5/+6
	Found by the LLVM MemorySanitizer tests when switching AVR to a default backend. ELFArch must be initialized before the call to initializeSubtargetDependencies(). The uninitialized read would occur deep within TableGen'd code.
2020-03-12	[PowerPC] Add strict-fp intrinsic to FP arithmetic	Qiu Chaofan	3	-31/+68
	This patch adds basic strict-fp intrinsics support to PowerPC backend, including basic arithmetic operations (add/sub/mul/div). Reviewed By: steven.zhang, andrew.w.kaylor Differential Revision: https://reviews.llvm.org/D63916
2020-03-12	[AMDGPU] Use progbits type for .AMDGPU.disasm section	Sebastian Neubauer	1	-1/+1
	The note section type implies a specific format that this section does not have thus tools like readelf fail here. Progbits has no format and another pipeline compiler already sets the type to progbits. Differential Revision: https://reviews.llvm.org/D75913
2020-03-12	[X86] Reduce the number of emitted fragments due to branch align	Shengchen Kan	1	-46/+48
	Summary: Currently, a BoundaryAlign fragment may be inserted after the branch that needs to be aligned to truncate the current fragment, this fragment is unused at most of time. To avoid that, we can insert a new empty Data fragment instead. Non-relaxable instruction is usually emitted into Data fragment, so the inserted empty Data fragment will be reused at a high possibility. Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight Reviewed By: reames, LuoYuanke Subscribers: llvm-commits, dexonsmith, hiraditya Tags: #llvm Differential Revision: https://reviews.llvm.org/D75438
2020-03-12	[DebugInfo] Fix build failure on the mingw	Djordje Todorovic	1	-0/+4
	Add the workaround for the X86::MOV16ri when describing call site parameters.
2020-03-12	[PowerPC] Add the MacroFusion support for Power8	QingShan Zhang	8	-1/+298
	This patch is intend to implement the missing P8 MacroFusion for LLVM according to Power8 User's Manual Section 10.1.12 Instruction Fusion Differential Revision: https://reviews.llvm.org/D70651
2020-03-11	[PowerPC] Fix compile time issue in recursive CTR analysis code	Teresa Johnson	2	-6/+11
	Summary: Avoid re-examining operands on recursive walk looking for CTR. This was causing huge compile time after some earlier optimization created a large expression. The start of the expression (created by IndVarSimplify) looked like: %469 = lshr i64 trunc (i128 xor (i128 udiv (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), ... with the _ZN4absl13hash_internal13CityHashState5kSeedE referenced many times. Reviewers: hfinkel Subscribers: nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75790
2020-03-11	AMDGPU: Don't hard error on LDS globals in functions	Matt Arsenault	3	-8/+47
	Instead, emit a trap and a warning. We force inlining of this situation, so any function where this happens should be dead as indirect or external calls are not yet supported. This should avoid erroring on dead code.
2020-03-11	[llvm][CodeGen] IR intrinsics for SVE2 contiguous conflict detection ↵	Francesco Petrogalli	2	-3/+9
	instructions. Summary: The IR intrinsics are mapped to the following SVE2 instructions: * WHILERW <Pd>.<T>, <Xn>, <Xm> * WHILEWR <Pd>.<T>, <Xn>, <Xm> The intrinsics introduced in this patch are the IR counterpart of the SVE ACLE functions `svwhilerw` and `svwhilewr` (all data type variants). Patch by Maciej Gąbka <maciej.gabka@arm.com>. Reviewers: kmclaughlin, rengolin Reviewed By: kmclaughlin Subscribers: tschuett, kristof.beyls, hiraditya, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75862
2020-03-11	[AMDGPU] Disable nested endcf collapse	Stanislav Mekhanoshin	1	-0/+10
	The assumption is that conditional regions are perfectly nested and a mask restored at the exit from the inner block will be completely covered by a mask restored in the outer. It turns out with our current structurizer this is not always the case. Disable the optimization for now, but I want to keep it around for a while to either try after further structurizer changes or to move it into control flow lowering where we have more info and reuse the test. Differential Revision: https://reviews.llvm.org/D75958
2020-03-11	[AMDGPU] Extend macro fusion for ADDC and SUBB to SUBBREV	Jay Foad	1	-0/+1
	Summary: There's a lot of test case churn but the overall effect is to increase the number of back-to-back v_sub,v_subbrev pairs, which can execute with no delay even on gfx10. Reviewers: arsenm, rampitec, nhaehnle Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75999
2020-03-11	[AArch64][SVE] Add the @llvm.aarch64.sve.sel intrinsic	Andrzej Warzynski	1	-0/+3
	Reviewers: sdesmalen, efriedma Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75928
2020-03-11	AMDGPU/GlobalISel: Manually RegBankSelect copies	Matt Arsenault	1	-0/+21
	This was failng on any pre-assigned copy to the VCC bank. This is something of a workaround for the default implementation in getInstrMappingImpl, and how it treats copy-like operations in general. Copy-like operations are considered to only have one result register bank, rather than separate banks for each source like a normal instruction. To avoid potentially mishandling reg_sequence with impossible operand combinations, the generic implementation errors on impossible costs. If the bank was already assigned, is treated it as-if it were an unsatisfiable REG_SEQUENCE mapping. We really don't get any value from any of what getInstrMappingImpl tries to do for copies, so just directly emit the simple mapping we really want.
2020-03-11	[NFC][ARM] Reorder some logic	Sam Parker	1	-30/+31
	Move some logic around in LowOverheadLoop::ValidateLiveOut
2020-03-11	[X86] Replace (most) X86ISD::SHLD/SHRD usage with ISD::FSHL/FSHR generic ↵	Simon Pilgrim	5	-66/+73
	opcodes (PR39467) For i32 and i64 cases, X86ISD::SHLD/SHRD are close enough to ISD::FSHL/FSHR that we can use them directly, we just need to account for the operand commutation for SHRD. The i16 SHLD/SHRD case is annoying as the shift amount is modulo-32 (vs funnel shift modulo-16), so I've added X86ISD::FSHL/FSHR equivalents, which matches the generic implementation in all other terms. Something I'm slightly concerned with is that ISD::FSHL/FSHR legality is controlled by the Subtarget.isSHLDSlow() feature flag - we don't normally use non-ISA features for this but it allows the DAG combines to continue to operate after legalization in a lot more cases. The X86 *bits.ll changes are all affected by the same issue - we now have a "FSHR(-1,-1,amt) -> ROTR(-1,amt) -> (-1)" simplification that reduces the dependencies enough for the branch fall through code to mess up. Differential Revision: https://reviews.llvm.org/D75748
2020-03-11	[TTI][ARM][MVE] Refine gather/scatter cost model	Anna Welker	12	-64/+127
	Refines the gather/scatter cost model, but also changes the TTI function getIntrinsicInstrCost to accept an additional parameter which is needed for the gather/scatter cost evaluation. This did require trivial changes in some non-ARM backends to adopt the new parameter. Extending gathers and truncating scatters are now priced cheaper. Differential Revision: https://reviews.llvm.org/D75525
2020-03-11	[ARM] Improve codegen of volatile load/store of i64	Victor Campos	6	-6/+195
	Summary: Instead of generating two i32 instructions for each load or store of a volatile i64 value (two LDRs or STRs), now emit LDRD/STRD. These improvements cover architectures implementing ARMv5TE or Thumb-2. The code generation explicitly deviates from using the register-offset variant of LDRD/STRD. In this variant, the register allocated to the register-offset cannot be reused in any of the remaining operands. Such restriction seems to be non-trivial to implement in LLVM, thus it is left as a to-do. Reviewers: dmgreen, efriedma, john.brawn, nickdesaulniers Reviewed By: efriedma, nickdesaulniers Subscribers: danielkiss, alanphipps, hans, nathanchance, nickdesaulniers, vvereschaka, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70072
2020-03-10	AMDGPU/GlobalISel: Refine G_TRUNC legality rules	Matt Arsenault	1	-0/+17
	Scalarize most truncates. Avoid touching cases that could end up in unresolvable infinite loops.
2020-03-10	GlobalISel: Implement fewerElementsVector for G_TRUNC	Matt Arsenault	1	-0/+1
	Extend fewerElementsVectorBasic to handle operands with different element types.
2020-03-10	AMDGPU: Use V_MAC_F32 for fmad.ftz	Matt Arsenault	1	-12/+20
	This avoids regressions in a future patch. I'm confused by the use of the gfx9 usage legacy_mad. Was this a pointless instruction rename, or uses fmul_legacy handling? Why is regular mac avilable in that case?
2020-03-10	[AMDGPU] Fix the gfx10 scheduling model for f32 conversions	Jay Foad	2	-2/+5
	Summary: As far as I can tell on gfx10 conversions to/from f32 (that are not converting f32 to/from f64) are full rate instructions, but they were marked as quarter rate instructions. I have fixed this for gfx10 only. I assume the scheduling model was correct for older architectures, though I don't have any documentation handy to confirm that. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75392
2020-03-10	ARM: Fixup some tests using denormal-fp-math attribute	Matt Arsenault	1	-9/+19
	Don't use the deprecated, single mode form in tests. Also make sure to parse the attribute, in case of the deprecated form.
2020-03-10	Give helpers internal linkage. NFC.	Benjamin Kramer	1	-1/+1

2020-03-10	[VE] Target-specific bit size for sjljehprepare	Kazushi (Jam) Marukawa	1	-0/+2
	Summary: This patch extends the TargetMachine to let targets specify the integer size used by the sjljehprepare pass. This is 64bit for the VE target and otherwise defaults to 32bit for all targets, which was hard-wired before. Reviewed By: arsenm Differential Revision: https://reviews.llvm.org/D71337
2020-03-10	[X86][SSE] getFauxShuffleMask - add support for ↵	Simon Pilgrim	1	-26/+59
	INSERT_VECTOR_ELT(EXTRACT_VECTOR_ELT) shuffle pattern We already do this for PINSRB/PINSRW and SCALAR_TO_VECTOR.
2020-03-10	[X86][SSE] matchShuffleWithSHUFPD - add support for unary shuffles.	Simon Pilgrim	1	-2/+5
	This causes one minor test change but is mainly necessary for an upcoming patch.
2020-03-10	AMDGPU/GlobalISel: Insert readfirstlane on SGPR returns	Matt Arsenault	1	-0/+12
	In case the source value ends up in a VGPR, insert a readfirstlane to avoid producing an illegal copy later. If it turns out to be unnecessary, it can be folded out.
2020-03-10	[ARM][MVE] VFMA and VFMS validForTailPredication	Sam Parker	1	-0/+1
	Add four instructions to the whitelist. Differential Revision: https://reviews.llvm.org/D75902
2020-03-10	[SystemZ] Improve foldMemoryOperandImpl().	Jonas Paulsson	5	-45/+162
	Swap the compare operands if LHS is spilled while updating the CCMask:s of the CC users. This is relatively straight forward since the live-in lists for the CC register can be assumed to be correct during register allocation (thanks to 659efa2). Also fold a spilled operand of an LOCR/SELR into an LOC(G). Review: Ulrich Weigand Differential Revision: https://reviews.llvm.org/D67437
2020-03-10	[X86][SSE] Add more accurate costs for fmaxnum/fminnum codegen	Simon Pilgrim	1	-0/+23
	Based off llvm-mca reports on codegen in llvm\test\CodeGen\X86\fmaxnum.ll + llvm\test\CodeGen\X86\fminnum.ll
2020-03-10	[AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to ↵	alex-t	1	-3/+9
	separate basic block Summary: When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop. To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result. %95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32 results to s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_mov_b64 exec, s[6:7] The bug appeared in case this expansion occurs in the ELSE block of the CF. Originally %110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32, %112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec expanded to ****************** <== here exec has "THEN" context s_mov_b64 s[6:7], exec BB0_16: v_readfirstlane_b32 s8, v28 v_cmp_eq_u32_e32 vcc, s8, v28 s_and_saveexec_b64 vcc, vcc s_set_gpr_idx_on s8, gpr_idx(DST) v_mov_b32_e32 v6, v25 s_set_gpr_idx_off s_xor_b64 exec, exec, vcc s_cbranch_execnz BB0_16 ; %bb.17: s_or_saveexec_b64 s[4:5], s[4:5] <-- exec mask is restored for "ELSE" but immediately overwritten. s_mov_b64 exec, s[6:7] The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask" SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec. Proposed fix: The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration. Reviewers: rampitec, vpykhtin, nhaehnle Reviewed By: vpykhtin Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, hiraditya, kerbowa, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75472
2020-03-10	[AArch64][SVE] Add SVE intrinsics for address calculations	Kerry McLaughlin	1	-0/+18
	Summary: Adds the @llvm.aarch64.sve.adr[b\|h\|w\|d] intrinsics Reviewers: sdesmalen, andwar, efriedma, dancgr, cameron.mcinally, rengolin Reviewed By: sdesmalen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, danielkiss, cfe-commits, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75858
2020-03-10	[Arm] Do not lower vmax/vmin to Neon instructions	James Greenhalgh	1	-6/+10
	On some Arm cores there is a performance penalty when forwarding from an S register to a D register. Calculating VMAX in a D register creates false forwarding hazards, so don't do that unless we're on a core which specifically asks for it. Patch by James Greenhalgh Differential Revision: https://reviews.llvm.org/D75248
2020-03-10	[X86][AVX] combineX86ShuffleChain - combine binary shuffles to ↵	Simon Pilgrim	1	-15/+37
	X86ISD::VPERM2X128 For pre-AVX512 targets, combine binary shuffles to X86ISD::VPERM2X128 if possible. This mainly helps optimize the blend(extract_subvector(x,1),y) pattern. At some point soon we're going to have make a decision about when to combine AVX512 shuffles more aggressively - we bail out if there is any change in element size (to protect predicate mask merging) which means we miss out on a lot of optimizations.
2020-03-10	[ARM][MVE] Validate tail predication values	Sam Parker	1	-5/+102
	Iterate through the loop and check that the observable values produced are the same whether tail predication happens or not. We want to find out if the tail-predicated version of this loop will produce the same values as the loop in its original form. For this to be true, the newly inserted implicit predication must not change the the (observable) results. We're doing this because many instructions in the loop will not be predicated and so the conversion from VPT predication to tail predication can result in different values being produced, because of falsely predicated lanes not being updated in the converted form. A masked load, whether through VPT or tail predication, will write zeros to any of the falsely predicated bytes. So, from the loads, we know that the false lanes are zeroed and here we're trying to track that those false lanes remain zero, or where they change, the differences are masked away by their user(s). All MVE loads and stores have to be predicated, so we know that any load operands, or stored results are equivalent already. Other explicitly predicated instructions will perform the same operation in the original loop and the tail-predicated form too. Because of this, we can insert loads, stores and other predicated instructions into our KnownFalseZeros set and build from there. Differential Revision: https://reviews.llvm.org/D75452
2020-03-10	Reland "[DebugInfo] Enable the debug entry values feature by default"	Djordje Todorovic	3	-0/+9
	Differential Revision: https://reviews.llvm.org/D73534
2020-03-10	[X86] Remove isel patterns for (X86VBroadcast (i16 (trunc (i32 (load))))). ↵	Craig Topper	3	-10/+22
	Replace with a DAG combine to form VBROADCAST_LOAD. isTypeDesirableForOp prevents loads from being shrunk to i16 by DAG combine. Because of this we can't just match the broadcast and a scalar load. So look for broadcast+truncate+load and form a vbroadcast_load during DAG combine. This replaces what was previously done as an isel pattern and I think fixes it so we won't change the size of a volatile load. But my main motivation is just to clean up our isel patterns.
2020-03-09	AMDGPU/GlobalISel: Avoid illegal vector exts for add/sub/mul	Matt Arsenault	1	-9/+13
	When expanding scalar packed operations, we should not introduce illegal vector casts LegalizerHelper introduces. We're not in a legalizer context, and there's no RegBankSelect apply or legalize worklist.