riscv-gnu-toolchain/llvm.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
4 hours	[ARM] Update and cleanup lround/llround tests. NFCHEAD main	David Green	2	-26/+94
	Similar to f4370fb801aa, the fp16 tests do not work yet.
17 hours	[AMDGPU] Add another test for missing S_WAIT_XCNT (#161838)	Jay Foad	1	-0/+40

19 hours	AMDGPU: Remove LDS_DIRECT_CLASS register class (#161762)	Matt Arsenault	10	-174/+174
	This is a singleton register class which is a bad idea, and not actually used.
21 hours	AMDGPU: Remove m0 classes (#161758)	Matt Arsenault	26	-544/+545
	These are singleton register classes, which are not a good idea and also are unused.
26 hours	[AMDGPU][True16][CodeGen] fix v_mov_b16_t16 index in folding pass (#161764)	Brox Chen	1	-0/+25
	With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but the src operand is in different index. Use the correct src index for v_mov_b16_t16.
27 hours	[SPIR-V] Fix `asdouble` issue in SPIRV codegen to correctly generate ↵	Lucie Choi	1	-0/+28
	`OpBitCast` instruction. (#161891) Generate `OpBitCast` instruction for pointer cast operation if the element type is different. The HLSL for the unit test is ```hlsl StructuredBuffer<uint2> In : register(t0); RWStructuredBuffer<double2> Out : register(u2); [numthreads(1,1,1)] void main() { Out[0] = asdouble(In[0], In[1]); } ``` Resolves https://github.com/llvm/llvm-project/issues/153513
28 hours	[LLVM][CodeGen] Check Non Saturate Case in isSaturatingMinMax (#160637)	Yatao Wang	10	-174/+714
	Fix Issue #160611
28 hours	[AArch64][GlobalISel] Use TargetConstant for shift immediates (#161527)	David Green	18	-597/+313
	This changes the intrinsic definitions for shifts to use IntArg, which in turn changes how the shifts are represented in SDAG to use TargetConstant (and fixes up a number of ISel lowering places too). The vecshift immediates are changed from ImmLeaf to TImmLeaf to keep them matching the TargetConstant. On the GISel side the constant shift amounts are then represented as immediate operands, not separate constants. The end result is that this allows a few more patterns to match in GISel.
29 hours	[Hexagon] Support lowering of setuo & seto for vector types in Hexagon (#158740)	Fateme Hosseini	1	-0/+93
	Resolves instruction selection failure for v64f16 and v32f32 vector types. Patch by: Fateme Hosseini --------- Co-authored-by: Kaushik Kulkarni <quic_kauskulk@quicinc.com>
31 hours	[Hexagon] isel-fold-shl-zext.ll - regenerate test checks (#161869)	Simon Pilgrim	1	-5/+7
	Improves codegen diff in an upcoming patch
31 hours	[AMDGPU][Attributor] Stop inferring amdgpu-no-flat-scratch-init in sanitized ↵	Chaitanya	2	-2/+26
	functions. (#161319) This PR stops the attributor pass to infer `amdgpu-no-flat-scratch-init` for functions marked with `sanitize_*` attribute.
32 hours	[x86] lowerV4I32Shuffle - don't adjust PSHUFD splat masks to match UNPCK ↵	Simon Pilgrim	12	-59/+59
	(#161846) Allow getV4X86ShuffleImm8ForMask to create a pure splat mask, helping to reduce demanded elts.
32 hours	AMDGPU: Fix broken register class IDs in mir tests (#161832)	Matt Arsenault	5	-21/+21

32 hours	[RISCV] Support scalar llvm.fmodf intrinsic. (#161743)	Craig Topper	8	-2/+509

32 hours	[AMDGPU] Enable XNACK on gfx1250 (#161457)	Shilei Tian	26	-1167/+1252
	This should be always on. Fixes SWDEV-555931.
32 hours	Fold SVE mul and mul_u to neg during isel (#160828)	Martin Wehking	1	-0/+131
	Replace mul and mul_u ops with a neg operation if their second operand is a splat value -1. Apply the optimization also for mul_u ops if their first operand is a splat value -1 due to their commutativity.
33 hours	[Hexagon] Added lowering for sint_to_fp from v32i1 to v32f32 (#159507)	pkarveti	2	-25/+42
	The transformation pattern is identical to the uint_to_fp conversion from v32i1 to v32f32.
33 hours	[X86] Fold ADD(x,x) -> X86ISD::VSHLI(x,1) (#161843)	Simon Pilgrim	16	-398/+398
	Now that #161007 will attempt to fold this back to ADD(x,x) in X86FixupInstTunings, we can more aggressively create X86ISD::VSHLI nodes to avoid missed optimisations due to oneuse limits, avoids unnecessary freezes and allows AVX512 to fold to mi memory folding variants. I've currently limited SSE targets to cases where ADD is the only user of x to prevent extra moves - AVX shift patterns benefit from breaking the ADD+ADD+ADD chains into shifts, but its not so beneficial on SSE with the extra moves.
35 hours	[SPARC] Prevent meta instructions from being inserted into delay slots (#161111)	Koakuma	1	-0/+25
	Do not move meta instructions like `FAKE_USE`/`@llvm.fake.use` into delay slots, as they don't correspond to real machine instructions. This should fix crashes when compiling with, for example, `clang -Og`.
35 hours	AMDGPU: Fix constrain register logic for physregs (#161794)	Matt Arsenault	5	-940/+650
	We do not need to reconstrain physical registers. Enables an additional fold for constant physregs.
37 hours	[AArch64][SME] Enable `aarch64-split-sve-objects` with hazard padding (#161714)	Benjamin Maxwell	2	-138/+140
	This enables `aarch64-split-sve-objects` by default. Note: This option only has an effect when used in conjunction with hazard padding (`aarch64-stack-hazard-size` != 0). See https://github.com/llvm/llvm-project/pull/142392 for more details.
37 hours	[X86][GlobalIsel] Adds support for G_UMIN/G_UMAX/G_SMIN/G_SMAX (#161783)	Mahesh-Attarde	4	-328/+648
	Original PR broke in rebase https://github.com/llvm/llvm-project/pull/160247. Continuing here This patch adds support for G_[U\|S][MIN\|MAX] opcodes into X86 Target. This PR addressed review comments 1. About Widening to next power of 2 https://github.com/llvm/llvm-project/pull/160247#discussion_r2371655478 2. clamping scalar https://github.com/llvm/llvm-project/pull/160247#discussion_r2374748440
38 hours	[X86][GlobalIsel] Enable gisel run for fpclass isel (#160741)	Mahesh-Attarde	1	-134/+122
	X86 Gisel has all necessary opcodes supported to expand/lower isfpclass intrinsic, enabling test prior fpclass patch. This patch enables runs for isel-fpclass.ll tests
39 hours	[AMDGPU] Account for implicit XCNT insertion (#160812)	Aaditya	2	-7/+3
	Hardware inserts an implicit `S_WAIT_XCNT 0` between alternate SMEM and VMEM instructions, so there are never outstanding address translations for both SMEM and VMEM at the same time.
40 hours	[AMDGPU] Define VS_128*. NFCI (#161798)	Stanislav Mekhanoshin	6	-60/+60
	Needed for future patch.
40 hours	[ARM] shouldFoldMaskToVariableShiftPair should be true for scalars up to the ↵	AZero13	3	-0/+7433
	biggest legal type (#158070) For ARM, we want to do this up to 32-bits. Otherwise the code ends up bigger and bloated.
40 hours	[X86] combineBitcastvxi1 - bail out on soft-float targets (#161704)	Simon Pilgrim	1	-0/+40
	combineBitcastvxi1 is sometimes called pre-legalization, so don't introduce X86ISD::MOVMSK nodes when vector types aren't legal Fixes #161693
46 hours	[X86][AMX] Combine constant zero vector and AMX cast to tilezero (#92384)	Phoebe Wang	1	-60/+18
	Found this problem when investigating #91207
2 days	[ARM] Update and cleanup lrint/llrint tests. NFC	David Green	3	-59/+67
	Most of the fp16 cases still do not work properly. See #161088.
2 days	[AMDGPU] Be less optimistic when allocating module scope lds (#161464)	Jon Chesterfield	1	-47/+42
	Make the test for when additional variables can be added to the struct allocated at address zero more stringent. Previously, variables can be added to it (for faster access) even when that increases the lds requested by a kernel. This corrects that oversight. Test case diff shows the change from all variables being allocated into the module lds to only some being, in particular the introduction of uses of the offset table and that some kernels now use less lds than before. Alternative to PR 160181
2 days	[NVPTX] expand trunc/ext on v2i32 (#161715)	Artem Belevich	1	-0/+145
	#153478 made v2i32 legal on newer GPUs, but we can not lower all operations yet. Expand the `trunc/ext` operation until we implement efficient lowering.
2 days	[AArch64][SME] Support split ZPR and PPR area allocation (#142392)	Benjamin Maxwell	5	-344/+1961
	For a while we have supported the `-aarch64-stack-hazard-size=<size>` option, which adds "hazard padding" between GPRs and FPR/ZPRs. However, there is currently a hole in this mitigation as PPR and FPR/ZPR accesses to the same area also cause streaming memory hazards (this is noted by `-pass-remarks-analysis=sme -aarch64-stack-hazard-remark-size=<val>`), and the current stack layout places PPRs and ZPRs within the same area. Which looks like: ``` ------------------------------------ Higher address \| callee-saved gpr registers \| \|---------------------------------- \| \| lr,fp (a.k.a. "frame record") \| \|-----------------------------------\| <- fp(=x29) \| <hazard padding> \| \|-----------------------------------\| \| callee-saved fp/simd/SVE regs \| \|-----------------------------------\| \| SVE stack objects \| \|-----------------------------------\| \| local variables of fixed size \| \| <FPR> \| \| <hazard padding> \| \| <GPR> \| ------------------------------------\| <- sp \| Lower address ``` With this patch the stack (and hazard padding) is rearranged so that hazard padding is placed between the PPRs and ZPRs rather than within the (fixed size) callee-save region. Which looks something like this: ``` ------------------------------------ Higher address \| callee-saved gpr registers \| \|---------------------------------- \| \| lr,fp (a.k.a. "frame record") \| \|-----------------------------------\| <- fp(=x29) \| callee-saved PPRs \| \| PPR stack objects \| (These are SVE predicates) \|-----------------------------------\| \| <hazard padding> \| \|-----------------------------------\| \| callee-saved ZPR regs \| (These are SVE vectors) \| ZPR stack objects \| Note: FPRs are promoted to ZPRs \|-----------------------------------\| \| local variables of fixed size \| \| <FPR> \| \| <hazard padding> \| \| <GPR> \| ------------------------------------\| <- sp \| Lower address ``` This layout is only enabled if: * SplitSVEObjects are enabled (`-aarch64-split-sve-objects`) - (This may be enabled by default in a later patch) * Streaming memory hazards are present - (`-aarch64-stack-hazard-size=<val>` != 0) * PPRs and FPRs/ZPRs are on the stack * There's no stack realignment or variable-sized objects - This is left as a TODO for now Additionally, any FPR callee-saves that are present will be promoted to ZPRs. This is to prevent stack hazards between FPRs and GRPs in the fixed size callee-save area (which would otherwise require more hazard padding, or moving the FPR callee-saves). This layout should resolve the hole in the hazard padding mitigation, and is not intended change codegen for non-SME code.
2 days	[RegAlloc] Add coverage leading to revert of pr160765 (#161614)	Philip Reames	1	-0/+132
	Essentially what happened is the following series of events: 1) We rematerialized the vmv.v.x into the loop. 2) As this was the last use of the instruction, we deleted the instruction, and removed it from the original live range. 3) We split the live range for the remat. 4) We tried to rematerialize the uses of that split interval, and crashed because the assert about the def being available in the original live interval does not hold.
2 days	[AMDGPU] s_quadmask* implicitly defines SCC (#161582)	LU-JOHN	1	-2/+90
	Fix s_quadmask* instruction description so that it defines SCC. --------- Signed-off-by: John Lu <John.Lu@amd.com>
2 days	[X86] Create special case for (a-b) - (a<b) -> sbb a, b (#161388)	AZero13	1	-0/+29

2 days	[Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618)	Ikhlas Ajbar	1	-0/+23
	Check for a valid offset for unaligned vector store V6_vS32Ub_npred_ai. isValidOffset() is updated to evaluate offset of this instruction. Fixes #160647
2 days	Greedy: Take hints from copy to physical subreg (#160467)	Matt Arsenault	1	-2/+1
	Previously this took hints from subregister extract of physreg, like %vreg.sub = COPY $physreg This now also handles the rarer case: $physreg_sub = COPY %vreg Also make an accidental bug here before explicit; this was only using the superregister as a hint if it was already in the copy, and not if using the existing assignment. There are a handful of regressions in that case, so leave that extension for a future change.
2 days	[Codegen] Add a separate stack ID for scalable predicates (#142390)	Benjamin Maxwell	5	-22/+22
	This splits out "ScalablePredicateVector" from the "ScalableVector" StackID this is primarily to allow easy differentiation between vectors and predicates (without inspecting instructions). This new stack ID is not used in many places yet, but will be used in a later patch to mark stack slots that are known to contain predicates. Co-authored-by: Kerry McLaughlin <kerry.mclaughlin@arm.com>
2 days	AMDGPU: Switch test to generated checks (#161658)	Matt Arsenault	1	-13/+20

2 days	RegAllocGreedy: Check if copied lanes are live in trySplitAroundHintReg ↵	Matt Arsenault	3	-88/+87
	(#160424) For subregister copies, do a subregister live check instead of checking the main range. Doesn't do much yet, the split analysis still does not track live ranges.
2 days	[LLVM][CodeGen][SVE] Remove failure cases when widening vector load/store ↵	Paul Walker	2	-31/+2879
	ops. (#160515) When unable to widen a vector load/store we can replace the operation with a masked variant. Support for extending loads largely came for free hence its inclusion, but truncating stores require more work. Fixes https://github.com/llvm/llvm-project/issues/159995
3 days	[SPIR-V] Prevent adding duplicate binding instructions for implicit binding ↵	Lucie Choi	2	-24/+48
	(#161299) Prevent adding duplicate instructions for implicit bindings when they are from the same resource. The fix is to store and check if the binding number is already assigned for each `OrderId`. Resolves https://github.com/llvm/llvm-project/issues/160716
3 days	[AArch64][GlobalISel] Add `G_FMODF` instruction (#160061)	Ryan Cowan	5	-152/+660
	This commit adds the intrinsic `G_FMODF` to GMIR & enables its translation, legalization and instruction selection in AArch64.
3 days	[AArch64] Combine PTEST_FIRST(PTRUE, CONCAT(A, B)) -> PTEST_FIRST(PTRUE, A) ↵	Kerry McLaughlin	1	-18/+2
	(#161384) When the input to ptest_first is a vector concat and the mask is all active, performPTestFirstCombine returns a ptest_first using the first operand of the concat, looking through any reinterpret casts. This allows optimizePTestInstr to later remove the ptest when the first operand is a flag setting instruction such as whilelo.
3 days	[AArch64][SME] Preserve `Chain` when selecting multi-vector LUT4Is (#161494)	Benjamin Maxwell	3	-7/+16
	Previously, the `Chain` was dropped meaning LUTI4 nodes that only differed in the chain operand would be incorrectly CSE'd. Fixes: #161420
3 days	[AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (#146076)	Fabian Ritter	16	-357/+322
	Also removes the command line option to control this feature. There seem to be mainly two kinds of test changes: - Some operands of addition instructions are swapped; that is to be expected since PTRADD is not commutative. - Improvements in code generation, probably because the legacy lowering enabled some transformations that were sometimes harmful. For SWDEV-516125.
3 days	[DAG] Add ComputeNumSignBits(FREEZE(X)) handling (#161507)	Simon Pilgrim	1	-6/+0
	If X is known never under/poison then skip the freeze and return ComputeNumSignBits(X)
3 days	PeepholeOpt: Fix losing subregister indexes on full copies (#161310)	Matt Arsenault	33	-3126/+2895
	Previously if we had a subregister extract reading from a full copy, the no-subregister incoming copy would overwrite the DefSubReg index of the folding context. There's one ugly rvv regression, but it's a downstream issue of this; an unnecessary same class reg-to-reg full copy was avoided.
3 days	[RISCV][GISel] Use LBU for anyext i8 atomic_load. (#161588)	Craig Topper	1	-20/+20
	This matches what we do for regular i8 extload due to the lack of c.lb in Zbc. This only affects global isel because SelectionDAG won't create an anyext i8 atomic_load today.
3 days	[AMDGPU] Move LowerBufferFatPointers after LoadStoreVectorizer and remove ↵	Gang Chen	7	-2304/+1815
	the fixme (#161531) Move LowerBufferFatPointers pass after CodegenPrepare and LoadStoreVectorizer pass, and remove the fixme about that.