aboutsummaryrefslogtreecommitdiff
path: root/llvm/test/CodeGen
AgeCommit message (Collapse)AuthorFilesLines
4 hours[ARM] Update and cleanup lround/llround tests. NFCHEADmainDavid Green2-26/+94
Similar to f4370fb801aa, the fp16 tests do not work yet.
17 hours[AMDGPU] Add another test for missing S_WAIT_XCNT (#161838)Jay Foad1-0/+40
19 hoursAMDGPU: Remove LDS_DIRECT_CLASS register class (#161762)Matt Arsenault10-174/+174
This is a singleton register class which is a bad idea, and not actually used.
21 hoursAMDGPU: Remove m0 classes (#161758)Matt Arsenault26-544/+545
These are singleton register classes, which are not a good idea and also are unused.
26 hours[AMDGPU][True16][CodeGen] fix v_mov_b16_t16 index in folding pass (#161764)Brox Chen1-0/+25
With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but the src operand is in different index. Use the correct src index for v_mov_b16_t16.
27 hours[SPIR-V] Fix `asdouble` issue in SPIRV codegen to correctly generate ↵Lucie Choi1-0/+28
`OpBitCast` instruction. (#161891) Generate `OpBitCast` instruction for pointer cast operation if the element type is different. The HLSL for the unit test is ```hlsl StructuredBuffer<uint2> In : register(t0); RWStructuredBuffer<double2> Out : register(u2); [numthreads(1,1,1)] void main() { Out[0] = asdouble(In[0], In[1]); } ``` Resolves https://github.com/llvm/llvm-project/issues/153513
28 hours[LLVM][CodeGen] Check Non Saturate Case in isSaturatingMinMax (#160637)Yatao Wang10-174/+714
Fix Issue #160611
28 hours[AArch64][GlobalISel] Use TargetConstant for shift immediates (#161527)David Green18-597/+313
This changes the intrinsic definitions for shifts to use IntArg, which in turn changes how the shifts are represented in SDAG to use TargetConstant (and fixes up a number of ISel lowering places too). The vecshift immediates are changed from ImmLeaf to TImmLeaf to keep them matching the TargetConstant. On the GISel side the constant shift amounts are then represented as immediate operands, not separate constants. The end result is that this allows a few more patterns to match in GISel.
29 hours[Hexagon] Support lowering of setuo & seto for vector types in Hexagon (#158740)Fateme Hosseini1-0/+93
Resolves instruction selection failure for v64f16 and v32f32 vector types. Patch by: Fateme Hosseini --------- Co-authored-by: Kaushik Kulkarni <quic_kauskulk@quicinc.com>
31 hours[Hexagon] isel-fold-shl-zext.ll - regenerate test checks (#161869)Simon Pilgrim1-5/+7
Improves codegen diff in an upcoming patch
31 hours[AMDGPU][Attributor] Stop inferring amdgpu-no-flat-scratch-init in sanitized ↵Chaitanya2-2/+26
functions. (#161319) This PR stops the attributor pass to infer `amdgpu-no-flat-scratch-init` for functions marked with `sanitize_*` attribute.
32 hours[x86] lowerV4I32Shuffle - don't adjust PSHUFD splat masks to match UNPCK ↵Simon Pilgrim12-59/+59
(#161846) Allow getV4X86ShuffleImm8ForMask to create a pure splat mask, helping to reduce demanded elts.
32 hoursAMDGPU: Fix broken register class IDs in mir tests (#161832)Matt Arsenault5-21/+21
32 hours[RISCV] Support scalar llvm.fmodf intrinsic. (#161743)Craig Topper8-2/+509
32 hours[AMDGPU] Enable XNACK on gfx1250 (#161457)Shilei Tian26-1167/+1252
This should be always on. Fixes SWDEV-555931.
32 hoursFold SVE mul and mul_u to neg during isel (#160828)Martin Wehking1-0/+131
Replace mul and mul_u ops with a neg operation if their second operand is a splat value -1. Apply the optimization also for mul_u ops if their first operand is a splat value -1 due to their commutativity.
33 hours[Hexagon] Added lowering for sint_to_fp from v32i1 to v32f32 (#159507)pkarveti2-25/+42
The transformation pattern is identical to the uint_to_fp conversion from v32i1 to v32f32.
33 hours[X86] Fold ADD(x,x) -> X86ISD::VSHLI(x,1) (#161843)Simon Pilgrim16-398/+398
Now that #161007 will attempt to fold this back to ADD(x,x) in X86FixupInstTunings, we can more aggressively create X86ISD::VSHLI nodes to avoid missed optimisations due to oneuse limits, avoids unnecessary freezes and allows AVX512 to fold to mi memory folding variants. I've currently limited SSE targets to cases where ADD is the only user of x to prevent extra moves - AVX shift patterns benefit from breaking the ADD+ADD+ADD chains into shifts, but its not so beneficial on SSE with the extra moves.
35 hours[SPARC] Prevent meta instructions from being inserted into delay slots (#161111)Koakuma1-0/+25
Do not move meta instructions like `FAKE_USE`/`@llvm.fake.use` into delay slots, as they don't correspond to real machine instructions. This should fix crashes when compiling with, for example, `clang -Og`.
35 hoursAMDGPU: Fix constrain register logic for physregs (#161794)Matt Arsenault5-940/+650
We do not need to reconstrain physical registers. Enables an additional fold for constant physregs.
37 hours[AArch64][SME] Enable `aarch64-split-sve-objects` with hazard padding (#161714)Benjamin Maxwell2-138/+140
This enables `aarch64-split-sve-objects` by default. Note: This option only has an effect when used in conjunction with hazard padding (`aarch64-stack-hazard-size` != 0). See https://github.com/llvm/llvm-project/pull/142392 for more details.
37 hours[X86][GlobalIsel] Adds support for G_UMIN/G_UMAX/G_SMIN/G_SMAX (#161783)Mahesh-Attarde4-328/+648
Original PR broke in rebase https://github.com/llvm/llvm-project/pull/160247. Continuing here This patch adds support for G_[U|S][MIN|MAX] opcodes into X86 Target. This PR addressed review comments 1. About Widening to next power of 2 https://github.com/llvm/llvm-project/pull/160247#discussion_r2371655478 2. clamping scalar https://github.com/llvm/llvm-project/pull/160247#discussion_r2374748440
38 hours[X86][GlobalIsel] Enable gisel run for fpclass isel (#160741)Mahesh-Attarde1-134/+122
X86 Gisel has all necessary opcodes supported to expand/lower isfpclass intrinsic, enabling test prior fpclass patch. This patch enables runs for isel-fpclass.ll tests
39 hours[AMDGPU] Account for implicit XCNT insertion (#160812)Aaditya2-7/+3
Hardware inserts an implicit `S_WAIT_XCNT 0` between alternate SMEM and VMEM instructions, so there are never outstanding address translations for both SMEM and VMEM at the same time.
40 hours[AMDGPU] Define VS_128*. NFCI (#161798)Stanislav Mekhanoshin6-60/+60
Needed for future patch.
40 hours[ARM] shouldFoldMaskToVariableShiftPair should be true for scalars up to the ↵AZero133-0/+7433
biggest legal type (#158070) For ARM, we want to do this up to 32-bits. Otherwise the code ends up bigger and bloated.
40 hours[X86] combineBitcastvxi1 - bail out on soft-float targets (#161704)Simon Pilgrim1-0/+40
combineBitcastvxi1 is sometimes called pre-legalization, so don't introduce X86ISD::MOVMSK nodes when vector types aren't legal Fixes #161693
46 hours[X86][AMX] Combine constant zero vector and AMX cast to tilezero (#92384)Phoebe Wang1-60/+18
Found this problem when investigating #91207
2 days[ARM] Update and cleanup lrint/llrint tests. NFCDavid Green3-59/+67
Most of the fp16 cases still do not work properly. See #161088.
2 days[AMDGPU] Be less optimistic when allocating module scope lds (#161464)Jon Chesterfield1-47/+42
Make the test for when additional variables can be added to the struct allocated at address zero more stringent. Previously, variables can be added to it (for faster access) even when that increases the lds requested by a kernel. This corrects that oversight. Test case diff shows the change from all variables being allocated into the module lds to only some being, in particular the introduction of uses of the offset table and that some kernels now use less lds than before. Alternative to PR 160181
2 days[NVPTX] expand trunc/ext on v2i32 (#161715)Artem Belevich1-0/+145
#153478 made v2i32 legal on newer GPUs, but we can not lower all operations yet. Expand the `trunc/ext` operation until we implement efficient lowering.
2 days[AArch64][SME] Support split ZPR and PPR area allocation (#142392)Benjamin Maxwell5-344/+1961
For a while we have supported the `-aarch64-stack-hazard-size=<size>` option, which adds "hazard padding" between GPRs and FPR/ZPRs. However, there is currently a hole in this mitigation as PPR and FPR/ZPR accesses to the same area also cause streaming memory hazards (this is noted by `-pass-remarks-analysis=sme -aarch64-stack-hazard-remark-size=<val>`), and the current stack layout places PPRs and ZPRs within the same area. Which looks like: ``` ------------------------------------ Higher address | callee-saved gpr registers | |---------------------------------- | | lr,fp (a.k.a. "frame record") | |-----------------------------------| <- fp(=x29) | <hazard padding> | |-----------------------------------| | callee-saved fp/simd/SVE regs | |-----------------------------------| | SVE stack objects | |-----------------------------------| | local variables of fixed size | | <FPR> | | <hazard padding> | | <GPR> | ------------------------------------| <- sp | Lower address ``` With this patch the stack (and hazard padding) is rearranged so that hazard padding is placed between the PPRs and ZPRs rather than within the (fixed size) callee-save region. Which looks something like this: ``` ------------------------------------ Higher address | callee-saved gpr registers | |---------------------------------- | | lr,fp (a.k.a. "frame record") | |-----------------------------------| <- fp(=x29) | callee-saved PPRs | | PPR stack objects | (These are SVE predicates) |-----------------------------------| | <hazard padding> | |-----------------------------------| | callee-saved ZPR regs | (These are SVE vectors) | ZPR stack objects | Note: FPRs are promoted to ZPRs |-----------------------------------| | local variables of fixed size | | <FPR> | | <hazard padding> | | <GPR> | ------------------------------------| <- sp | Lower address ``` This layout is only enabled if: * SplitSVEObjects are enabled (`-aarch64-split-sve-objects`) - (This may be enabled by default in a later patch) * Streaming memory hazards are present - (`-aarch64-stack-hazard-size=<val>` != 0) * PPRs and FPRs/ZPRs are on the stack * There's no stack realignment or variable-sized objects - This is left as a TODO for now Additionally, any FPR callee-saves that are present will be promoted to ZPRs. This is to prevent stack hazards between FPRs and GRPs in the fixed size callee-save area (which would otherwise require more hazard padding, or moving the FPR callee-saves). This layout should resolve the hole in the hazard padding mitigation, and is not intended change codegen for non-SME code.
2 days[RegAlloc] Add coverage leading to revert of pr160765 (#161614)Philip Reames1-0/+132
Essentially what happened is the following series of events: 1) We rematerialized the vmv.v.x into the loop. 2) As this was the last use of the instruction, we deleted the instruction, and removed it from the original live range. 3) We split the live range for the remat. 4) We tried to rematerialize the uses of that split interval, and crashed because the assert about the def being available in the original live interval does not hold.
2 days[AMDGPU] s_quadmask* implicitly defines SCC (#161582)LU-JOHN1-2/+90
Fix s_quadmask* instruction description so that it defines SCC. --------- Signed-off-by: John Lu <John.Lu@amd.com>
2 days[X86] Create special case for (a-b) - (a<b) -> sbb a, b (#161388)AZero131-0/+29
2 days[Hexagon] Add opcode V6_vS32Ub_npred_ai for offset validity check (#161618)Ikhlas Ajbar1-0/+23
Check for a valid offset for unaligned vector store V6_vS32Ub_npred_ai. isValidOffset() is updated to evaluate offset of this instruction. Fixes #160647
2 daysGreedy: Take hints from copy to physical subreg (#160467)Matt Arsenault1-2/+1
Previously this took hints from subregister extract of physreg, like %vreg.sub = COPY $physreg This now also handles the rarer case: $physreg_sub = COPY %vreg Also make an accidental bug here before explicit; this was only using the superregister as a hint if it was already in the copy, and not if using the existing assignment. There are a handful of regressions in that case, so leave that extension for a future change.
2 days[Codegen] Add a separate stack ID for scalable predicates (#142390)Benjamin Maxwell5-22/+22
This splits out "ScalablePredicateVector" from the "ScalableVector" StackID this is primarily to allow easy differentiation between vectors and predicates (without inspecting instructions). This new stack ID is not used in many places yet, but will be used in a later patch to mark stack slots that are known to contain predicates. Co-authored-by: Kerry McLaughlin <kerry.mclaughlin@arm.com>
2 daysAMDGPU: Switch test to generated checks (#161658)Matt Arsenault1-13/+20
2 daysRegAllocGreedy: Check if copied lanes are live in trySplitAroundHintReg ↵Matt Arsenault3-88/+87
(#160424) For subregister copies, do a subregister live check instead of checking the main range. Doesn't do much yet, the split analysis still does not track live ranges.
2 days[LLVM][CodeGen][SVE] Remove failure cases when widening vector load/store ↵Paul Walker2-31/+2879
ops. (#160515) When unable to widen a vector load/store we can replace the operation with a masked variant. Support for extending loads largely came for free hence its inclusion, but truncating stores require more work. Fixes https://github.com/llvm/llvm-project/issues/159995
3 days[SPIR-V] Prevent adding duplicate binding instructions for implicit binding ↵Lucie Choi2-24/+48
(#161299) Prevent adding duplicate instructions for implicit bindings when they are from the same resource. The fix is to store and check if the binding number is already assigned for each `OrderId`. Resolves https://github.com/llvm/llvm-project/issues/160716
3 days[AArch64][GlobalISel] Add `G_FMODF` instruction (#160061)Ryan Cowan5-152/+660
This commit adds the intrinsic `G_FMODF` to GMIR & enables its translation, legalization and instruction selection in AArch64.
3 days[AArch64] Combine PTEST_FIRST(PTRUE, CONCAT(A, B)) -> PTEST_FIRST(PTRUE, A) ↵Kerry McLaughlin1-18/+2
(#161384) When the input to ptest_first is a vector concat and the mask is all active, performPTestFirstCombine returns a ptest_first using the first operand of the concat, looking through any reinterpret casts. This allows optimizePTestInstr to later remove the ptest when the first operand is a flag setting instruction such as whilelo.
3 days[AArch64][SME] Preserve `Chain` when selecting multi-vector LUT4Is (#161494)Benjamin Maxwell3-7/+16
Previously, the `Chain` was dropped meaning LUTI4 nodes that only differed in the chain operand would be incorrectly CSE'd. Fixes: #161420
3 days[AMDGPU][SDAG] Enable ISD::PTRADD for 64-bit AS by default (#146076)Fabian Ritter16-357/+322
Also removes the command line option to control this feature. There seem to be mainly two kinds of test changes: - Some operands of addition instructions are swapped; that is to be expected since PTRADD is not commutative. - Improvements in code generation, probably because the legacy lowering enabled some transformations that were sometimes harmful. For SWDEV-516125.
3 days[DAG] Add ComputeNumSignBits(FREEZE(X)) handling (#161507)Simon Pilgrim1-6/+0
If X is known never under/poison then skip the freeze and return ComputeNumSignBits(X)
3 daysPeepholeOpt: Fix losing subregister indexes on full copies (#161310)Matt Arsenault33-3126/+2895
Previously if we had a subregister extract reading from a full copy, the no-subregister incoming copy would overwrite the DefSubReg index of the folding context. There's one ugly rvv regression, but it's a downstream issue of this; an unnecessary same class reg-to-reg full copy was avoided.
3 days[RISCV][GISel] Use LBU for anyext i8 atomic_load. (#161588)Craig Topper1-20/+20
This matches what we do for regular i8 extload due to the lack of c.lb in Zbc. This only affects global isel because SelectionDAG won't create an anyext i8 atomic_load today.
3 days[AMDGPU] Move LowerBufferFatPointers after LoadStoreVectorizer and remove ↵Gang Chen7-2304/+1815
the fixme (#161531) Move LowerBufferFatPointers pass after CodegenPrepare and LoadStoreVectorizer pass, and remove the fixme about that.