aboutsummaryrefslogtreecommitdiff
path: root/llvm/test
AgeCommit message (Collapse)AuthorFilesLines
8 hours[AMDGPU] Add another test for missing S_WAIT_XCNT (#161838)Jay Foad1-0/+40
10 hours[SimplifyCFG][profcheck] Handle branch weights in `simplifySwitchLookup` ↵Mircea Trofin2-7/+30
(#161739) The switch becomes a conditional branch, one edge going to what was the default target of the switch, the other to a BB that performs a lookup in a table. The branch weights are accurately determinable from the ones of the switch. Issue #147390
10 hoursAMDGPU: Remove LDS_DIRECT_CLASS register class (#161762)Matt Arsenault11-176/+176
This is a singleton register class which is a bad idea, and not actually used.
10 hours[TableGen][SchedModel] Introduce a new SchedPredicate that checks against ↵Min-Yih Hsu1-0/+46
SubtargetFeature (#161888) Introduce a new SchedPredicate, `FeatureSchedPredicate`, that holds true when a certain SubtargetFeature is enabled. This could be useful when we want to configure a scheduling model with subtarget features. I add this as a separate SchedPredicate rather than piggy-back on the existing `SchedPredicate<[{....}]>` because first and foremost, `SchedPredicate` is expected to only operate on MachineInstr, so it does _not_ appear in `MCGenSubtargetInfo::resolveVariantSchedClass` but only show up in `TargetGenSubtargetInfo::resolveSchedClass`. Yet I think `FeatureSchedPredicate` will be useful for both MCInst and MachineInstr. There is another subtle difference between `resolveVariantSchedClass` and `resolveSchedClass` regarding how we access the MCSubtargetInfo instance, if we really want to express `FeatureSchedPredicate` using `SchedPredicate<[{.....}]>`. So I thought it'll be easier to add another new SchedPredicate for SubtargetFeature.
11 hours[SimplifyCFG][profcheck] Synthesize profile for `br (X == 0 | X == 1), T, F1 ↵Mircea Trofin1-7/+20
-> switch` (#161549) We cannot calculate the weights of the switch precisely, but we do know the probability of the default branch. We then split equally the remaining probability over the rest of the cases. If we did nothing, the static estimation could be considerably poorer. Issue #147390
11 hoursAMDGPU: Remove m0 classes (#161758)Matt Arsenault27-546/+547
These are singleton register classes, which are not a good idea and also are unused.
13 hours[TableGen][MC] Pass a MCSubtargetInfo instance into ↵Min-Yih Hsu1-0/+18
resolveVariantSchedClassImpl (#161886) `Target_MC::resolveVariantSchedClassImpl` is the implementation function for `TargetGenMCSubtargetInfo::resolveVariantSchedClass`. Despite being only called by `resolveVariantSchedClass`, `resolveVariantSchedClassImpl` is still a standalone function that cannot access a MCSubtargetInfo through `this` (i.e. `TargetGenMCSubtargetInfo`). And having access to a `MCSubtargetInfo` could be useful for some (future) SchedPredicate. This patch modifies TableGen to generate `resolveVariantSchedClassImpl` with an additional `MCSubtargetInfo` argument passing in. Note that this does not change any public interface in either `TargetGenMCSubtargetInfo ` or `MCSubtargetInfo`, as `resolveVariantSchedClassImpl` is basically an internal function.
14 hours[MemProf] Suppress duplicate clones in the LTO backend (#161551)Teresa Johnson2-5/+143
In some cases due to phase ordering issues with re-cloning during function assignment, we may end up with duplicate clones in the summaries (calling the same set of callee clones and/or allocation hints). Ideally we would fix this in the thin link, but for now, detect and suppress these in the LTO backend. In order to satisfy possibly cross-module references, make each duplicate an alias to the first identical copy, which gets materialized. This reduces ThinLTO backend compile times.
16 hours[AMDGPU][True16][CodeGen] fix v_mov_b16_t16 index in folding pass (#161764)Brox Chen1-0/+25
With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but the src operand is in different index. Use the correct src index for v_mov_b16_t16.
17 hours[VPlan] Match legacy CM in ::computeCost if load is used by load/store.Florian Hahn1-0/+126
If a load is scalarized because it is used by a load/store address, the legacy cost model does not pass ScalarEvolution to getAddressComputationCost. Match the behavior in VPReplicateRecipe::computeCost.
18 hours[SPIR-V] Fix `asdouble` issue in SPIRV codegen to correctly generate ↵Lucie Choi1-0/+28
`OpBitCast` instruction. (#161891) Generate `OpBitCast` instruction for pointer cast operation if the element type is different. The HLSL for the unit test is ```hlsl StructuredBuffer<uint2> In : register(t0); RWStructuredBuffer<double2> Out : register(u2); [numthreads(1,1,1)] void main() { Out[0] = asdouble(In[0], In[1]); } ``` Resolves https://github.com/llvm/llvm-project/issues/153513
18 hours[NewGVN] Remove returned arg simplification (#161865)ManuelJBrito1-0/+21
Replacing uses of the return value with the argument is already handled in other passes, additionally it causes issues with memory value numbering when the call is a memory defining intrinsic. fixes #159918
18 hours[LLVM][CodeGen] Check Non Saturate Case in isSaturatingMinMax (#160637)Yatao Wang10-174/+714
Fix Issue #160611
19 hoursReapply "[InstCombine] Preserve profile after folding select instructions ↵Alan Zhao2-39/+55
with conditionals" (#161885) (#161890) This reverts commit 572b579632fb79ea6eb562a537c9ff1280b3d4f5. This is a reland of #159666 but with a fix moving the `extern` declaration of the flag under the LLVM namespace, which is needed to fix a linker error caused by #161240.
19 hours[AArch64][GlobalISel] Use TargetConstant for shift immediates (#161527)David Green18-597/+313
This changes the intrinsic definitions for shifts to use IntArg, which in turn changes how the shifts are represented in SDAG to use TargetConstant (and fixes up a number of ISel lowering places too). The vecshift immediates are changed from ImmLeaf to TImmLeaf to keep them matching the TargetConstant. On the GISel side the constant shift amounts are then represented as immediate operands, not separate constants. The end result is that this allows a few more patterns to match in GISel.
19 hours[Hexagon] Support lowering of setuo & seto for vector types in Hexagon (#158740)Fateme Hosseini1-0/+93
Resolves instruction selection failure for v64f16 and v32f32 vector types. Patch by: Fateme Hosseini --------- Co-authored-by: Kaushik Kulkarni <quic_kauskulk@quicinc.com>
20 hours[SLP][NFC]Add udiv/srem test cases, NFCAlexey Bataev1-0/+101
20 hoursRevert "[InstCombine] Preserve profile after folding select instructions ↵Mehdi Amini2-55/+39
with conditionals" (#161885) Reverts llvm/llvm-project#159666 Many bots are broken right now.
21 hours[InstCombine] Preserve profile after folding select instructions with ↵Alan Zhao2-39/+55
conditionals (#159666) If `select` simplification produces the transform: ``` (select A && B, T, F) -> (select A, T, F) ``` or ``` (select A || B, T, F) -> (select A, T, F) ``` it stands to reason that if the branches are the same, then the branch weights remain the same since the net effect is a simplification of the conditional. There are also cases where InstCombine negates the conditional (and therefore reverses the branches); this PR asserts that the branch weights are reversed in this case. Tracking issue: #147390
22 hours[Hexagon] isel-fold-shl-zext.ll - regenerate test checks (#161869)Simon Pilgrim1-5/+7
Improves codegen diff in an upcoming patch
22 hours[InstComb] Handle undef in simplifyMasked(Store|Scatter) (#161825)Ramkumar Ramachandra2-2/+58
22 hours[AMDGPU][Attributor] Stop inferring amdgpu-no-flat-scratch-init in sanitized ↵Chaitanya2-2/+26
functions. (#161319) This PR stops the attributor pass to infer `amdgpu-no-flat-scratch-init` for functions marked with `sanitize_*` attribute.
23 hours[x86] lowerV4I32Shuffle - don't adjust PSHUFD splat masks to match UNPCK ↵Simon Pilgrim12-59/+59
(#161846) Allow getV4X86ShuffleImm8ForMask to create a pure splat mask, helping to reduce demanded elts.
23 hoursCodeGen: Do not store RegisterClass copy costs as a signed value (#161786)Matt Arsenault1-0/+31
Tolerate setting negative values in tablegen, and store them as a saturated uint8_t value. This will allow naive uses of the copy cost to directly add it as a cost without considering the degenerate negative case. The degenerate negative cases are only used in InstrEmitter / DAG scheduling, so leave the special case processing there. There are also fixmes about this system already there. This is the expedient fix for an out of tree target regression after #160084. Currently targets can set a negative copy cost to mark copies as "impossible". However essentially all the in-tree uses only uses this for non-allocatable condition registers. We probably should replace the InstrEmitter/DAG scheduler uses with a more direct check for a copyable register but that has test changes.
23 hoursAMDGPU: Fix broken register class IDs in mir tests (#161832)Matt Arsenault5-21/+21
23 hours[RISCV] Support scalar llvm.fmodf intrinsic. (#161743)Craig Topper8-2/+509
23 hours[AMDGPU] Enable XNACK on gfx1250 (#161457)Shilei Tian29-1169/+1256
This should be always on. Fixes SWDEV-555931.
23 hoursFold SVE mul and mul_u to neg during isel (#160828)Martin Wehking1-0/+131
Replace mul and mul_u ops with a neg operation if their second operand is a splat value -1. Apply the optimization also for mul_u ops if their first operand is a splat value -1 due to their commutativity.
24 hours[Hexagon] Added lowering for sint_to_fp from v32i1 to v32f32 (#159507)pkarveti2-25/+42
The transformation pattern is identical to the uint_to_fp conversion from v32i1 to v32f32.
24 hours[X86] Fold ADD(x,x) -> X86ISD::VSHLI(x,1) (#161843)Simon Pilgrim16-398/+398
Now that #161007 will attempt to fold this back to ADD(x,x) in X86FixupInstTunings, we can more aggressively create X86ISD::VSHLI nodes to avoid missed optimisations due to oneuse limits, avoids unnecessary freezes and allows AVX512 to fold to mi memory folding variants. I've currently limited SSE targets to cases where ADD is the only user of x to prevent extra moves - AVX shift patterns benefit from breaking the ADD+ADD+ADD chains into shifts, but its not so beneficial on SSE with the extra moves.
24 hours[LAA] Check if Ptr can be freed between Assume and CtxI. (#161725)Florian Hahn2-4/+4
When using information from dereferenceable assumptions, we need to make sure that the memory is not freed between the assume and the specified context instruction. Instead of just checking canBeFreed, check if there any calls that may free between the assume and the context instruction. This patch introduces a willNotFreeBetween to check for calls that may free between an assume and a context instructions, to also be used in https://github.com/llvm/llvm-project/pull/161255. PR: https://github.com/llvm/llvm-project/pull/161725
24 hoursAllow DW_OP_rot, DW_OP_neg, and DW_OP_abs in DIExpression (#160757)Tom Tromey1-0/+10
The Ada front end can emit somewhat complicated DWARF expressions for the offset of a field. While working in this area I found that I needed DW_OP_rot (to implement a branch-free computation -- it looked more difficult to add support for branching); and DW_OP_neg and DW_OP_abs (just basic functionality).
26 hours[SPARC] Prevent meta instructions from being inserted into delay slots (#161111)Koakuma1-0/+25
Do not move meta instructions like `FAKE_USE`/`@llvm.fake.use` into delay slots, as they don't correspond to real machine instructions. This should fix crashes when compiling with, for example, `clang -Og`.
26 hoursAMDGPU: Fix constrain register logic for physregs (#161794)Matt Arsenault5-940/+650
We do not need to reconstrain physical registers. Enables an additional fold for constant physregs.
28 hours[AArch64][SME] Enable `aarch64-split-sve-objects` with hazard padding (#161714)Benjamin Maxwell2-138/+140
This enables `aarch64-split-sve-objects` by default. Note: This option only has an effect when used in conjunction with hazard padding (`aarch64-stack-hazard-size` != 0). See https://github.com/llvm/llvm-project/pull/142392 for more details.
28 hours[GVN] Teach GVN simple masked load/store forwarding (#157689)Matthew Devereau2-2/+210
This patch teaches GVN how to eliminate redundant masked loads and forward previous loads or instructions with a select. This is possible when the same mask is used for masked stores/loads that write to the same memory location
28 hours[X86][GlobalIsel] Adds support for G_UMIN/G_UMAX/G_SMIN/G_SMAX (#161783)Mahesh-Attarde4-328/+648
Original PR broke in rebase https://github.com/llvm/llvm-project/pull/160247. Continuing here This patch adds support for G_[U|S][MIN|MAX] opcodes into X86 Target. This PR addressed review comments 1. About Widening to next power of 2 https://github.com/llvm/llvm-project/pull/160247#discussion_r2371655478 2. clamping scalar https://github.com/llvm/llvm-project/pull/160247#discussion_r2374748440
28 hours[X86][GlobalIsel] Enable gisel run for fpclass isel (#160741)Mahesh-Attarde1-134/+122
X86 Gisel has all necessary opcodes supported to expand/lower isfpclass intrinsic, enabling test prior fpclass patch. This patch enables runs for isel-fpclass.ll tests
29 hours[AArch64] Refactor and refine cost-model for partial reductions (#158641)Sander de Smalen8-615/+479
This cost-model takes into account any type-legalisation that would happen on vectors such as splitting and promotion. This results in wider VFs being chosen for loops that can use partial reductions. The cost-model now also assumes that when SVE is available, the SVE dot instructions for i16 -> i64 dot products can be used for fixed-length vectors. In practice this means that loops with non-scalable VFs are vectorized using partial reductions where they wouldn't before, e.g. ``` int64_t foo2(int8_t *src1, int8_t *src2, int N) { int64_t sum = 0; for (int i=0; i<N; ++i) sum += (int64_t)src1[i] * (int64_t)src2[i]; return sum; } ``` These changes also fix an issue where previously a partial reduction would be used for mixed sign/zero-extends (USDOT), even when +i8mm was not available.
29 hours[llvm][ELF]Add Shdr check for getBuildID (#126537)Ruoyu Qiu1-0/+1
Add Section Header check for getBuildID, fix crash with invalid Program Header. Fixes: #126418 --------- Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com> Signed-off-by: Ruoyu Qiu <qiuruoyu@xiaomi.com> Co-authored-by: Ruoyu Qiu <qiuruoyu@xiaomi.com> Co-authored-by: James Henderson <James.Henderson@sony.com>
30 hours[AMDGPU] Account for implicit XCNT insertion (#160812)Aaditya2-7/+3
Hardware inserts an implicit `S_WAIT_XCNT 0` between alternate SMEM and VMEM instructions, so there are never outstanding address translations for both SMEM and VMEM at the same time.
30 hours[llvm-jitlink] Use MachOObjectFile::getArchTriple for triple identifi… ↵Lang Hames3-0/+46
(#161799) …cation. Replaces a call to ObjectFile::makeTriple (still used for ELF and COFF) with a call to MachOObjectFile::getArchTriple. The latter knows how to build correct triples for different MachO CPU subtypes, e.g. arm64 vs arm64e, which is important for selecting the right slice from universal archives.
31 hours[AMDGPU] Define VS_128*. NFCI (#161798)Stanislav Mekhanoshin6-60/+60
Needed for future patch.
31 hours[ARM] shouldFoldMaskToVariableShiftPair should be true for scalars up to the ↵AZero133-0/+7433
biggest legal type (#158070) For ARM, we want to do this up to 32-bits. Otherwise the code ends up bigger and bloated.
31 hours[X86] combineBitcastvxi1 - bail out on soft-float targets (#161704)Simon Pilgrim1-0/+40
combineBitcastvxi1 is sometimes called pre-legalization, so don't introduce X86ISD::MOVMSK nodes when vector types aren't legal Fixes #161693
33 hours[JITLink] Add LinkGraph name / triple to debugging output. (#161772)Lang Hames3-3/+3
Adds the name and triple of the graph to LinkGraph::dump output before the rest of the graph content. Calls from JITLinkGeneric.cpp to dump the graph are updated to avoid redundantly naming the graph.
37 hours[X86][AMX] Combine constant zero vector and AMX cast to tilezero (#92384)Phoebe Wang1-60/+18
Found this problem when investigating #91207
41 hours[ARM] Update and cleanup lrint/llrint tests. NFCDavid Green3-59/+67
Most of the fp16 cases still do not work properly. See #161088.
42 hours[LV] Add tests with multiple F(Max|Min)Num reductions w/o fast-math.Florian Hahn3-0/+138
Pre-commits extra test coverage for loops with multiple F(Max|Min)Num reductions w/o fast-math-flags for follow-up PR.
42 hours[AMDGPU] Be less optimistic when allocating module scope lds (#161464)Jon Chesterfield1-47/+42
Make the test for when additional variables can be added to the struct allocated at address zero more stringent. Previously, variables can be added to it (for faster access) even when that increases the lds requested by a kernel. This corrects that oversight. Test case diff shows the change from all variables being allocated into the module lds to only some being, in particular the introduction of uses of the offset table and that some kernels now use less lds than before. Alternative to PR 160181