aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib
AgeCommit message (Collapse)AuthorFilesLines
2024-04-03[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions.Alexey Bataev1-10/+42
Compiler can improve analysis for operands of UIToFP/SIToFP instructions and operands of ICmp instruction. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/85966
2024-04-03Revert "[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp ↵Alexey Bataev1-40/+10
instructions." This reverts commit 899855d2b11856a44e530fffe854d76be69b9008 to fix the issue reported in https://lab.llvm.org/buildbot/#/builders/165/builds/51659.
2024-04-03[SLP]Improve minbitwidth analysis for operands of IToFP and ICmp instructions.Alexey Bataev1-10/+40
Compiler can improve analysis for operands of UIToFP/SIToFP instructions and operands of ICmp instruction. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/85966
2024-04-03[AMDGPU] Add a missing COV6 case to getAMDHSACodeObjectVersion() (#87492)Emma Pilkington1-0/+2
2024-04-03[AMDGPU][MC] Allow VOP3C dpp src1 to be imm or SGPR (#87418)Joe Nash3-61/+2
Allows src1 of VOP3 encoded VOPC to be an SGPR or inline immediate on GFX1150Plus The w32 and w64 _e64_dpp assembler only real instructions were unused, and erroneously constructed in a way that bugged parsing of the new instructions. They are removed. This patch is a follow up to PR https://github.com/llvm/llvm-project/pull/87382
2024-04-03AMDGPU: Use PseudoInstr to name SIMCInstr for DSDIR and SOPs, NFC (#87537)Changpeng Fang2-40/+40
We should consistently use PseudoInstr instead of Mnemonic to name SIMCInstr, even though they may be the same in most cases
2024-04-03[SLP]Add support for commutative intrinsics.Alexey Bataev1-8/+36
Implemented long-standing TODO to support commutative intrinsics. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/86316
2024-04-03[PseudoProbe] Mix block and call probe ID in lexical order (#75092)Lei Wang1-15/+7
Before all the call probe ids are after block ids, in this change, it mixed the call probe and block probe by reordering them in lexical(line-number) order. For example: ``` main(): BB1 if(...) BB2 foo(..); else BB3 bar(...); BB4 ``` Before the profile is ``` main 1: .. 2: .. 3: ... 4: ... 5: foo ... 6: bar ... ``` Now the new order is ``` main 1: .. 2: .. 3: foo ... 4: ... 5: bar ... 6: ... ``` This can potentially make it more tolerant of profile mismatch, either from stale profile or frontend change. e.g. before if we add one block, even the block is the last one, all the call probes are shifted and mismatched. Moreover, this makes better use of call-anchor based stale profile matching. Blocks are matched based on the closest anchor, there would be more anchors used for the matching, reduce the mismatch scope.
2024-04-03[SLP]Fix PR87133: crash because of different altopcodes for cmps after ↵Alexey Bataev1-1/+24
reordering. If the node has cmp instruction with 3 or more different but swappable predicates, need to keep same kind of main/alternate opcodes to avoid incorrect detection of opcodes after reordering. Reordering changes the order and we may erroneously consider swappable opcodes as non-compatible/alternate, which may lead to a later compiler crash. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/87267
2024-04-03[SLP]Fix PR87477: fix alternate node cast cost/codegen.Alexey Bataev1-25/+40
Have to compare actual type size to pick up proper cast operation opcode.
2024-04-03[SamplePGO] Support -salvage-stale-profile without probes too (#86116)Krzysztof Pszeniczny1-4/+4
Currently -salvage-stale-profile is a no-op if the profile is not probe-based. We observed that it can help for regular, non-probe- based profiles too: some of our internal benchmarks show 0.2-0.3% QPS improvement. There seems to be no good reason to limit this flag to only work for probe-based profiles.
2024-04-03[AMDGPU][MC] Enables sgpr or imm src1 for float VOP3 DPP, but excludi… ↵Joe Nash4-9/+33
(#87382) …ng VOPC. Fixes support on GFX1150 and GFX12 where src1 of e64_dpp instructions should allow sgpr and imm operands. PR #67461 added support for this with int operands, but it was missing a piece for float. Changing VOPC e64_dpp will be in a different patch because there is a bug preventing that change.
2024-04-03[SelectionDAG] Dump convergencectrl_glue DAG node (#87487)Jay Foad1-0/+1
2024-04-03[DAG] visitADDLikeCommutative - convert (add x, shl(0 - y, n)) fold to ↵Simon Pilgrim1-6/+4
SDPatternMatch. NFC.
2024-04-03[X86] getEffectiveX86CodeModel - take a Triple argument instead of just a ↵Simon Pilgrim1-3/+4
Is64Bit flag. NFC. (#87479) Matches what most other targets do and makes it easier to specify code model based off other triple settings in the future.
2024-04-03[DAG] SimplifyDemandedVectorElts - add ↵aniplcc1-0/+4
ISD::AVGCEILS/AVGCEILU/AVGFLOORS/AVGFLOORU nodes (#86284) Fixes #84768
2024-04-03[SLP][NFC]Simplify common analysis of instructions in ↵Alexey Bataev1-107/+78
BoUpSLP::collectValuesToDemote by outlining common code, NFC.
2024-04-03[VPlan] Factor out logic to check if recipe is dead (NFCI).Florian Hahn1-16/+22
In preparation to use the helper in more places.
2024-04-03[VP][DAGCombine] Use `simplifySelect` when combining vp.select. (#87342)AinsleySnow1-0/+7
Hi all, This patch is a follow-up of #79101. It migrates logic from `visitVSELECT` to `visitVP_SELECT` to simplify `vp.select`. With this patch we can do the following combinations: ``` vp.select undef, T, F --> T (if T is a constant), F otherwise vp.select <condition>, undef, F --> F vp.select <condition>, T, undef --> T vp.select false, T, F --> F vp.select <condition>, T, T --> T ``` I'm a total newbie to llvm and I'm sure there's room for improvements in this patch. Please let me know if you have any advice. Thank you in advance!
2024-04-03[X86] Haswell/Broadwell/Skylake DPPS folded instructions use an extra port06 ↵Simon Pilgrim4-8/+16
resource This is an extension to 07151f0241d3f893cb36eb2dbc395d4098f74a87 which handled SandyBridge so we at least model the regression identified in #14640 Confirmed by Agner + uops.info/uica (SkylakeServer also had an incorrect use of Port015 instead of just Port01) I raised #86669 as a proposal for a 'x86 unfold' pass that can unfold these (if we have the free registers) driven by the scheduler model.
2024-04-03[Object][COFF][NFC] Introduce getMachineArchType helper. (#87370)Jacek Caban2-37/+18
It's a common pattern that we have a machine type, but we don't care which ARM64* platform we're dealing with. We already have isAnyArm64 for that, but it does not fit cases where we use a switch statement. With this helper, it's easy to simplify such cases by using Triple::ArchType instead of machine type.
2024-04-03Print more descriptive error message when trying to link a global with ↵Gleb Popov1-1/+1
appending linkage (#69613) This is a proper fix for https://github.com/llvm/llvm-project/issues/40308
2024-04-03[SLP] Use isValidElementType instead of (#87469)Han-Kuan Chen1-1/+1
FixedVectorType::isValidElementType for consistency.
2024-04-03[AMDGPU] Remove useless aliases for FLAT instructions. NFC. (#87462)Jay Foad1-2/+2
We were generating "" (the empty string) as an alias for a bunch of FLAT instructions, which had no effect except to cause tablegen to generate some very long if-else chains in the generate AsmMatcher.
2024-04-03[VPlan] Remove VPTransformState::addMetadata with ArrayRef arg (NFCI).Florian Hahn2-18/+5
addMeadata is only over called with a single element, clean up the variant that takes multiple values.
2024-04-03[CodeGen][ShrinkWrap] Clarify StackAddressUsedBlockInfo meaning (#80679)Elizaveta Noskova1-3/+8
2024-04-03[AArch64][GlobalISel] Basic add_sat and sub_sat vector handling. (#80650)David Green1-5/+8
This tries to fill in the basic vector handling for sadd_sat/uadd_sat and ssub_sat/usub_sat. It just handles the basics, marking legal types and clamping illegally sized vectors to legal ones.
2024-04-03[ExpandLargeFpConvert] Scalarize vector types. (#86954)Bevin Hansson1-8/+41
expand-large-fp-convert cannot handle vector types. If overly large vector element types survive into isel, they will likely be scalarized there, but since isel cannot handle scalar integer types of that size, it will assert. Handle vector types in expand-large-fp-convert by scalarizing them and then expanding the scalar type operation. For large vectors, this results in a *massive* code expansion, but it's better than asserting.
2024-04-03[InstCombine] Simplify select if it combinated and/or/xor (#73362)hanbeom1-0/+106
`and/or/xor` operations can each be changed to sum of logical operations including operators other than themselves. `x&y -> (x|y) ^ (x^y)` `x|y -> (x&y) | (x^y)` `x^y -> (x|y) ^ (x&y)` if left of condition of `SelectInst` is `and/or/xor` logical operation and right is equal to `0, -1`, or a `constant`, and if `TrueVal` consist of `and/or/xor` logical operation then we can optimize this case. This patch implements this combination. Proof: https://alive2.llvm.org/ce/z/WW8iRR Fixes https://github.com/llvm/llvm-project/issues/71792.
2024-04-03[MIPS] Fix the opcode of max.fmt and mina.fmt (#85609)Cinhi Young1-4/+4
- The opcode of the mina.fmt and max.fmt is documented wrong, the object code compiled from the same assembly with LLVM behaves differently than one compiled with GCC and Binutils. - Modify the opcodes to match Binutils. The actual opcodes are as follows: {5,3} | bits {2,0} of func | ... | 100 | 101 | 110 | 111 -----+-----+-----+-----+-----+----- 010 | ... | min | mina | max | maxa
2024-04-02[RISCV] Slightly simplify RVVArgDispatcher::constructArgInfos. NFC (#87308)Craig Topper1-4/+2
Use a single insert for the non-mask case instead of a push_back followed by an insert that may contain 0 registers.
2024-04-03Reapply "[CodeGen] Fix register pressure computation in MachinePipeli… ↵Ryotaro KASUGA1-1/+1
(#87312) …ner (#87030)" Fix broken test. This reverts commit b8ead2198f27924f91b90b6c104c1234ccc8972e.
2024-04-02AMDGPU: Use PseudoInstr instead of Pseudo Mnemonic for SIMCInstr, NFC (#87420)Changpeng Fang1-2/+2
Pseudo Mnemonic could be of other uses.
2024-04-02[VPlan] Make sure OR VPInstructions are treated as disjoint ops.Florian Hahn2-4/+23
Make sure that VPInstructions with OR opcodes are properly registered as disjoint ops. Fixes https://github.com/llvm/llvm-project/issues/87378.
2024-04-02[RISCV][NFC] Delete some unused pseudo multiclasses (#87401)Michael Maitland1-26/+0
We only use the `RM` equivalents now.
2024-04-02MachineScheduler: Simplify usage of TargetInstrInfoMatt Arsenault1-12/+4
2024-04-02[CallSiteInfo][NFC] CallSiteInfo -> CallSiteInfo.ArgRegPairs (#86842)Prabhuk8-15/+17
CallSiteInfo is originally used only for argument - register pairs. Make it struct, in which we can store additional data for call sites. Also, the variables/methods used for CallSiteInfo are named for its original use case, e.g., CallFwdRegsInfo. Refactor these for the upcoming use, e.g. addCallArgsForwardingRegs() -> addCallSiteInfo(). An upcoming patch will add type ids for indirect calls to propogate them from middle-end to the back-end. The type ids will be then used to emit the call graph section. Original RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151044.html Updated RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-July/151739.html Differential Revision: https://reviews.llvm.org/D107109?id=362888 Co-authored-by: Necip Fazil Yildiran <necip@google.com>
2024-04-02[RISCV] Lower (vector_interleave X, undef) to (vzext_vl X). (#87283)Craig Topper1-1/+10
If the odd vector is undef or poison, the widening add and multiply trick doesn't work unless we freeze the odd vector. Unfortunately, freezing doesn't work when the operand is provably undef/poison. MIR doesn't have a representation for freeze so it just becomes a COPY from IMPLICIT_DEF which freely propagates undef to each operand independently. To work around this, check for undef explicitly and lower to a VZEXT_VL of the even vector. This produces better code than we'd get from a freeze anyway. I've left a FIXME for adding a freeze. I'll do that as a separate patch as it affects other tests and doesn't help with the new test.
2024-04-02[SLP]Fix PR87384: check for fixed vector type before using.Alexey Bataev1-1/+3
If we have mixed extractelement instructions, fixed and scalable ones, need to check that compiler tries to estimate the cost for fixed vector extractelement, not the scalable one, to avoid compiler crash.
2024-04-02[WebAssembly] Allocate MCSymbolWasm data on MCContext (#85866)Tim Neumann7-74/+44
Fixes #85578, a use-after-free caused by some `MCSymbolWasm` data being freed too early. Previously, `WebAssemblyAsmParser` owned the data that is moved to `MCContext` by this PR, which caused problems when handling module ASM, because the ASM parser was destroyed after parsing the module ASM, but the symbols persisted. The added test passes locally with an LLVM build with AddressSanitizer enabled. Implementation notes: * I've called the added method <code>allocate<b><i>Generic</i></b>String</code> and added the second paragraph of its documentation to maybe guide people a bit on when to use this method (based on my (limited) understanding of the `MCContext` class). We could also just call it `allocateString` and remove that second paragraph. * The added `createWasmSignature` method does not support taking the return and parameter types as arguments: Specifying them afterwards is barely any longer and prevents them from being accidentally specified in the wrong order. * This removes a _"TODO: Do the uniquing of Signatures here instead of ObjectFileWriter?"_ since the field it's attached to is also removed. Let me know if you think that TODO should be preserved somewhere.
2024-04-02[X86] canonicalizeShuffleWithOp - don't fold VPERMI(BINOP(X,Y)) -> ↵Simon Pilgrim1-5/+9
BINOP(VPERMI(X),VPERMI(Y)) VPERMI (VPERMQ/PD) is nearly always lane-crossing and poorly merges with target shuffles (other than itself). For now, I've restricted VPERMI to only merge with itself, constants, loads and splats. We might be able to merge with a few other special cases (AND/ANDNP with constant?), which could help the shuffle-vs-trunc-256.ll AVX512VL regression, but since that now gives similar codegen to the other AVX512 variants, I'd prefer to improve the shuffle lowering for that properly.
2024-04-02[SLP]Fix PR80027: handle case when ext is not reduced but its operand is.Alexey Bataev1-0/+7
Need to handle the case, where the resize operation itself is not reduced but its operand is. In this case need to take an extra analysis for the operand, not the instruction itself.
2024-04-02[RISCV][GISEL] Legalize G_BITCAST for scalable vectors (#85970)Michael Maitland1-0/+6
SelectionDAG marks ISD::BITCAST as legal between scalable vector types and ISelDAGToDAG deletes them. We mark G_BITCAST between scalable vectors as legal in GISel. A future patch will handle what to do with them after the legalizer (likley either drop them in a isel-preprocess or convert them to COPYs). BITCAST is needed for legalization of G_INSERT and G_EXTRACT. This is a precommit for legalization of G_INSERT and G_EXTRACT.
2024-04-02[ADT] Add signed and unsigned mulh to APInt (#84719)Atousa Duprat2-13/+20
Fixes #84207
2024-04-02Move format internal code from llvm::detail to llvm::support::detail. (#87288)Chenguang Wang1-1/+1
Some support code, e.g. llvm/Support/Endian.h, uses llvm::support::detail, but the format-related code uses llvm::detail. On VS2019, when a C++ file includes both headers, a `detail::` from `namespace llvm { ... }` becomes ambiguous. 44253a9c breaks TensorFlow and [JAX](https://github.com/google/jax/actions/runs/8507773013/job/23300219405) build because of this. Since llvm::X::detail seems like a cleaner solution and is used in other places as well (e.g. llvm::yaml::detail), we should probably migrate all llvm::detail usages to llvm::X::detail.
2024-04-02[SLP]Fix PR87329: crash on alternate cast vectorization.Alexey Bataev1-0/+21
Need to fix the analysis for the alternate instructions, based on int extension operations. If the alternate extension node is resized, but not the operand, need to resize the node and do not shuffle final result, we end up only with trunc instruction.
2024-04-02[RISCV] Fix and refactor Zvk sched classes (#86519)Michael Maitland2-41/+22
* VPseudoVALU_V_NoMask_Zvk, VPseudoVALU_S_NoMask_Zvk, VPseudoVALU_VV_NoMask_Zvk, and VPseudoVALU_VI_NoMask_Zvk do not read a merge op * VPseudoUnaryV_V is a unary read instead of a binary read * Convert all other cases `Sched<[...]>` to the equivalent SchedUnary, SchedBinary, or SchedTernary.
2024-04-02[ExpandLargeDivRem] Scalarize vector types. (#86959)Bevin Hansson1-3/+39
expand-large-divrem cannot handle vector types. If overly large vector element types survive into isel, they will likely be scalarized there, but since isel cannot handle scalar integer types of that size, it will assert. Handle vector types in expand-large-divrem by scalarizing them and then expanding the scalar type operation. For large vectors, this results in a *massive* code expansion, but it's better than asserting.
2024-04-02[SelectionDAG][Statepoint] Fix truncation of `gc.statepoint` ID argument ↵Il-Capitano1-1/+1
(#85908) The ID argument of `gc.statepoint` gets incorrectly truncated to 32 bits during code generation. This is fixed by using `uint64_t` instead of `unsigned` for the `ID` member in `SelectionDAGBuilder::StatepointLoweringInfo`, and a `patchpoint` test case is extended to check for 64 bit ID generation in stackmaps.
2024-04-02[SLP][NFC]Do not lookup in MinBWs, reuse previously used iterator.Alexey Bataev1-3/+3