aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib
AgeCommit message (Collapse)AuthorFilesLines
4 days[ConstantFPRange] Add support for flushDenormals (#163074)Yingwei Zheng1-0/+22
This patch provides a helper function to handle non-IEEE denormal flushing behaviours. For the dynamic mode, it returns a union of all possible results.
4 days[AMDGPU] Add register usage debug printing the point of maximum register ↵Valery Pykhtin3-6/+188
pressure. (#161850) Basically this allows to analyze "why so many VGPRs used?". It prints all live registers at the point of maximum register pressure and for each register its defs/uses are dumped. Currently can be run before and after the scheduler but would be nice if it can be ran inbetween any passes (not sure this is possible with legacy pass-manager). Requires debug or built with asserts compiler. Highly recommended to run with debug info to have debug locations for instructions. Example output: ``` *** Register pressure info (VGPRs) for _ZN7ck_tile6ken.... *** Max pressure is 256 VGPRs at 41780e@BB.18 (LoopHdr BB.16, Depth 1): %9858:vreg_512_align2 = contract V_MFMA_F32_32X32X16_BF16_mac_vgprcd_e64 %10137:vreg_128_align2, %10141:vreg_128_align2, %9858:vreg_512_align2(tied-def 0), 0, 0, 0, implicit $mode, implicit $exec Live registers with single definition (123 VGPRs): %10126:VReg_128_Align2, LiveMask 00000000000000FF (4 VGPRs) def 41600r@BB.18 (LoopHdr BB.16, Depth 1): undef %10126.sub0_sub1:vreg_128_align2 = DS_READ_B64_TR_B16 %478:vgpr_32, 15232, 0, implicit $exec :: (load (s64) from %ir.sunkaddr1314, !noalias !60, addrspace 3) def 41608r@BB.18 (LoopHdr BB.16, Depth 1): %10126.sub2_sub3:vreg_128_align2 = DS_READ_B64_TR_B16 %478:vgpr_32, 16320, 0, implicit $exec :: (load (s64) from %ir.sunkaddr1315, !noalias !60, addrspace 3) use 41848r@BB.18 (LoopHdr BB.16, Depth 1): %9856:vreg_512_align2 = contract V_MFMA_F32_32X32X16_BF16_mac_vgprcd_e64 %10126:vreg_128_align2, %10138:vreg_128_align2, %9856:vreg_512_align2(tied-def 0), 0, 0, 0, implicit $mode, implicit $exec %10136:VReg_128_Align2, LiveMask 00000000000000FF (4 VGPRs) def 41264r@BB.18 (LoopHdr BB.16, Depth 1): undef %10136.sub0_sub1:vreg_128_align2 = DS_READ_B64_TR_B16 %478:vgpr_32, 2176, 0, implicit $exec :: (load (s64) from %ir.sunkaddr1294, !noalias !60, addrspace 3) def 41272r@BB.18 (LoopHdr BB.16, Depth 1): %10136.sub2_sub3:vreg_128_align2 = DS_READ_B64_TR_B16 %478:vgpr_32, 3264, 0, implicit $exec :: (load (s64) from %ir.sunkaddr1295, !noalias !60, addrspace 3) use 41788r@BB.18 (LoopHdr BB.16, Depth 1): %9858:vreg_512_align2 = contract V_MFMA_F32_32X32X16_BF16_mac_vgprcd_e64 %10136:vreg_128_align2, %10140:vreg_128_align2, %9858:vreg_512_align2(tied-def 0), 0, 0, 0, implicit $mode, implicit $exec %10129:VReg_128_Align2, LiveMask 00000000000000FF (4 VGPRs) ... Live registers with multiple definitions (133 VGPRs): %9856:VReg_512_Align2, LiveMask 00000000FFFFFFFF (16 VGPRs) def 16544r@BB.8: INLINEASM &"v_pk_mul_f32 $0, $1, $2" [sideeffect] [isconvergent] [attdialect], $0:[regdef:VReg_64_Align2], def undef %9856.sub0_sub1:vreg_512_align2, $1:[reguse:VReg_64_Align2], %4069:vreg_64_align2, $2:[reguse:VReg_64_Align2], %10159:vreg_64_align2, !52 def 16592r@BB.8: INLINEASM &"v_pk_mul_f32 $0, $1, $2" [sideeffect] [isconvergent] [attdialect], $0:[regdef:VReg_64_Align2], def %9856.sub2_sub3:vreg_512_align2, $1:[reguse:VReg_64_Align2], %4069:vreg_64_align2, $2:[reguse:VReg_64_Align2], %10159:vreg_64_align2, !52 def 16608r@BB.8: INLINEASM &"v_pk_mul_f32 $0, $1, $2" [sideeffect] [isconvergent] [attdialect], $0:[regdef:VReg_64_Align2], def %9856.sub4_sub5:vreg_512_align2, $1:[reguse:VReg_64_Align2], %4069:vreg_64_align2, $2:[reguse:VReg_64_Align2], %10159:vreg_64_align2, !52 def 16656r@BB.8: INLINEASM &"v_pk_mul_f32 $0, $1, $2" [sideeffect] [isconvergent] [attdialect], $0:[regdef:VReg_64_Align2], def %9856.sub6_sub7:vreg_512_align2, $1:[reguse:VReg_64_Align2], %4069:vreg_64_align2, $2:[reguse:VReg_64_Align2], %10159:vreg_64_align2, !52 def 16672r@BB.8: INLINEASM &"v_pk_mul_f32 $0, $1, $2" [sideeffect] [isconvergent] [attdialect], $0:[regdef:VReg_64_Align2], def %9856.sub8_sub9:vreg_512_align2, $1:[reguse:VReg_64_Align2], %4069:vreg_64_align2, $2:[reguse:VReg_64_Align2], %10159:vreg_64_align2, !52 def 16720r@BB.8: INLINEASM &"v_pk_mul_f32 $0, $1, $2" [sideeffect] [isconvergent] [attdialect], $0:[regdef:VReg_64_Align2], def %9856.sub10_sub11:vreg_512_align2, $1:[reguse:VReg_64_Align2], %4069:vreg_64_align2, $2:[reguse:VReg_64_Align2], %10159:vreg_64_align2, !52 def 16736r@BB.8: INLINEASM &"v_pk_mul_f32 $0, $1, $2" [sideeffect] [isconvergent] [attdialect], $0:[regdef:VReg_64_Align2], def %9856.sub12_sub13:vreg_512_align2, $1:[reguse:VReg_64_Align2], %4069:vreg_64_align2, $2:[reguse:VReg_64_Align2], %10159:vreg_64_align2, !52 def 16784r@BB.8: INLINEASM &"v_pk_mul_f32 $0, $1, $2" [sideeffect] [isconvergent] [attdialect], $0:[regdef:VReg_64_Align2], def %9856.sub14_sub15:vreg_512_align2, $1:[reguse:VReg_64_Align2], %4069:vreg_64_align2, $2:[reguse:VReg_64_Align2], %10159:vreg_64_align2, !52 def use 41828r@BB.18 (LoopHdr BB.16, Depth 1): %9856:vreg_512_align2 = contract V_MFMA_F32_... ... ********** INTERVALS ********** ... ********** MACHINEINSTRS ********** # Machine code for function _ZN7ck_tile6kentr... ```
4 days[NFC][Verifier] Fix typo initalizer->initializer (#163193)Juan Manuel Martinez Caamaño1-1/+1
4 days[NVPTX] Update architecture support checks for tcgen05 intrinsics (#161519)Rajat Bajpai5-16/+110
This change updates architecture support checks for tcgen05 intrinsics (except tcgen05.mma.*). The newer checks will support family-specific architecture variants as well. After this change, the arch checks will be accurate and match with PTX ISA. Intrinsics affected: - tcgen05.ld/st - tcgen05.alloc/dealloc/relinquish - tcgen05.cp - tcgen05.fence/wait - tcgen05.commit - tcgen05.shift
4 days[Constants] Handle ptrtoaddr in getRelocationInfo()Nikita Popov1-2/+5
Treat it the same way as ptrtoint.
4 days[NFC][LLVM] Code cleanup in CloneFunction.cpp (#162875)Rahul Joshi1-36/+31
- Use explicit types instead of auto when type is not obvious. - Move local function out of anonymous namespace and make them static. - Use structured bindings in some range for loops. - Simplify PHI handling code a bit.
4 days[SSAUpdaterBulk] Add PHI simplification pass. (#150936)Valery Pykhtin1-6/+100
This optimization is performed as a separate pass over newly inserted PHI nodes to simplify and deduplicate them. By processing PHIs separately, we avoid the complexity of tracking reference bookkeeping needed to update BBValueInfo structures during insertion.
4 days[AArch64] Support commuted operands in performFlagSettingCombine (#162496)Cullen Rhodes2-9/+23
Fixes #122624. Assisted-by: gpt-5-codex
4 days[SLP]Enable support for logical ops in copyables (#162945)Alexey Bataev1-1/+5
Allows to use And, Or and Xor instructions as base for copyables.
4 days[Matrix] Use data layout index type for lowering matrix intrinsics (#162646)Nathan Corbyn1-12/+31
To properly support the matrix intrinsics on, e.g., 32-bit platforms (without the need to emit `libc` calls), `LowerMatrixIntrinsics` pass should generate code that performs strided index calculations using the same pointer bit-width as the matrix pointers, as determined by the data layout. This patch updates the `LowerMatrixInstrics` transform to make this the case. PR: https://github.com/llvm/llvm-project/pull/162646
4 days[AMDGPU] Remove NoInfsFPMath uses (#163028)paperchalice2-8/+5
Only `ninf` should be used.
4 daysRevert "[WebAssembly] Lower fmuladd to madd and nmadd" (#163171)Sam Parker9-88/+11
Reverts llvm/llvm-project#161355 Looks like I've broken some intrinsic code generation.
4 days[GlobalISel] Add G_SUB for computeNumSignBits (#158384)Yatao Wang1-0/+38
This patch ports the ISD::SUB handling from SelectionDAG’s ComputeNumSignBits to GlobalISel. Related to https://github.com/llvm/llvm-project/issues/150515. --------- Co-authored-by: Matt Arsenault <arsenm2@gmail.com> Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
4 days[VPlan] Assign custom opcodes to recipes not mapping to IR opcodes. (#162267)Florian Hahn2-5/+12
We can perform CSE on recipes that do not directly map to Instruction opcodes. One example is VPVectorPointerRecipe. Currently this is handled by supporting them in ::canHandle, but currently that means that we return std::nullopt from getOpcodeOrIntrinsicID() for it. This currently only works, because the only case we return std::nullopt and perform CSE is VPVectorPointerRecipe. But that does not work if we support more such recipes, like VPPredInstPHIRecipe (https://github.com/llvm/llvm-project/pull/162110). To fix this, return a custom opcode from getOpcodeOrIntrinsicID for recipes like VPVectorPointerRecipe, using the VPDefID after all regular instruction opcodes. PR: https://github.com/llvm/llvm-project/pull/162267
4 days[VPlan] Strip VPDT's default constructor (NFC) (#162692)Ramkumar Ramachandra4-13/+6
4 days[WebAssembly] Lower fmuladd to madd and nmadd (#161355)Sam Parker9-11/+88
Lower v4f32 and v2f64 fmuladd calls to relaxed_madd instructions. If we have FP16, then lower v8f16 fmuladds to FMA. I've introduced an ISD node for fmuladd to maintain the rounding ambiguity through legalization / combine / isel.
4 days[IR] Handle trunc for ptrtoaddr(inttoptr) cast pair (#162842)Nikita Popov1-6/+10
For ptrtoint(inttoptr) and ptrtoaddr(inttoptr), handle the case where the source and destination size do not match and convert to either zext or trunc. We can't do this if the middle size is smaller than both src/dest, because we'd have to perform an additional masking operation in that case. Most of these cases are handled by dint of ptrtoint/inttoptr size canonicalization (so I added some unit tests instead). However, the ptrtoaddr(inttoptr) case where the pointer size and address size differ is relevant, as in that case the mismatch in integer sizes is canonical.
4 days[DebugCounter] Add -print-debug-counter-queries option (#162827)Nikita Popov1-20/+36
Add a `-print-debug-counter-queries` option which prints the current value of the counter and whether it is executed/skipped each time it is queried. This is useful when interleaving the output with the usual transform debug output, in order to find the correct counter value to use to hit a specific point in the transform.
4 days[LLVM-C] Allow `LLVMGetVolatile` to work with any kind of Instruction (#163060)AMS211-9/+2
Allow LLVMGetVolatile() to work with any kind of Instruction, rather than only memory instructions that accept a volatile flag. For instructions that can never be volatile, the function now return false instead of asserting. This matches the behavior of `Instruction::isVolatile()` in the C++ API.
4 days[InstCombine] Skip foldFBinOpOfIntCastsFromSign for vector ops (#162804)Nikita Popov1-0/+5
Converting a vector float op into a vector int op may be non-profitable, especially for targets where the float op for a given type is legal, but the integer op is not. We could of course also try to address this via a reverse transform in the backend, but I don't think it's worth the bother, given that vectors were never the intended use case for this transform in the first place. Fixes https://github.com/llvm/llvm-project/issues/162749.
4 days[AArch64][SVE] Avoid extra pop of "FixedObject" with ↵Benjamin Maxwell1-29/+41
CalleeSavesAboveFrameRecord (#156452) Previously, we would pop `FixedObject`-bytes after deallocating the SVE area, then again as part of the "AfterCSRPopSize". This could be seen in the tests `@f6` and `@f9`. This patch removes the erroneous pop, and refactors `CalleeSavesAboveFrameRecord` to reuse more of the existing GPR deallocation logic, which allows for post-decrements.
4 days[VPlan] Allow zero-operand m_BranchOn(Cond|Count) (NFC) (#162721)Ramkumar Ramachandra6-18/+20
4 days[AMDGPU] expand-fp: unify scalarization (NFC) (#158588)Frederik Harwath1-80/+55
Extend the existing "scalarize" function which is used for the fp-integer conversion instruction expansion to BinaryOperator instructions and reuse it for the frem expansion; a similar function for scalarizing BinaryOperator instructions exists in the ExpandLargeDivRem pass and this change is a step towards merging that pass with ExpandFp. Further refactoring: Scalarize directly instead of using the "ReplaceVector" as a worklist, rename "Replace" vector to "Worklist", and hoist a check for unsupported scalable vectors to the top of the instruction visiting loop.
5 days[ORC] Add LinkGraph& argument to MemoryMapper::prepare. (#163121)Lang Hames2-3/+5
This gives MemoryMapper implementations a chance to allocate working memory using the LinkGraph's allocator.
5 days[AMDGPU][NFC] Use `getScoreUB` for XCNT insertion. (#162448)Aaditya1-1/+1
5 days[AMDGPU][True16][CodeGen] Add patterns to reduce intermediates (#162047)Carl Ritson1-0/+17
Add patterns which reduce or operations to register sequences when combining i16 values to i32. This removes many intermediate VGPRs and reduces registers pressure.
5 days[llvm] Use [[fallthrough]] instead of LLVM_FALLTHROUGH (NFC) (#163086)Kazu Hirata8-20/+20
[[fallthrough]] is now part of C++17, so we don't need to use LLVM_FALLTHROUGH.
5 daysAMDGPU: Use ELF mangling in data layout (#163011)Matt Arsenault1-2/+2
Closes #95219
5 days[llvm][LoongArch] Replace unnecessary ZERO_EXTEND to ANY_EXTEND (#162593)Zhaoxin Yang1-1/+1
5 days[X86][GlobalISel] Improve carry value selection (#146586)Evgenii Kudriashov2-22/+23
Generally G_UADDE, G_UADDO, G_USUBE, G_USUBO are used together and it was enough to simply define EFLAGS. But if extractvalue is used, we end up with a copy of EFLAGS into GPR. Always generate SETB instruction to put the carry bit on GPR and CMP to set the carry bit back. It gives the correct lowering in all the cases. Closes #120029
5 days[SLP]INsert postponed vector value after all uses, if the parent node is PHIAlexey Bataev1-1/+3
Need to insert the vector value for the postponed gather/buildvector node after all uses non only if the vector value of the user node is phi, but also if the user node itself is PHI node, which may produce vector phi + shuffle. Fixes #162799
5 days[LV] Bail out on loops with switch as latch terminator.Florian Hahn1-0/+13
Currently we cannot vectorize loops with latch blocks terminated by a switch. In the future this could be handled by materializing appropriate compares. Fixes https://github.com/llvm/llvm-project/issues/156894.
5 daysFix typo: IsGlobaLinkage -> IsGlobalLinkage in XCOFF (#161960)小钟2-2/+2
Corrects the spelling of 'IsGlobaLinkage' to 'IsGlobalLinkage' in XCOFF-related code, comments, and tests across the codebase.
5 days[SCEV] Use APInt for DividesBy when collecting loop guard info (NFC). (#163017)Florian Hahn1-43/+32
Follow-up as suggested in https://github.com/llvm/llvm-project/pull/162617. Just use an APInt for DividesBy, as the existing code already operates on APInt and thus handles the case of DividesBy being 1. PR: https://github.com/llvm/llvm-project/pull/163017
5 days[SLP]Support non-ordered copyable argument in non-commutative instructionsAlexey Bataev1-28/+33
If the non-commutative user has several same operands and at least one of them (but not the first) is copyable, need to consider this opportunity when calculating the number of dependencies. Otherwise, the schedule bundle might be not scheduled correctly and cause a compiler crash Fixes #162925
5 days[VPlan] Set flags when constructing truncs using VPWidenCastRecipe.Florian Hahn3-27/+35
VPWidenCastRecipes with Trunc opcodes where missing the correct OpType for IR flags. Update createWidenCast to set the correct flags for truncs, and use it consistenly. Fixes https://github.com/llvm/llvm-project/issues/162374.
5 days[AArch64][NFC] Use member variable RI instead getRegisterInfo in copyPhysReg ↵Tomer Shafir1-48/+38
(#162826) This patch uses the RI member variable directly in the member function AArch64InstrInfo::copyPhysReg, instead of redundant calls to the public API.
5 days[AArch64] Protect against scalable vectors in performUADDVAddCombine.David Green1-1/+2
Fixes #162983
6 days[ConstantFPRange] Add support for add/sub (#162962)Yingwei Zheng1-4/+66
This patch adds support for fadd/fsub operations. I only tested this patch with some special ranges because the exhaustive check is too expensive.
6 days[SelectionDAG] Remove NoInfsFPMath uses (#162788)paperchalice2-17/+5
Users should use fast-math flags instead.
6 days[VPlan] Don't reset canonical IV start value. (#161589)Florian Hahn2-7/+24
Instead of re-setting the start value of the canonical IV when vectorizing the epilogue we can emit an Add VPInstruction to provide canonical IV value, adjusted by the resume value from the main loop. This is in preparation to make the canonical IV a VPValue defined by loop regions. It ensures that the canonical IV always starts at 0. PR: https://github.com/llvm/llvm-project/pull/161589
6 days[SPIR-V] Support `nonuniformindex` intrsinsic in SPIRV CodeGen. (#162540)Lucie Choi1-21/+61
Support `@llvm.spv.resource.nonuniformindex` in SPIRV Codegen. - Add `NonUniformEXT` decoration to the registers marked as `nonuniformindex`, and recursively decorate its child registers (e.g. Copy, AccessChain, Load) that access such index. - `OpCapability ShaderNonUniformEXT` is already added in the code. - [SPV_EXT_descriptor_indexing](https://github.khronos.org/SPIRV-Registry/extensions/EXT/SPV_EXT_descriptor_indexing.html) is skipped because it's added to SPIRV Core in 1.5. ## Unit test - The unit test checks that the register being used in the final Store/Load/Write instruction is decorated, as required by the spec. - The implementation follows [DXC](https://godbolt.org/z/zhqGThcaf) in that it recursively decorates all the child elements until the end. ```hlsl RWStructuredBuffer<uint4> StructuredOut[64]; RWBuffer<uint> UnStructuredOut[64]; [numthreads(64,1,1)] void main(uint3 GTID: SV_GroupThreadID) { StructuredOut[(NonUniformResourceIndex(GTID.x + 1))][98][0] = 99; UnStructuredOut[(NonUniformResourceIndex(GTID.x))][96] = 95; } ``` Resolves https://github.com/llvm/llvm-project/issues/160231, https://github.com/llvm/llvm-project/issues/161852. Verified [offload-test-suite](https://github.com/llvm/offload-test-suite/blob/cfc37840c8ad0d9c08ee900ecbc0b02cc56478ae/test/Feature/ResourceArrays/unbounded-array-nuri.test) started passing for clang.
6 days[VPlan] Return invalid for scalable VF in VPReplicateRecipe::computeCostFlorian Hahn2-6/+6
Replication is currently not supported for scalable VFs. Make sure VPReplicateRecipe::computeCost returns an invalid cost early, for scalable VFs if the recipe is not a single-scalar. Note that this moves the existing invalid-costs.ll out of the AArch64 subdirectory, as it does not use a target triple. Fixes https://github.com/llvm/llvm-project/issues/160792.
6 days[RISCV] Use i32 in more RV32 only patterns. NFCCraig Topper5-40/+43
This reduces RISCVGenDAGISel.inc by about 750 bytes.
6 days[llvm][DebugInfo] Support versioned source language names in DwarfUnit (#162625)Michael Buch1-2/+10
Depends on: * https://github.com/llvm/llvm-project/pull/162621 Now we can start emitting `DW_AT_language_name`, make sure `DwarfUnit::getSourceLanguage` is equipped to handle this. Otherwise the new test-case would assert.
7 days[RISCV] Explicitly use i32 in RV32 Zdinx patterns to prune tablegen ↵Craig Topper4-49/+60
patterns. NFC (#162790) Tablegen was creating unnecessary patterns for the RV64 HwMode.
7 days[PowerPC] Lower ucmp using subtractions (#146446)AZero132-0/+34
Source: Hacker's delight, page 21. Using the carry, we can use contractions to use the ucmp.
7 days[ConstantFPRange] Add support for cast operations (#162686)Yingwei Zheng1-0/+19
This patch adds support for fpext/fptrunc operations. I noticed that finite-only semantics are not supported by the current representation of constant FP ranges. It should be okay for now, as we don't expose these types in the IR.
7 days[NFC] Simplify checks using isDebugOrPseudoInstr API (#145127)Lei Wang2-2/+2
Merge the two checks using the existing API, NFC.
7 daysFix legalizing `FNEG` and `FABS` with `TypeSoftPromoteHalf` (#156343)beetrees2-2/+26
Based on top of #157211. `FNEG` and `FABS` must preserve signalling NaNs, meaning they should not convert to f32 to perform the operation. Instead legalize to `XOR` and `AND`. Fixes almost all of #104915