aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen
AgeCommit message (Collapse)AuthorFilesLines
2024-11-27[SjLjEHPrepare] Configure call sites correctly (#117656)Sergei Barannikov1-6/+5
After 9fe78db4, the pass inserts `store volatile i32 -1, ptr %call_site` before all invoke instruction except the one in the entry block, which has the effect of bypassing landing pads on exceptions. When configuring the call site for a potentially throwing instruction check that it is not `InvokeInst` -- they are handled by earlier code.
2024-11-26[SelectionDAG] Add generic implementation for @llvm.expect.with.probability ↵antangelo3-4/+9
when optimizations are disabled (#117459) Handle \@llvm.expect.with.probability in SelectionDAGBuilder, FastISel, and IntrinsicLowering in the same way \@llvm.expect is handled, where the value is passed through as-is. This can be reached if the intrinsic is used without optimizations, where it would otherwise be properly transformed out. Fixes #115411 for SelectionDAG. A similar patch is likely needed for GlobalISel.
2024-11-27[MachineLateInstrsCleanup] Minor fixing (NFC). (#117816)Jonas Paulsson1-11/+7
With cb57b7a7, MachineLateInstrsCleanup switched to using a map to keep track of kill flags to remedy compile time regressions seen with huge functions. It seems that the comment above clearKillsForDef() became stale with that commit, and also that one of the arguments to it became unused, both of which this patch fixes.
2024-11-26[RISCV][GISel] Use libcalls for f32/f64 G_FCMP without F/D extensions. (#117660)Craig Topper1-22/+42
LegalizerHelp only supported f128 libcalls and incorrectly assumed that the destination register for the G_FCMP was s32.
2024-11-26[DebugInfo] Handle trailing empty blocks when seeking prologue_end spot ↵Jeremy Morse1-24/+38
(#117320) The optimiser will produce empty blocks that are unconditionally executed according to the CFG -- while it may not be meaningful code, and won't get a prologue_end position, we need to not crash on this input. The fault comes from assuming that there's always a next block with some instructions in it, that will eventually produce some meaningful control flow to stop at -- in the given reproducer in issue #117206 this isn't true, because the function terminates with `unreachable`. Thus, I've refactored the "get next instruction logic" into a helper that'll step through all blocks and terminate if there aren't any more. Reproducer from aeubanks
2024-11-26[SDAG] Don't allow implicit trunc in getConstant() (#117558)Nikita Popov1-8/+1
Assert that the passed value is a valid unsigned integer value for the specified type. For signed values getSignedConstant() / getSignedTargetConstant() should be used instead.
2024-11-25[SelectionDAG] Require last operand of (STRICT_)FP_ROUND to be a ↵Craig Topper2-12/+15
TargetConstant. (#117639) Fix all the places I could find that did't do this. We were already mostly correct for FP_ROUND after 9a976f36615dbe15e76c12b22f711b2e597a8e51, but not STRICT_FP_ROUND.
2024-11-25[TTI][RISCV] Unconditionally break critical edges to sink ADDI (#108889)Philip Reames1-1/+3
This looks like a rather weird change, so let me explain why this isn't as unreasonable as it looks. Let's start with the problem it's solving. ``` define signext i32 @overlap_live_ranges(ptr %arg, i32 signext %arg1) { bb: %i = icmp eq i32 %arg1, 1 br i1 %i, label %bb2, label %bb5 bb2: ; preds = %bb %i3 = getelementptr inbounds nuw i8, ptr %arg, i64 4 %i4 = load i32, ptr %i3, align 4 br label %bb5 bb5: ; preds = %bb2, %bb %i6 = phi i32 [ %i4, %bb2 ], [ 13, %bb ] ret i32 %i6 } ``` Right now, we codegen this as: ``` li a3, 1 li a2, 13 bne a1, a3, .LBB0_2 lw a2, 4(a0) .LBB0_2: mv a0, a2 ret ``` In this example, we have two values which must be assigned to a0 per the ABI (%arg, and the return value). SelectionDAG ensures that all values used in a successor phi are defined before exit the predecessor block. This creates an ADDI to materialize the immediate in the entry block. Currently, this ADDI is not sunk into the tail block because we'd have to split a critical edges to do so. Note that if our immediate was anything large enough to require two instructions we *would* split this critical edge. Looking at other targets, we notice that they don't seem to have this problem. They perform the sinking, and tail duplication that we don't. Why? Well, it turns out for AArch64 that this is entirely an accident of the existance of the gpr32all register class. The immediate is materialized into the gpr32 class, and then copied into the gpr32all register class. The existance of that copy puts us right back into the two instruction case noted above. This change essentially just bypasses this emergent behavior aspect of the aarch64 behavior, and implements the same "always sink immediates" behavior for RISCV as well.
2024-11-25[GISel] #undef macros when they are no longer needed. NFC (#117652)Craig Topper1-0/+2
These macros are created inside a function. They should be undefined before the end of the function.
2024-11-25[SelectionDAG][RISCV][AArch64] Allow f16 STRICT_FLDEXP to be promoted. Fix ↵Craig Topper2-3/+17
integer promotion of STRICT_FLDEXP in type legalizer. (#117633) A special case in type legalization wasn't accounting for different operand numbering between FLDEXP and STRICT_FLDEXP. AArch64 already asked STRICT_FLDEXP to be promoted, but had no test for it.
2024-11-25Reland [CGData][GMF] Skip No Params (#116548)Kyungwoo Lee1-5/+6
This update follows up on change #112671 and is mostly a NFC, with the following exceptions: - Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required. - Parameter count is now calculated based on the unique hash count. - Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size. - Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time. - Moved a sorting operation outside of the loop. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-11-25Revert "[CGData][GMF] Skip No Params (#116548)"Kyungwoo Lee1-6/+5
This reverts commit fdf1f69c57ac3667d27c35e097040284edb1f574.
2024-11-25[CGData][GMF] Skip No Params (#116548)Kyungwoo Lee1-5/+6
This update follows up on change #112671 and is mostly a NFC, with the following exceptions: - Introduced `-global-merging-skip-no-params` to bypass merging when no parameters are required. - Parameter count is now calculated based on the unique hash count. - Added `-global-merging-inst-overhead` to adjust the instruction overhead, reflecting the machine instruction size. - Costs and benefits are now computed using the double data type. Since the finalization process occurs offline, this should not significantly impact build time. - Moved a sorting operation outside of the loop. This is a patch for https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-11-25Revert "[SelectOpt] Refactor to prepare for support more select-like ↵Igor Kirillov1-261/+220
operations (#115745)" This reverts commit b5a11d378db4b39ceb085ebd59c941e9369d9596.
2024-11-25Revert "[DAGCombiner] Add support for scalarising extracts of a vector setcc ↵David Sherwood1-25/+16
(#116031)" (#117556) This reverts commit 22ec44f509ff266b581dbb490d7b040473b7c31a.
2024-11-25[SelectOpt] Refactor to prepare for support more select-like operations ↵Igor Kirillov1-220/+261
(#115745) * Enables conversion of several select-like instructions within one group * Any number of auxiliary instructions depending on the same condition can be in between select-like instructions * After splitting the basic block, move select-like instructions into the relevant basic blocks and optimise them * Make it easier to add support shift-base select-like instructions and also any mixture of zext/sext/not instructions
2024-11-25[DAGCombiner] Add support for scalarising extracts of a vector setcc (#116031)David Sherwood1-16/+25
For IR like this: %icmp = icmp ult <4 x i32> %a, splat (i32 5) %res = extractelement <4 x i1> %icmp, i32 1 where there is only one use of %icmp we can take a similar approach to what we already do for binary ops such add, sub, etc. and convert this into %ext = extractelement <4 x i32> %a, i32 1 %res = icmp ult i32 %ext, 5 For AArch64 targets at least the scalar boolean result will almost certainly need to be in a GPR anyway, since it will probably be used by branches for control flow. I've tried to reuse existing code in scalarizeExtractedBinop to also work for setcc. NOTE: The optimisations don't apply for tests such as extract_icmp_v4i32_splat_rhs in the file CodeGen/AArch64/extract-vector-cmp.ll because scalarizeExtractedBinOp only works if one of the input operands is a constant.
2024-11-25[AMDGPU] Use getSignedConstant() where necessary (#117328)Nikita Popov1-6/+6
Create signed constant using getSignedConstant(), to avoid future assertion failures when we disable implicit truncation in getConstant(). This also touches some generic legalization code, which apparently only AMDGPU tests.
2024-11-25[RISCV][MachineVerifier] Use RegUnit for register liveness checking (#115980)Piyou Chen1-1/+5
For the RISC-V target, V14_V15 are not subregisters of v14m4, even though they share some registers. Currently, the MachineVerifier reports an error when checking register liveness for segment load/store operations. This patch adds additional register liveness checking, using RegUnit instead of subregisters, to prevent this error.
2024-11-25[VP] Refactoring some functions in ExpandVectorPredication.NFC (#115840)LiqinWeng1-90/+26
Building vp intrinsic functions using a unified interface for expandPredicationToIntCall/expandPredicationToFPCall/expandPredicationToCastIntrinsic functions.
2024-11-23[AArch64][GlobalISel] Legalize ptr shuffle vector to s64 (#116013)David Green3-3/+47
This converts all ptr element shuffle vectors to s64, so that the existing vector legalization handling can lower them as needed. This prevents a lot of fallbacks that currently try to generate things like `<2 x ptr> G_EXT`. I'm not sure if bitcast/inttoptr/ptrtoint is intended to be necessary for vectors of pointers, but it uses buildCast for the casts, which now generates a ptrtoint/inttoptr.
2024-11-22[BasicBlockSections] Allow mixing of -basic-block-sections with MFS. (#117076)Rahman Lavaee2-8/+22
This PR allows mixing `-basic-block-sections` with `-enable-machine-function-splitter`. The strategy is to let `-basic-block-sections` take precedence over functions with profiles.
2024-11-22[Clang] Attribute NoFPClass should not prevent tail call optimization. (#116741)Félix-Antoine Constantin2-8/+8
Fixes #111950
2024-11-22[SHT_LLVM_BB_ADDR_MAP] Add an option to skip emitting bb entries (#114447)Lei Wang1-18/+36
Sometimes we want to use a `PgoAnalysisMap` feature that doesn't require the BB entries info, e.g. only the `FuncEntryCount`, but the BB entries is emitted by default, so I'm adding an option to skip the info for this case to save the binary size(can save ~90% size of the section). For implementation, it extends a new field(`OmitBBEntries`) in `BBAddrMap::Features` for this and it's controlled by a switch `--basic-block-address-map-skip-bb-entries`. Note that this naturally supports backwards compatibility as the field is zero for the old version, matches the decoding in the new version llvm.
2024-11-22Revert "[RegisterCoalescer] Fix up subreg lanemasks after rematerializing. ↵Vitaly Buka1-22/+0
(#116191)" (#117367) To pass tests with #117307 revert. This reverts commit 3093b29b597b9a936a3e4d1c8bc4a7ccba8fc848.
2024-11-22Revert "[InitUndef] handleSubReg should skip artificial subregs. (#116248)" ↵Vitaly Buka1-8/+0
(#117365) Maybe not needed but to avoid conflicts with #117307 Without revert of this one, but reverting #117307, the regenerated init-undef.mir became empty. This reverts commit be15fd5085680cc5ed9ec4f4f2258b504cdd55db.
2024-11-22[CodeGen][NewPM] Port EdgeBundles analysis to NPM (#116616)Akshat Oke3-15/+36
2024-11-21[GlobalISel] Correct comment about type vs register class (#116083)Daniel Sanders1-3/+2
Type and register class aren't mutually exclusive in gMIR but there's also no target-independent requirement (yet?) to have both on target instructions.
2024-11-21[NFC][VectorUtils][TargetTransformInfo] Add ↵Finn Plummer1-2/+4
`isVectorIntrinsicWithOverloadTypeAtArg` api (#114849) This changes allows target intrinsics to specify and overwrite overloaded types. - Updates `ReplaceWithVecLib` to not provide TTI as there most probably won't be a use-case - Updates `SLPVectorizer` to use available TTI - Updates `VPTransformState` to pass down TTI - Updates `VPlanRecipe` to use passed-down TTI This change will let us add scalarization for `asdouble`: #114847
2024-11-21[MachineLICM] Don't allow hoisting invariant loads across mem barrier. (#116987)Florian Hahn1-1/+1
The improvements in 63917e1 / #70796 do not check for memory barriers/unmodelled sideeffects, which means we may incorrectly hoist loads across memory barriers. Fix this by checking any machine instruction in the loop is a load-fold barrier. PR: https://github.com/llvm/llvm-project/pull/116987
2024-11-21[DAGCombiner] Limit steps in shouldCombineToPostInc (#116030)Jonathan Cohen1-3/+6
Currently the function will walk the entire DAG to find other candidates to perform a post-inc store. This leads to very long compilation times on large functions. Added a MaxSteps limit to avoid this, which is also aligned to how hasPredecessorHelper is used elsewhere in the code.
2024-11-21[SDAG] [X86] Extend SplitVecOp_VSETCC for STRICT_FSETCCS (#116768)abhishek-kaushik221-8/+8
Closes #116767
2024-11-21[DebugInfo][InstrRef][MIR][GlobalIsel][MachineLICM] NFC Use std::move to ↵abhishek-kaushik224-7/+7
avoid copying (#116935)
2024-11-20[CFIFixup] Add frame info to the first block of each section (#113626)Daniel Hoekwater1-11/+58
Now that `-fbasic-block-sections=list` is enabled for Arm, functions may be split aross multiple sections, and CFI information must be handled independently for each section. On x86, this is handled in `llvm/lib/CodeGen/CFIInstrInserter.cpp`. However, this pass does not run on Arm, so we must add logic for it to `llvm/lib/CodeGen/CFIFixup.cpp`.
2024-11-20Fix GCC Wparentheses warning in assert condition / message. NFC.Simon Pilgrim1-2/+2
2024-11-20[SDAG] Generalize FSINCOS type legalization (NFC) (#116848)Benjamin Maxwell2-18/+25
There's nothing that specific to FSINCOS about these; they could be used for similar nodes in the future.
2024-11-20IR: de-duplicate two CmpInst routines (NFC) (#116866)Ramkumar Ramachandra1-1/+1
De-duplicate the functions getSignedPredicate and getUnsignedPredicate, nearly identical versions of which were present in CmpInst and ICmpInst, creating less confusion.
2024-11-19[MachineSink] Fix stable sort comparator (#116705)Ellis Hoag1-1/+2
Fix the comparator in `stable_sort()` to satisfy the strict weak ordering requirement. In https://github.com/llvm/llvm-project/pull/115367 this comparator was changed to use `getCycleDepth()` when `shouldOptimizeForSize()` is true. However, I mistakenly changed to logic so that we use `LHSFreq < RHSFreq` if **either** of them are zero. This causes us to fail the last requirment (https://en.cppreference.com/w/cpp/named_req/Compare). > if comp(a, b) == true and comp(b, c) == true then comp(a, c) == true
2024-11-19[RDF] Fix cover check when linking refs to defs (#113888)Yashas Andaluri1-8/+6
During RDF graph construction, linkRefUp method links a register ref to its upward reaching defs until all RegUnits of the ref have been covered by defs. However, when a sub-register def covers some, but not all, of the RegUnits of a previous super-register def, a super-register ref is not linked to the super-register def. This can result in certain super register defs being dead code eliminated. This patch fixes the cover check for a register ref. A def must be skipped only when all RegUnits of that def have already been covered by a previously seen def.
2024-11-19[AsmPrinter] Fix handling in emitGlobalConstantImpl for AIX (#116255)Zaara Syeda1-1/+23
When GlobalMerge creates a MergedGlobal of statics all initialized to zero, emitGlobalConstantImpl sees a ConstantAggregateZero. This results in just emitting zeros followed by labels for the aliases. We need to handle it more like how emitGlobalConstantStruct does by emitting each global inside the aggregate. --------- Co-authored-by: Hubert Tong <hubert.reinterpretcast@gmail.com>
2024-11-19[RegisterCoalescer] Fix up subreg lanemasks after rematerializing. (#116191)Sander de Smalen1-0/+22
In a situation like the following: ``` undef %2.subreg = INST %1 ; DefMI (rematerializable), ; DefSubIdx = subreg %3 = COPY %2 ; SrcIdx = DstIdx = 0 .... = SOMEINSTR %3, %2 ``` there are no subranges for `%3` because the entire register is copied, but after rematerialization the subrange of the rematerialized value must be fixed up with the appropriate subranges for `.subreg`. (To me this issue seemed a bit similar to the issue fixed by #96839, but then related to rematerialization)
2024-11-18Revert "[NFC] Move DroppedVariableStats to its own file and redesign it to ↵Shubham Sandeep Rastogi2-195/+0
be extensible. (#115563)" This reverts commit 2de78815604e9027efd93cac27c517bf732587d2. Reverted due to buildbot failure: unittests/IR/CMakeFiles/IRTests.dir/DroppedVariableStatsIRTest.cpp.o:DroppedVariableStatsIRTest.cpp:function llvm::DroppedVariableStatsIR::runAfterPass(llvm::StringRef, llvm::Any): error: undefined reference to 'llvm::DroppedVariableStatsIR::runOnModule(llvm::Module const*, bool)'
2024-11-18Revert "Add a pass to collect dropped var stats for MIR. (#115566)"Shubham Sandeep Rastogi2-76/+2
This reverts commit 6e2b77d4696d4a672635c0ba1ead4824e2158a7d. Reverting due to buildbot failure: unittests/IR/CMakeFiles/IRTests.dir/DroppedVariableStatsIRTest.cpp.o:DroppedVariableStatsIRTest.cpp:function llvm::DroppedVariableStatsIR::runAfterPass(llvm::StringRef, llvm::Any): error: undefined reference to 'llvm::DroppedVariableStatsIR::runOnModule(llvm::Module const*, bool)'
2024-11-18Add a pass to collect dropped var stats for MIR. (#115566)Shubham Sandeep Rastogi2-2/+76
This patch uses the DroppedVariableStats class to add dropped variable statistics for MIR passes.
2024-11-18[NFC] Move DroppedVariableStats to its own file and redesign it to be ↵Shubham Sandeep Rastogi2-0/+195
extensible. (#115563) Move DroppedVariableStats code to its own file and change the class to have an extensible design so that we can use it to add dropped statistics to MIR passes and the instruction selector.
2024-11-18[GlobalISel] Move DemandedElt's APInt size assert after isValid() check ↵Daniel Sanders1-9/+9
(#115979) This prevents the assertion from wrongly triggering on invalid LLT's
2024-11-18[GlobalISel] Combine [S,U]SUBO (#116489)Thorsten Schütt1-0/+75
We import the llvm.ssub.with.overflow.* Intrinsics, but the Legalizer also builds them while legalizing other opcodes, see narrowScalarAddSub.
2024-11-18[SelectionDAG] Support integer promotion for VP_LOAD and VP_STORE (#81299)Lei Huang2-0/+38
Add integer promotion support for for VP_LOAD and VP_STORE via legalization of extend and truncate of each form. Patch commandeered from: https://reviews.llvm.org/D109377
2024-11-18[CodeLayout] Do not rebuild chains with -apply-ext-tsp-for-size (#115934)Ellis Hoag1-5/+7
https://github.com/llvm/llvm-project/pull/109711 disables `buildCFGChains()` when `-apply-ext-tsp-for-size` is used to improve codesize. Tail merging can change the layout and normally requires `buildCFGChains()` to be called again, but we want to prevent this when optimizing for codesize. We saw slight size improvement on large binaries with this change. If `-apply-ext-tsp-for-size` is not used, this should be a NFC.
2024-11-18[CodeGen][NewPM] Port PeepholeOptimizer to NPM (#116326)Akshat Oke3-37/+70
With this, all machine SSA optimization passes are available in the new codegen pipeline.