aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/CodeGen/MachineCombiner.cpp
AgeCommit message (Collapse)AuthorFilesLines
2023-02-17[MachineCombiner] Support local strategy for tracesAnton Sidorenko1-5/+10
For in-order cores MachineCombiner makes better decisions when the critical path is calculated only for the current basic block and does not take into account other blocks from the trace. This patch adds a virtual method to TargetInstrInfo to allow each target decide which strategy to use. Depends on D140541 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D140542
2023-02-16[MachineCombiner][NFC] Rename `MinInstr` to `TraceEnsemble`Anton Sidorenko1-22/+21
We are about to allow different trace strategies for MachineCombiner. Make the name of the ensemble strategy-neutral. Depends on D140540 Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D140541
2023-02-14[MachineTraceMetrics][NFC] Move Strategy enum out of the classAnton Sidorenko1-1/+1
Make forward declaration possible to reduce amount of dependencies and reduce re-compilation burden caused by further patches. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D140539
2023-01-20[MachineCombiner] Use default latency model when no detailed model availablePhilip Reames1-15/+0
This change adjusts the cost modeling used when the target does not have a schedule model with individual instruction latencies. After this change, we use the default latency information available from TargetSchedule. The default latency information essentially ends up treating most instructions as latency 1, with a few "expensive" ones getting a higher cost. Previously, we unconditionally applied the first legal pattern - without any consideration of profitability. As a result, this change both prevents some patterns being applied, and changes which patterns are exercised. (i.e. previously the first pattern was applied, afterwards, maybe the second one is because the first wasn't profitable.) The motivation here is two fold. First, this brings the default behavior in line with the behavior when -mcpu or -mtune is specified. This improves test coverage, and generally makes it less likely we will have bad surprises when providing more information to the compiler. Second, this enables some reassociation for ILP by default. Despite being unconditionally enabled, the prior code tended to "reassociate" repeatedly through an entire chain and simply moving the first operand to the end. The result was still a serial chain, just a different one. With this change, one of the intermediate transforms is unprofitable and we end up with a partially flattened tree. Note that the resulting code diffs show significant room for improvement in the basic algorithm. I am intentionally excluding those from this patch. For the test diffs, I don't seen any concerning regressions. I took a fairly close look at the RISCV ones, but only skimmed the x86 (particularly vector x86) changes. Differential Revision: https://reviews.llvm.org/D141017
2023-01-13[CodeGen] Remove uses of Register::isPhysicalRegister/isVirtualRegister. NFCCraig Topper1-3/+3
Use isPhysical/isVirtual methods. Reviewed By: foad Differential Revision: https://reviews.llvm.org/D141715
2023-01-05Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ partserge-sans-paille1-2/+2
Use deduction guides instead of helper functions. The only non-automatic changes have been: 1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*)) 2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase. 3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated. 4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that). Per reviewers' comment, some useless makeArrayRef have been removed in the process. This is a follow-up to https://reviews.llvm.org/D140896 that introduced the deduction guides. Differential Revision: https://reviews.llvm.org/D140955
2023-01-04[MachineCombine] Reorganize code for readability and tracing [nfc]Philip Reames1-26/+20
2022-11-17[MachineCombiner][RISCV] Add fmadd/fmsub/fnmsub instructions patternsAnton Sidorenko1-0/+4
This patch adds tranformation of fmul+fadd/fsub chains to fused multiply instructions: * fmul+fadd->fmadd * fmul+fsub->fmsub/fnmsub We also will try to combine these instructions if the fmul has more than one use and cannot be deleted. However, removing the dependence between fmul and fadd can still be profitable, and we rely on machine combiner approximations of scheduling. Differential Revision: https://reviews.llvm.org/D136764
2022-10-13[NFC] Use forward decl of MachineCombinerPattern enum to reduce dependenciesAnton Sidorenko1-0/+1
Differential Revision: https://reviews.llvm.org/D135776
2022-07-17[CodeGen] Qualify auto variables in for loops (NFC)Kazu Hirata1-2/+2
2022-07-14[MachineCombiner] Don't compute the latency of transient instructionsGuozhi Wei1-3/+42
If an MI will not generate a target instruction, we should not compute its latency. Then we can compute more precise instruction sequence cost, and get better result. Differential Revision: https://reviews.llvm.org/D129615
2022-06-28[MachineCombiner, AArch64] Add a new pattern A-(B+C) => (A-B)-C to reduce ↵Guozhi Wei1-0/+2
latency Add a new pattern A - (B + C) ==> (A - B) - C to give machine combiner a chance to evaluate which instruction sequence has lower latency. Differential Revision: https://reviews.llvm.org/D124564
2022-03-16Cleanup codegen includesserge-sans-paille1-1/+0
This is a (fixed) recommit of https://reviews.llvm.org/D121169 after: 1061034926 before: 1063332844 Discourse thread: https://discourse.llvm.org/t/include-what-you-use-include-cleanup Differential Revision: https://reviews.llvm.org/D121681
2022-03-10Revert "Cleanup codegen includes"Nico Weber1-0/+1
This reverts commit 7f230feeeac8a67b335f52bd2e900a05c6098f20. Breaks CodeGenCUDA/link-device-bitcode.cu in check-clang, and many LLVM tests, see comments on https://reviews.llvm.org/D121169
2022-03-10Cleanup codegen includesserge-sans-paille1-1/+0
after: 1061034926 before: 1063332844 Differential Revision: https://reviews.llvm.org/D121169
2021-12-08[NFC] Rename MachineFunction::deleteMachineInstr (coding style)Mircea Trofin1-1/+1
2021-12-05[GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues.Jack Andersen1-1/+1
Expanding on D109750. Since `DBG_VALUE` instructions have final register validity determined in `LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune unused register operands as their defs are erased. Consequently, this renders `MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a substantial performance improvement. The only necessary changes involve making relevant passes consider invalid DBG_VALUE vregs uses as valid. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D112852
2021-01-24[PowerPC] support register pressure reduction in machine combiner.Chen Zheng1-0/+3
Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071
2021-01-18Revert "[PowerPC] support register pressure reduction in machine combiner."Tres Popp1-3/+0
This reverts commit 26a396c4ef481cb159bba631982841736a125a9c. See https://reviews.llvm.org/D92071 for a description of the issue.
2021-01-17[PowerPC] support register pressure reduction in machine combiner.Chen Zheng1-0/+3
Reassociating some patterns to generate more fma instructions to reduce register pressure. Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D92071
2020-12-14[MachineCombiner][NFC] Add MustReduceRegisterPressure goalChen Zheng1-6/+63
add a new goal MustReduceRegisterPressure for machine combiner pass. PowerPC will use this new goal to do some register pressure related optimization. Reviewed By: spatel Differential Revision: https://reviews.llvm.org/D92068
2020-06-15[PowerPC] fma chain break to expose more ILPChen Zheng1-0/+2
This patch tries to reassociate two patterns related to FMA to expose more ILP on PowerPC. // Pattern 1: // A = FADD X, Y (Leaf) // B = FMA A, M21, M22 (Prev) // C = FMA B, M31, M32 (Root) // --> // A = FMA X, M21, M22 // B = FMA Y, M31, M32 // C = FADD A, B // Pattern 2: // A = FMA X, M11, M12 (Leaf) // B = FMA A, M21, M22 (Prev) // C = FMA B, M31, M32 (Root) // --> // A = FMUL M11, M12 // B = FMA X, M21, M22 // D = FMA A, M31, M32 // C = FADD B, D Reviewed By: jsji Differential Revision: https://reviews.llvm.org/D80175
2020-05-31[MachineCombine] add a hook for resource length limitChen Zheng1-2/+4
2019-12-09[PGO][PGSO] Instrument the code gen / target passes.Hiroshi Yamauchi1-4/+19
Summary: Split off of D67120. Add the profile guided size optimization instrumentation / queries in the code gen or target passes. This doesn't enable the size optimizations in those passes yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass queries). A second try after reverted D71072. Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71149
2019-12-06Revert "[PGO][PGSO] Instrument the code gen / target passes."Hiroshi Yamauchi1-19/+4
This reverts commit 9a0b5e14075a1f42a72eedb66fd4fde7985d37ac. This seems to break buildbots.
2019-12-06[PGO][PGSO] Instrument the code gen / target passes.Hiroshi Yamauchi1-4/+19
Summary: Split off of D67120. Add the profile guided size optimization instrumentation / queries in the code gen or target passes. This doesn't enable the size optimizations in those passes yet as they are currently disabled in shouldOptimizeForSize (for non-IR pass queries). Reviewers: davidxl Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D71072
2019-11-13Sink all InitializePasses.h includesReid Kleckner1-0/+1
This file lists every pass in LLVM, and is included by Pass.h, which is very popular. Every time we add, remove, or rename a pass in LLVM, it caused lots of recompilation. I found this fact by looking at this table, which is sorted by the number of times a file was changed over the last 100,000 git commits multiplied by the number of object files that depend on it in the current checkout: recompiles touches affected_files header 342380 95 3604 llvm/include/llvm/ADT/STLExtras.h 314730 234 1345 llvm/include/llvm/InitializePasses.h 307036 118 2602 llvm/include/llvm/ADT/APInt.h 213049 59 3611 llvm/include/llvm/Support/MathExtras.h 170422 47 3626 llvm/include/llvm/Support/Compiler.h 162225 45 3605 llvm/include/llvm/ADT/Optional.h 158319 63 2513 llvm/include/llvm/ADT/Triple.h 140322 39 3598 llvm/include/llvm/ADT/StringRef.h 137647 59 2333 llvm/include/llvm/Support/Error.h 131619 73 1803 llvm/include/llvm/Support/FileSystem.h Before this change, touching InitializePasses.h would cause 1345 files to recompile. After this change, touching it only causes 550 compiles in an incremental rebuild. Reviewers: bkramer, asbirlea, bollu, jdoerfert Differential Revision: https://reviews.llvm.org/D70211
2019-08-01Finish moving TargetRegisterInfo::isVirtualRegister() and friends to ↵Daniel Sanders1-3/+3
llvm::Register as started by r367614. NFC llvm-svn: 367633
2019-06-02[X86] Fix several places that weren't passing what they though they were to ↵Craig Topper1-2/+4
MachineInstr::print Over a year ago, MachineInstr gained a fourth boolean parameter that occurs before the TII pointer. When this happened, several places started accidentally passing TII into this boolean parameter instead of the TII parameter. llvm-svn: 362312
2019-04-04[IR] Refactor attribute methods in Function class (NFC)Evandro Menezes1-1/+1
Rename the functions that query the optimization kind attributes. Differential revision: https://reviews.llvm.org/D60287 llvm-svn: 357731
2019-02-04[AsmPrinter] Remove hidden flag -print-schedule.Andrea Di Biagio1-7/+4
This patch removes hidden codegen flag -print-schedule effectively reverting the logic originally committed as r300311 (https://llvm.org/viewvc/llvm-project?view=revision&revision=300311). Flag -print-schedule was originally introduced by r300311 to address PR32216 (https://bugs.llvm.org/show_bug.cgi?id=32216). That bug was about adding "Better testing of schedule model instruction latencies/throughputs". These days, we can use llvm-mca to test scheduling models. So there is no longer a need for flag -print-schedule in LLVM. The main use case for PR32216 is now addressed by llvm-mca. Flag -print-schedule is mainly used for debugging purposes, and it is only actually used by x86 specific tests. We already have extensive (latency and throughput) tests under "test/tools/llvm-mca" for X86 processor models. That means, most (if not all) existing -print-schedule tests for X86 are redundant. When flag -print-schedule was first added to LLVM, several files had to be modified; a few APIs gained new arguments (see for example method MCAsmStreamer::EmitInstruction), and MCSubtargetInfo/TargetSubtargetInfo gained a couple of getSchedInfoStr() methods. Method getSchedInfoStr() had to originally work for both MCInst and MachineInstr. The original implmentation of getSchedInfoStr() introduced a subtle layering violation (reported as PR37160 and then fixed/worked-around by r330615). In retrospect, that new API could have been designed more optimally. We can always query MCSchedModel to get the latency and throughput. More importantly, the "sched-info" string should not have been generated by the subtarget. Note, r317782 fixed an issue where "print-schedule" didn't work very well in the presence of inline assembly. That commit is also reverted by this change. Differential Revision: https://reviews.llvm.org/D57244 llvm-svn: 353043
2019-01-19Update the file headers across all of the LLVM projects in the monorepoChandler Carruth1-4/+3
to reflect the new license. We understand that people may be surprised that we're moving the header entirely to discuss the new license. We checked this carefully with the Foundation's lawyer and we believe this is the correct approach. Essentially, all code in the project is now made available by the LLVM project under our new license, so you will see that the license headers include that license only. Some of our contributors have contributed code under our old license, and accordingly, we have retained a copy of our old license notice in the top-level files in each project and repository. llvm-svn: 351636
2019-01-10[MachineCombiner][NFC] Prevent dereferencing past-the-end object in an MRI ↵Gerolf Hoflehner1-0/+2
container llvm-svn: 350896
2018-05-14Rename DEBUG macro to LLVM_DEBUG.Nicola Zaghen1-24/+27
The DEBUG() macro is very generic so it might clash with other projects. The renaming was done as follows: - git grep -l 'DEBUG' | xargs sed -i 's/\bDEBUG\s\?(/LLVM_DEBUG(/g' - git diff -U0 master | ../clang/tools/clang-format/clang-format-diff.py -i -p1 -style LLVM - Manual change to APInt - Manually chage DOCS as regex doesn't match it. In the transition period the DEBUG() macro is still present and aliased to the LLVM_DEBUG() one. Differential Revision: https://reviews.llvm.org/D43624 llvm-svn: 332240
2018-04-08[TargetSchedule] shrink interface for init(); NFCISanjay Patel1-1/+1
The TargetSchedModel is always initialized using the TargetSubtargetInfo's MCSchedModel and TargetInstrInfo, so we don't need to extract those and pass 3 parameters to init(). Differential Revision: https://reviews.llvm.org/D44789 llvm-svn: 329540
2018-03-16Revert r327721 "This patch fixes the invalid usage of OptSize in Machine ↵Reid Kleckner1-3/+3
Combiner." It causes asserts when compiling Chromium on Win32 with optimizations. We compile many things with -Os. llvm-svn: 327733
2018-03-16This patch fixes the invalid usage of OptSize in Machine Combiner.Andrew V. Tischenko1-3/+3
Differential Revision: https://reviews.llvm.org/D43813 llvm-svn: 327721
2018-02-26The final step to close D41278 [MachineCombiner] Improve debug output (NFC).Andrew V. Tischenko1-4/+2
Differential Revision: https://reviews.llvm.org/D41278 llvm-svn: 326074
2018-02-15(NFC)[MachineCombiner] Improve debug output.Andrew V. Tischenko1-28/+53
llvm-svn: 325217
2018-02-06Fix unused variable warning in release mode. NFC.Alexander Ivchenko1-0/+1
llvm-svn: 324330
2018-01-31[MachineCombiner] Add check for optimal pattern order.Florian Hahn1-16/+82
In D41587, @mssimpso discovered that the order of some patterns for AArch64 was sub-optimal. I thought a bit about how we could avoid that case in the future. I do not think there is a need for evaluating all patterns for now. But this patch adds an extra (expensive) check, that evaluates the latencies of all patterns, and ensures that the latency saved decreases for subsequent patterns. This catches the sub-optimal order fixed in D41587, but I am not entirely happy with the check, as it only applies to sub-optimal patterns seen while building with EXPENSIVE_CHECKS on. It did not discover any other sub-optimal pattern ordering. Reviewers: Gerolf, spatel, mssimpso Reviewed By: Gerolf, mssimpso Differential Revision: https://reviews.llvm.org/D41766 llvm-svn: 323873
2017-12-15MachineFunction: Return reference from getFunction(); NFCMatthias Braun1-1/+1
The Function can never be nullptr so we can return a reference. llvm-svn: 320884
2017-12-13Remove redundant includes from lib/CodeGen.Michael Zolotukhin1-1/+0
llvm-svn: 320619
2017-12-06[MachineCombiner] Add up latencies of all instructions in new pattern.Florian Hahn1-2/+9
Summary: When calculating the RootLatency, we add up all the latencies of the deleted instructions. But for NewRootLatency we only add the latency of the new root instructions, ignoring the latencies of the other instructions inserted. This leads the combiner to underestimate the cost of patterns which add multiple instructions. This patch fixes that by summing up the latencies of all new instructions. For NewRootNode, the more complex getLatency function is used. Note that we may be slightly more precise than just summing up all latencies. For example, consider a pattern like r1 = INS1 .. r2 = INS2 .. r3 = INS3 r1, r2 I think in some other places, the total latency of the pattern would be estimated as lat(INS3) + max(lat(INS1), lat(INS2)). If you consider that worth changing, I think it would be best to do in a follow-up patch. Reviewers: Gerolf, sebpop, spop, fhahn Reviewed By: fhahn Subscribers: evandro, llvm-commits Differential Revision: https://reviews.llvm.org/D40307 llvm-svn: 319951
2017-11-17Fix a bunch more layering of CodeGen headers that are in TargetDavid Blaikie1-2/+2
All these headers already depend on CodeGen headers so moving them into CodeGen fixes the layering (since CodeGen depends on Target, not the other way around). llvm-svn: 318490
2017-11-08Target/TargetInstrInfo.h -> CodeGen/TargetInstrInfo.h to match layeringDavid Blaikie1-1/+1
This header includes CodeGen headers, and is not, itself, included by any Target headers, so move it into CodeGen to match the layering of its implementation. llvm-svn: 317647
2017-10-30[MC] Split out register def/use idx calls to make debugging simpler. NFCI.Simon Pilgrim1-3/+4
llvm-svn: 316927
2017-10-11[MachineCombiner] Fix initialisation of LastUpdate for incremental update.Florian Hahn1-2/+4
Summary: Fixes a bogus iterator resulting from the removal of a block's first instruction at the point that incremental update is enabled. Patch by Paul Walker. Reviewers: fhahn, Gerolf, efriedma, MatzeB Reviewed By: fhahn Subscribers: aemerson, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D38734 llvm-svn: 315502
2017-09-20Recommit [MachineCombiner] Update instruction depths incrementally for large ↵Florian Hahn1-23/+82
BBs. This version of the patch fixes an off-by-one error causing PR34596. We do not need to use std::next(BlockIter) when calling updateDepths, as BlockIter already points to the next element. Original commit message: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313751
2017-09-13Revert r312719 "[MachineCombiner] Update instruction depths incrementally ↵Hans Wennborg1-82/+23
for large BBs." This caused PR34596. > [MachineCombiner] Update instruction depths incrementally for large BBs. > > Summary: > For large basic blocks with lots of combinable instructions, the > MachineTraceMetrics computations in MachineCombiner can dominate the compile > time, as computing the trace information is quadratic in the number of > instructions in a BB and it's relevant successors/predecessors. > > In most cases, knowing the instruction depth should be enough to make > combination decisions. As we already iterate over all instructions in a basic > block, the instruction depth can be computed incrementally. This reduces the > cost of machine-combine drastically in cases where lots of instructions > are combined. The major drawback is that AFAIK, computing the critical path > length cannot be done incrementally. Therefore we only compute > instruction depths incrementally, for basic blocks with more > instructions than inc_threshold. The -machine-combiner-inc-threshold > option can be used to set the threshold and allows for easier > experimenting and checking if using incremental updates for all basic > blocks has any impact on the performance. > > Reviewers: sanjoy, Gerolf, MatzeB, efriedma, fhahn > > Reviewed By: fhahn > > Subscribers: kiranchandramohan, javed.absar, efriedma, llvm-commits > > Differential Revision: https://reviews.llvm.org/D36619 llvm-svn: 313213