aboutsummaryrefslogtreecommitdiff
path: root/llvm/lib/Target/ARM/MVETailPredication.cpp
AgeCommit message (Collapse)AuthorFilesLines
2025-03-31[IRBuilder] Add new overload for CreateIntrinsic (#131942)Rahul Joshi1-1/+1
Add a new `CreateIntrinsic` overload with no `Types`, useful for creating calls to non-overloaded intrinsics that don't need additional mangling.
2024-11-12[ARM] Remove unused includes (NFC) (#115995)Kazu Hirata1-2/+0
Identified with misc-include-cleaner.
2024-10-17[LLVM] Make more use of IRBuilder::CreateIntrinsic. NFC. (#112706)Jay Foad1-2/+1
Convert many instances of: Fn = Intrinsic::getOrInsertDeclaration(...); CreateCall(Fn, ...) to the equivalent CreateIntrinsic call.
2024-10-11[NFC] Rename `Intrinsic::getDeclaration` to `getOrInsertDeclaration` (#111752)Rahul Joshi1-1/+1
Rename the function to reflect its correct behavior and to be consistent with `Module::getOrInsertFunction`. This is also in preparation of adding a new `Intrinsic::getDeclaration` that will have behavior similar to `Module::getFunction` (i.e, just lookup, no creation).
2024-08-03[SCEV] Use const SCEV * explicitly in more places.Florian Hahn1-7/+9
Use const SCEV * explicitly in more places to prepare for https://github.com/llvm/llvm-project/pull/91961. Split off as suggested.
2024-06-27[IR] Add getDataLayout() helpers to BasicBlock and Instruction (#96902)Nikita Popov1-2/+2
This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.
2024-06-24Revert "[IR][NFC] Update IRBuilder to use InsertPosition (#96497)"Stephen Tozer1-1/+1
Reverts the above commit, as it updates a common header function and did not update all callsites: https://lab.llvm.org/buildbot/#/builders/29/builds/382 This reverts commit 6481dc57612671ebe77fe9c34214fba94e1b3b27.
2024-06-24[IR][NFC] Update IRBuilder to use InsertPosition (#96497)Stephen Tozer1-1/+1
Uses the new InsertPosition class (added in #94226) to simplify some of the IRBuilder interface, and removes the need to pass a BasicBlock alongside a BasicBlock::iterator, using the fact that we can now get the parent basic block from the iterator even if it points to the sentinel. This patch removes the BasicBlock argument from each constructor or call to setInsertPoint. This has no functional effect, but later on as we look to remove the `Instruction *InsertBefore` argument from instruction-creation (discussed [here](https://discourse.llvm.org/t/psa-instruction-constructors-changing-to-iterator-only-insertion/77845)), this will simplify the process by allowing us to deprecate the InsertPosition constructor directly and catch all the cases where we use instructions rather than iterators.
2023-09-11[NFC][RemoveDIs] Use iterators over inst-pointers when using IRBuilderJeremy Morse1-1/+1
This patch adds a two-argument SetInsertPoint method to IRBuilder that takes a block/iterator instead of an instruction, and updates many call sites to use it. The motivating reason for doing this is given here [0], we'd like to pass around more information about the position of debug-info in the iterator object. That necessitates passing iterators around most of the time. [0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939 Differential Revision: https://reviews.llvm.org/D152468
2023-05-06[ARM] Remove unused declaration RematerializeIterCountKazu Hirata1-5/+0
The corresponding function definition was removed by: commit af45907653fd312264632b616eff0fad1ae1eb2e Author: Sjoerd Meijer <sjoerd.meijer@arm.com> Date: Mon Jun 29 15:40:03 2020 +0100
2023-03-29[ARM] Convert active.lane.masks to vctp with non-zero startsDavid Green1-62/+96
This attempts to expand the logic in the MVETailPredication pass to convert active lane masks that the vectorizer produces to vctp instructions that the backend can later turn into tail predicated loops. Especially for addrecs with non-zero starts that can be created from epilog vectorization. There is some adjustment to the logic to handle this, moving some of the code to check the addrec earlier so that we can get the start value. This start value is then incorporated into the logic of checkin the new vctp is valid, and there is a newly added check that it is known to be a multiple of the VF as we expect. Differential Revision: https://reviews.llvm.org/D146517
2022-08-28[Target] Qualify auto in range-based for loops (NFC)Kazu Hirata1-1/+1
2021-12-03[ARM] Use v2i1 for MVE and CDE intrinsicsDavid Green1-7/+3
This adjusts all the MVE and CDE intrinsics now that v2i1 is a legal type, to use a <2 x i1> as opposed to emulating the predicate with a <4 x i1>. The v4i1 workarounds have been removed leaving the natural v2i1 types, notably in vctp64 which now generates a v2i1 type. AutoUpgrade code has been added to upgrade old IR, which needs to convert the old v4i1 to a v2i1 be converting it back and forth to an integer with arm.mve.v2i and arm.mve.i2v intrinsics. These should be optimized away in the final assembly. Differential Revision: https://reviews.llvm.org/D114455
2021-09-01[SCEV] If max BTC is zero, then so is the exact BTC [2 of 2]Philip Reames1-8/+12
This extends D108921 into a generic rule applied to constructing ExitLimits along all paths. The remaining paths (primarily howFarToZero) don't have the same reasoning about UB sensitivity as the howManyLessThan ones did. Instead, the remain cause for max counts being more precise than exact counts is that we apply context sensitive loop guards on the max path, and not on the exact path. That choice is mildly suspect, but out of scope of this patch. The MVETailPredication.cpp change deserves a bit of explanation. We were previously figuring out that two SCEVs happened to be equal because the happened to be identical. When we optimized one with context sensitive information, but not the other, we lost the ability to prove them equal. So, cover this case by subtracting and then applying loop guards again. Without this, we see changes in test/CodeGen/Thumb2/mve-blockplacement.ll Differential Revision: https://reviews.llvm.org/D109015
2021-04-26[ARM] Ensure loop invariant active.lane.mask operandsDavid Green1-0/+4
CGP can move instructions like a ptrtoint into a loop, but the MVETailPredication when converting them will currently assume invariant trip counts. This tries to ensure the operands are loop invariant, and bails if not. Differential Revision: https://reviews.llvm.org/D100550
2021-03-11[ARM] Improve WLS loweringDavid Green1-1/+1
Recently we improved the lowering of low overhead loops and tail predicated loops, but concentrated first on the DLS do style loops. This extends those improvements over to the WLS while loops, improving the chance of lowering them successfully. To do this the lowering has to change a little as the instructions are terminators that produce a value - something that needs to be treated carefully. Lowering starts at the Hardware Loop pass, inserting a new llvm.test.start.loop.iterations that produces both an i1 to control the loop entry and an i32 similar to the llvm.start.loop.iterations intrinsic added for do loops. This feeds into the loop phi, properly gluing the values together: %wls = call { i32, i1 } @llvm.test.start.loop.iterations.i32(i32 %div) %wls0 = extractvalue { i32, i1 } %wls, 0 %wls1 = extractvalue { i32, i1 } %wls, 1 br i1 %wls1, label %loop.ph, label %loop.exit ... loop: %lsr.iv = phi i32 [ %wls0, %loop.ph ], [ %iv.next, %loop ] .. %iv.next = call i32 @llvm.loop.decrement.reg.i32(i32 %lsr.iv, i32 1) %cmp = icmp ne i32 %iv.next, 0 br i1 %cmp, label %loop, label %loop.exit The llvm.test.start.loop.iterations need to be lowered through ISel lowering as a pair of WLS and WLSSETUP nodes, which each get converted to t2WhileLoopSetup and t2WhileLoopStart Pseudos. This helps prevent t2WhileLoopStart from being a terminator that produces a value, something difficult to control at that stage in the pipeline. Instead the t2WhileLoopSetup produces the value of LR (essentially acting as a lr = subs rn, 0), t2WhileLoopStart consumes that lr value (the Bcc). These are then converted into a single t2WhileLoopStartLR at the same point as t2DoLoopStartTP and t2LoopEndDec. Otherwise we revert the loop to prevent them from progressing further in the pipeline. The t2WhileLoopStartLR is a single instruction that takes a GPR and produces LR, similar to the WLS instruction. %1:gprlr = t2WhileLoopStartLR %0:rgpr, %bb.3 t2B %bb.1 ... bb.2.loop: %2:gprlr = PHI %1:gprlr, %bb.1, %3:gprlr, %bb.2 ... %3:gprlr = t2LoopEndDec %2:gprlr, %bb.2 t2B %bb.3 The t2WhileLoopStartLR can then be treated similar to the other low overhead loop pseudos, eventually being lowered to a WLS providing the branches are within range. Differential Revision: https://reviews.llvm.org/D97729
2021-01-15[ARM] Tail predication with constant loop boundsDavid Green1-12/+10
The TripCount for a predicated vector loop body will be ceil(ElementCount/Width). This alters the conversion of an active.lane.mask to a VCPT intrinsics to match. Differential Revision: https://reviews.llvm.org/D94608
2020-11-26[ARM] Cleanup for the MVETailPrediction passDavid Green1-223/+44
This strips out a lot of the code that should no longer be needed from the MVETailPredictionPass, leaving the important part - find active lane mask instructions and convert them to VCTP operations. Differential Revision: https://reviews.llvm.org/D91866
2020-11-10[ARM] Alter t2DoLoopStart to define lrDavid Green1-1/+1
This changes the definition of t2DoLoopStart from t2DoLoopStart rGPR to GPRlr = t2DoLoopStart rGPR This will hopefully mean that low overhead loops are more tied together, and we can more reliably generate loops without reverting or being at the whims of the register allocator. This is a fairly simple change in itself, but leads to a number of other required alterations. - The hardware loop pass, if UsePhi is set, now generates loops of the form: %start = llvm.start.loop.iterations(%N) loop: %p = phi [%start], [%dec] %dec = llvm.loop.decrement.reg(%p, 1) %c = icmp ne %dec, 0 br %c, loop, exit - For this a new llvm.start.loop.iterations intrinsic was added, identical to llvm.set.loop.iterations but produces a value as seen above, gluing the loop together more through def-use chains. - This new instrinsic conceptually produces the same output as input, which is taught to SCEV so that the checks in MVETailPredication are not affected. - Some minor changes are needed to the ARMLowOverheadLoop pass, but it has been left mostly as before. We should now more reliably be able to tell that the t2DoLoopStart is correct without having to prove it, but t2WhileLoopStart and tail-predicated loops will remain the same. - And all the tests have been updated. There are a lot of them! This patch on it's own might cause more trouble that it helps, with more tail-predicated loops being reverted, but some additional patches can hopefully improve upon that to get to something that is better overall. Differential Revision: https://reviews.llvm.org/D89881
2020-10-07[llvm][mlir] Promote the experimental reduction intrinsics to be first class ↵Amara Emerson1-1/+1
intrinsics. This change renames the intrinsics to not have "experimental" in the name. The autoupgrader will handle legacy intrinsics. Relevant ML thread: http://lists.llvm.org/pipermail/llvm-dev/2020-April/140729.html Differential Revision: https://reviews.llvm.org/D88787
2020-09-28[llvm] Fix unused variable in non-debug configurationsTres Popp1-0/+2
2020-09-28[ARM][MVE] Enable tail-predication by defaultSjoerd Meijer1-1/+1
We have been running tests/benchmarks downstream with tail-predication enabled for some time now and this behaves as expected: we are not aware of any correctness issues, and this performs better across the board than with tail-predication disabled. Time to flip the switch! Differential Revision: https://reviews.llvm.org/D88093
2020-09-28[ARM][MVE] tail-predication: overflow checks for elementcount, cont'dSjoerd Meijer1-97/+61
This is a reimplementation of the overflow checks for the elementcount, i.e. the 2nd argument of intrinsic get.active.lane.mask. The element count is lowered in each iteration of the tail-predicated loop, and we must prove that this expression doesn't overflow. Many thanks to Eli Friedman and Sam Parker for all their help with this work. Differential Revision: https://reviews.llvm.org/D88086
2020-09-24[ARM] LowoverheadLoops: add an option to disable tail-predicationSjoerd Meijer1-1/+1
This might be useful for testing. We already have an option -tail-predication but that controls the MVETailPredication pass. This -arm-loloops-disable-tail-pred is just for disabling it in the LowoverheadLoops pass. Differential Revision: https://reviews.llvm.org/D88212
2020-09-16[ARM][MVE] Tail-predication: predicate new elementcount checks on force-enabledSjoerd Meijer1-1/+1
Additional sanity checks were added to get.active.lane.mask's second argument, the loop tripcount/elementcount, in rG635b87511ec3. Like the other (overflow) checks, skip this if tail-predication is forced. Differential Revision: https://reviews.llvm.org/D87769
2020-09-15[ARM][MVE] Tail-predication: use unsigned SCEV ranges for tripcountSjoerd Meijer1-8/+5
Loop tripcount expressions have a positive range, so use unsigned SCEV ranges for them. Differential Revision: https://reviews.llvm.org/D87608
2020-09-15[MVE] fix typo in llvm debug message. NFC.Sjoerd Meijer1-2/+2
2020-09-14[ARM][MVE] Tail-predication: check get.active.lane.mask's TC valueSjoerd Meijer1-10/+71
This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform several sanity checks to see if it is of the form that we expect similarly like we already do for the IV which is the first argument of get.active.lane. Differential Revision: https://reviews.llvm.org/D86074
2020-08-28[ARM] Correct predicate operand for offset gather/scatterDavid Green1-3/+10
These arm_mve_vldr_gather_offset_predicated and arm_mve_vstr_scatter_offset_predicated have some extra parameters meaning the predicate is at a later operand. If a loop contains _only_ those masked instructions, we would miss transforming the active lane mask. Differential Revision: https://reviews.llvm.org/D86791
2020-08-25[ARM][MVE] Tail-predication: remove the BTC + 1 overflow checksSjoerd Meijer1-78/+19
This adapts tail-predication to the new semantics of get.active.lane.mask as defined in D86147. This means that: - we can remove the BTC + 1 overflow checks because now the loop tripcount is passed in to the intrinsic, - we can immediately use that value to setup a counter for the number of elements processed by the loop and don't need to materialize BTC + 1. Differential Revision: https://reviews.llvm.org/D86303
2020-08-13[ARM][MVE] Fix for tail predication for loops containing MVE gather/scattersAnna Welker1-1/+2
Fix to include non-predicated version of write-back gather in special case treatment for deducting the instruction type. (This is fixing https://reviews.llvm.org/D85138 for corner cases) Differential Revision: https://reviews.llvm.org/D85889
2020-08-12[ARM][MVE] Enable tail predication for loops containing MVE gather/scattersAnna Welker1-7/+17
Widen the scope of memory operations that are allowed to be tail predicated to include gathers and scatters, such that loops that are auto-vectorized with the option -enable-arm-maskedgatscat (and actually end up containing an MVE gather or scatter) can be tail predicated. Differential Revision: https://reviews.llvm.org/D85138
2020-08-12[ARM][MVE] tail-predication: overflow checks for backedge taken count.Sjoerd Meijer1-8/+16
This pick ups the work on the overflow checks for get.active.lane.mask, which ensure that it is safe to insert the VCTP intrinisc that enables tail-predication. For a 2d auto-correlation kernel and its inner loop j: M = Size - i; for (j = 0; j < M; j++) Sum += Input[j] * Input[j+i]; For this inner loop, the SCEV backedge taken count (BTC) expression is: (-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body> and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC cannot be max" could not be determined. So overflow behaviour had to be assumed in the loop tripcount expression that uses the BTC. As a result tail-predication had to be forced (with an option) for this case. This change solves that by using ScalarEvolution's helper getConstantMaxBackedgeTakenCount which is able to determine the range of BTC, thus can determine it is safe, so that we no longer need to force tail-predication as reflected in the changed test cases. Differential Revision: https://reviews.llvm.org/D85737
2020-08-09[ARM] Allow vecreduce_add in tail predicated loopsDavid Green1-2/+3
This allows vecreduce_add in loops so that we can tailpredicate them. Differential Revision: https://reviews.llvm.org/D85454
2020-07-13[ARM][MVE] Refactor option -disable-mve-tail-predicationSjoerd Meijer1-10/+25
This refactors option -disable-mve-tail-predication to take different arguments so that we have 1 option to control tail-predication rather than several different ones. This is also a prep step for D82953, in which we want to reject reductions unless that is requested with this option. Differential Revision: https://reviews.llvm.org/D83133
2020-06-30[ARM] Allow the fabs intrinsic to be tail predicatedSamuel Tebbs1-0/+1
This patch stops the fabs intrinsic from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82570
2020-06-30[ARM] Allow the usub_sat and ssub_sat intrinsics to be tail predicatedSamuel Tebbs1-0/+2
This patch stops the usub_sat and ssub_sat intrinsics from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82571
2020-06-30[ARM][MVE] Tail-predication: clean-up of unused codeSjoerd Meijer1-60/+6
After the rewrite of this pass (D79175) I missed one thing: the inserted VCTP intrinsic can be cloned to exit blocks if there are instructions present in it that perform the same operation, but this wasn't triggering anymore. However, it turns out that for handling reductions, see D75533, it's actually easier not not to have the VCTP in exit blocks, so this removes that code. This was possible because it turned out that some other code that depended on this, rematerialization of the trip count enabling more dead code removal later, wasn't doing much anymore due to more aggressive dead code removal that was added to the low-overhead loops pass. Differential Revision: https://reviews.llvm.org/D82773
2020-06-30[ARM] Allow rounding intrinsics to be tail predicatedSamuel Tebbs1-2/+11
This patch stops the trunc, rint, round, floor and ceil intrinsics from blocking tail predication. Differential Revision: https://reviews.llvm.org/D82553
2020-06-26[ARM] Don't revert get.active.lane.mask in ARM Tail-Predication passSjoerd Meijer1-116/+48
Don't revert intrinsic get.active.lane.mask here, this is moved to isel legalization in D82292. Differential Revision: https://reviews.llvm.org/D82105
2020-06-25[ARM] Allow tail predication on sadd_sat and uadd_sat intrinsicsSam Tebbs1-2/+8
This patch stops the sadd_sat and uadd_sat intrinsics from blocking tail predication. Differential revision: https://reviews.llvm.org/D82377
2020-06-24LoopUtils.h - reduce AliasAnalysis.h include to forward declarations. NFC.Simon Pilgrim1-0/+1
Fix implicit include dependencies in source files and replace legacy AliasAnalysis typedef with AAResults where necessary.
2020-06-19[ARM][MVE] tail-predication: renamed internal option.Sjoerd Meijer1-2/+2
Renamed -force-tail-predication to -force-mve-tail-predication because that's more descriptive and consistent.
2020-06-17[ARM] Reimplement MVE Tail-Predication pass using @llvm.get.active.lane.maskSjoerd Meijer1-329/+320
To set up a tail-predicated loop, we need to to calculate the number of elements processed by the loop. We can now use intrinsic @llvm.get.active.lane.mask() to do this, which is emitted by the vectoriser in D79100. This intrinsic generates a predicate for the masked loads/stores, and consumes the Backedge Taken Count (BTC) as its second argument. We can now use that to reconstruct the loop tripcount, instead of the IR pattern match approach we were using before. Many thanks to Eli Friedman and Sam Parker for all their help with this work. This also adds overflow checks for the different, new expressions that we create: the loop tripcount, and the sub expression that calculates the remaining elements to be processed. For the latter, SCEV is not able to calculate precise enough bounds, so we work around that at the moment, but is not entirely correct yet, it's conservative. The overflow checks can be overruled with a force flag, which is thus potentially unsafe (but not really because the vectoriser is the only place where this intrinsic is emitted at the moment). It's also good to mention that the tail-predication pass is not yet enabled by default. We will follow up to see if we can implement these overflow checks better, either by a change in SCEV or we may want revise the definition of llvm.get.active.lane.mask. Differential Revision: https://reviews.llvm.org/D79175
2020-05-24[PatternMatch] abbreviate vector inst matchers; NFCSanjay Patel1-4/+4
Readability is not reduced with these opcodes/match lines, so reduce odds of awkward wrapping from 80-col limit.
2020-05-20[SCEV] Move ScalarEvolutionExpander.cpp to Transforms/Utils (NFC).Florian Hahn1-2/+2
SCEVExpander modifies the underlying function so it is more suitable in Transforms/Utils, rather than Analysis. This allows using other transform utils in SCEVExpander. This patch was originally committed as b8a3c34eee06, but broke the modules build, as LoopAccessAnalysis was using the Expander. The code-gen part of LAA was moved to lib/Transforms recently, so this patch can be landed again. Reviewers: sanjoy.google, efriedma, reames Reviewed By: sanjoy.google Differential Revision: https://reviews.llvm.org/D71537
2020-05-15[SVE] Remove usages of VectorType::getNumElements() from ARMChristopher Tetreault1-7/+7
Reviewers: efriedma, fpetrogalli, kmclaughlin, grosbach, dmgreen Reviewed By: dmgreen Subscribers: tschuett, kristof.beyls, hiraditya, rkruppe, psnobl, dmgreen, danielkiss, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D79816
2020-04-27[ARM] Allow fma in tail predicated loopsDavid Green1-0/+2
There are some intrinsics like this that currently block tail predication, but should be fine. This allows fma through, as the one that I ran into. There may be others that need the same treatment but I've only done this one here. Differential Revision: https://reviews.llvm.org/D78385
2020-04-22[ARM][MVE] Tail-predication: some more comments and debug messages. NFC.Sjoerd Meijer1-17/+52
Finding the loop tripcount is the first crucial step in preparing a loop for tail-predication, and this adds a debug message if a tripcount cannot be found. And while I was at it, I added some more comments here and there. Differential Revision: https://reviews.llvm.org/D78485
2020-03-31Remove "mask" operand from shufflevector.Eli Friedman1-2/+2
Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. Differential Revision: https://reviews.llvm.org/D72467