Age | Commit message (Collapse) | Author | Files | Lines |
|
legal (#109570)
If SETCC or VSELECT is not legal for vector, we should not expand it,
instead we can split the vectors.
So that, some simple scale instructions can be emitted instead of
some pairs of comparation+selection.
|
|
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
|
|
Many structs in this class have the wrong indentation. To generate this
diff, I touched the first line of each struct and then ran `git
clang-format`. This will make blaming more difficult, but this
autoformatting is difficult to avoid triggering. I think it's best to
push this as one NFC PR.
|
|
This allows selecting the addressing mode for stack instructions
in cases where we need to prove the sign bit is zero.
|
|
This patch handles virtual registers in the VarLocBasedImpl of the
LiveDebugVariables pass, allowing it to be used on architectures that
depend on virtual registers in debugging, like NVPTX. It enables the
pass for NVPTX.
|
|
getShiftAmountConstant helper.
We're assuming shift amount type matches the result type - which is true for vectors, but I'm hoping to generalize this fold in the future.
|
|
SSAIfConv::init (#111500)
|
|
I noticed this while following
https://github.com/llvm/llvm-project/pull/111269. It makes little sense
that FCOPYSIGN would look at the sign of `x`, right? Surely this must be
`y`. Also fix the inconsistency where it's sometimes `x` and sometimes
`X`.
|
|
(#107390)" (#111385)
This reverts commit 09a4c23eb410d4be52202bed21c967a3653c3544.
|
|
(#108519)" (#111372)
This reverts commit 9e7315912656628b606e884e39cdeb261b476f16.
|
|
This reverts commit 3c83102f0615c7d66f6df698ca472ddbf0e9483d.
|
|
Specifically:
fabs, fadd, fceil, fdiv, ffloor, fma, fmax, fmaxnm, fmin, fminnm,
fmul, fnearbyint, fneg, frint, fround, froundeven, fsub, fsqrt &
ftrunc
|
|
fabs and fneg are similar nodes in that they can always be expanded to
integer ops, but currently they diverge when widened.
If the widened vector fabs is marked as expand (and the corresponding
scalar type is too), LegalizeVectorTypes thinks that it may be turned
into a libcall and so will unroll it to avoid the overhead on the undef
elements.
However unlike the other ops in that list like fsin, fround, flog etc.,
an fabs marked as expand will never be legalized into a libcall. Like
fneg, it can always be expanded into an integer op.
This moves it below unrollExpandedOp to bring it in line with fneg,
which fixes an issue on RISC-V with f16 fabs being unexpectedly
scalarized when there's no zfhmin.
|
|
(#111297)
When widening some FP ops, LegalizeVectorTypes will check to see if the
widened op may be scalarized and then turned into a bunch of libcalls,
and if so unroll early to avoid unnecessary libcalls of the padded undef
elements.
It checks if the widened op is legal or custom to see if it will be
scalarized, but promoted ops will also avoid scalarization.
This relaxes the check to account for this which fixes some illegal
vector types on RISC-V from being scalarized when they could be widened.
|
|
|
|
(#111230)
mul and shl have different meanings for the nsw flag. We need to drop it
when converting a multiply by the minimum negative value.
|
|
(NFC)" (#111168)
Reverts llvm/llvm-project#111066
This seems to be breaking some builds:
- https://lab.llvm.org/buildbot/#/builders/51/builds/4732
- https://lab.llvm.org/buildbot/#/builders/41/builds/2534
- https://lab.llvm.org/buildbot/#/builders/73/builds/6601
|
|
Inserting a remember/restore pair is a very clean abstraction, so we can
split the logic out into a helper function. Additionally, cleaning this up
will make it easier to add logic for handling functions that are split across
multiple sections.
|
|
A previous commit d826b0c9 accidentally added a new MachineFunctionProperty,
HasFakeUses, that was unused by the commit (and results in an
uncovered-switch warning, which was fixed by a separate followup 1811e872);
this patch removes that enum value.
|
|
llvm-project/llvm/lib/CodeGen/MachineFunction.cpp:95:10:
error: enumeration value 'HasFakeUses' not handled in switch [-Werror,-Wswitch]
switch(Prop) {
^~~~
1 error generated.
|
|
Following the addition of the llvm.fake.use intrinsic and corresponding
MIR instruction, two further changes are planned: to add an
-fextend-lifetimes flag to Clang that emits these intrinsics, and to
have -Og enable this flag by default. Currently, some logic for handling
fake uses is gated by the optdebug attribute, which is intended to be
switched on by -fextend-lifetimes (and by extension -Og later on).
However, the decision was made that a general optdebug attribute should
be incompatible with other opt_ attributes (e.g. optsize, optnone),
since they all express different intents for how to optimize the
program. We would still like to allow -fextend-lifetimes with optsize
however (i.e. -Os -fextend-lifetimes should be legal), since it may be a
useful configuration and there is no technical reason to not allow it.
This patch resolves this by tracking MachineFunctions that have fake
uses, allowing us to run passes that interact with them and skip passes
that clash with them.
|
|
Enable .debug_loc section for NVPTX backend.
This commit makes NVPTX omit DW_AT_low_pc (and DW_AT_high_pc) for
DW_TAG_compile_unit. This is because cuda-gdb uses the compile unit's
low_pc as a base address, and adds the addresses in the debug_loc
section to it. Removing low_pc is equivalent to setting that base
address to zero, so addition doesn't break the location ranges.
Additionally, this patch forces debug_loc label emission to emit single
labels with no subtraction or base. This would not be necessary if we
could emit `label1 - label2` expressions in PTX. The PTX documentation
at
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#debugging-directives-section
makes it seem like this is supported, but it doesn't actually work. I
believe when that documentation says that you can subtract “label
addresses between labels in the same dwarf section”, it doesn't merely
mean that the labels need to be in the same section as each other, but
in fact they need to be in the same section as the use. If support for
label subtraction is supported such that in the debug_loc section you
can subtract labels from the main code section, then we can remove the
workarounds added in this PR.
Also, since this now emits valid .debug_loc sections, it replaces the
empty .debug_loc to force existence of at least one debug section with
an empty .debug_macinfo section, which matches what nvcc does.
|
|
In https://reviews.llvm.org/D153848, promotion was added for a variety
of f16 ops with zvfhmin, including VP reductions.
However I don't believe it's correct to promote f16 fadd or fmul
reductions to f32 since we need to round the intermediate results.
Today if we lower @llvm.vp.reduce.fadd.nxv1f16 on RISC-V, we'll get two
different results depending on whether we compiled with +zvfh or
+zvfhmin, for example with a 3 element reduction:
; v9 = [0.1563, 5.97e-8, 0.00006104]
; zvfh
vsetivli x0, 3, e16, m1, ta, ma
vmv.v.i v8, 0
vfredosum.vs v8, v9, v8
vfmv.f.s fa0, v8
; fa0 = 0.1563
; zvfhmin
vsetivli x0, 3, e16, m1, ta, ma
vfwcvt.f.f.v v10, v9
vsetivli x0, 3, e32, m1, ta, ma
vmv.v.i v8, 0
vfredosum.vs v8, v10, v8
vfmv.f.s fa0, v8
fcvt.h.s fa0, fa0
; fa0 = 0.1564
This same thing happens with reassociative reductions e.g. vfredusum.vs,
and this also applies for bf16.
I couldn't find anything in the LangRef for reductions that suggest the
excess precision is allowed. There may be something we can do in Clang
with -fexcess-precision=fast, but I haven't looked into this yet.
I presume the same precision issue occurs with fmul, but not with
fmin/fmax/fminimum/fmaximum.
I can't think of another way of lowering these other than scalarizing,
and we can't scalarize scalable vectors, so this just removes the
promotion and adjusts the cost model to return an invalid cost. (It
looks like we also don't currently cost fmul reductions, so presumably
they also have an invalid cost?)
I think this should be enough to stop the loop vectorizer or SLP from
emitting these intrinsics.
|
|
(#110938)
This macros is always defined: either 0 or 1. The correct pattern is to
use #if.
Re-apply #110185 with more fixes for debug build with the ABI breaking
checks disabled.
|
|
(#110923)
…ead of #ifdef (#110883)"
This reverts commit 1905cdbf4ef15565504036c52725cb0622ee64ef, which
causes lots of failures where LLVM doesn't have the right header guards.
The errors can be seen on
[BuildKite](https://buildkite.com/llvm-project/upstream-bazel/builds/112362#01924eae-231c-4d06-ba87-2c538cf40e04),
where the source uses `#ifndef NDEBUG`, but the content in question is
defined when `LLVM_ENABLE_ABI_BREAKING_CHECKS == 1`.
For example, `llvm/include/llvm/Support/GenericDomTreeConstruction.h`
has the following:
```cpp
// Helper struct used during edge insertions.
struct InsertionInfo {
// ...
#ifdef LLVM_ENABLE_ABI_BREAKING_CHECKS
SmallVector<TreeNodePtr, 8> VisitedUnaffected;
#endif
};
// ...
InsertionInfo II;
// ...
#ifndef NDEBUG
II.VisitedUnaffected.push_back(SuccTN);
#endif
```
|
|
For MI bundles, the instruction count remark doesn't count the
instructions inside the bundle.
|
|
This is an implementation of a new "size-aware" machine block placement.
The
idea is to reorder blocks so that the number of fall-through jumps is
maximized.
Observe that profile data is ignored for the optimization, and it is
applied only
for instances with hasOptSize()=true.
This strategy has two benefits:
(i) it eliminates jump instructions, which results in smaller text size;
(ii) we avoid using profile data while reordering blocks, which yields
more
"uniform" functions, thus helping ICF and machine outliner/merger.
For large (mobile) apps, the size benefits of (i) and (ii) are roughly
the same,
combined providing up to 0.5% uncompressed and up to 1% compressed
savings size
on top of the current solution.
The optimization is turned off by default.
|
|
(#110883)
This macros is always defined: either 0 or 1. The correct pattern is to
use #if.
Reapply https://github.com/llvm/llvm-project/pull/110185 with fixes.
|
|
|
|
InlineSpiller. (#109962)
RAGreedy invokes InlineSpiller to spill a particular virtreg inline.
When the spiller does this, it also identifies small, adjacent liveranges called
snippets. These are also spilled or rematerialized in the process.
However, the spiller does not inform RA that it has spilled these regs.
This means that debug variable locations referencing these regs/ranges
are lost.
Mark any spilled regs which do not have a stack slot assigned to them as
allocated to the slot being spilled to to tell LDV that those regs are
located in that slot, even though the regs might no longer exist in the
program after regalloc is finished. Also, inform RA about all of the
regs which were replaced (spilled or rematted), not just the one that was
requested so that it can properly manage the ranges of the debug vars.
|
|
This is heavily based on the SelectionDAG lowerEXTRACT_SUBVECTOR code.
|
|
overlapping Def/Use (#109875)
The current RP handling for uses of an MI that overlap with defs is
confusing and unnecessary. Moreover, the lane masks do not accurately
model the liveness behavior of the subregs. This cleans things up a bit
and more accurately models subreg lane liveness by sinking the use
handling into subsent Uses loop.
The effect of this PR is to replace
A. `increaseRegPressure(Reg, LiveAfter, ~LiveAfter & LiveBefore)`
with
B. `increaseRegPressure(Reg, LiveAfter, LiveBefore)`
Note that A (Defs loop) and B (Uses loop) have different definitions of
LiveBefore
A. `LiveBefore = (LiveAfter & ~DefLanes) | UseLanes`
and
B. `LiveBefore = LiveAfter | UseLanes`
Also note, `increaseRegPressure` will exit if `PrevMask` (`LiveAfter`
for both A/B) has any active lanes, thus these calls will only have an
effect if `LiveAfter` is 0.
A. NewMask = ~LiveAfter & ((LiveAfter & ~DefLanes) | UseLanes) => (1 &
UseLanes) => UseLanes = (0 | UseLanes) => (LiveAfter | UseLanes) =
NewMask B.
|
|
handle zext nneg as sext"
Reverting until I can investigate a miscompilation reported by @mstorsjo
|
|
This replaces some of the most frequent offenders of using a DenseMap that
cause a malloc, where the typical element-count is small enough to fit in
an initial stack allocation.
Most of these are fairly obvious, one to highlight is the collectOffset
method of GEP instructions: if there's a GEP, of course it's going to have
at least one offset, but every time we've called collectOffset we end up
calling malloc as well for the DenseMap in the MapVector.
|
|
Test: AArch64/GlobalISel/irtranslator-subvector.ll
Reference:
https://llvm.org/docs/LangRef.html#llvm-vector-extract-intrinsic
https://llvm.org/docs/LangRef.html#llvm-vector-insert-intrinsic
|
|
vectors in some cases. (#109232)
Copy the same FSUB check from ExpandFNEG to avoid breaking AArch64 and
ARM.
|
|
Move static functions `Function::lookupIntrinsicID` and
`Function::isTargetIntrinsic` to Intrinsic namespace.
|
|
|
|
Fixes poor codegen on AVX512 targets for a test case from #109790
|
|
This was using the default address space instead of the
correct one.
Fixes #56055
|
|
Follow the same patterns as the other min/max variants.
|
|
(#110432)
I'm trying to speed up the reaching def analysis by changing the
underlying data structure. Turning MBBReachingDefsInfo into a proper
class decouples the data structure and its users. This patch does not
change the existing three-dimensional vector structure.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
|
|
Note that we must use insert_or_assign because operator[] would
require DbgValue to have the default constructor.
|
|
|
|
|
|
|
|
As a side-effect of PR #101222, GlobalMerge started making transforms
which are unsafe on Mach-O platforms.
Two issues, in particular, are fixed here:
1. We must never merge symbols in the `__cfstring` section, as the
linker assumes each object in this section is only ever referenced
directly, and that it can split the section as it likes.
Previously, we avoided this problem because CFString literals are
identified by private-linkage symbols. This patch adds a list of
section-names with special behavior, to avoid merging under Mach-O.
2. When GlobalMerge code was originally written, it had to be careful
about emitting symbol aliases, due to issues with Mach-O's subsection
splitting in the linker with `-dead_strip` enabled. The underlying cause
of this problem was fixed in 2016, via creation of the `.alt_entry`
assembler directive, which allows a symbol to not also imply the start
of a new subsection. GlobalMerge's workaround for that issue was never
removed.
In the meantime, Apple's new ld-prime linker was written, and has a bug
in `.alt_entry` handling. Therefore, even though the original issue was
fixed, we must _continue_ to be careful not to emit any such symbol
aliases. The existing workaround avoided it for InternalLinkage symbols,
but after the above-mentioned PR, we also must avoid emitting aliases
for PrivateLinkage symbols.
I will file an Apple bug-report about this issue, so that it can be
fixed in a future version of ld-prime. But, in the meantime, the
workaround is sufficient for GlobalMerge, unless
`-global-merge-on-externals` is enabled (which it is already not by
default, on MachO platforms, due to the original issue).
Fixes #104625
|
|
Currently, we recompute block offsets after each relaxation. This causes
the complexity to be O(n^2) in the number of instructions, inflating
compile
time.
If we instead recompute block offsets after each iteration of the outer
loop, the complexity is O(n). Recomputing offsets in the outer loop will
cause some out-of-range branches to be missed in the inner loop, but
they will be relaxed in the next iteration of the outer loop.
This change may introduce unnecessary relaxations for an architecture
where the relaxed branch is smaller than the unrelaxed branch, but AFAIK
there is no such architecture.
|
|
This changes the existing promote logic to lower, so that it can use
normal integer operations. A minor change was needed to fneg lower code
to handle vectors.
|
|
Modify other legacy dwarf versions to align with the dwarf4 handling
approach when determining whether to generate DWARF5 or GNU extensions.
|