Age | Commit message (Collapse) | Author | Files | Lines |
|
Summary:
Support ConstantInt::get() and Constant::getAllOnesValue() for scalable
vector type, this requires ConstantVector::getSplat() to take in 'ElementCount',
instead of 'unsigned' number of element count.
This change is needed for D73753.
Reviewers: sdesmalen, efriedma, apazos, spatel, huntergr, willlovett
Reviewed By: efriedma
Subscribers: tschuett, hiraditya, rkruppe, psnobl, cfe-commits, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D74386
|
|
This patch allows ISD::FSHR(i32) patterns to lower to ALIGNBIT instructions.
This improves test coverage of ISD::FSHR matching - x86 has both FSHL/FSHR instructions and we prefer FSHL by default.
Differential Revision: https://reviews.llvm.org/D76070
|
|
Fix bug identified by https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=21167, foldVSelectOfConstants must ensure that the 2 build vectors have scalars of the same type before trying to compare APInt values.
|
|
Summary:
Using the default DAG.UnrollVectorOp on v16i8 and v8i16 vectors
results in i8 or i16 nodes being inserted into the SelectionDAG. Since
those are illegal types, this causes a legalization assertion failure
for some code patterns, as uncovered by PR45178. This change unrolls
shifts manually to avoid this issue by adding and using a new optional
EVT argument to DAG.ExtractVectorElements to control the type of the
extract_element nodes.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, hiraditya, sunfish, zzheng, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76043
|
|
This commit is likely causing clang-with-lto-ubuntu to fail
http://lab.llvm.org:8011/builders/clang-with-lto-ubuntu/builds/16052
Also causes PR45185.
This reverts commit f1ac5d2263f8419b865cc78ba1f5c8694970fb6b.
|
|
This is to replace the optimization from the SIOptimizeExecMaskingPreRA.
We have less opportunities in the control flow lowering because many
VGPR copies are still in place and will be removed later, but we know
for sure an instruction is SI_END_CF and not just an arbitrary S_OR_B64
with EXEC.
The subsequent change needs to convert s_and_saveexec into s_and and
address new TODO lines in tests, then code block guarded by the
-amdgpu-remove-redundant-endcf option in the pre-RA exec mask optimizer
will be removed.
Differential Revision: https://reviews.llvm.org/D76033
|
|
This patch is the callee side counterpart for https://reviews.llvm.org/D73209.
It removes the fatal error when we pass more formal arguments than available
registers.
Differential Revision: https://reviews.llvm.org/D74225
|
|
|
|
|
|
Summary:
On 32-bit PPC target[AIX and BE], when we convert an `i64` to `f32`, a `setcc` operand expansion is needed. The expansion will set the result type of expanded `setcc` operation based on if the subtarget use CRBits or not. If the subtarget does use the CRBits, like AIX and BE, then it will set the result type to `i1`, leading to an inconsistency with original `setcc` result type[i32].
And the reason why it crashed underneath is because we don't set result type of setcc consistent in those two places.
This patch fixes this problem by setting original setcc opnode result type also with `getSetCCResultType` interface.
Reviewers: sfertile, cebowleratibm, hubert.reinterpretcast, Xiangling_L
Reviewed By: sfertile
Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75702
|
|
Summary:
De-duplicate isel instruction classes by using RRIm for RRINDm. The latter
becomes obsolete.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D76063
|
|
Program counter on AIX is the dollar-sign.
Differential Revision:https://reviews.llvm.org/D75627
|
|
Summary:
This patch adds the following intrinsics for non-temporal gather loads
and scatter stores:
* aarch64_sve_ldnt1_gather_index
* aarch64_sve_stnt1_scatter_index
These intrinsics implement the "scalar + vector of indices" addressing
mode.
As opposed to regular and first-faulting gathers/scatters, there's no
instruction that would take indices and then scale them. Instead, the
indices for non-temporal gathers/scatters are scaled before the
intrinsics are lowered to `ldnt1` instructions.
The new ISD nodes, GLDNT1_INDEX and SSTNT1_INDEX, are only used as
placeholders so that we can easily identify the cases implemented in
this patch in performGatherLoadCombine and performScatterStoreCombined.
Once encountered, they are replaced with:
* GLDNT1_INDEX -> SPLAT_VECTOR + SHL + GLDNT1
* SSTNT1_INDEX -> SPLAT_VECTOR + SHL + SSTNT1
The patterns for lowering ISD::SHL for scalable vectors (required by
this patch) were missing, so these are added too.
Reviewed By: sdesmalen
Differential Revision: https://reviews.llvm.org/D75601
|
|
This is part of the IR sibling for:
D75576
Related transform committed with:
rG8ec71585719d
|
|
This is part of the IR sibling for:
D75576
(I'm splitting part of the transform as a separate commit
to reduce risk. I don't know of any bugs that might be
exposed by this improved folding, but it's hard to see
those in advance...)
|
|
Lets us remove another SLM proc family flag usage.
This is NFC, but we should probably check whether atom/glm/knl? should be using this flag as well...
|
|
|
|
This patch switches SCCP to use ValueLatticeElement for lattice values,
instead of the local LatticeVal, as first step to enable integer range support.
This patch does not make use of constant ranges for additional operations
and the only difference for now is that integer constants are represented by
single element ranges. To preserve the existing behavior, the following helpers
are used
* isConstant(LV): returns true when LV is either a constant or a constant range with a single element. This should return true in the same cases where LV.isConstant() returned true previously.
* getConstant(LV): returns a constant if LV is either a constant or a constant range with a single element. This should return a constant in the same cases as LV.getConstant() previously.
* getConstantInt(LV): same as getConstant, but additionally casted to ConstantInt.
Reviewers: davide, efriedma, mssimpso
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D60582
|
|
The initialization order was not correct. These bugs were discovered by
valgrind. They appear to work fine in practice but this patch should
unblock switching the AVR backend on by default as now a standard AVR
llc invocation runs without memory errors.
The AVRISelLowering constructor would run before the subtarget boolean
fields were initialized to false. Now, the initialization order is
correct.
|
|
Now that D75114 has landed, DAGCombiner handles this case so the code is redundant.
|
|
Summary:
This adds the ACLE intrinsic family for the VFMA and VFMS
instructions, which perform fused multiply-add on vectors of floats.
I've represented the unpredicated versions in IR using the cross-
platform `@llvm.fma` IR intrinsic. We already had isel rules to
convert one of those into a vector VFMA in the simplest possible way;
but we didn't have rules to detect a negated argument and turn it into
VFMS, or rules to detect a splat argument and turn it into one of the
two vector/scalar forms of the instruction. Now we have all of those.
The predicated form uses a target-specific intrinsic as usual, but
I've stuck to just one, for a predicated FMA. The subtraction and
splat versions are code-generated by passing an fneg or a splat as one
of its operands, the same way as the unpredicated version.
In arm_mve_defs.h, I've had to introduce a tiny extra piece of
infrastructure: a record `id` for use in codegen dags which implements
the identity function. (Just because you can't declare a Tablegen
value of type dag which is //only// a `$varname`: you have to wrap it
in something. Now I can write `(id $varname)` to get the same effect.)
Reviewers: dmgreen, MarkMurrayARM, miyuki, ostannard
Reviewed By: dmgreen
Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits
Tags: #clang, #llvm
Differential Revision: https://reviews.llvm.org/D75998
|
|
Found by the LLVM MemorySanitizer tests when switching AVR to a default
backend.
ELFArch must be initialized before the call to
initializeSubtargetDependencies().
The uninitialized read would occur deep within TableGen'd code.
|
|
Summary:
This patch replaces incorrectt assert with a check. Previously it asserts that
if SCEV cannot prove `isKnownPredicate(A != B)`, then it should be able to prove
`isKnownPredicate(A == B)`.
Both these fact may be not provable. It is shown in the provided test:
Could not prove: `{-294,+,-2}<%bb1> != 0`
Asserting: `{-294,+,-2}<%bb1> == 0`
Obviously, this SCEV is not equal to zero, but 0 is in its range so we cannot
also prove that it is not zero.
Instead of assert, we should be checking the required conditions explicitly.
Reviewers: lebedev.ri, fhahn, sanjoy, fedor.sergeev
Reviewed By: lebedev.ri
Subscribers: hiraditya, zzheng, javed.absar, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D76050
|
|
Summary: This patch adds the basic utilities to deal with dropable uses. dropable uses are uses that we rather drop than prevent transformations, for now they are limited to uses in llvm.assume.
Reviewers: jdoerfert, sstefan1
Reviewed By: jdoerfert
Subscribers: uenoku, lebedev.ri, mgorny, hiraditya, dexonsmith, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D73404
|
|
This patch adds basic strict-fp intrinsics support to PowerPC backend,
including basic arithmetic operations (add/sub/mul/div).
Reviewed By: steven.zhang, andrew.w.kaylor
Differential Revision: https://reviews.llvm.org/D63916
|
|
Summary:
Cost modelling strikes again.
In PR44668 <https://bugs.llvm.org/show_bug.cgi?id=44668> patch series,
i've made the same mistake of always using generic `getOperationCost()`
that i missed in reviewing D73480/D74495 which was later fixed
in 62dd44d76da9aa596fb199bda8b1e8768bb41033.
We should be using more specific hooks instead - `getCastInstrCost()`,
`getArithmeticInstrCost()`, `getCmpSelInstrCost()`.
Evidently, this does not have an effect on the existing testcases,
with unchanged default cost budget. But if it *does* have an effect
on some target, we'll have to segregate tests that use this function
per-target, much like we already do with other TTI-aware transform tests.
There's also an issue that @samparker has brought up in post-commit-review:
>>! In D73501#1905171, @samparker wrote:
> Hi,
> Did you get performance numbers for these patches? We track the performance
> of our (Arm) open source DSP library and the cost model fixes were generally
> a notable improvement, so many thanks for that! But the final patch
> for rewriting exit values has generally been bad, especially considering
> the gains from the modelling improvements. I need to look into it further,
> but on my current test case I'm seeing +30% increase in stack accesses
> with a similar decrease in performance.
> I'm just wondering if you observed any negative effects yourself?
I don't know if this addresses that, or we need D66450 for that.
Reviewers: samparker, spatel, mkazantsev, reames, wmi
Reviewed By: reames
Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits, samparker
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75908
|
|
Summary: When narrowing a scalar G_EXTRACT where the destination lines up perfectly with a single result of the emitted G_UNMERGE_VALUES a COPY should be emitted instead of unconditionally trying to emit a G_MERGE_VALUES.
Reviewers: arsenm, dsanders
Reviewed By: arsenm
Subscribers: wdng, rovka, hiraditya, volkan, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75743
|
|
The note section type implies a specific format that this section does
not have thus tools like readelf fail here. Progbits has no format and
another pipeline compiler already sets the type to progbits.
Differential Revision: https://reviews.llvm.org/D75913
|
|
Delete dead code from 8fffa40400e8719222e7f67152c12738521fa9fb.
|
|
Summary:
Currently, a BoundaryAlign fragment may be inserted after the branch
that needs to be aligned to truncate the current fragment, this fragment is
unused at most of time. To avoid that, we can insert a new empty Data
fragment instead. Non-relaxable instruction is usually emitted into Data
fragment, so the inserted empty Data fragment will be reused at a high
possibility.
Reviewers: annita.zhang, reames, MaskRay, craig.topper, LuoYuanke, jyknight
Reviewed By: reames, LuoYuanke
Subscribers: llvm-commits, dexonsmith, hiraditya
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75438
|
|
Add the workaround for the X86::MOV16ri when describing call site
parameters.
|
|
This patch is intend to implement the missing P8 MacroFusion for LLVM
according to Power8 User's Manual Section 10.1.12 Instruction Fusion
Differential Revision: https://reviews.llvm.org/D70651
|
|
This is a reimplementation of the optimization removed in D75964. The actual spill/fill optimization is handled by D76013, this one just worries about reducing the stackmap section size itself by eliminating redundant entries. As noted in the comments, we could go a lot further here, but avoiding the degenerate invoke case as we did before is probably "enough" in practice.
Differential Revision: https://reviews.llvm.org/D76021
|
|
Summary:
callbr's indirect branches aren't expected to be taken, so reduce their
probabilities to 0 while increasing the default destination to 1. This
allows some code improvements through block placement.
Reviewers: nickdesaulniers
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D72656
|
|
MachO symbol linkage is described by the desc field of the nlist entry, not the
type field.
|
|
It is ok to add dependencies on symbols that are ready, they should just be
skipped.
|
|
|
|
In order for dsymutil to collect .apinotes files (which capture
attributes such as nullability, Swift import names, and availability),
I want to propose adding an apinotes: field to DIModule that gets
translated into a DW_AT_LLVM_apinotes (path) nested inside
DW_TAG_module. This will be primarily used by LLDB to indirectly
extract the Swift names of Clang declarations that were deserialized
from DWARF.
<rdar://problem/59514626>
Differential Revision: https://reviews.llvm.org/D75585
|
|
Summary:
Avoid re-examining operands on recursive walk looking for CTR.
This was causing huge compile time after some earlier optimization
created a large expression.
The start of the expression (created by IndVarSimplify) looked like:
%469 = lshr i64 trunc (i128 xor (i128 udiv (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), ...
with the _ZN4absl13hash_internal13CityHashState5kSeedE referenced many times.
Reviewers: hfinkel
Subscribers: nemanjai, hiraditya, kbarton, jsji, shchenz, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75790
|
|
Summary: Add verification that operand bundles on an llvm.assume are well formed to the verify pass.
Reviewers: jdoerfert
Reviewed By: jdoerfert
Subscribers: hiraditya, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75269
|
|
$ diff -u <(sort thedeps-before.txt) <(sort thedeps-after.txt) \
| grep '^[-+] ' | sort | uniq -c | sort -nr
231 - llvm/include/llvm/ADT/StringMap.h
171 - llvm/include/llvm/Support/AllocatorBase.h
142 - llvm/include/llvm/Support/PointerLikeTypeTraits.h
|
|
Summary:
For scalable vector, index out-of-bound can not be determined at compile-time.
The same apply for VectorUtil findScalarElement().
Add test cases to check the functionality of SimplifyInsert/ExtractElementInst for scalable vector.
Reviewers: sdesmalen, efriedma, spatel, apazos
Reviewed By: efriedma
Subscribers: cameron.mcinally, tschuett, hiraditya, rkruppe, psnobl, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75782
|
|
This is part of PR44213 https://bugs.llvm.org/show_bug.cgi?id=44213
When importing (system) Clang modules, LLDB needs to know which SDK
(e.g., MacOSX, iPhoneSimulator, ...) they came from. While the sysroot
attribute contains the absolute path to the SDK, this doesn't work
well when the debugger is run on a different machine than the
compiler, and the SDKs are installed in different directories. It thus
makes sense to just store the name of the SDK instead of the absolute
path, so it can be found relative to LLDB.
rdar://problem/51645582
Differential Revision: https://reviews.llvm.org/D75646
|
|
of swift and Objective-C bitcode
Summary:
The change is to fix conflict value for metadata "Objective-C Garbage Collection" in the mix of swift and Objective-C bitcode.
The purpose is to provide the support of LTO for swift and Objective-C mixed project.
Reviewers: rjmccall, ahatanak, steven_wu
Reviewed By: rjmccall, steven_wu
Subscribers: manmanren, mehdi_amini, hiraditya, dexonsmith, llvm-commits, jinlin
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D71219
|
|
The cmp math test is inspired by memcmp() patterns seen in D75840.
I know there's at least 1 related fold we can do here if both
values are sext'd, but I'm not seeing a way to generalize further.
We have some other bool math patterns that we want to reduce, but
that might require fixing the bogus transforms noted in D72396.
Alive proof translations of the regression tests:
https://rise4fun.com/Alive/zGWi
Name: demand add 1
%xz = zext i1 %x to i32
%ys = sext i1 %y to i32
%sub = add i32 %xz, %ys
%r = lshr i32 %sub, 31
=>
%notx = xor i1 %x, 1
%and = and i1 %y, %notx
%r = zext i1 %and to i32
Name: demand add 2
%xz = zext i1 %x to i5
%ys = sext i1 %y to i5
%sub = add i5 %xz, %ys
%r = and i5 %sub, 16
=>
%notx = xor i1 %x, 1
%and = and i1 %y, %notx
%r = select i1 %and, i5 -16, i5 0
Name: demand add 3
%xz = zext i1 %x to i8
%ys = sext i1 %y to i8
%a = add i8 %ys, %xz
%r = ashr i8 %a, 7
=>
%notx = xor i1 %x, 1
%and = and i1 %y, %notx
%r = sext i1 %and to i8
Name: cmp math
%gt = icmp ugt i32 %x, %y
%lt = icmp ult i32 %x, %y
%xz = zext i1 %gt to i32
%yz = zext i1 %lt to i32
%s = sub i32 %xz, %yz
%r = lshr i32 %s, 31
=>
%r = zext i1 %lt to i32
Differential Revision: https://reviews.llvm.org/D75961
|
|
Instead, emit a trap and a warning. We force inlining of this
situation, so any function where this happens should be dead as
indirect or external calls are not yet supported. This should avoid
erroring on dead code.
|
|
We just removed a broken duplicate elimination algorithm in D75964, and after landed that it occurred to me that duplicate elimination is simply CSE. SelectionDAG has a build in CSE, so why wasn't that triggering? Well, it turns out we were overly conservative in the memory states for our reloads and CSE (rightly) considers the incoming memory state for a load part of the identity of the load.
By loosening the chain and allowing reordering, we also allow CSE. As shown in the test case, doing iterative CSE as we go is enough to eliminate duplicate stores in later statepoints as well. We key our (block local) slot map by SDValue, so commoning a previous pair of loads at construction time means we also common following stores.
Differential Revision: https://reviews.llvm.org/D76013
|
|
This patch reuses the existing MatchRotate ROTL/ROTR rotation pattern code to also recognize the more general FSHL/FSHR funnel shift patterns when we have constant shift amounts.
Differential Revision: https://reviews.llvm.org/D75114
|
|
instructions.
Summary:
The IR intrinsics are mapped to the following SVE2 instructions:
* WHILERW <Pd>.<T>, <Xn>, <Xm>
* WHILEWR <Pd>.<T>, <Xn>, <Xm>
The intrinsics introduced in this patch are the IR counterpart of the
SVE ACLE functions `svwhilerw` and `svwhilewr` (all data type
variants).
Patch by Maciej Gąbka <maciej.gabka@arm.com>.
Reviewers: kmclaughlin, rengolin
Reviewed By: kmclaughlin
Subscribers: tschuett, kristof.beyls, hiraditya, danielkiss, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D75862
|
|
The assumption is that conditional regions are perfectly nested
and a mask restored at the exit from the inner block will be
completely covered by a mask restored in the outer.
It turns out with our current structurizer this is not always
the case.
Disable the optimization for now, but I want to keep it around
for a while to either try after further structurizer changes or
to move it into control flow lowering where we have more info
and reuse the test.
Differential Revision: https://reviews.llvm.org/D75958
|