Age | Commit message (Collapse) | Author | Files | Lines |
|
types after type legalization. (#148970)
Fixes #148949
|
|
This essentially merges the handling for VPLoad - currently in
lowerInterleavedVPLoad which is shared between shuffle and intrinsic
based interleaves - into the existing dedicated routine.
My plan is that if we like this factoring is that I'll do the same for
the intrinsic store paths, and then remove the excess generality from
the shuffle paths since we don't need to support both modes in the
shared VPLoad/Store callbacks. We can probably even fold the VP versions
into the non-VP shuffle variants in the analogous way.
|
|
|
|
|
|
There are no longer debug-info instructions, thus we don't need this
skipping. Horray!
|
|
Start using RuntimeLibcalls in the base implementation of
getSafeStackPointerLocation instead of hardcoding the function
names.
|
|
calculateByteProvider only cares about scalars or a single element
within a vector. For the later there is the VectorIndex parameter to
identify the element. All other properties, and specificially Index, are
related to the underyling scalar type and thus when taking the size of a
type it's the scalar size that matters.
Fixes https://github.com/llvm/llvm-project/issues/148387
|
|
(#147916)
Stop using hardcoded function named and check availability. This only fixes
the forced usage via command line in the pass itself; the implementations
inside of TargetLoweringBase hide additional call emission.
|
|
This PR is part of #123870.
The pseudo probe desc emission code can be reused by other target.
|
|
Avoid hardcoding the function name, and query if it's really
supported or not.
|
|
The compiler should not introduce calls to arbitrary strings
that aren't defined in RuntimeLibcalls. Previously OpenBSD was
disabling the default __stack_chk_fail, but there was no record
of the alternative __stack_smash_handler function it emits instead.
This also avoids a random triple check in the pass.
|
|
Unlike the abs intrinsic, the ISD::ABS node defines ABS(INT_MIN) -> INT_MIN, so no undef/poison is created by the node itself
|
|
These are identified by misc-include-cleaner. I've filtered out those
that break builds. Also, I'm staying away from llvm-config.h,
config.h, and Compiler.h, which likely cause platform- or
compiler-specific build failures.
|
|
getExpression() already returns DIExpression *.
|
|
getNode has logic to intersect flags correctly if the new node happens
to CSE with an existing node. Setting node flags after getNode bypasses
this logic and may change the node for other uses where the flags don't
hold.
|
|
|
|
This is handled in CallBase, so it is valid for both call and invoke
|
|
returned. (#148733)
|
|
factors (#148689)
Factoring out and combining `isInterleaveIntrinsic`,
`isDeinterleaveIntrinsic`, and `getIntrinsicFactor` into
`getInterleaveIntrinsicFactor` and `getDeinterleaveIntrinsicFactor`
inside VectorUtils.
NFC.
|
|
### Summary
This PR resolves https://github.com/llvm/llvm-project/issues/147694
|
|
Allows expand of sdiv->mul by constant combine for the general case.
Previously this was only occurring in the exact case. This is part of
the resolution to issue #118090
|
|
same as https://github.com/llvm/llvm-project/pull/138660,
Co-authored-by : Oke, Akshat
<[Akshat.Oke@amd.com](mailto:Akshat.Oke@amd.com)>
|
|
There isn't any way to encode a variable in an SVE register, and there
isn't any way to encode a scalable offset, and as far as I know that's
unlikely to change in the near future. So suppress any debug info which
would require those encodings.
This isn't ideal, but we need to ship something which doesn't crash.
Alternatively, for Z registers, we could emit debug info assuming the
vector length is 128 bits, but that seems like it would lead to
unintuitive results.
The change to AArch64FrameLowering is needed to avoid a crash. But we
can't actually test that the returned offset is correct: LiveDebugValues
performs the query, then discards the result.
|
|
Hexagon currently has an untested global flag to control fast
math variants of libcalls. Add fast variants as explicit libcall
options so this can be a flag based lowering decision, and implement
it. I have no idea what fast math flags the hexagon case requires,
so I picked the maximally potentially relevant set of flags although
this probably is refinable per call. Looking in compiler-rt, I'm not
sure if the fast variants are anything more than aliases.
|
|
It seems `subsituteRegister` checks `FromReg == ToReg` instead of
`TRI->isSubRegisterEq`.
This PR simply reverts the original PR
(https://github.com/llvm/llvm-project/pull/131361) to its initial
implementation (without using `subsituteRegister`).
Not sure whether it is a desired fix (and by no means that I am an
expert on LLVM backend), but it does fix a numeric error on our internal
workload.
Original author: @sdesmalen-arm
|
|
- This implementation is adapted from SDAG
X86TargetLowering::LowerGET_ROUNDING.
- llvm.set.rounding will be added later because it involves MXCSR
updates currently unsupported.
|
|
Vselect (#147305)
Change isBuildVectorAll* -> isConstantSplatVectorAll* in VSelect in case
the fold happens after BuildVector has been canonically transformed to
Splat or if the Splat is initially in vselect already
- Fixes #73454
- Update related test cases, add extra tests in wasm
---------
Co-authored-by: Simon Pilgrim <llvm-dev@redking.me.uk>
|
|
|
|
|
|
This patch adds an additional validation step to ensure that the
generated schedule does not violate loop-carried memory dependencies.
Prior to this patch, incorrect schedules could be produced due to the
lack of checks for the following types of dependencies:
- load-to-store backward (from bottom to top within the BB) dependencies
- store-to-load dependencies
- store-to-store dependencies
One possible solution to this issue is to add these dependencies
directly to the dependency graph, although doing so may lead to
performance degradation. In addition, no known cases of incorrect code
generation caused by these missing dependencies have been observed in
practice. Given these factors, this patch introduces a post-scheduling
validation phase to check for such previously missed dependencies,
instead of adding them to the graph before searching for a schedule.
Since no actual problems have been identified so far, it is likely that
most generated schedules are already valid. Therefore, this additional
validation is not expected to cause performance degradation in practice.
Split off from #135148 .
The remaining tasks are as follows:
- Address other missing loop-carried dependencies (e.g., output
dependencies between physical registers, barrier instructions, and
instructions that may raise floating-point exceptions)
- Remove code that are currently retained to maintain the existing
behavior but probably unnecessary.
- Eliminate `SwingSchedulerDAG::isLoopCarriedDep` and use
`SwingSchedulerDDG` to traverse edges after dependency analysis part.
|
|
Stop emitting these calls by name in PreISelIntrinsicLowering. This
is still kind of a hack. We should be going through the abstract
RTLIB:Libcall, and then checking if the call is really supported in
this module. Do this as a placeholder until RuntimeLibcalls is a
module analysis.
|
|
This allows truncated splat / buildvector in isBoolConstant, to allow
certain not instructions to be recognized post-legalization, and allow
vselect to optimize.
An override for x86 avx512 predicated vectors is required to avoid an
infinite recursion from the code that detects zero vectors. From:
```
// Check if the first operand is all zeros and Cond type is vXi1.
// If this an avx512 target we can improve the use of zero masking by
// swapping the operands and inverting the condition.
```
|
|
## Purpose
Export a small number of private LLVM symbols so that unit tests can
still build/run when LLVM is built as a Windows DLL or a shared library
with default hidden symbol visibility.
## Background
The effort to build LLVM as a WIndows DLL is tracked in #109483.
Additional context is provided in [this
discourse](https://discourse.llvm.org/t/psa-annotating-llvm-public-interface/85307).
Some LLVM unit tests use internal/private symbols that are not part of
LLVM's public interface. When building LLVM as a DLL or shared library
with default hidden symbol visibility, the symbols are not available
when the unit test links against the DLL or shared library.
This problem can be solved in one of two ways:
1. Export the private symbols from the DLL.
2. Link the unit tests against the intermediate static libraries instead
of the final LLVM DLL.
This PR applies option 1. Based on the discussion of option 2 in
#145448, this option is preferable.
## Overview
* Adds a new `LLVM_ABI_FOR_TEST` export macro, which is currently just
an alias for `LLVM_ABI`.
* Annotates the sub-set of symbols under `llvm/lib` that are required to
get unit tests building using the new macro.
|
|
When the RegisterCoalescer adds an implicit-def when coalescing
a SUBREG_TO_REG (#123632), this causes issues when removing other
COPY nodes by commuting the instruction because it doesn't take
the implicit-def into consideration. This PR fixes that.
|
|
|
|
Add missing dependencies to unittest target
Original patch broke BUILD_SHARED bots and required revert #147947
|
|
Reverts llvm/llvm-project#139059
This broke
https://lab.llvm.org/buildbot/#/builders/10/builds/9125/steps/8/logs/stdio
The bot does a SHARED_LIBS=ON build. I can reproduce locally with the
CMake cache file in offload/cmake/caches/AMDGPUBot.cmake as the build
config.
|
|
This PR exposes the backend pass config to plugins via a callback.
Plugin authors can register a callback that is being triggered before
the target backend adds their passes to the pipeline. In the callback
they then get access to the `TargetMachine`, the `PassManager`, and the
`TargetPassConfig`. This allows plugins to call
`TargetPassConfig::insertPass`, which is honored in the subsequent
`addPass` of the main backend. We implemented this using the legacy pass
manager since backends still use it as the default.
|
|
Use the default globals address space
|
|
|
|
take LLVM Context (#147664)
Add LLVM Context to getOptimalMemOpType and findOptimalMemOpLowering. So
that we can use EVT::getVectorVT to generate EVT type in
getOptimalMemOpType.
Related to [#146673](https://github.com/llvm/llvm-project/pull/146673).
|
|
|
|
Anatoly Trosinenko found that when hasSideEffect was set to 0 in the
definition of LOADgotAUTH, MultiSource/Benchmarks/Ptrdist/ks/ks test
from llvm-test-suite started to crash. The issue was traced down to
MachineLICM pass placing LOADgotAUTH right after an unrelated copy to
x16 like rewriting this code:
````
bb.0:
renamable $x16 = COPY renamable $x12
B %bb.1
bb.1:
...
/* use $x16 */
...
renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
/* use $x20 */
...
````
like the following:
````
bb.0:
renamable $x16 = COPY renamable $x12
renamable $x20 = LOADgotAUTH target-flags(aarch64-got) @some_variable, implicit-def dead $x16, implicit-def dead $x17, implicit-def dead $nzcv
B %bb.1
bb.1:
...
/* use $x16 */
...
/* use $x20 */
...
```
The issue was caused by inconsistent logic between implicit and explicit
operand definitions, where the implicit side was incorrectly skipping
checking RUDefs for dead operands, leading to RuledOut not being set
for the X16 operand.
Because there isn't really a semantic difference between implicit and
explicit operands at this point, let's remove the isImplicit check and
adjust the logic to do the same thing in both cases:
- For implicit operands, we now check and update RUDefs in the same way
as explicit operands.
- For explicit operands, we now allow dead operands to be skipped.
Reviewers: arsenm, s-barannikov, atrosinenko
Reviewed By: arsenm, s-barannikov
Pull Request: https://github.com/llvm/llvm-project/pull/147624
|
|
As noted in post commit review, the API change here was not required.
I'd apparently confused myself when teasing apart patches from my
development branch.
|
|
section. (#146563)
Callsite offsets will help map addresses to the right position in the
basic block (before or after a callsite).
This PR also bumps the BBAddrMap version to 3.
The encoding/decoding ability is already pushed upstream
8d7a8fcc3ab9f6d4c4a7e4312876fe94ed3d6c4f.
|
|
Introduces saturated truncate instructions to Global ISel:
G_TRUNC_SSAT_S, G_TRUNC_SSAT_U, G_TRUNC_USAT_U. These were previously
introduced to SDAG to reduce redundant code.
The patch only initially introduces the instruction, a later patch will
follow to add combines and legalization for each instruction.
|
|
For the fixed vector cases, we already support this, but the
deinterleave intrinsic cases (primary used by scalable vectors) didn't.
Supporting it requires plumbing through the Factor separately from the
extracts, as there can now be fewer extracts than the Factor. Note that
the fixed vector path handles this slightly differently - it uses the
shuffle and indices scheme to achieve the same thing.
|
|
|
|
|
|
Previously we had a table of entries for every Libcall for
the comparison to use against an integer 0 if it was a soft
float compare function. This was only relevant to a handful of
opcodes, so it was wasteful. Now that we can distinguish the
abstract libcall for the compare with the concrete implementation,
we can just directly hardcode the comparison against the libcall
impl without this configuration system.
|