Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
Currently `AAAddressSpace` relies on identifying the address spaces of
all underlying objects. However, it might infer sub-optimal address
space when the underlying object is a function argument. In
`AMDGPUPromoteKernelArgumentsPass`, the promotion of a pointer kernel
argument is by adding a series of `addrspacecast` instructions (as shown
below), and hoping `InferAddressSpacePass` can pick it up and do the
rewriting accordingly.
Before promotion:
```
define amdgpu_kernel void @kernel(ptr %to_be_promoted) {
%val = load i32, ptr %to_be_promoted
...
ret void
}
```
After promotion:
```
define amdgpu_kernel void @kernel(ptr %to_be_promoted) {
%ptr.cast.0 = addrspace cast ptr % to_be_promoted to ptr addrspace(1)
%ptr.cast.1 = addrspace cast ptr addrspace(1) %ptr.cast.0 to ptr
# all the use of %to_be_promoted will use %ptr.cast.1
%val = load i32, ptr %ptr.cast.1
...
ret void
}
```
When `AAAddressSpace` analyzes the code after promotion, it will take
`%to_be_promoted` as the underlying object of `%ptr.cast.1`, and use its
address space (which is 0) as its final address space, thus simply do
nothing in `manifest`. The attributor framework will them eliminate the
address space cast from 0 to 1 and back to 0, and replace `%ptr.cast.1`
with `%to_be_promoted`, which basically reverts all changes by
`AMDGPUPromoteKernelArgumentsPass`.
IMHO I'm not sure if `AMDGPUPromoteKernelArgumentsPass` promotes the
argument in a proper way. To improve the handling of this case, this PR
adds an extra handling when iterating over all underlying objects. If an
underlying object is a function argument, it means it reaches a terminal
such that we can't futher deduce its underlying object further. In this
case, we check all uses of the argument. If they are all `addrspacecast`
instructions and their destination address spaces are same, we take the
destination address space.
Fixes: SWDEV-482640.
|
|
Since #103867, the nullness of the concept declaration has been turned
to represent a state in which the concept definition is being parsed and
used for self-reference checking.
However, PR missed a case where such a definition could be invalid, and
we shall inhibit making it into evaluation.
Fixes https://github.com/llvm/llvm-project/issues/109780
|
|
`SBDebugger().Create()` returns a debugger with only the host platform
in its platform list. If the test suite is running for a remote
platform, it should be explicitly added and selected in the new debugger
created within the test, otherwise, the test will fail because the host
platform may not be able to launch the built binary.
|
|
passes after BottomUpVec. (#111223)" (#111772)
https://github.com/llvm/llvm-project/pull/111223 was reverted because of
a build failure with `-DBUILD_SHARED_LIBS=on`.
The Passes component depends on Vectorizer (because PassBuilder needs to
be able to instantiate SandboxVectorizerPass). This resulted in CMake
doing this
1. when it builds lib/libLLVMVectorize.so.20.0git it adds
lib/libLLVMSandboxIR.so.20.0git to the command line, because it's listed
as a dependency (as expected)
2. when it's trying to build lib/libLLVMPasses.so.20.0git it adds
lib/libLLVMVectorize.so.20.0git to the command line, because it's listed
as a dependency (also as expected). But not libLLVMSandboxIR.so.
When SandboxVectorizerPass has its ctors/dtors defined inline, this
caused "undefined reference to vtable" linker errors. This change works
around that by moving ctors/dtors out of line.
Also fix a bazel build problem by adding the new
`llvm/lib/Transforms/Vectorize/SandboxVectorizer/Passes/PassRegistry.def`
as a textual header in the Vectorizer target.
|
|
legal (#109570)
If SETCC or VSELECT is not legal for vector, we should not expand it,
instead we can split the vectors.
So that, some simple scale instructions can be emitted instead of
some pairs of comparation+selection.
|
|
constant value. (#111232)
This patch adds support for constant folding for the `logb` and `logbf`
libc functions.
|
|
## Summary
This PR improves `SymbolFileDWARF::FindTypes()` by utilizing the newly
added parent chain `DW_IDX_parent` in `.debug_names`. The proposal was
originally discussed in [this
RFC](https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151).
## Implementation
To leverage the parent chain for `SymbolFileDWARF::FindTypes()`, this PR
adds a new API: `GetTypesWithQuery` in `DWARFIndex` base class. The API
performs the same function as `GetTypes` with additional filtering using
`TypeQuery`. Since this only introduces filtering, the callback
mechanisms at all call sites remain unchanged. A default implementation
is given in `DWARFIndex` class which parses debug info and performs the
matching. In the `DebugNameDWARFIndex` override, the parent_contexts in
the `TypeQuery` is cross checked with parent chain in `.debug_names` for
for much faster filtering before fallback to base implementation for
final filtering.
Unlike the `GetFullyQualifiedType` API, which fully consumes the
`DW_IDX_parent` parent chain for exact matching, these new APIs perform
partial subset matching for type/namespace queries. This is necessary to
support queries involving anonymous or inline namespaces. For instance,
a user might request `NS1::NS2::NS3::Foo`, while the index table's
parent chain might contain `NS1::inline_NS2::NS3::Foo`, which would fail
exact matching.
## Performance Results
In one of our internal target using `.debug_names` + split dwarf.
Expanding a "this" pointer in locals view in VSCode:
94s => 48s. (Not sure why I got 94s this time instead of 70s last week).
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
|
|
Change-Id: I77e72d23d41095b8fcc47996d8004f9e264968de
|
|
Rounds (#90933)
|
|
`shouldTagGlobal` doesn't sound like it should modify anything, so don't
do that. Remove unused code. Use SmallVector over std::vector
|
|
This feature is enabled by `-codegen-data-thinlto-two-rounds`, which
effectively runs the `-codegen-data-generate` and `-codegen-data-use` in
two rounds to enable global outlining with ThinLTO.
1. The first round: Run both optimization + codegen with a scratch
output.
Before running codegen, we serialize the optimized bitcode modules to a
temporary path.
2. From the scratch object files, we merge them into the codegen data.
3. The second round: Read the optimized bitcode modules and start the
codegen only this time.
Using the codegen data, the machine outliner effectively performs the
global outlining.
Depends on #90934, #110461 and #110463.
This is a patch for
https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
|
|
```
> python3 -m mypy --strict clang-tools-extra/clang-tidy/rename_check.py
Success: no issues found in 1 source file
```
|
|
definition (#111472)
Fix #111450.
|
|
(#110714)" v2 (#111708)
This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test
lines to prevent behavior mismatches between debug and release builds
The first attempted reapply was #111059
This reverts commit e075dcf7d270fd52dc837163ff24e8c872dfeb49.
|
|
argument lists (#106585)" (#111173)" (#111766)
This reverts commit 4da8ac34f76e707ab94380b94f616457cfd2cb83.
|
|
The dependency scanner uses the `llvm::json` library for outputting the
dependency information. Until now, it created an in-memory
representation of the dependency graph using the `llvm::json::Object`
hierarchy. This not only creates unnecessary copies of the data, but
also forces lexicographical ordering of attributes in the output, both
of which I'd like to avoid. This patch adopts the `llvm::json::OStream`
API instead and reorders the attribute printing logic such that the
existing lexicographical ordering is preserved (for now).
|
|
(#110387)" (#111764)
This reverts commit 4336f00f2156970cc0af2816331387a0a4039317.
|
|
SBLanguages.h uses a uint16_t but is missing the include for
`<cstdint>`, if any file includes this without including that it will
cause a build error so this commit adds this include.
|
|
We should not fold one of add/sub operands into a load/store's offset
when `nuw` (no unsigned wrap) is not present, because the address
calculation, which adds the offset with the operand, does not wrap.
This is handled correctly in the normal ISel:
https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp#L328-L332
but not in FastISel.
This positivity check in FastISel is not sufficient to avoid this case
fully:
https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L348-L352
because
1. Even if RHS is within signed int range, depending on the value of the
LHS, the resulting value can exceed uint32 max.
2. When one of the operands is a label, `Address` can contain a
`GlobalValue` and a `Reg` at the same time, so the `GlobalValue` becomes
incorrectly an offset:
https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L53-L69
https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L409-L417
Both cases are in the newly added test.
We should handle `SUB` too because `SUB` is the same as `ADD` when RHS's
sign changes. I checked why our current normal ISel only handles `ADD`,
and the reason it's OK for the normal ISel to handle only `ADD` seems
that DAGCombiner replaces `SUB` with `ADD` here:
https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L3904-L3907
Fixes #111018.
|
|
Porting to TTI provides direct access to the instruction cost model,
which can enable instruction cost based sinking without introducing code
duplication.
|
|
This changes the PR label to match the name of the subscriber team.
Fixes #111485.
|
|
Operands with device_type support need the corresponding attribute but
this was not catches in the verifier if it was missing. The custom
parser usually constructs it but creating the op from python could lead
to a segfault in the printer. This patch updates the verifier so we
catch this early on.
|
|
AsmPrinter::EmitToStreamer. NFC (#111714)
This allows us to pass the STI we already have cached instead of
AsmPrinter::EmitToStreamer looking it up from the MachineFunction again.
My plan is to make EmitHwasanMemaccessSymbols use
RISCVAsmPrinter::EmitToStreamer instead of calling
MCStreamer::emitInstruction. To do that I need control of the
MCSubtargetInfo.
|
|
This disabled the build of `MLIRExecutionEngineShared` because this causes linkage issues in windows for currently unknown reasons.
Related issue: https://github.com/llvm/llvm-project/issues/106859.
|
|
Allows non-power-of-2 vectorization for stores, but still requires, that
vectorized number of elements forms full vector registers.
Reviewers: RKSimon
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/111194
|
|
Follow-up to 418920b3fbdefec5b56ee2b9db96884d0ada7329, which started
diagnosing the legality of objects in OpenMP clauses (and caused some
test failures).
|
|
This is an internal API and the name should reflect that.
This is a reland of #108887.
|
|
libflangPasses.so was not linked against the correct libraries which
caused a build failure with -DBUILD_SHARED_LIBS=On. Fixes #110425
|
|
That isn't used anymore since we now run backdeployment testing
on the target system directly instead of using pre-packaged roots.
|
|
Based on a comment in #99473, it seems like `export *` may be overkill.
|
|
Plugin libraries that use coroutines can do so right now, however, to
provide their own ABI they need to be able to use various headers, some
of which such are required (such as the ABI header). This change exposes
the coro utils and required headers by moving them to
include/llvm/Transforms/Coroutines. My experience with our out-of-tree
plugin ABI has been that at least these headers are needed. The headers
moved are:
* ABI.h (ABI object)
* CoroInstr.h (helpers)
* Coroshape.h (Shape object)
* MaterializationUtils.h (helpers)
* SpillingUtils.h (helpers)
* SuspendCrossingInfo.h (analysis)
This has no code changes other than those required to move the headers
and these are:
* include guard name changes
* include path changes
* minor clang-format induced changes
* removal of LLVM_LIBRARY_VISIBILITY
|
|
Use SEND_ERROR (continue processing, but skip generation) instead of
FATAL_ERROR (stop processing and generation). This means that developers
get to see all errors at once, instead of seeing just the first error
and having to reconfigure to discover the next one.
|
|
Build out the necessary infrastructure for the main entry point into
ClangIR generation -- CIRGenModule. A set of boilerplate classes exist
to facilitate this -- CIRGenerator, CIRGenAction, EmitCIRAction and
CIRGenConsumer. These all mirror the corresponding types from LLVM
generation by Clang's CodeGen.
The main entry point to CIR generation is
`CIRGenModule::buildTopLevelDecl`. It is currently just an empty
function. We've added a test to ensure that the pipeline reaches this
point and doesn't fail, but does nothing else. This will be removed in
one of the subsequent patches that'll add basic `cir.func` emission.
This patch also re-adds `-emit-cir` to the driver. lib/Driver/Driver.cpp
requires that a driver flag exists to facilirate the selection of the
right actions for the driver to create. Without a driver flag you get
the standard behaviors of `-S`, `-c`, etc. If we want to emit CIR IR
and, eventually, bytecode we'll need a driver flag to force this. This
is why `-emit-llvm` is a driver flag. Notably, `-emit-llvm-bc` as a cc1
flag doesn't ever do the right thing. Without a driver flag it is
incorrectly ignored and an executable is emitted. With `-S` a file named
`something.s` is emitted which actually contains bitcode.
Reviewers: AaronBallman, MaskRay, bcardosolopes
Reviewed By: bcardosolopes, AaronBallman
Pull Request: https://github.com/llvm/llvm-project/pull/91007
|
|
Make this emit the same source range as the current interpreter.
|
|
Region passes after BottomUpVec." (#111727)
Reverts llvm/llvm-project#111223
It broke one of the build bots:
LLVM Buildbot has detected a new failure on builder flang-aarch64-libcxx
running on linaro-flang-aarch64-libcxx while building llvm at step 5
"build-unified-tree".
Full details are available at:
https://lab.llvm.org/buildbot/#/builders/89/builds/8127
|
|
profile/instrprof-write-file-atexit-explicitly.c
|
|
Trying to codegen these on targets without the instructions should
fail to select. Not sure if all the predicates are correct. We had
a fake one disconnected to a feature which was always true.
Fixes: SWDEV-482274
|
|
passes after BottomUpVec. (#111223)
The main change is that the main SandboxVectorizer pass no longer has a
pipeline of function passes. Now it is a wrapper that creates sandbox IR
from functions before calling BottomUpVec.
BottomUpVec now builds its own RegionPassManager from the `sbvec-passes`
flag, using a PassRegistry.def file. For now, these region passes are
not run (BottomUpVec doesn't create Regions yet), and only a null pass
for testing exists.
This commit also changes the ownership model for sandboxir::PassManager:
instead of having a PassRegistry that owns passes, and PassManagers that
contain non-owning pointers to the passes, now PassManager owns (via
unique pointers) the passes it contains.
PassRegistry is now deleted, and the logic to parse and create a pass
pipeline is now in PassManager::setPassPipeline.
|
|
|
|
#109201
|
|
Change-Id: I0b26d5db6d3da8936ab25ee2b1e9002840b9853e
|
|
Clang-offload-packager allows packaging of images based on an arbitrary
list of key-value pairs where only triple-key is mandatory.
Using target features as a key during packaging is not correct, as clang
does not allow packaging multiple images in one binary which only differ
in a target feature.
TargetID features (xnack and sramecc) anyways are handled using arch-key
and not as target features.
|
|
TryToSimplifyUncondBranchFromEmptyBlock optimization. (#110715)
In some pathological cases this optimization can spend an unreasonable
amount of time populating the set for predecessors of the successor
block. This change sinks some of that initializing to the point where
it's actually necessary so we can take advantage of the existing
early-exits.
rdar://137063034
|
|
Relaxes vector.transfer_write lowering to allow out-of-bound writes.
This aligns lowering with the current hardware specification which does
not update bytes in out-of-bound locations during block stores.
|
|
This adds the ability to use the GCNRPTrackers during scheduling. These trackers have several advantages over the generic trackers: 1. global live-thru trackers, 2. subregister based RP deltas, and 3. flexible vreg -> PressureSet mappings.
This feature is off-by-default to ease with the roll-out process. In particular, when using the optional trackers, the scheduler will still maintain the generic trackers leading to unnecessary compile time.
|
|
At present, `__builtin_available` is really restrictive with its use.
Overall, this seems like a good thing, since the analyses behind it are
not very expensive.
That said, it's very straightforward to support these two cases:
```
if ((__builtin_available(foo, *))) {
// ...
}
```
and
```
if (!__builtin_available(foo, *)) {
// ...
} else {
// ...
}
```
Seems nice to do so.
|
|
It's breaking the bots, e.g. http://45.33.8.238/linux/149792/step_3.txt
|
|
When -mno-movt is passed to Clang, the ARM codegen correctly avoids
movt/movw pairs to take the address of __stack_chk_guard in the stack
protector code emitted into the function pro- and epilogues. However,
the Thumb2 codegen fails to do so, and happily emits movw/movt pairs
unless it is generating an ELF binary and the symbol might be in a
different DSO. Let's incorporate a check for useMovt() in the logic
here, so movt/movw are never emitted when -mno-movt is specified.
Suggestions welcome for how/where to add a test case for this.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
|
|
|