Age | Commit message (Collapse) | Author | Files | Lines |
|
- Added to CoreTests in BUILD.gn
- Hiding DataAggregator std out/err outputs
|
|
Large binaries get two text segments mapped when loaded in memory.
BOLT processes only the first, which is not having a correct BaseAddress,
causing a wrong computation of a BinaryMMapInfo's size.
Consequently, BOLT wrongly thinks that many of the samples fall outside
the binary and ignores them. As a result, the computed heatmap is
incomplete, and the section hotness statistics are wrong.
This bug is present in both the AArch64 and x86 backends.
---
This patch introduces flag 'perf-script-events' that allows passing
perf events without BOLT having to parse them using 'perf script'.
The flag is used to pass a mock perf profile that has two memory
mappings for a mock binary that has two text segments. The size of the
mapping is updated as `parseMMapEvents` now processes all text segments.
---
Example used in unit tests:
From `/proc/<BINARY PID>/maps`, we have 2 text mappings, say A and B.
```
abc0000000-abc1000000 r-xp 011c0000 103:01 1573523 BINARY
abc2000000-abca000000 r-xp 031d0000 103:01 1573523 BINARY
```
Size of text mappings:
| Mapping | Size |
| ------- | ------ |
| A | ~15MB |
| B | ~135MB |
---
Example on a real program:
```
2f7200000-2fabca000 r--p 00000000 bolted-binary
2fabd9000-2fe47c000 r-xp 039c9000 bolted-binary <- 1st txt segment
2fe48b000-2fe61d000 r--p 0727b000 bolted-binary
2fe62c000-2fe660000 rw-p 0740c000 bolted-binary
2fe660000-2fea4c000 rw-p 00000000
2fec00000-303dad000 r-xp 07a00000 bolted-binary <- 2nd (appears only on the bolted binary)
```
|
|
|
|
Mostly copied from the AArch64 coverage for same, but also added
a couple tests for reductions which aren't currently supported.
|
|
in true16/fake16 format (#106089)
This is a NFC change to add tests for true16/fake16 flow. We need to
have two sets of asm/disasm tests for true16 and fake16 flow and this
patch is adding the missing one. The naming convension is that true16
filename is the default one while the fake16 filename has "fake16"
attached to it.
This patch
1. add true16 and fake16 version for vop3_from_vop1 test file
2. rename a test file to keep a consistant naming pattern
The true16 test file will be updated when more true16 commands are
supported in the up coming patches
|
|
in true16/fake16 format (#106093)
This is a NFC change to add tests for true16/fake16 flow. We need to
have two sets of asm/disasm tests for true16 and fake16 flow and this
patch is adding the missing one. The naming convension is that true16
filename is the default one while the fake16 filename has "fake16"
attached to it.
This patch
1. add true16 and fake16 version for vop1 test files
2. rename a test file to keep a consistant naming pattern
The true16 test file will be updated when more true16 commands are
supported in the up coming patches
|
|
With this commit, we also hide the implementation details of
`std::invoke`. To do so, the `LibCXXFrameRecognizer` got a couple more
regular expressions.
The regular expression passed into `AddRecognizer` became problematic,
as it was evaluated on the demangled name. Those names also included
result types for C++ symbols. For `std::__invoke` the return type is a
huge `decltype(...)`, making the regular expresison really hard to
write.
Instead, I added support to `AddRecognizer` for matching on the
demangled names without result type and argument types.
By hiding the implementation details of `invoke`, also the back traces
for `std::function` become even nicer, because `std::function` is using
`__invoke` internally.
Co-authored-by: Adrian Prantl <aprantl@apple.com>
|
|
value bitwidth
In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds.
Fixes #106202
|
|
When possible, avoid using descriptors and use the reference and the
shape for data_transfer.
|
|
Without calling `dyld_process_dispose`, LLDB will leak the memory
associated with the `dyld_process_t`.
rdar://134738265
|
|
uses, NFC.
|
|
This patch implements sandboxir::ResumeInst mirroring llvm::ResumeInst.
|
|
This reverts commit c46b41aaa6eaa787f808738d14c61a2f8b6d839f.
Multiple tests time out, either due to performance hit (see comment) or
a cycle.
|
|
The previous semantics of `llvm.experimental.get.vector.length` was too
permissive such that it gave optimizers a hard time on anything related
to the number of iterations of VP-vectorized loops.
This patch tries to address this by assigning it a set of stricter
semantics similar to that of RVV's VSETVLI instructions, while being not
too RISC-V specific and leaving room for other (future) targets.
---------
Co-authored-by: Craig Topper <craig.topper@sifive.com>
|
|
specialization callback (#106177)
This patch removes the conservative uniformity check in the indirect
call
specialization callback, as whether the function pointer is uniform
doesn't
matter too much. Instead, we add an argument to control specialization.
|
|
**Description**
This PR adds a new option for `convert-to-spirv` pass to clone and
convert only GPU kernel modules for integration testing. The reason for
using pass options instead of two separate passes is that they both
consist of `memref` types conversion and individual dialect patterns,
except they run on different scopes. The PR also replaces the
`gpu-to-spirv` pass with the `convert-to-spirv` pass (with the new
option) in `mlir-vulkan-runner`.
**Future Plan**
Use nesting pass pipelines in `mlir-vulkan-runner` instead of adding
this option.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
|
|
This particular variable name is shadowed by another lower in the
function, so reducing it's scope to it's single use removes the
shadowing and makes the code much less error prone.
|
|
This reduces the number of allocations inside the loop.
Partially addresses #105836
|
|
SMUL_LOHI and UMUL_LOHI are different operations because the high part
of the result is different, so it is not OK to optimize the signed
version to MUL_U24/MULHI_U24 or the unsigned version to
MUL_I24/MULHI_I24.
|
|
Use hasPhys instead of MCRegister::isPhysicalRegister.
I think the MCRegister returned from getPhys can only contain a physical
register or 0. hasPhys checks that the register returned from getPhys is non-zero.
So I think they are equivalent in this usage.
|
|
When a pattern is matched in TableGen, a check is run called
isObviouslySafeToFold(). One of the condition that it checks for is
whether the instructions that are being matched are consecutive, so the
instruction's insertion point does not change.
This patch allows the movement of the insertion point of a load
instruction if none of the intervening instructions are stores or have
side-effects.
|
|
getSCEV will assert unless the operand is SCEVable. Replace an instance
of the implementation of ScalarEvolution::isSCEVable (which checks that
the operand is either integer or pointer type) with a call to the
function, to make it clear that the subsequent use of getSCEV will not
fail.
|
|
This patch introduces a helper function collectImportStatistics. The
new function computes statistics of imports for
ComputeCrossModuleImport and dumpImportListForModule with no
functional change.
The background is as follows. I'm planning to reduce the memory
footprint of ThinLTO indexing by changing ImportMapTy, the data
structure used for an import list. The new list will be a hash set of
tuples (SourceModule, GUID, ImportType) represented in a space
efficient manner. That means that obtaining statistics like the
number of definitions per source module requires us to go through the
entire import list (for a given destination module).
Introducing a helper function now makes the callers more independent
of the underlying data structures used in ImportMapT.
|
|
This dump, if it is ever executed, is not actionable by the user and
might produce unwanted noise in the stderr.
The original intention behind this dump, to provide maximum information
in an unexpected situation, does not outweigh the potential annoyance
caused to users who might not even realize that they witnessed an
unexpected situation.
|
|
Matches uops.info + Agner
|
|
These are performed on SKLPort01 (+ SKLPort5/SKLPort23 for rr/rm shuffles/loads)
Also, cleanup some MMX CVT overrides that match the SSE equivalents.
Matches uops.info + Agner
|
|
Matches uops.info + Agner
|
|
|
|
This patch adds an argument to `gpu::TargetAttrInterface::createObject`
to pass the GPU module. This is useful as `gpu::ObjectAttr` contains a
property dict for metadata, hence the module can be used for extracting
things like the symbol table and adding it to the property dict.
---------
Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
|
|
This does a few things to canonicalize the library a bit. Specifically
- use `__desugars_to_v` instead of the custom `__is_simple_comparator`
- make `__use_branchless_sort` an inline variable
- remove the `_maybe_branchless` versions of the `__sortN` functions and
overload based on whether we can do branchless sorting instead.
|
|
`ReplaceBlockArgRewrite` (#105963)
There was a redundant `appendRewrite<ReplaceBlockArgRewrite>(block,
origArg);` in `ConversionPatternRewriterImpl::applySignatureConversion`
that had no effect.
|
|
Fix a bug I introduced in 721fdf1c9a73269280a504cbba847f4979512b66.
|
|
This patch adds the option to specify specific memory ranges to be
included in a given core file. The current implementation lets user
specified ranges either be in addition to a certain save style, or
independent of them via the newly added custom enum.
To achieve being inclusive of save style, I've moved from a std::vector
of ranges to a RangeDataVector, and to join overlapping ranges to
prevent duplication of memory ranges in the core file.
As a non function bonus, when SBSavecore was initially created, the
header was included in the lldb-private interfaces, and I've fixed that
and moved it the forward declare as an oversight. CC @bulbazord in case
we need to include that into swift.
|
|
interceptors (#106161)
|
|
FunctionPropertiesAnalysis (#104867)"
This seems to cause asserts in our builds:
llvm/include/llvm/Support/GenericDomTreeConstruction.h:927:
static void llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<BasicBlock, false>>::DeleteEdge(DomTreeT &, const BatchUpdatePtr, const NodePtr, const NodePtr) [DomTreeT = llvm::DominatorTreeBase<BasicBlock, false>]:
Assertion `!IsSuccessor(To, From) && "Deleted edge still exists in the CFG!"' failed.
and
llvm/lib/Analysis/FunctionPropertiesAnalysis.cpp:390:
DominatorTree &llvm::FunctionPropertiesUpdater::getUpdatedDominatorTree(FunctionAnalysisManager &) const:
Assertion `DT.getNode(BB)' failed.
See comment on the PR.
> We need the dominator tree analysis for loop info analysis, which we need to get features like most nested loop and number of top level loops. Invalidating and recomputing these from scratch after each successful inlining can sometimes lead to lengthy compile times. We don't need to recompute from scratch, though, since we have some boundary information about where the changes to the CFG happen; moreover, for dom tree, the API supports incrementally updating the analysis result.
>
> This change addresses the dom tree part. The loop info is still recomputed from scratch. This does reduce the compile time quite significantly already, though (~5x in a specific case)
>
> The loop info change might be more involved and would follow in a subsequent PR.
This reverts commit a2a5508bdae7d115b6c3ace461beb7a987a44407 and the
follow-up commit cdd11d694a406a98a16d6265168ee2fbe1b6a87c.
|
|
Also consider allocas non-null (subject to the usual caveats),
and consider nonnull/dereferenceable metadata on calls.
|
|
This is faster than checking for a SCEVConstant getMinusSCEV()
result. The results should be the same for non-degenerate cases.
|
|
|
|
S.substr(N, M) is simpler than S.slice(N, N + M). Also, substr is
probably better recognizable than slice thanks to
std::string_view::substr.
|
|
|
|
Add nuw attribute to inbounds GEPs where the expression used to form the
GEP is an addition of unsigned indices.
|
|
|
|
|
|
|
|
arguments of Calls
|
|
Extension SGFs require the module system to be enabled in order to discover which module defines the extended external type.
This patch ensures the following:
- Associate symbols with their top level module name, and that only top level modules are considered as modules for emitting extension SGFs.
- Ensure we don't drop macro definitions that came from a submodule. To this end look at all defined macros in `PPCalbacks::EndOfMainFile` instead of relying on `PPCallbacks::MacroDefined` being called to detect a macro definition.
|
|
Make sure code respects the GNU-extension `__attribute__((returns_nonnull))`.
Extend the NullabilityChecker to check that a function returns_nonnull
does not return a nullptr.
This commit also reverts an old hack introduced by
49bd58f1ebe28d97e4949e9c757bc5dfd8b2d72f
because it is no longer needed
CPP-4741
|
|
This patch updates MLIR tests for `omp.parallel` + `omp.wsloop`
reductions to move the reduction clause into `omp.wsloop` rather than
the parent `omp.parallel`, as mandated by the spec for these cases and
also to match what Flang is already producing for `parallel do
reduction(...)` combined constructs.
From the OpenMP Spec version 5.2, section 17.2:
> The effect of the reduction clause is as if it is applied to all leaf
constructs that permit the clause, except for the following constructs:
> - The `parallel` construct, when combined with the `sections`,
worksharing-loop, `loop`, or `taskloop` construct; [...]
|
|
Enable support for query functions - including transform.dlti.query - to
take types as keys. As the data layout specific attributes already
supported types as keys, this change enables querying such attributes in
the expected way.
|
|
For the most part I've tried to maintain the use of ISD::BITCAST
wherever possible so as to keep access to more DAG combines.
|