aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-08-29Attempt to fix build bot failures.users/paschalis-mpeis/bolt-heatmap-fixPaschalis Mpeis2-0/+5
- Added to CoreTests in BUILD.gn - Hiding DataAggregator std out/err outputs
2024-08-28[BOLT] Fix heatmaps on large BOLTE'd binaries.Paschalis Mpeis (aws-mem-aarch64)1-11/+8
Large binaries get two text segments mapped when loaded in memory. BOLT processes only the first, which is not having a correct BaseAddress, causing a wrong computation of a BinaryMMapInfo's size. Consequently, BOLT wrongly thinks that many of the samples fall outside the binary and ignores them. As a result, the computed heatmap is incomplete, and the section hotness statistics are wrong. This bug is present in both the AArch64 and x86 backends. --- This patch introduces flag 'perf-script-events' that allows passing perf events without BOLT having to parse them using 'perf script'. The flag is used to pass a mock perf profile that has two memory mappings for a mock binary that has two text segments. The size of the mapping is updated as `parseMMapEvents` now processes all text segments. --- Example used in unit tests: From `/proc/<BINARY PID>/maps`, we have 2 text mappings, say A and B. ``` abc0000000-abc1000000 r-xp 011c0000 103:01 1573523 BINARY abc2000000-abca000000 r-xp 031d0000 103:01 1573523 BINARY ``` Size of text mappings: | Mapping | Size | | ------- | ------ | | A | ~15MB | | B | ~135MB | --- Example on a real program: ``` 2f7200000-2fabca000 r--p 00000000 bolted-binary 2fabd9000-2fe47c000 r-xp 039c9000 bolted-binary <- 1st txt segment 2fe48b000-2fe61d000 r--p 0727b000 bolted-binary 2fe62c000-2fe660000 rw-p 0740c000 bolted-binary 2fe660000-2fea4c000 rw-p 00000000 2fec00000-303dad000 r-xp 07a00000 bolted-binary <- 2nd (appears only on the bolted binary) ```
2024-08-27BOLT fails to read correctly the size of multi-segment mmaps.Paschalis Mpeis4-0/+134
2024-08-27[RISCV][SLP] Add test coverage for 2^N-1 vector sizesPhilip Reames2-0/+644
Mostly copied from the AArch64 coverage for same, but also added a couple tests for reductions which aren't currently supported.
2024-08-27[AMDGPU][CodeGen][True16][Test] add test files for gfx11 vop1 instructions ↵Brox Chen2-1/+3509
in true16/fake16 format (#106089) This is a NFC change to add tests for true16/fake16 flow. We need to have two sets of asm/disasm tests for true16 and fake16 flow and this patch is adding the missing one. The naming convension is that true16 filename is the default one while the fake16 filename has "fake16" attached to it. This patch 1. add true16 and fake16 version for vop3_from_vop1 test file 2. rename a test file to keep a consistant naming pattern The true16 test file will be updated when more true16 commands are supported in the up coming patches
2024-08-27[AMDGPU][CodeGen][True16][Test] add test files for gfx12 vop1 instructions ↵Brox Chen6-3/+7113
in true16/fake16 format (#106093) This is a NFC change to add tests for true16/fake16 flow. We need to have two sets of asm/disasm tests for true16 and fake16 flow and this patch is adding the missing one. The naming convension is that true16 filename is the default one while the fake16 filename has "fake16" attached to it. This patch 1. add true16 and fake16 version for vop1 test files 2. rename a test file to keep a consistant naming pattern The true16 test file will be updated when more true16 commands are supported in the up coming patches
2024-08-27[lldb] Add frame recognizers for libc++ `std::invoke` (#105695)Adrian Vogelsgesang16-63/+234
With this commit, we also hide the implementation details of `std::invoke`. To do so, the `LibCXXFrameRecognizer` got a couple more regular expressions. The regular expression passed into `AddRecognizer` became problematic, as it was evaluated on the demangled name. Those names also included result types for C++ symbols. For `std::__invoke` the return type is a huge `decltype(...)`, making the regular expresison really hard to write. Instead, I added support to `AddRecognizer` for matching on the demangled names without result type and argument types. By hiding the implementation details of `invoke`, also the back traces for `std::function` become even nicer, because `std::function` is using `__invoke` internally. Co-authored-by: Adrian Prantl <aprantl@apple.com>
2024-08-27[DAG] Handle cases where a shift amount is larger than the pre-extended ↵Simon Pilgrim2-5/+33
value bitwidth In the (zext (shl (zext x), cst)) -> (shl (zext x), cst) fold, don't use a bitmask / MaskedValueIsZero as we can't guarantee that the shift amount is in bounds. Fixes #106202
2024-08-27[flang][cuda] Simplify data transfer when possible (#106120)Valentin Clement (バレンタイン クレメン)3-16/+52
When possible, avoid using descriptors and use the reference and the shape for data_transfer.
2024-08-27[lldb] Cleanup dyld_process_t after constructing SharedCacheInfo (#106157)Alex Langford1-1/+4
Without calling `dyld_process_dispose`, LLDB will leak the memory associated with the `dyld_process_t`. rdar://134738265
2024-08-27[SLP][NFC]Assert total number of scalar uses not less than number of scalar ↵Alexey Bataev1-9/+19
uses, NFC.
2024-08-27[SandboxIR] Implement ResumeInst (#106152)vporpo4-0/+86
This patch implements sandboxir::ResumeInst mirroring llvm::ResumeInst.
2024-08-27Revert "LSV: forbid load-cycles when vectorizing; fix bug (#104815)" (#106245)Danial Klimkin2-98/+5
This reverts commit c46b41aaa6eaa787f808738d14c61a2f8b6d839f. Multiple tests time out, either due to performance hit (see comment) or a cycle.
2024-08-27[LangRef] Update the semantic of `experimental.get.vector.length` (#104475)Min-Yih Hsu1-7/+17
The previous semantics of `llvm.experimental.get.vector.length` was too permissive such that it gave optimizers a hard time on anything related to the number of iterations of VP-vectorized loops. This patch tries to address this by assigning it a set of stricter semantics similar to that of RVV's VSETVLI instructions, while being not too RISC-V specific and leaving room for other (future) targets. --------- Co-authored-by: Craig Topper <craig.topper@sifive.com>
2024-08-27[AMDGPU][Attributor] Remove uniformity check in the indirect call ↵Shilei Tian3-20/+78
specialization callback (#106177) This patch removes the conservative uniformity check in the indirect call specialization callback, as whether the function pointer is uniform doesn't matter too much. Instead, we add an argument to control specialization.
2024-08-27[mlir][spirv] Integrate `convert-to-spirv` into `mlir-vulkan-runner` (#106082)Angel Zhang6-33/+189
**Description** This PR adds a new option for `convert-to-spirv` pass to clone and convert only GPU kernel modules for integration testing. The reason for using pass options instead of two separate passes is that they both consist of `memref` types conversion and individual dialect patterns, except they run on different scopes. The PR also replaces the `gpu-to-spirv` pass with the `convert-to-spirv` pass (with the new option) in `mlir-vulkan-runner`. **Future Plan** Use nesting pass pipelines in `mlir-vulkan-runner` instead of adding this option. --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
2024-08-27[SLP] Reduce scope of variable using if clause [NFC]Philip Reames1-8/+8
This particular variable name is shadowed by another lower in the function, so reducing it's scope to it's single use removes the shadowing and makes the code much less error prone.
2024-08-27[NFC] Reserve the number of operands before push_back (#106234)Nabeel Omer1-0/+1
This reduces the number of allocations inside the loop. Partially addresses #105836
2024-08-27[AMDGPU] Fix sign confusion in performMulLoHiCombine (#105831)Jay Foad3-64/+172
SMUL_LOHI and UMUL_LOHI are different operations because the high part of the result is different, so it is not OK to optimize the signed version to MUL_U24/MULHI_U24 or the unsigned version to MUL_I24/MULHI_I24.
2024-08-27[LiveDebugVariables] Use VirtRegMap::hasPhys. NFC (#106186)Craig Topper1-4/+2
Use hasPhys instead of MCRegister::isPhysicalRegister. I think the MCRegister returned from getPhys can only contain a physical register or 0. hasPhys checks that the register returned from getPhys is non-zero. So I think they are equivalent in this usage.
2024-08-27[GlobalISel] Look between instructions to be matched (#101675)chuongg34-160/+218
When a pattern is matched in TableGen, a check is run called isObviouslySafeToFold(). One of the condition that it checks for is whether the instructions that are being matched are consecutive, so the instruction's insertion point does not change. This patch allows the movement of the insertion point of a load instruction if none of the intervening instructions are stores or have side-effects.
2024-08-27IVDescriptors: clarify getSCEV use in a function (NFC) (#106222)Ramkumar Ramachandra1-2/+2
getSCEV will assert unless the operand is SCEVable. Replace an instance of the implementation of ScalarEvolution::isSCEVable (which checks that the operand is either integer or pointer type) with a call to the function, to make it clear that the subsequent use of getSCEV will not fail.
2024-08-27[LTO] Introduce a helper function collectImportStatistics (NFC) (#106179)Kazu Hirata1-31/+37
This patch introduces a helper function collectImportStatistics. The new function computes statistics of imports for ComputeCrossModuleImport and dumpImportListForModule with no functional change. The background is as follows. I'm planning to reduce the memory footprint of ThinLTO indexing by changing ImportMapTy, the data structure used for an import list. The new list will be a hash set of tuples (SourceModule, GUID, ImportType) represented in a space efficient manner. That means that obtaining statistics like the number of definitions per source module requires us to go through the entire import list (for a given destination module). Introducing a helper function now makes the callers more independent of the underlying data structures used in ImportMapT.
2024-08-27[analyzer][NFC] Remove a non-actionable dump (#106232)Arseniy Zaostrovnykh1-1/+0
This dump, if it is ever executed, is not actionable by the user and might produce unwanted noise in the stderr. The original intention behind this dump, to provide maximum information in an unexpected situation, does not outweigh the potential annoyance caused to users who might not even realize that they witnessed an unexpected situation.
2024-08-27[X86] Fix Skylake/Icelake port usage for MMX PACK instructionsSimon Pilgrim6-33/+33
Matches uops.info + Agner
2024-08-27[X86] Fix SkylakeClient ports for int-to-double conversionsSimon Pilgrim4-44/+32
These are performed on SKLPort01 (+ SKLPort5/SKLPort23 for rr/rm shuffles/loads) Also, cleanup some MMX CVT overrides that match the SSE equivalents. Matches uops.info + Agner
2024-08-27[X86] Fix Skylake/Icelake uops for masked storedSimon Pilgrim9-48/+48
Matches uops.info + Agner
2024-08-27[SLP][NFC]Use has_single_bit instead of isPowerOf2 functions, NFC.Alexey Bataev1-18/+19
2024-08-27[mlir][gpu] Pass GPU module to `TargetAttrInterface::createObject`. (#94910)Fabian Mora6-14/+89
This patch adds an argument to `gpu::TargetAttrInterface::createObject` to pass the GPU module. This is useful as `gpu::ObjectAttr` contains a property dict for metadata, hence the module can be used for extracting things like the symbol table and adding it to the property dict. --------- Co-authored-by: Oleksandr "Alex" Zinenko <ftynse@gmail.com>
2024-08-27[libc++] Simplify the implementation of std::sort a bit (#104902)Nikolas Klauser8-171/+150
This does a few things to canonicalize the library a bit. Specifically - use `__desugars_to_v` instead of the custom `__is_simple_comparator` - make `__use_branchless_sort` an inline variable - remove the `_maybe_branchless` versions of the `__sortN` functions and overload based on whether we can do branchless sorting instead.
2024-08-27[mlir][Transforms][NFC] Dialect conversion: Remove redundant ↵Matthias Springer1-1/+0
`ReplaceBlockArgRewrite` (#105963) There was a redundant `appendRewrite<ReplaceBlockArgRewrite>(block, origArg);` in `ConversionPatternRewriterImpl::applySignatureConversion` that had no effect.
2024-08-27[LoopUnrollAnalyzer] Fix icmp simplificationNikita Popov1-3/+6
Fix a bug I introduced in 721fdf1c9a73269280a504cbba847f4979512b66.
2024-08-27[LLDB][SBSaveCore] Add selectable memory regions to SBSaveCore (#105442)Jacob Lalonde20-32/+302
This patch adds the option to specify specific memory ranges to be included in a given core file. The current implementation lets user specified ranges either be in addition to a certain save style, or independent of them via the newly added custom enum. To achieve being inclusive of save style, I've moved from a std::vector of ranges to a RangeDataVector, and to join overlapping ranges to prevent duplication of memory ranges in the core file. As a non function bonus, when SBSavecore was initially created, the header was included in the lldb-private interfaces, and I've fixed that and moved it the forward declare as an oversight. CC @bulbazord in case we need to include that into swift.
2024-08-27[rtsan][compiler-rt] Add read, write, pread, pwrite, readv, and writev ↵Chris Apple2-17/+129
interceptors (#106161)
2024-08-27Revert "[nfc][mlgo] Incrementally update DominatorTreeAnalysis in ↵Hans Wennborg3-47/+3
FunctionPropertiesAnalysis (#104867)" This seems to cause asserts in our builds: llvm/include/llvm/Support/GenericDomTreeConstruction.h:927: static void llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<BasicBlock, false>>::DeleteEdge(DomTreeT &, const BatchUpdatePtr, const NodePtr, const NodePtr) [DomTreeT = llvm::DominatorTreeBase<BasicBlock, false>]: Assertion `!IsSuccessor(To, From) && "Deleted edge still exists in the CFG!"' failed. and llvm/lib/Analysis/FunctionPropertiesAnalysis.cpp:390: DominatorTree &llvm::FunctionPropertiesUpdater::getUpdatedDominatorTree(FunctionAnalysisManager &) const: Assertion `DT.getNode(BB)' failed. See comment on the PR. > We need the dominator tree analysis for loop info analysis, which we need to get features like most nested loop and number of top level loops. Invalidating and recomputing these from scratch after each successful inlining can sometimes lead to lengthy compile times. We don't need to recompute from scratch, though, since we have some boundary information about where the changes to the CFG happen; moreover, for dom tree, the API supports incrementally updating the analysis result. > > This change addresses the dom tree part. The loop info is still recomputed from scratch. This does reduce the compile time quite significantly already, though (~5x in a specific case) > > The loop info change might be more involved and would follow in a subsequent PR. This reverts commit a2a5508bdae7d115b6c3ace461beb7a987a44407 and the follow-up commit cdd11d694a406a98a16d6265168ee2fbe1b6a87c.
2024-08-27[SCCP] Add more non-null rootsNikita Popov2-12/+22
Also consider allocas non-null (subject to the usual caveats), and consider nonnull/dereferenceable metadata on calls.
2024-08-27[LoopUnrollAnalyzer] Use computeConstantDifference()Nikita Popov1-3/+3
This is faster than checking for a SCEVConstant getMinusSCEV() result. The results should be the same for non-degenerate cases.
2024-08-27[LoopUnrollAnalyzer] Store SimplifiedAddress offset as APInt (NFC)Nikita Popov2-9/+8
2024-08-27[llvm] Prefer StringRef::substr to StringRef::slice (NFC) (#106190)Kazu Hirata6-16/+14
S.substr(N, M) is simpler than S.slice(N, N + M). Also, substr is probably better recognizable than slice thanks to std::string_view::substr.
2024-08-27[AMDGPU] Use range-based for loops (NFC) (#106184)Kazu Hirata3-17/+8
2024-08-27[clang] Add nuw attribute to GEPs (#105496)Hari Limaye91-1097/+1110
Add nuw attribute to inbounds GEPs where the expression used to form the GEP is an addition of unsigned indices.
2024-08-27[SCCP] Add tests for more non-null roots (NFC)Nikita Popov1-1/+73
2024-08-27[SLP][NFC]Improve auto types, NFC.Alexey Bataev1-4/+4
2024-08-27[docs] Fix a documentation link (#105795)Piotr Fusik1-1/+1
2024-08-27NFC: precommit test for [ArgPromotion] Perform alias analysis on actual ↵Hari Limaye1-0/+231
arguments of Calls
2024-08-27[clang][ExtractAPI] Fix quirks in interaction with submodules (#105868)Daniel Grumberg10-784/+425
Extension SGFs require the module system to be enabled in order to discover which module defines the extended external type. This patch ensures the following: - Associate symbols with their top level module name, and that only top level modules are considered as modules for emitting extension SGFs. - Ensure we don't drop macro definitions that came from a submodule. To this end look at all defined macros in `PPCalbacks::EndOfMainFile` instead of relying on `PPCallbacks::MacroDefined` being called to detect a macro definition.
2024-08-27[analyzer] Report violations of the "returns_nonnull" attribute (#106048)Arseniy Zaostrovnykh4-8/+85
Make sure code respects the GNU-extension `__attribute__((returns_nonnull))`. Extend the NullabilityChecker to check that a function returns_nonnull does not return a nullptr. This commit also reverts an old hack introduced by 49bd58f1ebe28d97e4949e9c757bc5dfd8b2d72f because it is no longer needed CPP-4741
2024-08-27[MLIR][OpenMP] NFC: Update parallel workshare loop reduction tests (#105835)Sergio Afonso2-10/+10
This patch updates MLIR tests for `omp.parallel` + `omp.wsloop` reductions to move the reduction clause into `omp.wsloop` rather than the parent `omp.parallel`, as mandated by the spec for these cases and also to match what Flang is already producing for `parallel do reduction(...)` combined constructs. From the OpenMP Spec version 5.2, section 17.2: > The effect of the reduction clause is as if it is applied to all leaf constructs that permit the clause, except for the following constructs: > - The `parallel` construct, when combined with the `sections`, worksharing-loop, `loop`, or `taskloop` construct; [...]
2024-08-27[MLIR][DLTI] Enable types as keys in DLTI-query utils (#105995)Rolf Morel7-10/+187
Enable support for query functions - including transform.dlti.query - to take types as keys. As the data layout specific attributes already supported types as keys, this change enables querying such attributes in the expected way.
2024-08-27[LLVM][AArch64] Improve big endian code generation for SVE BITCASTs. (#104769)Paul Walker3-893/+345
For the most part I've tried to maintain the use of ISD::BITCAST wherever possible so as to keep access to more DAG combines.