aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
24 hours[BOLT] Avoid repeated hash lookups (NFC) (#111782)Kazu Hirata1-5/+1
24 hours[Attributor] Take the address space from addrspacecast directly (#108258)Shilei Tian2-14/+80
Currently `AAAddressSpace` relies on identifying the address spaces of all underlying objects. However, it might infer sub-optimal address space when the underlying object is a function argument. In `AMDGPUPromoteKernelArgumentsPass`, the promotion of a pointer kernel argument is by adding a series of `addrspacecast` instructions (as shown below), and hoping `InferAddressSpacePass` can pick it up and do the rewriting accordingly. Before promotion: ``` define amdgpu_kernel void @kernel(ptr %to_be_promoted) { %val = load i32, ptr %to_be_promoted ... ret void } ``` After promotion: ``` define amdgpu_kernel void @kernel(ptr %to_be_promoted) { %ptr.cast.0 = addrspace cast ptr % to_be_promoted to ptr addrspace(1) %ptr.cast.1 = addrspace cast ptr addrspace(1) %ptr.cast.0 to ptr # all the use of %to_be_promoted will use %ptr.cast.1 %val = load i32, ptr %ptr.cast.1 ... ret void } ``` When `AAAddressSpace` analyzes the code after promotion, it will take `%to_be_promoted` as the underlying object of `%ptr.cast.1`, and use its address space (which is 0) as its final address space, thus simply do nothing in `manifest`. The attributor framework will them eliminate the address space cast from 0 to 1 and back to 0, and replace `%ptr.cast.1` with `%to_be_promoted`, which basically reverts all changes by `AMDGPUPromoteKernelArgumentsPass`. IMHO I'm not sure if `AMDGPUPromoteKernelArgumentsPass` promotes the argument in a proper way. To improve the handling of this case, this PR adds an extra handling when iterating over all underlying objects. If an underlying object is a function argument, it means it reaches a terminal such that we can't futher deduce its underlying object further. In this case, we check all uses of the argument. If they are all `addrspacecast` instructions and their destination address spaces are same, we take the destination address space. Fixes: SWDEV-482640.
26 hours[Clang][Parser] Don't evaluate concept when its definition is invalid (#111179)Younan Zhang2-0/+18
Since #103867, the nullness of the concept declaration has been turned to represent a state in which the concept definition is being parsed and used for self-reference checking. However, PR missed a case where such a definition could be invalid, and we shall inhibit making it into evaluation. Fixes https://github.com/llvm/llvm-project/issues/109780
26 hours[lldb] Fix TestGlobalModuleCache.py for remote debugging (#111483)Igor Kudrin1-0/+1
`SBDebugger().Create()` returns a debugger with only the host platform in its platform list. If the test suite is running for a remote platform, it should be explicitly added and selected in the new debugger created within the test, otherwise, the test will fail because the host platform may not be able to launch the built binary.
26 hours[SandboxVec] Re-land "Use sbvec-passes flag to create a pipeline of Region ↵Jorge Gorbe Moya12-180/+220
passes after BottomUpVec. (#111223)" (#111772) https://github.com/llvm/llvm-project/pull/111223 was reverted because of a build failure with `-DBUILD_SHARED_LIBS=on`. The Passes component depends on Vectorizer (because PassBuilder needs to be able to instantiate SandboxVectorizerPass). This resulted in CMake doing this 1. when it builds lib/libLLVMVectorize.so.20.0git it adds lib/libLLVMSandboxIR.so.20.0git to the command line, because it's listed as a dependency (as expected) 2. when it's trying to build lib/libLLVMPasses.so.20.0git it adds lib/libLLVMVectorize.so.20.0git to the command line, because it's listed as a dependency (also as expected). But not libLLVMSandboxIR.so. When SandboxVectorizerPass has its ctors/dtors defined inline, this caused "undefined reference to vtable" linker errors. This change works around that by moving ctors/dtors out of line. Also fix a bazel build problem by adding the new `llvm/lib/Transforms/Vectorize/SandboxVectorizer/Passes/PassRegistry.def` as a textual header in the Vectorizer target.
27 hoursSelectionDAG/expandFMINNUM_FMAXNUM: skips vector if SETCC/VSELECT is not ↵YunQiang Su5-520/+258
legal (#109570) If SETCC or VSELECT is not legal for vector, we should not expand it, instead we can split the vectors. So that, some simple scale instructions can be emitted instead of some pairs of comparation+selection.
27 hours[ConstantFold] Fold `logb` and `logbf` when the input parameter is a ↵c8ef2-3/+121
constant value. (#111232) This patch adds support for constant folding for the `logb` and `logbf` libc functions.
28 hoursImprove type lookup using .debug_names parent chain (#108907)jeffreytan815-3/+196
## Summary This PR improves `SymbolFileDWARF::FindTypes()` by utilizing the newly added parent chain `DW_IDX_parent` in `.debug_names`. The proposal was originally discussed in [this RFC](https://discourse.llvm.org/t/rfc-improve-dwarf-5-debug-names-type-lookup-parsing-speed/74151). ## Implementation To leverage the parent chain for `SymbolFileDWARF::FindTypes()`, this PR adds a new API: `GetTypesWithQuery` in `DWARFIndex` base class. The API performs the same function as `GetTypes` with additional filtering using `TypeQuery`. Since this only introduces filtering, the callback mechanisms at all call sites remain unchanged. A default implementation is given in `DWARFIndex` class which parses debug info and performs the matching. In the `DebugNameDWARFIndex` override, the parent_contexts in the `TypeQuery` is cross checked with parent chain in `.debug_names` for for much faster filtering before fallback to base implementation for final filtering. Unlike the `GetFullyQualifiedType` API, which fully consumes the `DW_IDX_parent` parent chain for exact matching, these new APIs perform partial subset matching for type/namespace queries. This is necessary to support queries involving anonymous or inline namespaces. For instance, a user might request `NS1::NS2::NS3::Foo`, while the index table's parent chain might contain `NS1::inline_NS2::NS3::Foo`, which would fail exact matching. ## Performance Results In one of our internal target using `.debug_names` + split dwarf. Expanding a "this" pointer in locals view in VSCode: 94s => 48s. (Not sure why I got 94s this time instead of 70s last week). --------- Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
28 hours[AMDGPU] Remove some lit check linesJeffrey Byrnes2-15/+1
Change-Id: I77e72d23d41095b8fcc47996d8004f9e264968de
28 hoursFix build failure for [CGData][ThinLTO] Global Outlining with Two-CodeGen ↵Kyungwoo Lee1-1/+1
Rounds (#90933)
28 hours[NFC] [MTE] Improve readability of AArch64GlobalsTagging (#111580)Florian Mayer1-27/+16
`shouldTagGlobal` doesn't sound like it should modify anything, so don't do that. Remove unused code. Use SmallVector over std::vector
29 hours[CGData][ThinLTO] Global Outlining with Two-CodeGen Rounds (#90933)Kyungwoo Lee14-16/+689
This feature is enabled by `-codegen-data-thinlto-two-rounds`, which effectively runs the `-codegen-data-generate` and `-codegen-data-use` in two rounds to enable global outlining with ThinLTO. 1. The first round: Run both optimization + codegen with a scratch output. Before running codegen, we serialize the optimized bitcode modules to a temporary path. 2. From the scratch object files, we merge them into the codegen data. 3. The second round: Read the optimized bitcode modules and start the codegen only this time. Using the codegen data, the machine outliner effectively performs the global outlining. Depends on #90934, #110461 and #110463. This is a patch for https://discourse.llvm.org/t/rfc-enhanced-machine-outliner-part-2-thinlto-nolto/78753.
29 hours[NFC][clang-tidy] Add type annotations to rename_check.py (#108443)Nicolas van Kempen1-14/+18
``` > python3 -m mypy --strict clang-tools-extra/clang-tidy/rename_check.py Success: no issues found in 1 source file ```
29 hours[clang-tidy][performance-move-const-arg] Fix crash when argument type has no ↵Nicolas van Kempen3-2/+21
definition (#111472) Fix #111450.
29 hoursReapply "[AMDGPU][GlobalISel] Fix load/store of pointer vectors, buffer.*.pN ↵Krzysztof Drewniak12-275/+4015
(#110714)" v2 (#111708) This adds `-disable-gisel-legality-check` to some gfx6 and gfx7 test lines to prevent behavior mismatches between debug and release builds The first attempted reapply was #111059 This reverts commit e075dcf7d270fd52dc837163ff24e8c872dfeb49.
29 hoursRevert "Reapply "[Clang][Sema] Refactor collection of multi-level template ↵Krystian Stasiowski20-1005/+702
argument lists (#106585)" (#111173)" (#111766) This reverts commit 4da8ac34f76e707ab94380b94f616457cfd2cb83.
30 hours[clang][deps] Serialize JSON without creating intermediate objects (#111734)Jan Svoboda1-72/+87
The dependency scanner uses the `llvm::json` library for outputting the dependency information. Until now, it created an in-memory representation of the dependency graph using the `llvm::json::Object` hierarchy. This not only creates unnecessary copies of the data, but also forces lexicographical ordering of attributes in the output, both of which I'd like to avoid. This patch adopts the `llvm::json::OStream` API instead and reorders the attribute printing logic such that the existing lexicographical ordering is preserved (for now).
30 hoursRevert "[clang] Track function template instantiation from definition ↵Krystian Stasiowski12-169/+26
(#110387)" (#111764) This reverts commit 4336f00f2156970cc0af2816331387a0a4039317.
30 hours[lldb] Add missing include to SBLanguages.h (#111763)Chelsea Cassanova1-0/+2
SBLanguages.h uses a uint16_t but is missing the include for `<cstdint>`, if any file includes this without including that it will cause a build error so this commit adds this include.
30 hours[WebAssembly] Don't fold non-nuw add/sub in FastISel (#111278)Heejin Ahn3-1/+119
We should not fold one of add/sub operands into a load/store's offset when `nuw` (no unsigned wrap) is not present, because the address calculation, which adds the offset with the operand, does not wrap. This is handled correctly in the normal ISel: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyISelDAGToDAG.cpp#L328-L332 but not in FastISel. This positivity check in FastISel is not sufficient to avoid this case fully: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L348-L352 because 1. Even if RHS is within signed int range, depending on the value of the LHS, the resulting value can exceed uint32 max. 2. When one of the operands is a label, `Address` can contain a `GlobalValue` and a `Reg` at the same time, so the `GlobalValue` becomes incorrectly an offset: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L53-L69 https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/Target/WebAssembly/WebAssemblyFastISel.cpp#L409-L417 Both cases are in the newly added test. We should handle `SUB` too because `SUB` is the same as `ADD` when RHS's sign changes. I checked why our current normal ISel only handles `ADD`, and the reason it's OK for the normal ISel to handle only `ADD` seems that DAGCombiner replaces `SUB` with `ADD` here: https://github.com/llvm/llvm-project/blob/6de5305b3d7a4a19a29b35d481a8090e2a6d3a7e/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L3904-L3907 Fixes #111018.
30 hours[TTI] NFC: Port TLI.shouldSinkOperands to TTI (#110564)Jeffrey Byrnes29-871/+901
Porting to TTI provides direct access to the instruction cost model, which can enable instruction cost based sinking without introducing code duplication.
31 hours[Github] Switch vectorization PR label to vectorizers (#111633)Aiden Grossman1-1/+1
This changes the PR label to match the name of the subscriber team. Fixes #111485.
31 hours[mlir][openacc] Update verifier to catch missing device type attribute (#111586)Valentin Clement (バレンタイン クレメン)2-11/+21
Operands with device_type support need the corresponding attribute but this was not catches in the verifier if it was missing. The custom parser usually constructs it but creating the op from python could lead to a segfault in the printer. This patch updates the verifier so we catch this early on.
31 hours[RISCV] Use MCStreamer::emitInstruction instead of calling ↵Craig Topper1-1/+1
AsmPrinter::EmitToStreamer. NFC (#111714) This allows us to pass the STI we already have cached instead of AsmPrinter::EmitToStreamer looking it up from the MachineFunction again. My plan is to make EmitHwasanMemaccessSymbols use RISCVAsmPrinter::EmitToStreamer instead of calling MCStreamer::emitInstruction. To do that I need control of the MCSubtargetInfo.
32 hours[MLIR] Don't build MLIRExecutionEngineShared on Windows (#109524)Zentrik1-1/+1
This disabled the build of `MLIRExecutionEngineShared` because this causes linkage issues in windows for currently unknown reasons. Related issue: https://github.com/llvm/llvm-project/issues/106859.
32 hours[SLP]Initial support for non-power-of-2 (but whole reg) vectorization for storesAlexey Bataev2-39/+58
Allows non-power-of-2 vectorization for stores, but still requires, that vectorized number of elements forms full vector registers. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: https://github.com/llvm/llvm-project/pull/111194
32 hours[flang][OpenMP] Treat POINTER variables as valid variable list items (#111722)Krzysztof Parzyszek3-3/+21
Follow-up to 418920b3fbdefec5b56ee2b9db96884d0ada7329, which started diagnosing the legality of objects in OpenMP clauses (and caused some test failures).
32 hours[libc++abi] Rename abort_message to __abort_message (#111413)Petr Hosek11-25/+25
This is an internal API and the name should reflect that. This is a reland of #108887.
32 hours[flang] Link libflangPasses against correct librariesTarun Prabhu1-0/+5
libflangPasses.so was not linked against the correct libraries which caused a build failure with -DBUILD_SHARED_LIBS=On. Fixes #110425
32 hours[libc++][NFC] Remove obsolete --osx-roots parameter to run-buildbotLouis Dionne1-8/+0
That isn't used anymore since we now run backdeployment testing on the target system directly instead of using pre-packaged roots.
32 hours[libc++] Narrow the exports for common_type (#111681)Louis Dionne1-3/+3
Based on a comment in #99473, it seems like `export *` may be overkill.
33 hours[Coroutines] Move util headers to include/llvm (#111599)Tyler Nowicki16-83/+85
Plugin libraries that use coroutines can do so right now, however, to provide their own ABI they need to be able to use various headers, some of which such are required (such as the ABI header). This change exposes the coro utils and required headers by moving them to include/llvm/Transforms/Coroutines. My experience with our out-of-tree plugin ABI has been that at least these headers are needed. The headers moved are: * ABI.h (ABI object) * CoroInstr.h (helpers) * Coroshape.h (Shape object) * MaterializationUtils.h (helpers) * SpillingUtils.h (helpers) * SuspendCrossingInfo.h (analysis) This has no code changes other than those required to move the headers and these are: * include guard name changes * include path changes * minor clang-format induced changes * removal of LLVM_LIBRARY_VISIBILITY
33 hours[lldb] Use SEND_ERROR instead of FATAL_ERROR in test/CMakeLists.txt (#111729)Jonas Devlieghere1-6/+6
Use SEND_ERROR (continue processing, but skip generation) instead of FATAL_ERROR (stop processing and generation). This means that developers get to see all errors at once, instead of seeing just the first error and having to reconfigure to discover the next one.
33 hours[CIR] Build out AST consumer patterns to reach the entry point into CIRGenNathan Lanza16-1/+440
Build out the necessary infrastructure for the main entry point into ClangIR generation -- CIRGenModule. A set of boilerplate classes exist to facilitate this -- CIRGenerator, CIRGenAction, EmitCIRAction and CIRGenConsumer. These all mirror the corresponding types from LLVM generation by Clang's CodeGen. The main entry point to CIR generation is `CIRGenModule::buildTopLevelDecl`. It is currently just an empty function. We've added a test to ensure that the pipeline reaches this point and doesn't fail, but does nothing else. This will be removed in one of the subsequent patches that'll add basic `cir.func` emission. This patch also re-adds `-emit-cir` to the driver. lib/Driver/Driver.cpp requires that a driver flag exists to facilirate the selection of the right actions for the driver to create. Without a driver flag you get the standard behaviors of `-S`, `-c`, etc. If we want to emit CIR IR and, eventually, bytecode we'll need a driver flag to force this. This is why `-emit-llvm` is a driver flag. Notably, `-emit-llvm-bc` as a cc1 flag doesn't ever do the right thing. Without a driver flag it is incorrectly ignored and an executable is emitted. With `-S` a file named `something.s` is emitted which actually contains bitcode. Reviewers: AaronBallman, MaskRay, bcardosolopes Reviewed By: bcardosolopes, AaronBallman Pull Request: https://github.com/llvm/llvm-project/pull/91007
33 hours[clang][bytecode] Fix source range of uncalled base dtor (#111683)Timm Baeder2-2/+4
Make this emit the same source range as the current interpreter.
34 hoursRevert "[SandboxVectorizer] Use sbvec-passes flag to create a pipeline of ↵Jorge Gorbe Moya11-204/+183
Region passes after BottomUpVec." (#111727) Reverts llvm/llvm-project#111223 It broke one of the build bots: LLVM Buildbot has detected a new failure on builder flang-aarch64-libcxx running on linaro-flang-aarch64-libcxx while building llvm at step 5 "build-unified-tree". Full details are available at: https://lab.llvm.org/buildbot/#/builders/89/builds/8127
33 hours[test] remove profile file at the start of ↵Wael Yehia1-0/+1
profile/instrprof-write-file-atexit-explicitly.c
34 hoursAMDGPU: Fix incorrectly selecting fp8/bf8 conversion intrinsics (#107291)Matt Arsenault4-5/+117
Trying to codegen these on targets without the instructions should fail to select. Not sure if all the predicates are correct. We had a fake one disconnected to a feature which was always true. Fixes: SWDEV-482274
34 hours[SandboxVectorizer] Use sbvec-passes flag to create a pipeline of Region ↵Jorge Gorbe Moya11-183/+204
passes after BottomUpVec. (#111223) The main change is that the main SandboxVectorizer pass no longer has a pipeline of function passes. Now it is a wrapper that creates sandbox IR from functions before calling BottomUpVec. BottomUpVec now builds its own RegionPassManager from the `sbvec-passes` flag, using a PassRegistry.def file. For now, these region passes are not run (BottomUpVec doesn't create Regions yet), and only a null pass for testing exists. This commit also changes the ownership model for sandboxir::PassManager: instead of having a PassRegistry that owns passes, and PassManagers that contain non-owning pointers to the passes, now PassManager owns (via unique pointers) the passes it contains. PassRegistry is now deleted, and the logic to parse and create a pass pipeline is now in PassManager::setPassPipeline.
34 hours[SandboxVec][DAG] Drop RAR and fix dependency scanning loop (#111715)vporpo2-9/+6
34 hours[libc][math] Implement `issignaling` and `iscanonical` macro. (#111403)Shourya Goel9-141/+76
#109201
34 hours[AMDGPU] Fix expensive checkJeffrey Byrnes1-1/+1
Change-Id: I0b26d5db6d3da8936ab25ee2b1e9002840b9853e
34 hours[Clang][OpenMP] Do not use feature option during packaging (#111702)Saiyedul Islam2-9/+2
Clang-offload-packager allows packaging of images based on an arbitrary list of key-value pairs where only triple-key is mandatory. Using target features as a key during packaging is not correct, as clang does not allow packaging multiple images in one binary which only differ in a target feature. TargetID features (xnack and sramecc) anyways are handled using arch-key and not as target features.
34 hours[SimplifyCFG][NFC] Improve compile time for ↵Amara Emerson1-7/+4
TryToSimplifyUncondBranchFromEmptyBlock optimization. (#110715) In some pathological cases this optimization can spend an unreasonable amount of time populating the set for predecessors of the successor block. This change sinks some of that initializing to the point where it's actually necessary so we can take advantage of the existing early-exits. rdar://137063034
34 hours[mlir][xegpu] Allow out-of-bounds writes (#110811)Adam Siemieniuk2-18/+22
Relaxes vector.transfer_write lowering to allow out-of-bound writes. This aligns lowering with the current hardware specification which does not update bytes in out-of-bound locations during block stores.
34 hours[AMDGPU] Optionally Use GCNRPTrackers during scheduling (#93090)Jeffrey Byrnes12-79/+1672
This adds the ability to use the GCNRPTrackers during scheduling. These trackers have several advantages over the generic trackers: 1. global live-thru trackers, 2. subregister based RP deltas, and 3. flexible vreg -> PressureSet mappings. This feature is off-by-default to ease with the roll-out process. In particular, when using the optional trackers, the scheduler will still maintain the generic trackers leading to unnecessary compile time.
35 hours[Sema] Support negation/parens with __builtin_available (#111439)George Burgess IV2-12/+64
At present, `__builtin_available` is really restrictive with its use. Overall, this seems like a good thing, since the analyses behind it are not very expensive. That said, it's very straightforward to support these two cases: ``` if ((__builtin_available(foo, *))) { // ... } ``` and ``` if (!__builtin_available(foo, *)) { // ... } else { // ... } ``` Seems nice to do so.
35 hours[gn build] Remove unix x86 stage2 toolchainArthur Eubanks1-1/+0
It's breaking the bots, e.g. http://45.33.8.238/linux/149792/step_3.txt
35 hours[ARM] Honour -mno-movt in stack protector handling (#109022)Ard Biesheuvel2-1/+36
When -mno-movt is passed to Clang, the ARM codegen correctly avoids movt/movw pairs to take the address of __stack_chk_guard in the stack protector code emitted into the function pro- and epilogues. However, the Thumb2 codegen fails to do so, and happily emits movw/movt pairs unless it is generating an ELF binary and the symbol might be in a different DSO. Let's incorporate a check for useMovt() in the logic here, so movt/movw are never emitted when -mno-movt is specified. Suggestions welcome for how/where to add a test case for this. Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
35 hours[gn build] Fix up win/x86 flags and add stage2_unix_x86 (#111595)Arthur Eubanks6-6/+23