aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-08-06[LLVM][PassBuilder] Extend the function signature of callback for optimizer ↵users/shiltian/extend-callback-signatureShilei Tian7-46/+64
pipeline extension point These callbacks can be invoked in multiple places when building an optimization pipeline, both in compile time and link time. However, there is no indicator on what pipeline it is currently building. In this patch, an extra argument is added to indicate its (Thin)LTO stage such that the callback can check it if needed. There is no test expected from this, and the benefit of this change will be demonstrated in https://github.com/llvm/llvm-project/pull/66488.
2024-08-07[SLP] The order of store chains needs to consider the size of the values. ↵tcwzxx2-9/+7
(#101810) When store chains have the same value type ID and pointer type ID, they may mix different sizes of values, such as i8 and i64. This can lead to missed vectorization opportunities.
2024-08-06Reapply "[Attributor][AMDGPU] Enable AAIndirectCallInfo for AMDAttributor ↵Shilei Tian8-13/+95
(#100952)" This reverts commit 874cd100a076f3b98aaae09f90ef224682501538.
2024-08-06Revert "[gn build] Manually port 90ccf21"Nico Weber1-18/+5
This reverts commit 46307f1a84bf832f32938c8ad2dc0605441a5319. 90ccf21 was reverted in 030ee841a9c.
2024-08-06[gn] port e77ac42bccb8Nico Weber1-0/+7
2024-08-06Revert "[LinkerWrapper] Extend with usual pass options (#96704)" (#102226)Joseph Huber3-91/+0
This reverts commit 90ccf2187332ff900d46a58a27cb0353577d37cb. Fixes: https://github.com/llvm/llvm-project/issues/100212
2024-08-06[Attributor] Fix an issue that an access is skipped by mistake (#101862)Shilei Tian2-14/+87
When we check if an access can be skipped, there is a case that an inter-procedural interference access exists after a dominant write. Currently we rely on `AAInterFnReachability` to tell if the access can be reachable. If it is not, we can safely skip the access. However, it is based on an assumption that the AA exists. It is possible that the AA doesn't exist. In this case, we can't safely assume the acess can be skipped because we have to assume the access can reach. This can happen when `AAInterFnReachability` is not in the allowed AA list when creating the attributor, such as AMDGPUAttributor. Co-authored-by: Mark de Wever <koraq@xs4all.nl>
2024-08-06[BPF] Make llvm-objdump disasm default cpu v4 (#102166)yonghong-song7-14/+16
Currently, with the following example, $ cat t.c void foo(int a, _Atomic int *b) { *b &= a; } $ clang --target=bpf -O2 -c -mcpu=v3 t.c $ llvm-objdump -d t.o t.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <foo>: 0: c3 12 00 00 51 00 00 00 <unknown> 1: 95 00 00 00 00 00 00 00 exit Basically, the default cpu for llvm-objdump is v1 and it won't be able to decode insn properly. If we add --mcpu=v3 to llvm-objdump command line, we will have $ llvm-objdump -d --mcpu=v3 t.o t.o: file format elf64-bpf Disassembly of section .text: 0000000000000000 <foo>: 0: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1) 1: 95 00 00 00 00 00 00 00 exit The atomic_fetch_and insn can be decoded properly. Using latest cpu version --mcpu=v4 can also decode properly like the above --mcpu=v3. To avoid the above '<unknown>' decoding with common 'llvm-objdump -d t.o', this patch marked the default cpu for llvm-objdump with the current highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted in the future if cpu number is increased e.g. v5 etc. Such an approach also aligns with gcc-bpf as discussed in [1]. Six bpf unit tests are affected with this change. I changed test output for three unit tests and added --mcpu=v1 for the other three unit tests, to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1 behavior. [1] https://lore.kernel.org/bpf/6f32c0a1-9de2-4145-92ea-be025362182f@linux.dev/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200 Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
2024-08-06Fix ASAN failure in TestSingleThreadStepTimeout.py (#102208)jeffreytan813-19/+25
This PR fixes the ASAN failure in https://github.com/llvm/llvm-project/pull/90930. The original PR made the assumption that parent `ThreadPlanStepOverRange`'s lifetime will always be longer than `ThreadPlanSingleThreadTimeout` leaf plan so it passes the `m_timeout_info` as reference to it. From the ASAN failure, it seems that this assumption may not be true (likely the thread stack is holding a strong reference to the leaf plan). This PR fixes this lifetime issue by using shared pointer instead of passing by reference. --------- Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
2024-08-06Add __size_returning_new variant detection to TLI. (#101564)Snehasish Kumar4-5/+40
Add support to detect __size_returning_new variants defined inproposal P0901R5 to extend to operator new, see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0901r5.html for details. This PR matches the declarations exported by tcmalloc in https://github.com/google/tcmalloc/blob/f2516691d01051defc558679f37720bba88d9862/tcmalloc/malloc_extension.h#L707-L711
2024-08-07[Clang] Fix crash when transforming a `DependentAddressSpaceType` (#102206)Sirraide3-4/+17
We were forgetting to pass the `TypeLocBuilder` along to `TransformType`, causing us to complain if we then tried to build a `DependentAddressSpaceTypeLoc` because the inner `TypeLoc` was missing from the TLB. Fixes #101685.
2024-08-06[BOLT] Turn non-empty CFI StateStack assert into a warning (#102216)Amir Ayupov1-1/+4
clang-15 can produce binaries with mismatched RememberState/RestoreState CFIs. This is benign for unwinding, so replace an assert with a warning.
2024-08-06[AMDGPU] Add parseStringOrIntWithPrefix helper in asm parser (#102213)Stanislav Mekhanoshin4-72/+108
When we have a modifier with a value (like dst_sel:DWORD for example) we only accept symbolic values. SP3 allows to use numberic constants as well. Adding a helper function to allow both. Besides the compatibility it is easier to use.
2024-08-06Spill/restore FP/BP around instructions in which they are clobbered (#81048)weiguozhi17-6/+783
This patch fixes https://github.com/llvm/llvm-project/issues/17204. If a base pointer is used in a function, and it is clobbered by an instruction (typically an inline asm), current register allocator can't handle this situation, so BP becomes garbage after those instructions. It can also occur to FP in theory. We can spill and reload FP/BP registers around those instructions. But normal spill/reload instructions also use FP/BP, so we can't spill them into normal spill slots, instead we spill them into the top of stack by using SP register.
2024-08-06[flang] Match the type of the element size in the box in getValueFromBox ↵Kelvin Li6-30/+46
(#100512) Currently, `%17 = fir.box_elesize %16 : (!fir.class<!fir.ptr<!fir.type<_QFTt{a:i32,b:i32}>>>) -> i32` is translated to ``` %4 = getelementptr { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, ptr %1, i32 0, i32 1 %5 = load i32, ptr %4, align 4 ``` The type of the element size is `i64`. The load essentially truncates the value and yields incorrect result in the big endian environment. The problem occurs in the `storage_size` intrinsic on a polymorphic variable.
2024-08-06[lld][WebAssembly] Fix stub library deps causing LTO archive members to be ↵Sam Clegg3-8/+28
required post-LTO (#101894) Fixes: https://github.com/emscripten-core/emscripten/issues/16836
2024-08-06[nfc][ctx_prof] Rename `PGOContextualProfile` to `PGOCtxProfContext` (#102209)Mircea Trofin3-29/+28
2024-08-06[TableGen] Emit better error message for duplicate Subtarget features. (#102090)Rahul Joshi2-8/+23
- Keep track of last definition of a feature in a `DenseMap` and use it to report a better error message when a duplicate feature is found. - Use StringMap instead of a std::map in `EmitStageAndOperandCycleData` - Add a unit test to check if duplicate names are flagged.
2024-08-06Revert "[mlir][linalg] Relax tensor.extract vectorization" (#102232)Han-Chung Wang2-71/+20
Reverts llvm/llvm-project#99299 because it breaks the lowering. To repro: `mlir-opt -transform-interpreter ~/repro.mlir` ```mlir #map = affine_map<(d0, d1) -> (d0)> #map1 = affine_map<(d0, d1) -> (d1)> #map2 = affine_map<(d0, d1) -> (d0, d1)> #map3 = affine_map<(d0, d1) -> (d0 + d1)> module { func.func @foo(%arg0: index, %arg1: tensor<2xf32>, %arg2: tensor<4xf32>, %arg3: tensor<1xf32>) -> tensor<4x1xf32> { %c0 = arith.constant 0 : index %cst = arith.constant 1.000000e+00 : f32 %cst_0 = arith.constant 0.000000e+00 : f32 %0 = tensor.empty() : tensor<4x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel"]} ins(%arg2, %arg3 : tensor<4xf32>, tensor<1xf32>) outs(%0 : tensor<4x1xf32>) { ^bb0(%in: f32, %in_1: f32, %out: f32): %2 = linalg.index 0 : index %3 = linalg.index 1 : index %4 = affine.apply #map3(%3, %arg0) %extracted = tensor.extract %arg1[%c0] : tensor<2xf32> %5 = arith.cmpi eq, %2, %c0 : index %6 = arith.cmpi ult, %2, %c0 : index %7 = arith.select %5, %cst, %in : f32 %8 = arith.select %6, %cst_0, %7 : f32 %9 = arith.cmpi eq, %4, %c0 : index %10 = arith.cmpi ult, %4, %c0 : index %11 = arith.select %9, %cst, %in_1 : f32 %12 = arith.select %10, %cst_0, %11 : f32 %13 = arith.mulf %8, %12 : f32 %14 = arith.mulf %13, %extracted : f32 %15 = arith.cmpi eq, %2, %4 : index %16 = arith.select %15, %cst, %cst_0 : f32 %17 = arith.subf %16, %14 : f32 linalg.yield %17 : f32 } -> tensor<4x1xf32> return %1 : tensor<4x1xf32> } } module attributes {transform.with_named_sequence} { transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) { %0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op transform.structured.vectorize %0 : !transform.any_op transform.yield } } ```
2024-08-06[flang][cuda] Defined allocator for unified data (#102189)Valentin Clement (バレンタイン クレメン)4-3/+20
CUDA unified variable where set to use the same allocator than managed variable. This patch adds a specific allocator for the unified variables. Currently it will call the managed allocator underneath but we want to have the flexibility to change that in the future.
2024-08-06[flang][cuda][NFC] Disambiguate namespace with cuf dialect (#102194)Valentin Clement (バレンタイン クレメン)6-25/+25
Rename namespace `Fortran::runtime::cuf` to `Fortran::runtime::cuda` to avoid embiguity with the namespace `::cuf` that is defined in the CUF dialect.
2024-08-06[lldb][TypeSystemClang] Pass ClangASTMetadata around by value (#102161)Michael Buch7-46/+53
This patch changes the return type of `GetMetadata` from a `ClangASTMetadata*` to a `std::optional<ClangASTMetadata>`. Except for one call-site (`SetDeclIsForcefullyCompleted`), we never actually make use of the mutability of the returned metadata. And we never make use of the pointer-identity. By passing `ClangASTMetadata` by-value (the type is fairly small, size of 2 64-bit pointers) we'll avoid some questions surrounding the lifetimes/ownership/mutability of this metadata. For consistency, we also change the parameter to `SetMetadata` from `ClangASTMetadata&` to `ClangASTMetadata` (which is an NFC since we copy the data anyway). This came up during some changes we plan to make where we [create redeclaration chains for decls in the LLDB AST](https://github.com/llvm/llvm-project/pull/95100). We want to avoid having to dig out the canonical decl of the declaration chain for retrieving/setting the metadata. It should just be copied across all decls in the chain. This is easier to guarantee when everything is done by-value.
2024-08-06[InstCombine] (NFC) Remove improper TODO for a - UMIN (#101076)Rose Silicon1-1/+0
It is already handled in a different method, especially as a - UMIN(a, b) cannot be handled by a select statement, unless it means something like: "(c < b) ? b - ((b > c) ? c : b) : 0;" but LLVM handles that case as well.
2024-08-06[ARM] [Windows] Error out on branch relocations that require a symbol offset ↵Martin Storsjö2-0/+62
(#101906) This adds the same kind of verification for ARM, as was added for AArch64 in 1e7f592a890aad860605cf5220530b3744e107ba. This allows catching issues at assembly time, instead of having the linker misinterpret the relocations (as the linker ignores the symbol offset). This verifies that the issue fixed by 8dd065d5bc81b0c8ab57f365bb169a5d92928f25 really is fixed, and points out explicitly if the same issue appears elsewhere. Note that the parameter Value in the adjustFixupValue function is offset by 4 from the value that is stored as immediate in the instructions, so we compare with 4, when we want to make sure that the written immediate will be zero.
2024-08-06[Object][COFF] Use uintptr_t for getRvaPtr call in Arm64XRelocRef::validate.Jacek Caban1-1/+1
Fixes #97229.
2024-08-07AMDGPU: Fix using wrong alloca address space in test (#102108)Matt Arsenault4-30/+36
2024-08-06[SLP]Better sorting of phi instructions by comparing type sizes (#102188)Alexey Bataev2-6/+11
Currently SLP vectorizer compares phi instructions by the type id of the compared instructions, which may failed in case of different integer types, with the different sizes. Patch adds comparison by type sizes to fix this.
2024-08-06[Object][COFF][llvm-readobj] Add support for ARM64X dynamic relocations. ↵Jacek Caban7-4/+2443
(#97229)
2024-08-06[CodeGen] Make non-COMDAT relative vtable internal instead of private (#102056)Shoaib Meenai16-30/+33
When using the relative vtable ABI, if a vtable is not dso_local, it's given private linkage (if not COMDAT) or hidden visibility (if COMDAT) to make it dso_local (to place it in rodata instead of data.rel.ro), and an alias generated with the original linkage and visibility. This alias could later be removed from the symbol table, e.g. if using a version script, at which point we lose all symbol information about the vtable. Use internal linkage instead of private linkage to avoid this. While I'm here, clarify the comment about why COMDAT vtables can't use internal (or private) linkage, and associate it with the else block where hidden visibility is applied instead of internal linkage.
2024-08-06[SLP]Fix PR102187: do not insert extractelement before landingpad instruction.Alexey Bataev2-8/+62
Landingpad instruction must be the very first instruction after the phi nodes, so need to inser extractelement/shuffles after this instruction. Fixes https://github.com/llvm/llvm-project/issues/102187
2024-08-06[Attributor] Improve debug string of `AAUnderlyingObjects` (#101861)Shilei Tian1-8/+18
2024-08-06[AMDGPU] Enable `AAAddressSpace` in `AMDGPUAttributor` (#101593)Shilei Tian6-113/+96
2024-08-06[libc] Fix index into argument vectorJoseph Huber1-1/+1
2024-08-06[M68k] Fix compilation pipeline checkMichael Liao1-1/+0
- After ExpandVP pass is merged into PreISelIntrinsicLowering
2024-08-06[mlir][vector] Fix link in docs (nfc)Andrzej Warzynski1-2/+2
2024-08-06[SandboxIR] Implement AllocaInst (#102027)vporpo7-1/+453
This patch implements sandboxir::AllocaInst which mirrors llvm::AllocaInst.
2024-08-06[flang][cuda] Allocate local descriptor in managed memory (#102060)Valentin Clement (バレンタイン クレメン)6-9/+186
This patch adds entry point in the runtime to be able to allocate descriptors in managed memory. These entry points currently only call `CUFAllocManaged` and `CUFFreeManaged` but could be more complicated in the future. `cuf.alloc` and `cuf.free` related to local descriptors are converted into runtime calls.
2024-08-06[libc][math] Improve the error analysis and accuracy for pow function. (#102098)lntue2-27/+55
2024-08-06[lldb][debuginfod] Fix the DebugInfoD PR that caused issues when working ↵Kevin Frei13-21/+549
with stripped binaries (#99362) @walter-erquinigo found the the [PR with testing and a fix for DebugInfoD](https://github.com/llvm/llvm-project/pull/98344) caused an issue when working with stripped binaries. The issue is that when you're working with split-dwarf, there are *3* possible files: The stripped binary the user is debugging, the "only-keep-debug" *or* unstripped binary, plus the `.dwp` file. The debuginfod plugin should provide the unstripped/OKD binary. However, if the debuginfod plugin fails, the default symbol locator plugin will just return the stripped binary, which doesn't help. So, to address that, the SymbolVendorELF code checks to see if the SymbolLocator's ExecutableObjectFile request returned the same file, and bails if that's the case. You can see the specific diff as the second commit in the PR. I'm investigating adding a test: I can't quite get a simple repro, and I'm unwilling to make any additional changes to Makefile.rules to this diff, for Pavlovian reasons.
2024-08-06[libc] Fix GPU argument vector writing `nullptr` to stringJoseph Huber1-1/+1
Summary: The intention behind this code was to null terminate the `envp` string, but it accidentally went into the string data.
2024-08-06[mlir] Add --list-passes option to mlir-opt (#100420)Natan-GabrielTiutiuIntel4-0/+55
Currently, the only way to see the passes that were registered is by calling “mlir-opt --help”. However, for compilers with 500+ passes, the help message becomes too long and sometimes hard to understand. In this PR I add a new "--list-passes" option to mlir-opt, which can be used for printing only the registered passes, a feature that would be extremely useful.
2024-08-07[ConstantRange] Improve `shlWithNoWrap` (#101800)Yingwei Zheng3-14/+147
Closes https://github.com/dtcxzyw/llvm-tools/issues/22.
2024-08-06[libc++] Implements LWG3130. (#101889)Mark de Wever49-177/+941
This adds addressof at the required places in [input.output]. Some of the new tests failed since string used operator& internally. These have been fixed too. Note the new fstream tests perform output to a basic_string instead of a double. Using a double requires num_get specialization num_get<CharT, istreambuf_iterator<CharT, char_traits_operator_hijacker<CharT>> This facet is not present in the locale database so the conversion would fail due to a missing locale facet. Using basic_string avoids using the locale. As a drive-by fixes several bugs in the ofstream.cons tests. These tested ifstream instead of ofstream with an open mode. Implements: - LWG3130 [input.output] needs many addressof Closes #100246.
2024-08-06[SandboxIR] Implement missing PHINode functions (#101734)Sterling-Augustine4-7/+64
replaceIncomingBlockWith and removeIncomingValueIf are both straightforward and done. I'll defer copyIncomingBlocks until a couple of other changes that also handle blocks go in.
2024-08-06[libc++][chrono][test] Fixes bogus loops. (#101890)Mark de Wever1-2/+2
Changes the loop range to match similar tests and avoids zero iterations. The original motivation to reduce the number of iterations was to allow the test to be executed during constant evaluation. Fixes: https://github.com/llvm/llvm-project/issues/100502
2024-08-06[NVPTX] Add Volta Atomic SequentiallyConsistent Load and Store Operations ↵gonzalobg8-1223/+1332
(#98551) This PR Builds on #98022 . It adds support for Volta's SequentiallyConsistent Load and Store operations at system scope.
2024-08-06[CodeGen] Fix PreISelLowering not reporting changes (#102184)Alexis Engelke1-1/+3
expandVectorPredication may change code, even if the intrinsic itself remains in the code. Report changes whenever such an intrinsic is encountered, because code could have been changed. Another follow-up fix for #101652 to fix expensive-checks-only failure.
2024-08-06[Clang][Doc] Fix an error in `OpenMPSupport.rst`Shilei Tian1-1/+1
2024-08-06[libc][math][c23] Add ffma{,l,f128} and fdiv{,l,f128} C23 math functions ↵aaryanshukla39-11/+701
#101089 (#101253) - added all variations of ffma and fdiv - will add all new headers into yaml for next patch - only fsub is left then all basic operations for float is complete --------- Co-authored-by: OverMighty <its.overmighty@gmail.com>
2024-08-06AMDGPU: Add some leaf intrinsics to isAlwaysUniform (#101925)Matt Arsenault2-0/+37
These would always be uniform anyway, but it shouldn't hurt to mark them as always uniform. This will help use TTI::isAlwaysUniform in place of proper uniformity analysis in trivial situations.