Age | Commit message (Collapse) | Author | Files | Lines |
|
pipeline extension point
These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.
In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in https://github.com/llvm/llvm-project/pull/66488.
|
|
(#101810)
When store chains have the same value type ID and pointer type ID, they
may mix different sizes of values, such as i8 and i64. This can lead to
missed vectorization opportunities.
|
|
(#100952)"
This reverts commit 874cd100a076f3b98aaae09f90ef224682501538.
|
|
This reverts commit 46307f1a84bf832f32938c8ad2dc0605441a5319.
90ccf21 was reverted in 030ee841a9c.
|
|
|
|
This reverts commit 90ccf2187332ff900d46a58a27cb0353577d37cb.
Fixes: https://github.com/llvm/llvm-project/issues/100212
|
|
When we check if an access can be skipped, there is a case that an
inter-procedural interference access exists after a dominant write.
Currently we
rely on `AAInterFnReachability` to tell if the access can be reachable.
If it is
not, we can safely skip the access. However, it is based on an
assumption that
the AA exists. It is possible that the AA doesn't exist. In this case,
we can't
safely assume the acess can be skipped because we have to assume the
access can
reach. This can happen when `AAInterFnReachability` is not in the
allowed AA
list when creating the attributor, such as AMDGPUAttributor.
Co-authored-by: Mark de Wever <koraq@xs4all.nl>
|
|
Currently, with the following example,
$ cat t.c
void foo(int a, _Atomic int *b)
{
*b &= a;
}
$ clang --target=bpf -O2 -c -mcpu=v3 t.c
$ llvm-objdump -d t.o
t.o: file format elf64-bpf
Disassembly of section .text:
0000000000000000 <foo>:
0: c3 12 00 00 51 00 00 00 <unknown>
1: 95 00 00 00 00 00 00 00 exit
Basically, the default cpu for llvm-objdump is v1 and it won't be able
to decode insn properly.
If we add --mcpu=v3 to llvm-objdump command line, we will have
$ llvm-objdump -d --mcpu=v3 t.o
t.o: file format elf64-bpf
Disassembly of section .text:
0000000000000000 <foo>:
0: c3 12 00 00 51 00 00 00 w1 = atomic_fetch_and((u32 *)(r2 + 0x0), w1)
1: 95 00 00 00 00 00 00 00 exit
The atomic_fetch_and insn can be decoded properly. Using latest cpu
version --mcpu=v4 can also decode properly like the above --mcpu=v3.
To avoid the above '<unknown>' decoding with common 'llvm-objdump -d
t.o', this patch marked the default cpu for llvm-objdump with the
current highest cpu number v4 in ELFObjectFileBase::tryGetCPUName(). The
cpu number in ELFObjectFileBase::tryGetCPUName() will be adjusted in the
future if cpu number is increased e.g. v5 etc. Such an approach also
aligns with gcc-bpf as discussed in [1].
Six bpf unit tests are affected with this change. I changed test output
for three unit tests and added --mcpu=v1 for the other three unit tests,
to demonstrate the default (cpu v4) behavior and explicit --mcpu=v1
behavior.
[1]
https://lore.kernel.org/bpf/6f32c0a1-9de2-4145-92ea-be025362182f@linux.dev/T/#m0f7e63c390bc8f5a5523e7f2f0537becd4205200
Co-authored-by: Yonghong Song <yonghong.song@linux.dev>
|
|
This PR fixes the ASAN failure in
https://github.com/llvm/llvm-project/pull/90930.
The original PR made the assumption that parent
`ThreadPlanStepOverRange`'s lifetime will always be longer than
`ThreadPlanSingleThreadTimeout` leaf plan so it passes the
`m_timeout_info` as reference to it.
From the ASAN failure, it seems that this assumption may not be true
(likely the thread stack is holding a strong reference to the leaf
plan).
This PR fixes this lifetime issue by using shared pointer instead of
passing by reference.
---------
Co-authored-by: jeffreytan81 <jeffreytan@fb.com>
|
|
Add support to detect __size_returning_new variants defined inproposal
P0901R5 to extend to operator new, see
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p0901r5.html for
details.
This PR matches the declarations exported by tcmalloc in
https://github.com/google/tcmalloc/blob/f2516691d01051defc558679f37720bba88d9862/tcmalloc/malloc_extension.h#L707-L711
|
|
We were forgetting to pass the `TypeLocBuilder` along to
`TransformType`, causing us to complain if we then tried to build a
`DependentAddressSpaceTypeLoc` because the inner `TypeLoc` was
missing from the TLB.
Fixes #101685.
|
|
clang-15 can produce binaries with mismatched RememberState/RestoreState
CFIs. This is benign for unwinding, so replace an assert with a warning.
|
|
When we have a modifier with a value (like dst_sel:DWORD for example) we
only accept symbolic values. SP3 allows to use numberic constants as
well. Adding a helper function to allow both.
Besides the compatibility it is easier to use.
|
|
This patch fixes https://github.com/llvm/llvm-project/issues/17204.
If a base pointer is used in a function, and it is clobbered by an
instruction (typically an inline asm), current register allocator can't
handle this situation, so BP becomes garbage after those instructions.
It can also occur to FP in theory.
We can spill and reload FP/BP registers around those instructions. But
normal spill/reload instructions also use FP/BP, so we can't spill them
into normal spill slots, instead we spill them into the top of stack by
using SP register.
|
|
(#100512)
Currently, `%17 = fir.box_elesize %16 :
(!fir.class<!fir.ptr<!fir.type<_QFTt{a:i32,b:i32}>>>) -> i32`
is translated to
```
%4 = getelementptr { ptr, i64, i32, i8, i8, i8, i8, ptr, [1 x i64] }, ptr %1, i32 0, i32 1
%5 = load i32, ptr %4, align 4
```
The type of the element size is `i64`. The load essentially truncates
the value and yields incorrect result in the big endian environment. The
problem occurs in the `storage_size` intrinsic on a polymorphic
variable.
|
|
required post-LTO (#101894)
Fixes: https://github.com/emscripten-core/emscripten/issues/16836
|
|
|
|
- Keep track of last definition of a feature in a `DenseMap` and use
it to report a better error message when a duplicate feature is found.
- Use StringMap instead of a std::map in `EmitStageAndOperandCycleData`
- Add a unit test to check if duplicate names are flagged.
|
|
Reverts llvm/llvm-project#99299 because it breaks the lowering. To
repro: `mlir-opt -transform-interpreter ~/repro.mlir`
```mlir
#map = affine_map<(d0, d1) -> (d0)>
#map1 = affine_map<(d0, d1) -> (d1)>
#map2 = affine_map<(d0, d1) -> (d0, d1)>
#map3 = affine_map<(d0, d1) -> (d0 + d1)>
module {
func.func @foo(%arg0: index, %arg1: tensor<2xf32>, %arg2: tensor<4xf32>, %arg3: tensor<1xf32>) -> tensor<4x1xf32> {
%c0 = arith.constant 0 : index
%cst = arith.constant 1.000000e+00 : f32
%cst_0 = arith.constant 0.000000e+00 : f32
%0 = tensor.empty() : tensor<4x1xf32>
%1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel"]} ins(%arg2, %arg3 : tensor<4xf32>, tensor<1xf32>) outs(%0 : tensor<4x1xf32>) {
^bb0(%in: f32, %in_1: f32, %out: f32):
%2 = linalg.index 0 : index
%3 = linalg.index 1 : index
%4 = affine.apply #map3(%3, %arg0)
%extracted = tensor.extract %arg1[%c0] : tensor<2xf32>
%5 = arith.cmpi eq, %2, %c0 : index
%6 = arith.cmpi ult, %2, %c0 : index
%7 = arith.select %5, %cst, %in : f32
%8 = arith.select %6, %cst_0, %7 : f32
%9 = arith.cmpi eq, %4, %c0 : index
%10 = arith.cmpi ult, %4, %c0 : index
%11 = arith.select %9, %cst, %in_1 : f32
%12 = arith.select %10, %cst_0, %11 : f32
%13 = arith.mulf %8, %12 : f32
%14 = arith.mulf %13, %extracted : f32
%15 = arith.cmpi eq, %2, %4 : index
%16 = arith.select %15, %cst, %cst_0 : f32
%17 = arith.subf %16, %14 : f32
linalg.yield %17 : f32
} -> tensor<4x1xf32>
return %1 : tensor<4x1xf32>
}
}
module attributes {transform.with_named_sequence} {
transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
%0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!transform.any_op) -> !transform.any_op
transform.structured.vectorize %0 : !transform.any_op
transform.yield
}
}
```
|
|
CUDA unified variable where set to use the same allocator than managed
variable. This patch adds a specific allocator for the unified
variables. Currently it will call the managed allocator underneath but
we want to have the flexibility to change that in the future.
|
|
Rename namespace `Fortran::runtime::cuf` to `Fortran::runtime::cuda` to
avoid embiguity with the namespace `::cuf` that is defined in the CUF
dialect.
|
|
This patch changes the return type of `GetMetadata` from a
`ClangASTMetadata*` to a `std::optional<ClangASTMetadata>`. Except for
one call-site (`SetDeclIsForcefullyCompleted`), we never actually make
use of the mutability of the returned metadata. And we never make use of
the pointer-identity. By passing `ClangASTMetadata` by-value (the type
is fairly small, size of 2 64-bit pointers) we'll avoid some questions
surrounding the lifetimes/ownership/mutability of this metadata.
For consistency, we also change the parameter to `SetMetadata` from
`ClangASTMetadata&` to `ClangASTMetadata` (which is an NFC since we copy
the data anyway).
This came up during some changes we plan to make where we [create
redeclaration chains for decls in the LLDB
AST](https://github.com/llvm/llvm-project/pull/95100). We want to avoid
having to dig out the canonical decl of the declaration chain for
retrieving/setting the metadata. It should just be copied across all
decls in the chain. This is easier to guarantee when everything is done
by-value.
|
|
It is already handled in a different method, especially as a - UMIN(a,
b) cannot be handled by a select statement, unless it means something
like: "(c < b) ? b - ((b > c) ? c : b) : 0;" but LLVM handles that case
as well.
|
|
(#101906)
This adds the same kind of verification for ARM, as was added for
AArch64 in 1e7f592a890aad860605cf5220530b3744e107ba. This allows
catching issues at assembly time, instead of having the linker
misinterpret the relocations (as the linker ignores the symbol offset).
This verifies that the issue fixed by
8dd065d5bc81b0c8ab57f365bb169a5d92928f25 really is fixed, and points out
explicitly if the same issue appears elsewhere.
Note that the parameter Value in the adjustFixupValue function is offset
by 4 from the value that is stored as immediate in the instructions, so
we compare with 4, when we want to make sure that the written immediate
will be zero.
|
|
Fixes #97229.
|
|
|
|
Currently SLP vectorizer compares phi instructions by the type id of the
compared instructions, which may failed in case of different integer
types,
with the different sizes. Patch adds comparison by type sizes to fix
this.
|
|
(#97229)
|
|
When using the relative vtable ABI, if a vtable is not dso_local, it's
given private linkage (if not COMDAT) or hidden visibility (if COMDAT)
to make it dso_local (to place it in rodata instead of data.rel.ro), and
an alias generated with the original linkage and visibility. This alias
could later be removed from the symbol table, e.g. if using a version
script, at which point we lose all symbol information about the vtable.
Use internal linkage instead of private linkage to avoid this.
While I'm here, clarify the comment about why COMDAT vtables can't use
internal (or private) linkage, and associate it with the else block
where hidden visibility is applied instead of internal linkage.
|
|
Landingpad instruction must be the very first instruction after the phi
nodes, so need to inser extractelement/shuffles after this instruction.
Fixes https://github.com/llvm/llvm-project/issues/102187
|
|
|
|
|
|
|
|
- After ExpandVP pass is merged into PreISelIntrinsicLowering
|
|
|
|
This patch implements sandboxir::AllocaInst which mirrors
llvm::AllocaInst.
|
|
This patch adds entry point in the runtime to be able to allocate
descriptors in managed memory. These entry points currently only call
`CUFAllocManaged` and `CUFFreeManaged` but could be more complicated in
the future.
`cuf.alloc` and `cuf.free` related to local descriptors are converted
into runtime calls.
|
|
|
|
with stripped binaries (#99362)
@walter-erquinigo found the the [PR with testing and a fix for
DebugInfoD](https://github.com/llvm/llvm-project/pull/98344) caused an
issue when working with stripped binaries.
The issue is that when you're working with split-dwarf, there are *3*
possible files: The stripped binary the user is debugging, the
"only-keep-debug" *or* unstripped binary, plus the `.dwp` file. The
debuginfod plugin should provide the unstripped/OKD binary. However, if
the debuginfod plugin fails, the default symbol locator plugin will just
return the stripped binary, which doesn't help. So, to address that, the
SymbolVendorELF code checks to see if the SymbolLocator's
ExecutableObjectFile request returned the same file, and bails if that's
the case. You can see the specific diff as the second commit in the PR.
I'm investigating adding a test: I can't quite get a simple repro, and
I'm unwilling to make any additional changes to Makefile.rules to this
diff, for Pavlovian reasons.
|
|
Summary:
The intention behind this code was to null terminate the `envp` string,
but it accidentally went into the string data.
|
|
Currently, the only way to see the passes that were registered is by
calling “mlir-opt --help”. However, for compilers with 500+ passes, the
help message becomes too long and sometimes hard to understand. In this
PR I add a new "--list-passes" option to mlir-opt, which can be used for
printing only the registered passes, a feature that would be extremely
useful.
|
|
Closes https://github.com/dtcxzyw/llvm-tools/issues/22.
|
|
This adds addressof at the required places in [input.output]. Some of
the new tests failed since string used operator& internally. These have
been fixed too.
Note the new fstream tests perform output to a basic_string instead of a
double. Using a double requires num_get specialization
num_get<CharT, istreambuf_iterator<CharT,
char_traits_operator_hijacker<CharT>>
This facet is not present in the locale database so the conversion would
fail due to a missing locale facet. Using basic_string avoids using the
locale.
As a drive-by fixes several bugs in the ofstream.cons tests. These
tested ifstream instead of ofstream with an open mode.
Implements:
- LWG3130 [input.output] needs many addressof
Closes #100246.
|
|
replaceIncomingBlockWith and removeIncomingValueIf are both
straightforward and done.
I'll defer copyIncomingBlocks until a couple of other changes that also
handle blocks go in.
|
|
Changes the loop range to match similar tests and avoids zero
iterations. The original motivation to reduce the number of iterations
was to allow the test to be executed during constant evaluation.
Fixes: https://github.com/llvm/llvm-project/issues/100502
|
|
(#98551)
This PR Builds on #98022 .
It adds support for Volta's SequentiallyConsistent Load and Store
operations at system scope.
|
|
expandVectorPredication may change code, even if the intrinsic itself
remains in the code. Report changes whenever such an intrinsic is
encountered, because code could have been changed.
Another follow-up fix for #101652 to fix expensive-checks-only failure.
|
|
|
|
#101089 (#101253)
- added all variations of ffma and fdiv
- will add all new headers into yaml for next patch
- only fsub is left then all basic operations for float is complete
---------
Co-authored-by: OverMighty <its.overmighty@gmail.com>
|
|
These would always be uniform anyway, but it shouldn't hurt to
mark them as always uniform. This will help use TTI::isAlwaysUniform
in place of proper uniformity analysis in trivial situations.
|