Age | Commit message (Collapse) | Author | Files | Lines |
|
Use the constrained buffer load opcodes while combining under-aligned
load for XNACK enabled subtargets.
|
|
CodeGen/AMDGPU/merge-sbuffer-load.mir.
|
|
The common baremetal libc implementations already provide crt0.o and GCC
automatically links it so this improves parity.
|
|
It's more common to use `none` rather than `unknown` for the OS
component in Arm baremetal targets.
|
|
instructions" (#101612)
Reverts llvm/llvm-project#101452
There are several buildbot failed. Revert first.
|
|
Fixes #101138.
|
|
Fixes #101408.
|
|
instructions (#101452)
Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
|
|
Remove elementwise description for builtins that don't perform
elementwise operations.
|
|
Implement handling for `v8plus` feature bit to allow the user to switch
between V8 and V8+ mode with 32-bit code.
Currently this only sets the appropriate ELF machine type and flags;
codegen changes will be done in future patches.
This is done as a prerequisite for `-mv8plus` flag on clang (#98713).
|
|
Memcpy, and other memory intrinsics, typically try to use wider
load/store if the source and destination addresses are aligned. In
CodeGenPrepare, look for calls to memory intrinsics and, if the object
is on the stack, align it to 4-byte (32-bit) or 8-byte (64-bit)
boundaries if it is large enough that we expect memcpy to use wider
load/store instructions to copy it.
Fixes #101295
|
|
NFC (#101540)
Loads/stores/reinterpret/vfncvt.f.f.w/vfwcvt.f.f.v/vmerge/vmv.v.v are
all expected to work for f16 vectors with Zvfhmin.
Remove the handcrafted Zvfhmin test that partially tested this.
Splits the vfwcvt.f.f.v and vfncvt.f.f.w tests into their own file so we
can have a separate RUN line from the float<->int conversions.
|
|
`getAddressSpace`
|
|
|
|
non-zero address space (#101589)
|
|
|
|
|
|
(#101281)
By using DenseMap to minimize the traveral time of callOps, and the
efficiency of running this pass has been greatly improved.
|
|
struct SuperEmpty { struct{ int a[0];} b;};
Such 0 sized structs in c++ mode can not be ignored in i386 for that c++
fields are never empty.But when EmitVAArg, its size is 0, so that
va_list not increase.Maybe we can just Ignore this kind of arguments,
like X86_64 did. Fix #86385.
|
|
|
|
|
|
Currently a Module has a std::optional<UnwindTable> which is created
when the UnwindTable is requested from outside the Module. The idea is
to delay its creation until the Module has an ObjectFile initialized,
which will have been done by the time we're doing an unwind.
However, Module::GetUnwindTable wasn't doing any locking, so it was
possible for two threads to ask for the UnwindTable for the first time,
one would be created and returned while another thread would create one,
destroy the first in the process of emplacing it. It was an uncommon
crash, but it was possible.
Grabbing the Module's mutex would be one way to address it, but when
loading ELF binaries, we start creating the SymbolTable on one thread
(ObjectFileELF) grabbing the Module's mutex, and then spin up worker
threads to parse the individual DWARF compilation units, which then try
to also get the UnwindTable and deadlock if they try to get the Module's
mutex.
This changes Module to have a concrete UnwindTable as an ivar, and when
it adds an ObjectFile or SymbolFileVendor, it will call the Update
method on it, which will re-evaluate which sections exist in the
ObjectFile/SymbolFile. UnwindTable used to have an Initialize method
which set all the sections, and an Update method which would set some of
them if they weren't set. I unified these with the Initialize method
taking a `force` option to re-initialize the section pointers even if
they had been done already before.
This is addressing a rare crash report we've received, and also a
failure Adrian spotted on the -fsanitize=address CI bot last week, it's
still uncommon with ASAN but it can happen with the standard testsuite.
rdar://128876433
|
|
Follow up to #100923
|
|
- After 'lowerConstantIntrinsics' is merged into pre-isel lowering
|
|
This patch implements sandboxir::UnaryInstruction class and updates
sandboxir::LoadInst and sandboxir::CastInst to inherit from it instead
of sandboxir::Instruction.
|
|
This tutorial gives an introduction to the `mlir-opt` tool, focusing on
how to run basic passes with and without options, run pass pipelines
from the CLI, and point out particularly useful flags.
---------
Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
|
|
- Added the dialect's prefix to operations' descriptions to follow the
same style inside the TableGen file.
- Minor changes in the 'emitc.yield' operation's description.
|
|
Also edited file header formatting on sin_fuz and cos_fuzz
|
|
This patch adds support for verifying local type units in .debug_names
section. It adds a test to test if the TU index is valid, and a test
that tests that an error is found inside the name entry for a type unit.
We don't need to test all other errors in the name entry because these
are essentially identical to compile unit entries, they just use a
different DWARF unit offset index.
|
|
issues (#93115)
Fix codegen of consteval functions returning an empty class, and related
issues
If a class is empty, don't store it to memory: the store might overwrite
useful data. Similarly, if a class has tail padding that might overlap
other fields, don't store the tail padding to memory.
The problem here turned out a bit more general than I initially thought:
basically all uses of EmitAggregateStore were broken. Call lowering had
a method that did mostly the right thing, though: CreateCoercedStore.
Adapt CreateCoercedStore so it always does the conservatively right
thing, and use it for both calls and ConstantExpr.
Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was
set incorrectly for empty classes in some cases.
Fixes #93040.
|
|
__debugbreak(), __builtin_verbose_trap() (#101549)"
This reverts commit 667598d84b16d1789ce90b231565e9e7bfdbe77d and fixes failed tests: llvm/test/CodeGen/X86/nomerge.ll and llvm/test/MC/AArch64/local-bounds-single-trap.ll.
|
|
(#101400)
The kernel names for OpenMP are manually mangled and not ideal when we
report something to the user. We demangle them now, providing the
function and line number of the target region, together with the actual
kernel name.
|
|
|
|
|
|
__debugbreak(), __builtin_verbose_trap() (#101549)"
This reverts commit 5e84646982d1ec9bc94e48dde4b47f03c044a156, which
broke 'nomerge.ll' test on llvm bots.
|
|
To avoid breaking searchability of when a paper was implemented.
|
|
There were a few places where we didn't properly quote entries in the
CSV status pages, or where we followed inconsistent spacing. This causes
issue when trying to synchronize status pages with Github issues.
|
|
If the string is too long for a short string, we can simply check for
the long bit. If that's false we can do an early return. This improves
the code gen slightly.
|
|
~0.1% instruction count improvements
https://llvm-compile-time-tracker.com/compare.php?from=07d2709a17860a202d91781769a88837e4fb5f2a&to=d5cc47831ecd9f0a2b164b16da67f74b94e9aafc&stat=instructions:u
|
|
**Summary**:
When ASan checks for a potential ODR violation on a global it loops over
a linked list of all globals to find those with the matching value of an
indicator. With the default setting 'detect_odr_violation=1', ASan
doesn't report violations on same-size globals but it still has to
traverse the list. For larger binaries with a ton of shared libs and
globals (and a non-trivial volume of same-sized duplicates) this gets
extremely expensive.
This patch adds an indicator indexed (multi-)map of globals to speed up
the search.
> Note: asan used to use a map to store globals a while ago which was
replaced with a list when the codebase [moved off of
STL](https://github.com/llvm/llvm-project/commit/e4bada2c946e5399fc37bd67421de01c0047ad38).
Internally we see many examples where ODR checking takes *seconds* (even
double digits). With this patch it's practically free and
`__asan_register_globals` doesn't show up prominently in the perf
profile anymore.
There are several high-level questions:
1. I understand that the intent is that we hit the slow path rarely,
ideally once before the process dies with an error. But in practice we
hit the slow path a lot. It feels reasonable to keep the amount of work
bounded even in the worst case, even if it requires a bit of extra
memory. But if not, it'd be great to learn about the tradeoffs.
2. Poisoning based ODR checking remains on the slow path. Internally we
build everything with `-fsanitize-address-use-odr-indicator` so I'm not
sure if poisoning-based check would exhibit the same behavior (looking
at the code, the shape looks very similar, so it might?).
3. Globals with an ODR indicator of `-1` need to be skipped for the
purposes of ODR checking (cf.
https://github.com/llvm/llvm-project/commit/a257639a6935a2c63377784c5c9c3b73864a2582).
But they are still getting added to the list of globals and hence take
up space and slow down the iteration over the list of globals. It would
be a good saving if we could avoid adding them to the globals list.
4. Any reason to use a linked list instead of e.g. a vector to store
globals?
**Test Plan**:
* `cmake --build build --target check-asan` looks good
* Perf-wise things look good when linking against this version of
compiler-rt.
---------
Co-authored-by: Vitaly Buka <vitalybuka@google.com>
|
|
- fadd removed because I need to add for different input types
- finishing rest of basic operations
- noticed duplicates will remove
---------
Co-authored-by: OverMighty <its.overmighty@gmail.com>
|
|
|
|
We have existing code which reasons about a step evenly dividing the
iteration space is a finite loop with a single exit implying
no-self-wrap. The sign of the step doesn't effect this.
---------
Co-authored-by: Nikita Popov <github@npopov.com>
|
|
|
|
__debugbreak(), __builtin_verbose_trap() (#101549)
1. It fixes the problem that llvm.trap() not getting the nomerge
attribute.
2. It sets nomerge flag for the node if the instruction has nomerge
arrtibute.
This is a copy of https://reviews.llvm.org/D146164. This only attempts
to fix `nomerge` for `__builtin_trap()`, `__debugbreak()`,
`__builtin_verbose_trap()`, not working for non-trap builtins.
Fixes #53011
|
|
(#73451)"
This reverts commit 8d151f804ff43aaed1edf810bb2a07607b8bba14, which
broke some build bots. I think that is caused by an invalid argument
order when checking __is_comparable in upper_bound.
|
|
#100690 introduces allocator registry with the ability to store
allocator index in the descriptor. This patch adds an attribute to
fir.embox and fircg.ext_embox to be able to set the allocator index
while populating the descriptor fields.
|
|
As with other loops, we need only look at a RecordDecl's FieldDecls.
Convert to using them. In the meantime, we can improve the generation of
the 'counted_by' FieldDecl's GEP by creating one GEP instead of a series
of GEPs.
|
|
initializer (#101447)
|
|
Renaming to `Disallowed`.
|