aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-08-02[AMDGPU][SILoadStoreOptimizer] Include constrained buffer load variantsusers/cdevadas/combine-constrained-buffer-loadsChristudasan Devadasan3-82/+613
Use the constrained buffer load opcodes while combining under-aligned load for XNACK enabled subtargets.
2024-08-02[AMDGPU] Auto-generate lit pattern for test ↵Christudasan Devadasan1-34/+231
CodeGen/AMDGPU/merge-sbuffer-load.mir.
2024-08-01[Driver] Include crt0.o in the baremetal link (#101258)Petr Hosek4-0/+13
The common baremetal libc implementations already provide crt0.o and GCC automatically links it so this improves parity.
2024-08-01[CMake][Fuchsia] Use standard spelling for Arm baremetal targets (#101302)Petr Hosek1-2/+2
It's more common to use `none` rather than `unknown` for the OS component in Arm baremetal targets.
2024-08-02Revert "[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new ↵Phoebe Wang49-1442/+43
instructions" (#101612) Reverts llvm/llvm-project#101452 There are several buildbot failed. Revert first.
2024-08-01[clang-format] Fix a misannotation of PointerOrReference (#101291)Owen Pan2-14/+21
Fixes #101138.
2024-08-01[TableGen] Use std::move. NFCCraig Topper1-1/+1
Fixes #101408.
2024-08-02[X86][AVX10.2] Support AVX10.2 option and VMPSADBW/VADDP[D,H,S] new ↵Phoebe Wang49-43/+1442
instructions (#101452) Ref.: https://cdrdv2.intel.com/v1/dl/getContent/828965
2024-08-02[HLSL] cleanup builtin names elementwise usage (#101543)Farzon Lotfi7-90/+105
Remove elementwise description for builtins that don't perform elementwise operations.
2024-08-02[SPARC][IAS] Add v8plus feature bit (#101367)Koakuma9-16/+53
Implement handling for `v8plus` feature bit to allow the user to switch between V8 and V8+ mode with 32-bit code. Currently this only sets the appropriate ELF machine type and flags; codegen changes will be done in future patches. This is done as a prerequisite for `-mv8plus` flag on clang (#98713).
2024-08-02[LoongArch] Align stack objects passed to memory intrinsics (#101309)hev3-117/+39
Memcpy, and other memory intrinsics, typically try to use wider load/store if the source and destination addresses are aligned. In CodeGenPrepare, look for calls to memory intrinsics and, if the object is on the stack, align it to 4-byte (32-bit) or 8-byte (64-bit) boundaries if it is large enough that we expect memcpy to use wider load/store instructions to copy it. Fixes #101295
2024-08-01[RISCV] Use Zvhmin instead of Zvfh on RUN lines for some intrinsic tests. ↵Craig Topper544-3981/+3835
NFC (#101540) Loads/stores/reinterpret/vfncvt.f.f.w/vfwcvt.f.f.v/vmerge/vmv.v.v are all expected to work for f16 vectors with Zvfhmin. Remove the handcrafted Zvfhmin test that partially tested this. Splits the vfwcvt.f.f.v and vfncvt.f.f.w tests into their own file so we can have a separate RUN line from the float<->int conversions.
2024-08-01[Attributor] Use `getPointerAddressSpace` to replace a cast followed by a ↵Shilei Tian1-2/+1
`getAddressSpace`
2024-08-01Fix attr-nomerge.cpp with fixed tripleZequan Wu1-1/+1
2024-08-01[Attributor] Indicate optimistic fixed point if an instruction already has ↵Shilei Tian2-2/+7
non-zero address space (#101589)
2024-08-01[mlir][spirv] Add definitions and (de)serialization for FPRoundingMode (#101546)Andrea Faulds4-0/+42
2024-08-02[VPlan][NFC] Make VPValue pointer const. (#101334)Mel Chen2-4/+4
2024-08-02[mlir][bufferization] Improve performance of DropEquivalentBufferResultsPass ↵Longsheng Mou1-6/+10
(#101281) By using DenseMap to minimize the traveral time of callOps, and the efficiency of running this pass has been greatly improved.
2024-08-02[X86_32][C++] fix 0 sized struct case in vaarg. (#86388)Longsheng Mou3-1/+27
struct SuperEmpty { struct{ int a[0];} b;}; Such 0 sized structs in c++ mode can not be ignored in i386 for that c++ fields are never empty.But when EmitVAArg, its size is 0, so that va_list not increase.Maybe we can just Ignore this kind of arguments, like X86_64 did. Fix #86385.
2024-08-01[test] Fix attr-nomerge.cpp after ae6dc64ec670891cb15049277e43133d4df7fb4bFangrui Song1-10/+10
2024-08-01[Bazel] Port f3bfc56327df821801caa4ae20995f67f8589a19Fangrui Song1-0/+1
2024-08-01[lldb] Change Module to have a concrete UnwindTable, update (#101130)Jason Molenda4-74/+41
Currently a Module has a std::optional<UnwindTable> which is created when the UnwindTable is requested from outside the Module. The idea is to delay its creation until the Module has an ObjectFile initialized, which will have been done by the time we're doing an unwind. However, Module::GetUnwindTable wasn't doing any locking, so it was possible for two threads to ask for the UnwindTable for the first time, one would be created and returned while another thread would create one, destroy the first in the process of emplacing it. It was an uncommon crash, but it was possible. Grabbing the Module's mutex would be one way to address it, but when loading ELF binaries, we start creating the SymbolTable on one thread (ObjectFileELF) grabbing the Module's mutex, and then spin up worker threads to parse the individual DWARF compilation units, which then try to also get the UnwindTable and deadlock if they try to get the Module's mutex. This changes Module to have a concrete UnwindTable as an ivar, and when it adds an ObjectFile or SymbolFileVendor, it will call the Update method on it, which will re-evaluate which sections exist in the ObjectFile/SymbolFile. UnwindTable used to have an Initialize method which set all the sections, and an Update method which would set some of them if they weren't set. I unified these with the Initialize method taking a `force` option to re-initialize the section pointers even if they had been done already before. This is addressing a rare crash report we've received, and also a failure Adrian spotted on the -fsanitize=address CI bot last week, it's still uncommon with ASAN but it can happen with the standard testsuite. rdar://128876433
2024-08-01[asan] Avoid global ~DenseMap()Vitaly Buka1-4/+14
Follow up to #100923
2024-08-01[M68k] Fix compilation pipeline checkMichael Liao1-1/+0
- After 'lowerConstantIntrinsics' is merged into pre-isel lowering
2024-08-01[SandboxIR] Implement UnaryInstruction class (#101541)vporpo2-20/+67
This patch implements sandboxir::UnaryInstruction class and updates sandboxir::LoadInst and sandboxir::CastInst to inherit from it instead of sandboxir::Instruction.
2024-08-01Add a tutorial on mlir-opt (#96105)Jeremy Kun7-1/+416
This tutorial gives an introduction to the `mlir-opt` tool, focusing on how to run basic passes with and without options, run pass pipelines from the CLI, and point out particularly useful flags. --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com> Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
2024-08-02[mlir][emitc] Fix EmitC dialect's operations' descriptions (#101523)Andrey Timonin1-40/+40
- Added the dialect's prefix to operations' descriptions to follow the same style inside the TableGen file. - Minor changes in the 'emitc.yield' operation's description.
2024-08-01[libc] created tan function fuzzer (#101570)RoseZhang034-2/+51
Also edited file header formatting on sin_fuz and cos_fuzz
2024-08-01Add support for verifying local type units in .debug_names. (#101133)Greg Clayton4-15/+264
This patch adds support for verifying local type units in .debug_names section. It adds a test to test if the TU index is valid, and a test that tests that an error is found inside the name entry for a type unit. We don't need to test all other errors in the name entry because these are essentially identical to compile unit entries, they just use a different DWARF unit offset index.
2024-08-01Fix codegen of consteval functions returning an empty class, and related ↵Eli Friedman15-287/+320
issues (#93115) Fix codegen of consteval functions returning an empty class, and related issues If a class is empty, don't store it to memory: the store might overwrite useful data. Similarly, if a class has tail padding that might overlap other fields, don't store the tail padding to memory. The problem here turned out a bit more general than I initially thought: basically all uses of EmitAggregateStore were broken. Call lowering had a method that did mostly the right thing, though: CreateCoercedStore. Adapt CreateCoercedStore so it always does the conservatively right thing, and use it for both calls and ConstantExpr. Also, along the way, fix the "overlap" bit in AggValueSlot: the bit was set incorrectly for empty classes in some cases. Fixes #93040.
2024-08-01Reapply "[Clang] Fix nomerge attribute not working with __builtin_trap(), ↵Zequan Wu5-7/+111
__debugbreak(), __builtin_verbose_trap() (#101549)" This reverts commit 667598d84b16d1789ce90b231565e9e7bfdbe77d and fixes failed tests: llvm/test/CodeGen/X86/nomerge.ll and llvm/test/MC/AArch64/local-bounds-single-trap.ll.
2024-08-01[Offload][OpenMP] Prettify error messages by "demangling" the kernel name ↵Johannes Doerfert15-24/+145
(#101400) The kernel names for OpenMP are manually mangled and not ideal when we report something to the user. We demangle them now, providing the function and line number of the target region, together with the actual kernel name.
2024-08-01[libc][math][C23] removing daddl from arm32 (#101567)aaryanshukla1-1/+0
2024-08-01[libc] added cos function fuzzing test (#101556)RoseZhang033-4/+53
2024-08-01Revert "[Clang] Fix nomerge attribute not working with __builtin_trap(), ↵Haowei Wu4-104/+2
__debugbreak(), __builtin_verbose_trap() (#101549)" This reverts commit 5e84646982d1ec9bc94e48dde4b47f03c044a156, which broke 'nomerge.ll' test on llvm bots.
2024-08-01[libc++] Add status page consistency change to git-blame-ignore-revsLouis Dionne1-0/+3
To avoid breaking searchability of when a paper was implemented.
2024-08-01[libc++][NFC] Fix inconsistent quoting and spacing in our CSV filesLouis Dionne2-156/+156
There were a few places where we didn't properly quote entries in the CSV status pages, or where we followed inconsistent spacing. This causes issue when trying to synchronize status pages with Github issues.
2024-08-01[libc++] Improve code gen for string's operator== (#100926)Nikolas Klauser1-5/+11
If the string is too long for a short string, we can simply check for the long bit. If that's false we can do an early return. This improves the code gen slightly.
2024-08-01Simplify hot-path size computations in BumpPtrAllocator. (#101467)Owen Anderson1-8/+10
~0.1% instruction count improvements https://llvm-compile-time-tracker.com/compare.php?from=07d2709a17860a202d91781769a88837e4fb5f2a&to=d5cc47831ecd9f0a2b164b16da67f74b94e9aafc&stat=instructions:u
2024-08-01[asan] Speed up ASan ODR indicator-based checking (#100923)Artem Pianykh2-12/+95
**Summary**: When ASan checks for a potential ODR violation on a global it loops over a linked list of all globals to find those with the matching value of an indicator. With the default setting 'detect_odr_violation=1', ASan doesn't report violations on same-size globals but it still has to traverse the list. For larger binaries with a ton of shared libs and globals (and a non-trivial volume of same-sized duplicates) this gets extremely expensive. This patch adds an indicator indexed (multi-)map of globals to speed up the search. > Note: asan used to use a map to store globals a while ago which was replaced with a list when the codebase [moved off of STL](https://github.com/llvm/llvm-project/commit/e4bada2c946e5399fc37bd67421de01c0047ad38). Internally we see many examples where ODR checking takes *seconds* (even double digits). With this patch it's practically free and `__asan_register_globals` doesn't show up prominently in the perf profile anymore. There are several high-level questions: 1. I understand that the intent is that we hit the slow path rarely, ideally once before the process dies with an error. But in practice we hit the slow path a lot. It feels reasonable to keep the amount of work bounded even in the worst case, even if it requires a bit of extra memory. But if not, it'd be great to learn about the tradeoffs. 2. Poisoning based ODR checking remains on the slow path. Internally we build everything with `-fsanitize-address-use-odr-indicator` so I'm not sure if poisoning-based check would exhibit the same behavior (looking at the code, the shape looks very similar, so it might?). 3. Globals with an ODR indicator of `-1` need to be skipped for the purposes of ODR checking (cf. https://github.com/llvm/llvm-project/commit/a257639a6935a2c63377784c5c9c3b73864a2582). But they are still getting added to the list of globals and hence take up space and slow down the iteration over the list of globals. It would be a good saving if we could avoid adding them to the globals list. 4. Any reason to use a linked list instead of e.g. a vector to store globals? **Test Plan**: * `cmake --build build --target check-asan` looks good * Perf-wise things look good when linking against this version of compiler-rt. --------- Co-authored-by: Vitaly Buka <vitalybuka@google.com>
2024-08-01[libc][math][c23] Add dadd{l,f128} and ddiv{l,f128} C23 math functions (#100456)aaryanshukla24-2/+282
- fadd removed because I need to add for different input types - finishing rest of basic operations - noticed duplicates will remove --------- Co-authored-by: OverMighty <its.overmighty@gmail.com>
2024-08-01[libc] Fix 'vasprintf' not working in non-fullbuild modeJoseph Huber2-13/+14
2024-08-01[SCEV] Prove no-self-wrap from negative power of two step (#101416)Philip Reames3-21/+28
We have existing code which reasons about a step evenly dividing the iteration space is a finite loop with a single exit implying no-self-wrap. The sign of the step doesn't effect this. --------- Co-authored-by: Nikita Popov <github@npopov.com>
2024-08-01[flang][runtime] Added missing RT_API_ATTRS. (#101536)Slava Zakharin1-4/+4
2024-08-01[Clang] Fix nomerge attribute not working with __builtin_trap(), ↵Zequan Wu4-2/+104
__debugbreak(), __builtin_verbose_trap() (#101549) 1. It fixes the problem that llvm.trap() not getting the nomerge attribute. 2. It sets nomerge flag for the node if the instruction has nomerge arrtibute. This is a copy of https://reviews.llvm.org/D146164. This only attempts to fix `nomerge` for `__builtin_trap()`, `__debugbreak()`, `__builtin_verbose_trap()`, not working for non-trap builtins. Fixes #53011
2024-08-01[libc++] Revert "Check correctly ref-qualified __is_callable in algorithms ↵Louis Dionne23-227/+84
(#73451)" This reverts commit 8d151f804ff43aaed1edf810bb2a07607b8bba14, which broke some build bots. I think that is caused by an invalid argument order when checking __is_comparable in upper_bound.
2024-08-01[flang] Add allocator_idx attribute on fir.embox and fircg.ext_embox (#101212)Valentin Clement (バレンタイン クレメン)8-14/+66
#100690 introduces allocator registry with the ability to store allocator index in the descriptor. This patch adds an attribute to fir.embox and fircg.ext_embox to be able to set the allocator index while populating the descriptor fields.
2024-08-01[Clang][NFC] Improve generation of GEP and RecordDecl loop (#101434)Bill Wendling4-29/+28
As with other loops, we need only look at a RecordDecl's FieldDecls. Convert to using them. In the meantime, we can improve the generation of the 'counted_by' FieldDecl's GEP by creating one GEP instead of a series of GEPs.
2024-08-01[clang] fix classification of a string literal expression used as ↵Matheus Izvekov3-7/+51
initializer (#101447)
2024-08-01[clang-format] Rename variable more sensitively (#100943)Nathan Sidwell1-2/+2
Renaming to `Disallowed`.