aboutsummaryrefslogtreecommitdiff
path: root/llvm
AgeCommit message (Collapse)AuthorFilesLines
2024-04-04[clang][ubsan] Switch UBSAN optimization to ↵Vitaly Buka2-35/+107
`llvm.allow.{runtime,ubsan}.check()` (#84858) Intrinsic introduced with #84850. Intrinsics improves performance by 3% comparing to removing traps (on "test-suite/MultiSource/Benchmarks" with PGO+ThinLTO). The pass will be renamed with #84853. RFC: https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641
2024-04-04[TableGen] Fix a potential crash when operand doesn't appear in the ↵Shilei Tian1-2/+7
instruction pattern (#87663) We have a check of whether an operand is in the instruction pattern, and emit an error if it is not, but we simply continue execution, including directly dereferencing a point-like object `InVal`, which will be just created when accessing the map. It contains a `nullptr` so dereferencing it causes crash. This is a very trivial fix.
2024-04-04[RISCV][GISel] Make register bank selection for unary and binary arithmetic ↵Craig Topper1-17/+28
ops more generic. (#87593) This is inspired by AArch64's getSameKindOfOperandsMapping, but based on what RISC-V currently needs. This removes the special vector case for G_ADD/SUB and unifies integer and FP operations into the same handler. G_SEXTLOAD/ZEXTLOAD have been separated from integer since they should only be scalar integer and never vector.
2024-04-04[LV, VP]VP intrinsics support for the Loop Vectorizer + adding new ↵Alexey Bataev27-69/+2124
tail-folding mode using EVL. (#76172) This patch introduces generating VP intrinsics in the Loop Vectorizer. Currently the Loop Vectorizer supports vector predication in a very limited capacity via tail-folding and masked load/store/gather/scatter intrinsics. However, this does not let architectures with active vector length predication support take advantage of their capabilities. Architectures with general masked predication support also can only take advantage of predication on memory operations. By having a way for the Loop Vectorizer to generate Vector Predication intrinsics, which (will) provide a target-independent way to model predicated vector instructions. These architectures can make better use of their predication capabilities. Our first approach (implemented in this patch) builds on top of the existing tail-folding mechanism in the LV (just adds a new tail-folding mode using EVL), but instead of generating masked intrinsics for memory operations it generates VP intrinsics for loads/stores instructions. The patch adds a new VPlanTransforms to replace the wide header predicate compare with EVL and updates codegen for load/stores to use VP store/load with EVL. Other important part of this approach is how the Explicit Vector Length is computed. (VP intrinsics define this vector length parameter as Explicit Vector Length (EVL)). We use an experimental intrinsic `get_vector_length`, that can be lowered to architecture specific instruction(s) to compute EVL. Also, added a new recipe to emit instructions for computing EVL. Using VPlan in this way will eventually help build and compare VPlans corresponding to different strategies and alternatives. Differential Revision: https://reviews.llvm.org/D99750
2024-04-04[RISCV][NFC] Add isTargetAndroid API in RISCVSubtarget (#87671)Paul Kirth1-0/+1
This is required to set target specific code generation options for Android, like using the TLS slot for the stack protector.
2024-04-04[UBSAN] Remove invalid assert added with #87709Vitaly Buka1-1/+0
2024-04-05[SPARC] Implement L and H inline asm argument modifiers (#87259)Koakuma4-0/+64
This adds support for using the L and H argument modifiers for twinword operands in inline asm code, such as in: ``` %1 = tail call i64 asm sideeffect "rd %pc, ${0:L} ; srlx ${0:L}, 32, ${0:H}", "={o4}"() ``` This is needed by the Linux kernel.
2024-04-04[UBSAN][HWASAN] Remove redundant flags (#87709)Vitaly Buka5-18/+17
Presense of `cutoff-hot` or `random-skip-rate` should be enough to trigger optimization.
2024-04-04[NFC][HWASAN][UBSAN] Remove cl:init from few opts (#87692)Vitaly Buka2-2/+2
They are supposed to be used with `getNumOccurrences`.
2024-04-04[HWASAN][UBSAN] Don't use default `profile-summary-cutoff-hot` (#87691)Vitaly Buka4-22/+9
Default cutoff is not usefull here. Decision to enable or not sanitizer causes more significant performance impact, than a typical optimizations which rely on `profile-summary-cutoff-hot`.
2024-04-04[memprof] Introduce writeMemProf (NFC) (#87698)Kazu Hirata1-76/+142
This patch refactors the serialization of MemProf data to a switch statement style: switch (Version) { case Version0: return ...; case Version1: return ...; } just like IndexedMemProfRecord::serialize. A reasonable amount of code is shared and factored out to helper functions between writeMemProfV0 and writeMemProfV1 to the extent that doens't hamper readability.
2024-04-04Revert "[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region ↵Victor Campos3-189/+0
boundaries" (#87699) Reverts llvm/llvm-project#79173 The testcase fails in non-asserts builds.
2024-04-04[builtin][NFC] Remove ClangBuiltin<"__builtin_allow_ubsan_check"> (#87581)Vitaly Buka1-2/+1
We don't need clang builtin for this one. It was copy pasted from `__builtin_allow_runtime_check` RFC: https://discourse.llvm.org/t/rfc-add-llvm-experimental-hot-intrinsic-or-llvm-hot/77641
2024-04-04[NFC][UBSAN] Similar to #87687 for UBSANVitaly Buka1-64/+64
2024-04-04[NFC][HWASAN] Cleanup opt opt test (#87687)Vitaly Buka1-12/+12
Main change is replacing DEFAULT with HOT99. I'll remove DEFAULT related functionality in the followup patches.
2024-04-04[NFC][HWASAN] Simplify `selectiveInstrumentationShouldSkip` (#87670)Vitaly Buka1-20/+16
2024-04-04[SLP]Fix PR87630: wrong result for externally used vector value.Alexey Bataev2-8/+14
Need to check that the externally used value can be represented with the BitWidth before applying it, otherwise need to keep wider type.
2024-04-04[SLP]Add a test with the incorrect casting for external user, NFC.Alexey Bataev1-0/+64
2024-04-04[AArch64] Fix heuristics for folding "lsl" into load/store ops. (#86894)Eli Friedman14-177/+119
The existing heuristics were assuming that every core behaves like an Apple A7, where any extend/shift costs an extra micro-op... but in reality, nothing else behaves like that. On some older Cortex designs, shifts by 1 or 4 cost extra, but all other shifts/extensions are free. On all other cores, as far as I can tell, all shifts/extensions for integer loads are free (i.e. the same cost as an unshifted load). To reflect this, this patch: - Enables aggressive folding of shifts into loads by default. - Removes the old AddrLSLFast feature, since it applies to everything except A7 (and even if you are explicitly targeting A7, we want to assume extensions are free because the code will almost always run on a newer core). - Adds a new feature AddrLSLSlow14 that applies specifically to the Cortex cores where shifts by 1 or 4 cost extra. I didn't add support for AddrLSLSlow14 on the GlobalISel side because it would require a bunch of refactoring to work correctly. Someone can pick this up as a followup.
2024-04-04[CostModel][X86] Add costkinds test coverage for masked ↵Simon Pilgrim5-16/+7255
load/store/gather/scatter Noticed while starting triage for #87640
2024-04-04[AArch64][PAC][MC][ELF] Support PAuth ABI compatibility tag (#85236)Daniil Kovalev6-12/+127
Depends on #87545 Emit `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` property in `.note.gnu.property` section depending on `aarch64-elf-pauthabi-platform` and `aarch64-elf-pauthabi-version` llvm module flags.
2024-04-04[TextAPI] Reorder addRPath parameters (#87601)Cyndy Ishida4-9/+9
It matches up with other _attribute_ adding member functions and helps simplify InterfaceFile assignment for InstallAPI.
2024-04-04[ValueTracking] Add more conditions in to `isTruePredicate`Noah Goldstein3-64/+77
There is one notable "regression". This patch replaces the bespoke `or disjoint` logic we a direct match. This means we fail some simplification during `instsimplify`. All the cases we fail in `instsimplify` we do handle in `instcombine` as we add `disjoint` flags. Other than that, just some basic cases. See proofs: https://alive2.llvm.org/ce/z/_-g7C8 Closes #86083
2024-04-04[ValueTracking] Add tests for deducing more conditions in `isTruePredicate`; NFCNoah Goldstein2-0/+466
2024-04-04[ValueTracking] Infer known bits fromfrom `(icmp eq (and/or x,y), C)`Noah Goldstein3-20/+25
In `(icmp eq (and x,y), C)` all 1s in `C` must also be set in both `x`/`y`. In `(icmp eq (or x,y), C)` all 0s in `C` must also be set in both `x`/`y`. Closes #87143
2024-04-04[ValueTracking] Add tests for computing known bits from `(icmp eq (and/or ↵Noah Goldstein1-5/+105
x,y), C)`; NFC
2024-04-04[CMake] Install LLVMgold.so for LLVM_INSTALL_TOOLCHAIN_ONLY=on (#87567)Fangrui Song1-1/+1
LLVMgold.so can be used with GNU ar, gold, ld, and nm to process LLVM bitcode files. Install it in LLVM_INSTALL_TOOLCHAIN_ONLY=on builds like we install libLTO.so. Suggested by @emelife Fix #84271
2024-04-04[memprof] Make RecordWriterTrait a non-template class (#87604)Kazu Hirata2-7/+10
commit d89914f30bc7c180fe349a5aa0f03438ae6c20a4 Author: Kazu Hirata <kazu@google.com> Date: Wed Apr 3 21:48:38 2024 -0700 changed RecordWriterTrait to a template class with IndexedVersion as a template parameter. This patch changes the class back to a non-template one while retaining the ability to serialize multiple versions. The reason I changed RecordWriterTrait to a template class was because, even if RecordWriterTrait had IndexedVersion as a member variable, RecordWriterTrait::EmitKeyDataLength, being a static function, would not have access to the variable. Since OnDiskChainedHashTableGenerator calls EmitKeyDataLength as: const std::pair<offset_type, offset_type> &Len = InfoObj.EmitKeyDataLength(Out, I->Key, I->Data); we can make EmitKeyDataLength a member function, but we have one problem. InstrProfWriter::writeImpl calls: void insert(typename Info::key_type_ref Key, typename Info::data_type_ref Data) { Info InfoObj; insert(Key, Data, InfoObj); } which default-constructs RecordWriterTrait without a specific version number. This patch fixes the problem by adjusting InstrProfWriter::writeImpl to call the other form of insert instead: void insert(typename Info::key_type_ref Key, typename Info::data_type_ref Data, Info &InfoObj) To prevent an accidental invocation of the default constructor of RecordWriterTrait, this patch deletes the default constructor.
2024-04-04[gn build] Port fd38366e4525Arthur Eubanks1-1/+0
2024-04-04[gn build] Port 8bb9443333e0Arthur Eubanks1-0/+1
2024-04-04[gn build] Port 3365d6217901Arthur Eubanks1-0/+1
2024-04-04[gn build] Manually port 6f2d8cc0Arthur Eubanks2-0/+2
2024-04-04[gn build] Manually port 1679b27Arthur Eubanks1-1/+1
2024-04-04Revert "[GlobalISel] Fix the infinite loop issue in ↵Gulfem Savrun Yeniceri2-37/+8
`commute_int_constant_to_rhs`" This reverts commit 1f01c580444ea2daef67f95ffc5fde2de5a37cec because combine-commute-int-const-lhs.mir test failed in multiple builders. https://lab.llvm.org/buildbot/#/builders/124/builds/10375 https://luci-milo.appspot.com/ui/p/fuchsia/builders/prod/clang-linux-x64/b8751607530180046481/overview
2024-04-04[llvm-objcopy] Add --compress-sectionsFangrui Song9-8/+278
--compress-sections is similar to --compress-debug-sections but applies to arbitrary sections. * `--compress-sections <section>=none`: decompress sections * `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd Like `--remove-section`, the pattern is by default a glob, but a regex when --regex is specified. For `--remove-section` like options, `!` prevents matches and is not dependent on ordering (see `ELF/wildcard-syntax.test`). Since `--compress-sections a=zlib --compress-sections a=none` naturally allows overriding, having an order-independent `!` would be confusing. Therefore, `!` is disallowed. Sections within a segment are effectively immutable. Report an error for an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable file can be compressed, but linkers usually reject them. Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674 Pull Request: https://github.com/llvm/llvm-project/pull/85036
2024-04-04[APInt] Remove multiplicativeInverse with explicit modulus (#87644)Jay Foad3-67/+4
All callers have been changed to use the new simpler overload with an implicit modulus of 2^BitWidth. The old form was never used or tested with non-power-of-two modulus anyway.
2024-04-04[CostModel][X86] Update AVX1 sext v4i1 -> v4i64 cost based off worst case ↵Simon Pilgrim3-4/+4
llvm-mca numbers We were using raw instruction count which overestimated the costs for #67803
2024-04-04[X86] Rename Zn3FPP# ports -> Zn3FP#. NFCSimon Pilgrim61-967/+967
Matches Zn4FP# (which is mostly a copy) and avoids an issue in llvm-exegesis which is terrible at choosing the right portname when they have aliases.
2024-04-04[APInt] Add a simpler overload of multiplicativeInverse (#87610)Jay Foad6-31/+27
The current APInt::multiplicativeInverse takes a modulus which can be any value, but all in-tree callers use a power of two. Moreover, most callers want to use two to the power of the width of an existing APInt, which is awkward because 2^N is not representable as an N-bit APInt. Add a new overload of multiplicativeInverse which implicitly uses 2^BitWidth as the modulus.
2024-04-04[X86] Add or_is_add patterns for INC. (#87584)Craig Topper4-7/+15
Should fix the cases noted in #86857
2024-04-04[DAG] Preserve NUW when reassociating (#87621)Piotr Sobczak7-8294/+6077
Similarly to the generic case below, preserve the NUW flag when reassociating adds with constants.
2024-04-04[X86] evex-to-vex-compress.mir - update test checks missed in #87636Simon Pilgrim1-16/+16
2024-04-04[X86] Add missing immediate qualifier to the (V)ROUND instructions (#87636)Simon Pilgrim5-103/+103
Makes it easier to algorithmically recreate the instruction name in various analysis scripts I'm working on
2024-04-04[X86] Haswell/Broadwell - fix (V)ROUND*ri sched behaviours to use 2*Port1Simon Pilgrim6-54/+49
We were only using the Port23 memory ports and were missing the 2*Port1 uops entirely. Confirmed by Agner + uops.info/uica
2024-04-04AMDGPULowerBufferFatPointers.cpp - fix Wunused-variable warning. NFC.Simon Pilgrim1-1/+1
2024-04-04AMDGPULowerBufferFatPointers.cpp - fix Wparentheses warning. NFC.Simon Pilgrim1-2/+2
2024-04-04[SEH] Ignore EH pad check for internal intrinsics (#79694)Phoebe Wang2-0/+53
Intrinsics like @llvm.seh.scope.begin and @llvm.seh.scope.end which do not throw do not need funclets in catchpads or cleanuppads. Fixes #69428 Co-authored-by: Robert Cox <robert.cox@intel.com> --------- Co-authored-by: Robert Cox <robert.cox@intel.com>
2024-04-04[LLD][COFF] Use getMachineArchType in LinkerDriver::getArch. (#87499)Jacek Caban1-0/+4
Adds support for ARM64EC, which should use the same search paths as ARM64. It's similar to #87370 and #87495. The test is based on the existing x86 test. Generally ARM64EC libraries are shipped together with native ARM64 libraries (using ECSYMBOLS section mechanism). getMachineArchType uses Triple::thumb, while the existing implementation uses Triple::arm. It's ultimately passed to MSVCPaths.cpp functions, so modify them to accept both forms.
2024-04-04[ARM][Thumb2] Mark BTI-clearing instructions as scheduling region boundaries ↵Victor Campos3-0/+189
(#79173) Following https://github.com/llvm/llvm-project/pull/68313 this patch extends the idea to M-profile PACBTI. The Machine Scheduler can reorder instructions within a scheduling region depending on the scheduling policy set. If a BTI-clearing instruction happens to partake in one such region, it might be moved around, therefore ending up where it shouldn't. The solution is to mark all BTI-clearing instructions as scheduling region boundaries. This essentially means that they must not be part of any scheduling region, and as consequence never get moved: - PAC - PACBTI - BTI - SG Note that PAC isn't BTI-clearing, but it's replaced by PACBTI late in the compilation pipeline. As far as I know, currently it isn't possible to organically obtain code that's susceptible to the bug: - Instructions that write to SP are region boundaries. PAC seems to always be followed by the pushing of r12 to the stack, so essentially PAC is always by itself in a scheduling region. - CALL_BTI is expanded into a machine instruction bundle. Bundles are unpacked only after the last machine scheduler run. Thus setjmp and BTI can be separated only if someone deliberately run the scheduler once more. - The BTI insertion pass is run late in the pipeline, only after the last machine scheduling has run. So once again it can be reordered only if someone deliberately runs the scheduler again. Nevertheless, one can reasonably argue that we should prevent the bug in spite of the compiler not being able to produce the required conditions for it. If things change, the compiler will be robust against this issue. The tests written for this are contrived: bogus MIR instructions have been added adjacent to the BTI-clearing instructions in order to have them inside non-trivial scheduling regions.
2024-04-04[CostModel][X86] Update AVX1 sext v8i1 -> v8i32 cost based off worst case ↵Simon Pilgrim3-6/+6
llvm-mca numbers We were using raw instruction count which overestimated the costs for #67803