aboutsummaryrefslogtreecommitdiff
path: root/llvm
AgeCommit message (Collapse)AuthorFilesLines
2024-05-06Try to use non-volatile registers for `preserve_none` parameters (#88333)Brandt Bucher3-20/+61
This uses non-volatile registers for the first four (six on Windows) registers used for `preserve_none` argument passing. This allows these registers to stay "pinned", even if the body of the `preserve_none` function contains calls to other "normal" functions. Example: ```c void boring(void); __attribute__((preserve_none)) void (continuation)(void *, void *, void *, void *); __attribute__((preserve_none)) void entry(void *a, void *b, void *c, void *d) { boring(); __attribute__((musttail)) return continuation(a, b, c, d); } ``` Before: ```asm pushq %rax movq %rcx, %rbx movq %rdx, %r14 movq %rsi, %r15 movq %rdi, %r12 callq boring@PLT movq %r12, %rdi movq %r15, %rsi movq %r14, %rdx movq %rbx, %rcx popq %rax jmp continuation@PLT ``` After: ```asm pushq %rax callq boring@PLT popq %rax jmp continuation@PLT ```
2024-05-06[AArch64][GlobalISel] Common some shuffle mask functions.David Green7-244/+117
This removes the GISel versions of isREVMask, isTRNMask, isUZPMask and isZipMask. They are combined with the existing versions from SDAG into AArch64PerfectShuffle.h.
2024-05-06[X86] Add slow div64 tuning flag to Nehalem target (#91129)Simon Pilgrim2-1/+2
This appears to have been missed because later cpus don't inherit from Nehalem tuning much. Noticed while cleaning up for #90985
2024-05-06[SLP]Use last pointer instead of first for reversed strided stores.Alexey Bataev2-2/+6
Need to use the last address of the vectorized stores for the strided stores, not the first one, to correctly store the data.
2024-05-07[RISCV] Move RISCVDeadRegisterDefinitions to post vector regalloc (#90636)Luke Lau4-11/+22
Currently RISCVDeadRegisterDefinitions runs after vsetvli insertion, but in #70549 vsetvli insertion runs after vector regalloc and as a result we no longer convert some vsetvli a0, a0s to vsetvli x0, a0. This patch moves it to after vector regalloc, but before scalar regalloc so we still get the benefits of reducing register pressure.
2024-05-06[AArch64][GlobalISel] Addition GISel test coverage for shuffles. NFCDavid Green3-221/+549
2024-05-07[RISCV] Check dead flag on VL def op in RISCVCoalesceVSETVLI. NFC (#91168)Luke Lau1-3/+1
Because LiveVariables has been run, we no longer need to lookup the users in MachineRegisterInfo anymore and can instead just check for the dead flag.
2024-05-06[SLP]Fix PR91025: correctly handle smin/smax of signed operands.Alexey Bataev2-13/+17
Need to check that the signed operand has an extra sign bit to be sure that we do not skip signedness, when trying to minimize bitwidth for smin/smax intrinsics.
2024-05-06[SLP][NFC]Add a test with incorrect smin analysis for minimal bitwidth, NFC.Alexey Bataev1-0/+40
2024-05-06[AggressiveInstCombine] Fix strncmp inlining (#91204)Franklin Zhang2-1/+40
Fix the issue that `char` constants are converted to `uint64_t` in the wrong way when doing the inlining.
2024-05-06Revert "[AIX][CMake] Use top-level tools in llvm_ExternalProject_Add" (#91019)David Tenty1-3/+1
This reverts commit 11066449d49e20f18f46757df07455c6abcedcf1. As noted in the original patch, this was designed to reverted once https://reviews.llvm.org/D142479 and https://reviews.llvm.org/D142660 landed, which has long since happened.
2024-05-06Revert "Reapply "Use an abbrev to reduce size of VALUE_GUID records in ↵Jan Voung17-131/+99
ThinLTO summaries" (#90610)" (#91194) Reverts llvm/llvm-project#90692 Breaking PPC buildbots. The bots are not meant to test LLD, but are running a test that is using an old version of LLD without the change (so is incompatible). Revert until a fix is found.
2024-05-06[LAA] Add tests showing extra unnecessary runtime checks.Florian Hahn1-0/+143
Pre-commit tests for an upcoming patch.
2024-05-06[LAA] Update check line in test to fully match message.Florian Hahn1-1/+1
2024-05-06[X86] Fix -Wunused-function in X86ISelLowering.cpp (NFC)Jie Fu1-1/+1
llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:3582:13: error: unused function 'isBlendOrUndef' [-Werror,-Wunused-function] static bool isBlendOrUndef(ArrayRef<int> Mask) { ^ 1 error generated.
2024-05-06[X86] Fix -Wsign-compare in X86ISelLowering.cpp (NFC)Jie Fu1-1/+1
llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:40081:21: error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare] for (int I = 0; I != NumElts; ++I) { ~ ^ ~~~~~~~ 1 error generated.
2024-05-06[X86] Fold BLEND(PERMUTE(X),PERMUTE(Y)) -> PERMUTE(BLEND(X,Y)) (#90219)Simon Pilgrim18-14152/+12705
If we don't demand the same element from both single source shuffles (permutes), then attempt to blend the sources together first and then perform a merged permute. For vXi16 blends we have to be careful as these are much more likely to involve byte/word vector shuffles that will result in the creation of additional shuffle instructions. This fold might be worth it for VSELECT with constant masks on AVX512 targets, but I haven't investigated this yet, but I've tried to write combineBlendOfPermutes so to be prepared for this. The PR34592 -O0 regression is an unfortunate failure to cleanup with a later pass that calls SimplifyDemandedElts like the -O3 does - I'm not sure how worried we should be tbh.
2024-05-06[SystemZ] Simplify f128 atomic load/store (#90977)Ulrich Weigand5-168/+174
Change definition of expandBitCastI128ToF128 and expandBitCastF128ToI128 to allow for simplified use in atomic load/store. Update logic to split 128-bit loads and stores in DAGCombine to also handle the f128 case where appropriate. This fixes the regressions introduced by recent atomic load/store patches.
2024-05-06[DAG] Fold bitreverse(shl/srl(bitreverse(x),y)) -> srl/shl(x,y) (#89897)Simon Pilgrim4-397/+57
Noticed while investigating GFNI per-element vector shifts (we can form SHL but not SRL/SRA) Alive2: https://alive2.llvm.org/ce/z/fSH-rf
2024-05-06[LoongArch] Rename some OptWInstrs functions. NFCWANG Rui2-28/+32
2024-05-06[LoongArch] Mark data type i32 are sign-extended. NFCWANG Rui1-9/+9
2024-05-06[LoongArch] Optimize *W Instructions at MI level (#90463)hev21-627/+1797
Referring to RISC-V, adding an MI level pass to optimize *W instructions for LoongArch. First it removes unneeded sext(addi.w rd, rs, 0) instructions. Either because the sign extended bits aren't consumed or because the input was already sign extended by an earlier instruction. Then: 1. Unless explicit disabled or the target prefers instructions with W suffix, it removes the -w suffix from opw instructions whenever all users are dependent only on the lower word of the result of the instruction. The cases handled are: * addi.w because it helps reduce test differences between LA32 and LA64 w/o being a pessimization. 2. Or if explicit enabled or the target prefers instructions with W suffix, it adds the W suffix to the instruction whenever all users are dependent only on the lower word of the result of the instruction. The cases handled are: * add.d/addi.d/sub.d/mul.d. * slli.d with imm < 32. * ld.d/ld.wu.
2024-05-06[AMDGPU] don't mark control-flow intrinsics as convergent (#90026)Sameer Sahasrabuddhe17-232/+244
This is really a workaround to allow control flow lowering in the presence of convergence control tokens. Control-flow intrinsics in LLVM IR are convergent because they indirectly represent the wave CFG, i.e., sets of threads that are "converged" or "execute in lock-step". But they exist during a small window in the lowering process, inserted after the structurizer and then translated to equivalent MIR pseudos. So rather than create convergence tokens for these builtins, we simply mark them as not convergent. The corresponding MIR pseudos are marked as having side effects, which is sufficient to prevent optimizations without having to mark them as convergent.
2024-05-06[InstCombine] Fix miscompilation caused by #90436 (#91133)Yingwei Zheng2-0/+75
Proof: https://alive2.llvm.org/ce/z/iRnJ4i Fixes https://github.com/llvm/llvm-project/issues/91127.
2024-05-06Reapply "SystemZ: Fold copy of vector immediate to gr128" (#91099)Matt Arsenault3-0/+264
This reverts commit a415b4dfcc02e3e82b8c8a7836f7c04b9d65dc9b. Modify the instruction in place to transform it into a REG_SEQUENCE, which is what other implementations of foldImmediate do. Also start erasing the def instruction if there are no other uses. Fixes #91110.
2024-05-06[AMDGPU] Fix typo in function nameJay Foad2-4/+4
2024-05-06SystemZ: Remove unnecessary REQUIRES asserts from testsMatt Arsenault3-16/+13
2024-05-06SystemZ: Remove redundant REQUIRES systemz from testMatt Arsenault1-1/+0
2024-05-06Revert "Remove redundant move in return statement" (#91169)Mehdi Amini2-5/+5
Reverts llvm/llvm-project#90546 This broke some bots, seems like some toolchain don’t consider the implicit move here.
2024-05-06Reapply "AMDGPU: Implement llvm.set.rounding (#88587)" series (#91113)Matt Arsenault8-0/+1890
Revert "Revert 4 last AMDGPU commits to unbreak Windows bots" This reverts commit 0d493ed2c6e664849a979b357a606dcd8273b03f. MSVC does not like constexpr on the definition after an extern declaration of a global.
2024-05-06Remove redundant move in return statement (#90546)xiaoleis-nv2-5/+5
This pull request removes unnecessary move in the return statement to suppress compilation warnings. Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>
2024-05-06[RISCV] Use virtual registers for AVL instrs in coalesce-vsetvli.mir. NFCLuke Lau1-7/+11
All GPR registers will still be virtual at this stage, so update the test to reflect that.
2024-05-06[RISCV] Teach .option arch to support experimental extensions. (#89727)Yeting Kuo2-12/+24
Previously `.option arch` denied extenions are not belongs to RISC-V features. But experimental features have experimental- prefix, so `.option arch` can not serve for experimental extension. This patch uses the features of extensions to identify extension existance.
2024-05-06[RISCV] Add RISCVCoalesceVSETVLI tests for removing dead AVLs. NFCLuke Lau1-0/+62
2024-05-05[ADT] Reimplement operator==(StringRef, StringRef) (NFC) (#91139)Kazu Hirata1-1/+5
I'm planning to deprecate and eventually remove StringRef::equals in favor of operator==. This patch reimplements operator== without using StringRef::equals. I'm not sure if there is a good way to make StringRef::compareMemory available to operator==, which is not a member function. "friend" works to some extent but breaks corner cases, which is why I've chosen to "inline" compareMemory.
2024-05-06[X86][FP16] Do not create VBROADCAST_LOAD for f16 without AVX2 (#91125)Phoebe Wang2-1/+40
AVX doesn't provide 16-bit BROADCAST instruction. Fixes #91005
2024-05-05[clang backend] In AArch64's DataLayout, specify a minimum function ↵Doug Wyatt3-7/+18
alignment of 4. (#90702) This addresses an issue where the explicit alignment of 2 (for C++ ABI reasons) was being propagated to the back end and causing under-aligned functions (in special sections). This is an alternate approach suggested by @efriedma-quic in PR #90415. Fixes #90358
2024-05-06[AArch64][SelectionDAG] Lower multiplication by a constant to ↵Allen2-2/+103
shl+sub+shl+sub (#90199) Change the costmodel to lower a = b * C where C = 1 - (1 - 2^m) * 2^n to sub w8, w0, w0, lsl #m sub w0, w0, w8, lsl #n Fix https://github.com/llvm/llvm-project/issues/89430
2024-05-05X86FixupBWInsts: Remove redundant code. NFCFangrui Song1-3/+1
2024-05-05[Target] Use StringRef::operator== instead of StringRef::equals (NFC) ↵Kazu Hirata16-35/+36
(#91072) (#91138) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 38 under llvm/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-05[LAA] Directly pass DepChecker to getSource/getDestination (NFC).Florian Hahn4-13/+16
Instead of passing LoopAccessInfo only to fetch the MemoryDepChecker, directly pass MemoryDepChecker. This simplifies the code and also allows new uses in places where no LAI is available.
2024-05-05[X86] 2008-08-31-EH_RETURN32.ll - regenerate with update_llc_test_checks.pySimon Pilgrim1-19/+41
2024-05-05[X86] bypass-slow-division-64.ll - add udiv+urem test coverageSimon Pilgrim1-21/+198
2024-05-05Revert "[InlineCost] Correct the default branch cost for the switch ↵DianQK4-152/+49
statement (#85160)" This reverts commit 882814edd33cab853859f07b1dd4c4fa1393e0ea.
2024-05-05[InlineCost] Correct the default branch cost for the switch statement (#85160)Quentin Dian4-49/+152
Fixes #81723. The earliest commit of the related code is: https://github.com/llvm/llvm-project/commit/919f9e8d65ada6552b8b8a5ec12ea49db91c922a. I tried to understand the following code with https://github.com/llvm/llvm-project/pull/77856#issuecomment-1993499085. https://github.com/llvm/llvm-project/blob/5932fcc47855fdd209784f38820422d2369b84b2/llvm/lib/Analysis/InlineCost.cpp#L709-L720 I think only scenarios where there is a default branch were considered.
2024-05-05[X86] bypass-slow-division-64.ll - add optsize/minsize testsSimon Pilgrim1-4/+78
Make sure we're not expanding div32-div64 codegen when we're focussed on codesize
2024-05-05[X86] bypass-slow-division-64.ll - extend cpu test coverageSimon Pilgrim1-56/+102
Ensure we test with/without the idivq-to-divl attribute, and test the x86-64-v* cpu levels and some common Intel/AMD cpus
2024-05-05[AArch64][SelectionDAG] Mask for SUBS with multiple users cannot be elided ↵Weihang Fan2-1/+24
(#90911) In DAGCombiner, the `performCONDCombine` function attempts to remove AND instructions in front of SUBS (cmp) instructions for which the AND is transparent. The rules for that are correct, but it fails to take into account the case where the SUBS instruction has multiple users with different condition codes for comparison and simply removes the AND for all of them. This causes a miscompilation in the attached test case.
2024-05-05[X86][EVEX512] Add `HasEVEX512` when `NoVLX` used for 512-bit patterns (#91106)Phoebe Wang3-22/+43
With KNL/KNC being deprecated, we don't need to care about such no VLX cases anymore. We may remove such patterns in the future. Fixes #90844
2024-05-05[VectorCombine] shuffleToIdentity - guard against call instructions.David Green2-2/+91
The shuffleToIdentity fold needs to be a bit more careful about the difference between call instructions and intrinsics. The second can be handled, but the first should result in bailing out. This patch also adds some extra intrinsic tests from #91000. Fixes #91078