Age | Commit message (Collapse) | Author | Files | Lines |
|
This uses non-volatile registers for the first four (six on Windows)
registers used for `preserve_none` argument passing. This allows these
registers to stay "pinned", even if the body of the `preserve_none`
function contains calls to other "normal" functions.
Example:
```c
void boring(void);
__attribute__((preserve_none)) void (continuation)(void *, void *, void *, void *);
__attribute__((preserve_none)) void entry(void *a, void *b, void *c, void *d)
{
boring();
__attribute__((musttail)) return continuation(a, b, c, d);
}
```
Before:
```asm
pushq %rax
movq %rcx, %rbx
movq %rdx, %r14
movq %rsi, %r15
movq %rdi, %r12
callq boring@PLT
movq %r12, %rdi
movq %r15, %rsi
movq %r14, %rdx
movq %rbx, %rcx
popq %rax
jmp continuation@PLT
```
After:
```asm
pushq %rax
callq boring@PLT
popq %rax
jmp continuation@PLT
```
|
|
This removes the GISel versions of isREVMask, isTRNMask, isUZPMask and
isZipMask. They are combined with the existing versions from SDAG into
AArch64PerfectShuffle.h.
|
|
This appears to have been missed because later cpus don't inherit from Nehalem tuning much.
Noticed while cleaning up for #90985
|
|
Need to use the last address of the vectorized stores for the strided
stores, not the first one, to correctly store the data.
|
|
Currently RISCVDeadRegisterDefinitions runs after vsetvli insertion, but
in #70549 vsetvli insertion runs after vector regalloc and as a result
we no longer convert some vsetvli a0, a0s to vsetvli x0, a0. This patch
moves it to after vector regalloc, but before scalar regalloc so we
still get the benefits of reducing register pressure.
|
|
|
|
Because LiveVariables has been run, we no longer need to lookup the
users in MachineRegisterInfo anymore and can instead just check for the
dead flag.
|
|
Need to check that the signed operand has an extra sign bit to be sure
that we do not skip signedness, when trying to minimize bitwidth for
smin/smax intrinsics.
|
|
|
|
Fix the issue that `char` constants are converted to `uint64_t` in the
wrong way when doing the inlining.
|
|
This reverts commit 11066449d49e20f18f46757df07455c6abcedcf1.
As noted in the original patch, this was designed to reverted once
https://reviews.llvm.org/D142479 and https://reviews.llvm.org/D142660
landed, which has long since happened.
|
|
ThinLTO summaries" (#90610)" (#91194)
Reverts llvm/llvm-project#90692
Breaking PPC buildbots. The bots are not meant to test LLD, but are
running a test that is using an old version of LLD without the change
(so is incompatible). Revert until a fix is found.
|
|
Pre-commit tests for an upcoming patch.
|
|
|
|
llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:3582:13:
error: unused function 'isBlendOrUndef' [-Werror,-Wunused-function]
static bool isBlendOrUndef(ArrayRef<int> Mask) {
^
1 error generated.
|
|
llvm-project/llvm/lib/Target/X86/X86ISelLowering.cpp:40081:21:
error: comparison of integers of different signs: 'int' and 'unsigned int' [-Werror,-Wsign-compare]
for (int I = 0; I != NumElts; ++I) {
~ ^ ~~~~~~~
1 error generated.
|
|
If we don't demand the same element from both single source shuffles (permutes), then attempt to blend the sources together first and then perform a merged permute.
For vXi16 blends we have to be careful as these are much more likely to involve byte/word vector shuffles that will result in the creation of additional shuffle instructions.
This fold might be worth it for VSELECT with constant masks on AVX512 targets, but I haven't investigated this yet, but I've tried to write combineBlendOfPermutes so to be prepared for this.
The PR34592 -O0 regression is an unfortunate failure to cleanup with a later pass that calls SimplifyDemandedElts like the -O3 does - I'm not sure how worried we should be tbh.
|
|
Change definition of expandBitCastI128ToF128 and expandBitCastF128ToI128
to allow for simplified use in atomic load/store.
Update logic to split 128-bit loads and stores in DAGCombine to also
handle the f128 case where appropriate. This fixes the regressions
introduced by recent atomic load/store patches.
|
|
Noticed while investigating GFNI per-element vector shifts (we can form SHL but not SRL/SRA)
Alive2: https://alive2.llvm.org/ce/z/fSH-rf
|
|
|
|
|
|
Referring to RISC-V, adding an MI level pass to optimize *W instructions
for LoongArch.
First it removes unneeded sext(addi.w rd, rs, 0) instructions. Either
because the sign extended bits aren't consumed or because the input was
already sign extended by an earlier instruction.
Then:
1. Unless explicit disabled or the target prefers instructions with W
suffix, it removes the -w suffix from opw instructions whenever all
users are dependent only on the lower word of the result of the
instruction. The cases handled are:
* addi.w because it helps reduce test differences between LA32 and LA64
w/o being a pessimization.
2. Or if explicit enabled or the target prefers instructions with W
suffix, it adds the W suffix to the instruction whenever all users are
dependent only on the lower word of the result of the instruction. The
cases handled are:
* add.d/addi.d/sub.d/mul.d.
* slli.d with imm < 32.
* ld.d/ld.wu.
|
|
This is really a workaround to allow control flow lowering in the
presence of convergence control tokens. Control-flow intrinsics in LLVM
IR are convergent because they indirectly represent the wave CFG, i.e.,
sets of threads that are "converged" or "execute in lock-step". But they
exist during a small window in the lowering process, inserted after the
structurizer and then translated to equivalent MIR pseudos. So rather
than create convergence tokens for these builtins, we simply mark them
as not convergent.
The corresponding MIR pseudos are marked as having side effects, which
is sufficient to prevent optimizations without having to mark them as
convergent.
|
|
Proof: https://alive2.llvm.org/ce/z/iRnJ4i
Fixes https://github.com/llvm/llvm-project/issues/91127.
|
|
This reverts commit a415b4dfcc02e3e82b8c8a7836f7c04b9d65dc9b.
Modify the instruction in place to transform it into a REG_SEQUENCE,
which is what other implementations of foldImmediate do. Also start
erasing the def instruction if there are no other uses.
Fixes #91110.
|
|
|
|
|
|
|
|
Reverts llvm/llvm-project#90546
This broke some bots, seems like some toolchain don’t consider the
implicit move here.
|
|
Revert "Revert 4 last AMDGPU commits to unbreak Windows bots"
This reverts commit 0d493ed2c6e664849a979b357a606dcd8273b03f.
MSVC does not like constexpr on the definition after an extern
declaration of a global.
|
|
This pull request removes unnecessary move in the return statement to
suppress compilation warnings.
Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>
|
|
All GPR registers will still be virtual at this stage, so update the test
to reflect that.
|
|
Previously `.option arch` denied extenions are not belongs to RISC-V
features. But experimental features have experimental- prefix, so
`.option arch` can not serve for experimental extension.
This patch uses the features of extensions to identify extension
existance.
|
|
|
|
I'm planning to deprecate and eventually remove StringRef::equals in
favor of operator==. This patch reimplements operator== without using
StringRef::equals.
I'm not sure if there is a good way to make StringRef::compareMemory
available to operator==, which is not a member function. "friend"
works to some extent but breaks corner cases, which is why I've chosen
to "inline" compareMemory.
|
|
AVX doesn't provide 16-bit BROADCAST instruction.
Fixes #91005
|
|
alignment of 4. (#90702)
This addresses an issue where the explicit alignment of 2 (for C++ ABI
reasons) was being propagated to the back end and causing under-aligned
functions (in special sections).
This is an alternate approach suggested by @efriedma-quic in PR #90415.
Fixes #90358
|
|
shl+sub+shl+sub (#90199)
Change the costmodel to lower a = b * C where C = 1 - (1 - 2^m) * 2^n to
sub w8, w0, w0, lsl #m
sub w0, w0, w8, lsl #n
Fix https://github.com/llvm/llvm-project/issues/89430
|
|
|
|
(#91072) (#91138)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
38 under llvm/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
|
|
Instead of passing LoopAccessInfo only to fetch the MemoryDepChecker,
directly pass MemoryDepChecker. This simplifies the code and also allows
new uses in places where no LAI is available.
|
|
|
|
|
|
statement (#85160)"
This reverts commit 882814edd33cab853859f07b1dd4c4fa1393e0ea.
|
|
Fixes #81723.
The earliest commit of the related code is:
https://github.com/llvm/llvm-project/commit/919f9e8d65ada6552b8b8a5ec12ea49db91c922a.
I tried to understand the following code with
https://github.com/llvm/llvm-project/pull/77856#issuecomment-1993499085.
https://github.com/llvm/llvm-project/blob/5932fcc47855fdd209784f38820422d2369b84b2/llvm/lib/Analysis/InlineCost.cpp#L709-L720
I think only scenarios where there is a default branch were considered.
|
|
Make sure we're not expanding div32-div64 codegen when we're focussed on codesize
|
|
Ensure we test with/without the idivq-to-divl attribute, and test the x86-64-v* cpu levels and some common Intel/AMD cpus
|
|
(#90911)
In DAGCombiner, the `performCONDCombine` function attempts to remove AND
instructions in front of SUBS (cmp) instructions for which the AND is
transparent. The rules for that are correct, but it fails to take into
account the case where the SUBS instruction has multiple users with
different condition codes for comparison and simply removes the AND for
all of them. This causes a miscompilation in the attached test case.
|
|
With KNL/KNC being deprecated, we don't need to care about such no VLX
cases anymore. We may remove such patterns in the future.
Fixes #90844
|
|
The shuffleToIdentity fold needs to be a bit more careful about the difference
between call instructions and intrinsics. The second can be handled, but the
first should result in bailing out. This patch also adds some extra intrinsic
tests from #91000.
Fixes #91078
|