Age | Commit message (Collapse) | Author | Files | Lines |
|
When building with latest MSVC on Windows, this fixes some compile-time
warnings from last week's integration in
https://github.com/llvm/llvm-project/pull/157885:
```
[321/5941] Building CXX object lib\Support\LSP\CMakeFiles\LLVMSupportLSP.dir\Transport.cpp.obj
C:\git\llvm-project\llvm\lib\Support\LSP\Transport.cpp(123): warning C4930: 'std::lock_guard<std::mutex> responseHandlersLock(llvm::lsp::MessageHandler::ResponseHandlerTy)': prototyped function not called (was a variable definition intended?)
[384/5941] Building CXX object unittests\Support\LSP\CMakeFiles\LLVMSupportLSPTests.dir\Transport.cpp.obj
C:\git\llvm-project\llvm\unittests\Support\LSP\Transport.cpp(190): warning C4804: '+=': unsafe use of type 'bool' in operation
```
|
|
Make the actual use context less ugly.
|
|
If a COPY uses Reg but only in an implicit operand then the new
implementation ignores it but the old implementation would have treated
it as a copy of Reg. Probably this case never occurs in practice. Other
than that, this patch is NFC.
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
Relative lookup tables result in "LLVM ERROR: Circular dependency found
in global variable set", so disable them for this target.
|
|
This commit custom legalize `ConstantFP` using code sequence
rather than simpily loading the fp values from constant pool.
A new option (`-loongarch-materialize-float-imm=<enum>`) is
added to set the maximum number of instructions (including
code sequence to generate the value and moving the value to
FPR) alllowed to be used when materializing floating-point
immediates.
The default value of the option is set to `3` on both LA32 and
LA64. Which means:
- For `f32` on both LA32 and LA64: `2 insts + movgr2fr.w`;
(will cover all `f32` values)
- For `f64` on LA64: `2 insts + movgr2fr.d`;
- For `f64` on LA32: `1 inst + movgr2fr.w + movgr2frh.w`.
(same inst latency as using constant pool)
The option can be set in range `0,2-6`. (6 behaves same
as 5 on LA64.)
|
|
computeKnownBits/ComputeNumSignBits (#159692)
Resolves #158332
|
|
X86ISD::VPMADD52L\H handling - again (#159230)
FIX #155386
|
|
Added extra check in AArch64RegisterBankInfo.cpp to mark
llvm.aarch64.sisd.fcvtxn as having floating point operands
|
|
|
|
Fixes [157370](https://github.com/llvm/llvm-project/issues/157370)
UREM General proof: https://alive2.llvm.org/ce/z/b_GQJX
SREM General proof: https://alive2.llvm.org/ce/z/Whkaxh
I have added it as rv32i and rv64i tests because they are the only architectures where I could verify that it works.
|
|
|
|
(#154918)
Note: Only worse for `v8i32/v8f32/v4i64/v4f64` types when the high
part only has one non-undef element. Skip spliting to avoid this.
|
|
As far as I can tell this load pattern will not perform anything as it
could only trigger from a i32 MemVT extended to a i32.
|
|
inserting per element (#154533)
|
|
vec-regs (#159568)
Fixes #159529
|
|
https://sourceware.org/bugzilla/show_bug.cgi?id=21120
|
|
(#159765)
Fixes #159571
|
|
RISCVMatInt. NFC (#159864)
I think this better reflects the intent of modification. In all these
places we know bit 31 is 1 so we are sign extending.
|
|
NFC (#159965)
|
|
|
|
Without this patch, we compute a type trait in a roundabout manner:
- Compute a boolean value in the primary template.
- Pass the value to std::enable_if_t.
- Return std::true_type (or std::false_type on the fallback path).
- Compare the return type to std::true_type.
That is, when the expression for the first boolean value above is well
formed, we already have the answer we are looking for.
This patch bypasses the entire sequence by having the primary template
return std::bool_constant and adjusting RESULT to extract the ::value
of the boolean type.
|
|
I find it very confusing that we have two different kinds of
"immediates":
- MCOperands in the backend that are `isImm()` which can only be numbers
- RISCVOperands in the parser that are `isImm()` which can contain
expressions
This change aims to make it clearer that in the AsmParser, we are
dealing with expressions, rather than just numbers.
Unfortunately, `isImm` comes from the `MCParsedAsmOperand`, which is
needed for Microsoft Inline Asm, so we cannot fully get rid of it.
|
|
Fixes #159912
|
|
This ensures each scalarized member has an accurate cost, matching the
cost it would have if it would not have been considered for an
interleave group.
|
|
When building an assert-enabled target, silence the following:
```
C:\git\llvm-project\llvm\include\llvm/Analysis/DependenceAnalysis.h(290): warning C4018: '<=': signed/unsigned mismatch
```
|
|
For UDiv/SDiv with invariant divisors, the created selects will be
hoisted out. Don't compute their cost for each iteration, to match the
more accurate VPlan-based cost modeling.
Fixes https://github.com/llvm/llvm-project/issues/159402.
|
|
|
|
The code is taken from `SelectionDAG::computeKnownBits`.
This ticks off ABS from #150515
|
|
Loads of addresses are scalarized and have their costs computed w/o
scalarization overhead. Consistently apply this logic also to
non-uniform loads that are already scalarized, to ensure their costs are
consistent with other scalarized lodas that are used as addresses.
|
|
We were hitting an assert discovered in https://github.com/llvm/llvm-project/pull/157768#issuecomment-3315359832
|
|
The kernel libbpf does not need .comment section. If not filtering out
in llvm, the section will be filtered out in libbpf. So let us filter it
out as early as possible which is in llvm.
The following is an example.
$ cat t.c
int test() { return 5; }
Without this change:
$ llvm-readelf -S t.o
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 000110 000047 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000010 00 AX 0 0 8
[ 3] .comment PROGBITS 0000000000000000 000050 000072 01 MS 0 0 1
[ 4] .note.GNU-stack PROGBITS 0000000000000000 0000c2 000000 00 0 0 1
[ 5] .llvm_addrsig LLVM_ADDRSIG 0000000000000000 000110 000000 00 E 6 0 1
[ 6] .symtab SYMTAB 0000000000000000 0000c8 000048 18 1 2 8
With this change:
$ llvm-readelf -S t.o
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
[ 1] .strtab STRTAB 0000000000000000 000098 00003e 00 0 0 1
[ 2] .text PROGBITS 0000000000000000 000040 000010 00 AX 0 0 8
[ 3] .note.GNU-stack PROGBITS 0000000000000000 000050 000000 00 0 0 1
[ 4] .llvm_addrsig LLVM_ADDRSIG 0000000000000000 000098 000000 00 E 5 0 1
[ 5] .symtab SYMTAB 0000000000000000 000050 000048 18 1 2 8
|
|
|
|
Follow-up to https://reviews.llvm.org/D46527
|
|
This patch simplifies dispatchRecalculateHash and dispatchResetHash
with "constexpr if".
This patch does not inline dispatchRecalculateHash and
dispatchResetHash into their respective call sites. Using "constexpr
if" in a non-template context like MDNode::uniquify would still
require the discarded branch to be syntactically valid, causing a
compilation error for node types that do not have
recalculateHash/setHash. Using template functions ensures that the
"constexpr if" is evaluated in a proper template context, allowing the
compiler to fully discard the inactive branch.
|
|
This patch modernizes HasCachedHash.
- "struct SFINAE" is replaced with identically defined SameType.
- The return types Yes and No are replaced with std::true_type and
std::false_type.
My previous attempt (#159510) to clean up HasCachedHash failed on
clang++-18, but this version works with clang++-18.
|
|
There are a couple of places during function cloning where we may create
new callsite clone nodes. One of those places was correctly propagating
the assignment to which function clone it should call, and one was not.
Refactor this handling into a helper and use in both places so the newly
created callsite clones actually call the assigned callee function
clones.
|
|
comparisons with `ConstantFPRange` (#159315)
Follow up of #158097
Similar to `simplifyAndOrOfICmpsWithConstants`, we can do so for
floating point comparisons.
|
|
Alive2: https://alive2.llvm.org/ce/z/8rX5Rk
Closes https://github.com/llvm/llvm-project/issues/118106.
|
|
|
|
Fixes regression after e5bbaa9c8fb6e06dbcbd39404039cc5d31df4410.
e5500 accidentally still had the 64bit feature applied instead of
64bit-support.
|
|
(#159839)
LiveRangeEdit's rematerialization checking logic is used in two quite
different ways. For SplitKit and InlineSpiller, we're analyzing all defs
associated with a live interval, doing that analysis up front, and then
using the result a bit later. The RegisterCoalescer, we're analysing
exactly one ValNo at a time, and using the legality result immediately.
LRE had a checkRematerializable which existed basically to adapt the
later into the former usage model.
Instead, this change bypasses the ScannedRemat and Remattable
structures, and directly queries the underlying routines. This is easy
to read, and makes it more clear as to which uses actually need the
deferred analysis. (A following change may try to unwind that too, but
it's not strictly NFC.)
|
|
Summary:
This patch has broken the `libc` build bot. I could work around that but
the changes seem unnecessary.
This reverts commit 9ba844eb3a21d461c3adc7add7691a076c6992fc.
|
|
Different instructions are used for the 32-bit and 64-bit cases
anyway, so directly use the concrete register class in the
instruction.
|
|
STI exists in the base class, use it instead.
Fixes #159862.
|
|
Fixes #148052 .
Last PR did not account for the scenario, when more than one instruction
used the `catchpad` label.
In that case I have deleted uses, which were already "choosen to be
iterated over" by the early increment iterator. This issue was not
visible in normal release build on x86, but luckily later on the address
sanitizer build it has found it on the buildbot.
Here is the diff from the last version of this PR: #158435
```diff
diff --git a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
index 91e245e5e8f5..1dd8cb4ee584 100644
--- a/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
+++ b/llvm/lib/Transforms/Utils/BasicBlockUtils.cpp
@@ -106,7 +106,8 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs,
// first block, the we would have possible cleanupret and catchret
// instructions with poison arguments, which wouldn't be valid.
if (isa<FuncletPadInst>(I)) {
- for (User *User : make_early_inc_range(I.users())) {
+ SmallPtrSet<BasicBlock *, 4> UniqueEHRetBlocksToDelete;
+ for (User *User : I.users()) {
Instruction *ReturnInstr = dyn_cast<Instruction>(User);
// If we have a cleanupret or catchret block, replace it with just an
// unreachable. The other alternative, that may use a catchpad is a
@@ -114,33 +115,12 @@ void llvm::detachDeadBlocks(ArrayRef<BasicBlock *> BBs,
if (isa<CatchReturnInst>(ReturnInstr) ||
isa<CleanupReturnInst>(ReturnInstr)) {
BasicBlock *ReturnInstrBB = ReturnInstr->getParent();
- // This catchret or catchpad basic block is detached now. Let the
- // successors know it.
- // This basic block also may have some predecessors too. For
- // example the following LLVM-IR is valid:
- //
- // [cleanuppad_block]
- // |
- // [regular_block]
- // |
- // [cleanupret_block]
- //
- // The IR after the cleanup will look like this:
- //
- // [cleanuppad_block]
- // |
- // [regular_block]
- // |
- // [unreachable]
- //
- // So regular_block will lead to an unreachable block, which is also
- // valid. There is no need to replace regular_block with unreachable
- // in this context now.
- // On the other hand, the cleanupret/catchret block's successors
- // need to know about the deletion of their predecessors.
- emptyAndDetachBlock(ReturnInstrBB, Updates, KeepOneInputPHIs);
+ UniqueEHRetBlocksToDelete.insert(ReturnInstrBB);
}
}
+ for (BasicBlock *EHRetBB :
+ make_early_inc_range(UniqueEHRetBlocksToDelete))
+ emptyAndDetachBlock(EHRetBB, Updates, KeepOneInputPHIs);
}
}
```
|
|
Currently MCA takes instruction properties from scheduling model.
However, some instructions may execute differently depending on external
factors - for example, latency of memory instructions may vary
differently depending on whether the load comes from L1 cache, L2 or
DRAM. While MCA as a static analysis tool cannot model such differences
(and currently takes some static decision, e.g. all memory ops are
treated as L1 accesses), it makes sense to allow manual modification of
instruction properties to model different behavior (e.g. sensitivity of
code performance to cache misses in particular load instruction). This
patch addresses this need.
The library modification is intentionally generic - arbitrary
modifications to InstrDesc are allowed. The tool support is currently
limited to changing instruction latencies (single number applies to all
output arguments and MaxLatency) via coments in the input assembler
code; the format is the like this:
add (%eax), eax // LLVM-MCA-LATENCY:100
Users of MCA library can already make additional customizations; command
line tool can be extended in the future.
Note that InstructionView currently shows per-instruction information
according to scheduling model and is not affected by this change.
See https://github.com/llvm/llvm-project/issues/133429 for additional
clarifications (including explanation why existing customization
mechanisms do not provide required functionality)
---------
Co-authored-by: Min-Yih Hsu <min@myhsu.dev>
|
|
The split in this code path was left over from when we had to support
the old PM and the new PM at the same time. Now that the legacy pass has
been dropped, this simplifies the code a little bit and swaps pointers
for references in a couple places.
Reviewers: aeubanks, efriedma-quic, wlei-llvm
Reviewed By: aeubanks
Pull Request: https://github.com/llvm/llvm-project/pull/159858
|
|
after LUI now. NFC (#159829)
The simm32 base case only uses lui+addiw when necessary after
3d2650bdeb8409563d917d8eef70b906323524ef
The worst case 8 instruction sequence doesn't leave a full 32 bits for
the LUI+ADDI(W) after the 3 12-bit ADDI and SLLI pairs are created. So
we will never generate LUI+ADDIW in the worst case sequence.
|
|
This patch introduces a new optimization in SROA that handles the
pattern where multiple non-overlapping vector `store`s completely fill
an `alloca`.
The current approach to handle this pattern introduces many `.vecexpand`
and `.vecblend` instructions, which can dramatically slow down
compilation when dealing with large `alloca`s built from many small
vector `store`s. For example, consider an `alloca` of type `<128 x
float>` filled by 64 `store`s of `<2 x float>` each. The current
implementation requires:
- 64 `shufflevector`s( `.vecexpand`)
- 64 `select`s ( `.vecblend` )
- All operations use masks of size 128
- These operations form a long dependency chain
This kind of IR is both difficult to optimize and slow to compile,
particularly impacting the `InstCombine` pass.
This patch introduces a tree-structured merge approach that
significantly reduces the number of operations and improves compilation
performance.
Key features:
- Detects when vector `store`s completely fill an `alloca` without gaps
- Ensures no loads occur in the middle of the store sequence
- Uses a tree-based approach with `shufflevector`s to merge stored
values
- Reduces the number of intermediate operations compared to linear
merging
- Eliminates the long dependency chains that hurt optimization
Example transformation:
```
// Before: (stores do not have to be in order)
%alloca = alloca <8 x float>
store <2 x float> %val0, ptr %alloca ; offset 0-1
store <2 x float> %val2, ptr %alloca+16 ; offset 4-5
store <2 x float> %val1, ptr %alloca+8 ; offset 2-3
store <2 x float> %val3, ptr %alloca+24 ; offset 6-7
%result = load <8 x float>, ptr %alloca
// After (tree-structured merge):
%shuffle0 = shufflevector %val0, %val1, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%shuffle1 = shufflevector %val2, %val3, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
%result = shufflevector %shuffle0, %shuffle1, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
```
Benefits:
- Logarithmic depth (O(log n)) instead of linear dependency chains
- Fewer total operations for large vectors
- Better optimization opportunities for subsequent passes
- Significant compilation time improvements for large vector patterns
For some large cases, the compile time can be reduced from about 60s to
less than 3s.
---------
Co-authored-by: chengjunp <chengjunp@nividia.com>
|
|
When there is a dependency between two memory instructions in separate loops that have the same iteration space and depth, SIV will be able to test them and compute the direction and the distance of the dependency.
|