| Age | Commit message (Collapse) | Author | Files | Lines |
|
patterns over v4f32 and v2f64 (#164486)
Related to https://github.com/llvm/llvm-project/issues/55932
|
|
Previously, cross-lane operations were disallowed here, but they are
only problematic if the `select` condition is a vector, as the input of
the operation is not simply one of the arms of the phi/select.
|
|
For the sake of consistency.
|
|
|
|
This patch introduces SDNodeFlags::InBounds, to show that an ISD::PTRADD SDNode
implements an inbounds getelementptr operation (i.e., the pointer operand is in
bounds wrt. an allocated object it is based on, and the arithmetic does not
change that). The flag is set in the DAG construction when lowering inbounds
GEPs.
Inbounds information is useful in the ISel when selecting memory instructions
that perform address computations whose intermediate steps must be in the same
memory region as the final result. Follow-up patches to propagate the flag in
DAGCombines and to use it when lowering AMDGPU's flat memory instructions,
where the immediate offset must not affect the memory aperture of the address
(similar to this GISel patch: #153001), are planned.
This mirrors #150900, which has introduced a similar flag in GlobalISel.
This patch supersedes #131862, which previously attempted to introduce an
SDNodeFlags::InBounds flag. The difference between this PR and #131862 is that
there is now an ISD::PTRADD opcode (PR #140017) and the InBounds flag is only
defined to apply to ISD::PTRADD DAG nodes. It is therefore unambiguous that
in-bounds-ness refers to a memory object into which the left operand of the
PTRADD node points (in contrast to #131862, where InBounds would have applied
to commutative ISD::ADD nodes, so that the semantics would be more difficult to
reason about).
For SWDEV-516125.
|
|
`load atomic <1 x T>` is not valid. This change legalizes
vector types of atomic load via scalarization in SelectionDAG
so that it can, for example, translate from `v1i32` to `i32`.
|
|
Vector types on atomics are assumed to be invalid by the verifier. However,
this type can be valid if it is lowered by codegen.
|
|
Fixes #142146
Do nullptr check when pass accept `const TargetMachine &` in
constructor, but it is still not exhaustive.
|
|
Finds longest (almost) plain substring in the pattern.
Implementation is conservative to avoid false positives.
The result is not used to optimize
`GlobPattern::match()` so it's calculated on
request.
For
* https://github.com/llvm/llvm-project/pull/164545
---------
Co-authored-by: Luke Lau <luke@igalia.com>
|
|
When rematting we create a new virtual register with the original def's
register class. However the use may have a different register class if
the interval is split, which means we end up with an invalid register
class.
This fixes #164181 by constraining the newly created register to the
use's register class.
The test case is reduced as far as it goes. Because this test requires
us to reach a certain amount of register pressure in certain conditions
I'm not sure if there's an easy way to handwrite this scenario.
|
|
Add codegen support for fpround and fpextend for zvfbfa.
resolve https://github.com/llvm/llvm-project/issues/164324
|
|
Not all the paths appear to be implemented for GlobalISel
|
|
|
|
improve code quality (#164219)
|
|
Replace two StringRefs with One StringRef + 2 x size_t.
Prepare for:
* https://github.com/llvm/llvm-project/pull/164512
|
|
We were checking for cbuffers where the global was removed, but if the
buffer is completely unused the whole thing can be null.
---------
Co-authored-by: Helena Kotas <hekotas@microsoft.com>
|
|
Add MIR2Vec support to the llvm-ir2vec tool, enabling embedding generation for Machine IR alongside the existing LLVM IR functionality.
(This is an initial integration; Other entity/triplet gen for vocab generation would follow as separate patches)
|
|
Make call graph section to have a dedicated type instead of the generic
progbits type.
|
|
optimizations. (#149706)"
This reverts commit 8d29d09309654541fb2861524276ada6a3ebf84c.
There have been reports of mis-compiles
in https://github.com/llvm/llvm-project/pull/149706.
Revert while I investigate.
|
|
This patch provides an approximation of the memory locations touched by
`llvm.matrix.column.major.load` and `llvm.matrix.column.major.store`,
enabling dead store elimination and GVN to remove redundant loads and
dead stores.
PR: https://github.com/llvm/llvm-project/pull/163368
|
|
Add a flag to the GlobalValueSummaryInfo indicating whether the
associated SummaryList (all summaries with the same GUID) contains any
summaries with local linkage. This flag is set when building the index,
so it is associated with the original linkage type before
internalization and promotion. Consumers should check the
withInternalizeAndPromote() flag on the index before using it.
In most cases we expect a 1-1 mapping between a GUID and a summary with
local linkage, because for locals the GUID is computed from the hash of
"modulepath;name". However, there can be multiple locals with the same
GUID if translation units are not compiled with enough path. And in rare
but theoretically possible cases, there can be hash collisions on the
underlying MD5 computation. So to be safe when looking for local
summaries, analyses currently look through all summaries in the list.
These lists can be extremely long in the case of large binaries with
template function defs in widely used headers (i.e. linkonce_odr).
A follow on change will use this flag to reduce ThinLTO analysis time in
WPD by 5-6% for a large target (details in PR164046 which will be
reworked to use this flag).
Note that in the past we have tried to keep bits related to the GUID in
the ValueInfo (which has a pointer to the associated
GlobalValueSummaryInfo), via its PointerIntPair. However, we are out of
bits there. This change does add a byte to every GlobalValueSummaryInfo
instance, which I measured as a little under 0.90% overhead in a large
target. However, it enables adding 7 bits of other per-GUID flags in the
future without adding more overhead. Note that it was lower overhead to
add this to the GlobalValueSummaryInfo than the ValueInfo, which tends
to be copied into other maps.
|
|
Handling opcodes in embedding computation.
- Revamped MIR Vocabulary with four sections - `Opcodes`, `Common Operands`, `Physical Registers`, and `Virtual Registers`
- Operands broadly fall into 3 categories -- the generic MO types that are common across architectures, physical and virtual register classes. We handle these categories separately in MIR2Vec. (Though we have same classes for both physical and virtual registers, their embeddings vary).
|
|
DXILResource was falling over trying to name a resource type that
contained an array, such as `StructuredBuffer<float[3][2]>`. Handle this
by walking through array types to gather the dimensions.
|
|
The comment here pointed out that RAUW would fall over given a
constantexpr, but then proceeded to just do what RAUW does by hand,
which falls over in the same way. Instead, convert constantexprs
involving cbuffer globals to instructions before processing them.
The test update just modifies the existing cbuffer test, since it
implied it was trying to test this exact case anyways.
|
|
|
|
Add a new getGEPExpr variant which is independent of GEPOperator*.
To be used to construct SCEVs for VPlan recipes in
https://github.com/llvm/llvm-project/pull/161276.
PR: https://github.com/llvm/llvm-project/pull/164487
|
|
This PR is part of the LLVM IR LSP server project
([RFC](https://discourse.llvm.org/t/rfc-ir-visualization-with-vs-code-extension-using-an-lsp-server/87773))
To be able to make a LSP server, it's crucial to have location
information about the LLVM objects (Functions, BasicBlocks and
Instructions).
This PR adds:
* Position tracking to the Lexer
* A new AsmParserContext class, to hold the new position info
* Tests to check if the location is correct
The AsmParserContext can be passed as an optional parameter into the
parser. Which populates it and it can be then used by other tools, such
as the LSP server.
The AsmParserContext idea was borrowed from MLIR. As we didn't want to
store data no one else uses inside the objects themselves. But the
implementation is different, this class holds several maps of Functions,
BasicBlocks and Instructions, to map them to their location.
And some utility methods were added to get the positions of the
processed tokens.
|
|
Add an index-wide flag indicating whether index-based internalization
and promotion have completed. This will be used in a follow on change.
|
|
(#163174)
We currently raise a warning in `print_btf.py` when any of these types
have a name. Linux kernel doesn't allow names in these types either.[0]
However, there is nothing stopping frontends from giving names to these
types. To make sure that they are always anonymous, explicitly skip the
name emission.
[0]
https://elixir.bootlin.com/linux/v6.17.1/source/kernel/bpf/btf.c#L2586
|
|
When an VF is specified via a loop hint, it will be clamped to a safe
VF or ignored if it is found to be unsafe. This is not the case for
user-specified interleave counts, which can lead to loops such as
the following with a memory dependence being vectorised with
interleaving:
```
#pragma clang loop interleave_count(4)
for (int i = 4; i < LEN; i++)
b[i] = b[i - 4] + a[i];
```
According to [1], loop hints are ignored if they are not safe to apply.
This patch adds a check to prevent vectorisation with interleaving if
isSafeForAnyVectorWidth() returns false. This is already checked in
selectInterleaveCount().
[1]
https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
|
|
This patch fixes a missed optimization issue: predicate infos might be
lost in phi-use scenarios.
Due to the existence of and-chains, a phi-use might be associated with
multiple LN_Last predicate infos.
E.g.,
```cpp
// TWO LN_Last Predicate Info defs:
// 1. a >= 1
// 2. a < 2
if ( a < 1 || a >= 2) {
a = 1;
}
// PHI use of `a`
use(a)
```
However, previously, `popStackUntilDFSScope` reserved only ONE LN_Last
def for a phi use (i.e., reserve only one of `a >= 1` / `a < 2`),
although there might be multiple LN_Last defs for the same phi use.
This patch reserves the adjacent LN_Last defs if they are designated for
the same phi use.
|
|
Note that "override" makes "virtual" redundant.
Identified with modernize-use-override.
|
|
Reland PR https://github.com/llvm/llvm-project/pull/162352. Fix by
excluding SI_PC_ADD_REL_OFFSET from instructions that set SCC = DST!=0.
Passes check-libc-amdgcn-amd-amdhsa now.
Distribution of instructions that allowed a redundant S_CMP to be
deleted in check-libc-amdgcn-amd-amdhsa test:
```
S_AND_B32 485
S_AND_B64 47
S_ANDN2_B32 42
S_ANDN2_B64 277492
S_CSELECT_B64 17631
S_LSHL_B32 6
S_OR_B64 11
```
---------
Signed-off-by: John Lu <John.Lu@amd.com>
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
|
|
A bit of debug info maintenaince for #147540.
|
|
8d29d09309 exposed a crash due to incorrectly trying to handle masked
interleave recipes. For now, the current code does not support masked
interleave recipes. Bail out for them.
|
|
An assertion failed when Polly was registering for the pass manager
which assumed that there would be only Polly passes. Since this does not
need to be the case, re-apply with the assert removed.
Includes a non-Polly change to trigger the premerge CI to trigger
check-llvm which failed for 0b9a7b80c0674c5c6f746139912111bea7eae63b,
but pre-merge did not catch.
|
|
This PR fixes a handful of latency and uop changes between Znver3 and
Znver4 that were otherwise copied from Znver3.
Latency and uop values listed that matched Zen3 on uops.info were
updated to those for Zen4.
Includes: BSF/BSR, DIV, TZCNT, CLMUL, PCMPISTRM, VALIGN, VPERM
|
|
|
|
LD1Rv8b only supports a base register but the DAG is matched using
am_indexed8 with the offset it finds silently dropped.
I've also fixed a couple of immediate operands types inconsistencies
that don't manifest as bugs because their incorrect scaling is overriden
by the complex pattern and MachineInstr that are correct and thus
there's nothing to test.
|
|
This patch implements the speculative devirtualization feature in the
LLVM backend.
It handles the case of single implementation devirtualization where
there is a single possible callee of a virtual function.
- Add cl::opt 'devirtualize-speculatively' to enable it.
- Flag is disabled by default.
- It works regardless of the visibility of the object.
- Not enabled for LTO for now.
|
|
|
|
As far as I understand, constant pool access does not access stack and
accesses read-only memory.
This patch considers constant pool access as non-stack access allowing
shrink wrapping to happen in the concerned test.
We should be seeing perf improvement with povray benchmark from
SPEC17(around 12% with -flto -Ofast) after this patch.
An NFC PR #162476 already exists to upload the test before the patch but
approval has got delayed. So, as @davemgreen suggested in that PR, I
have uploaded the test and patch in this single PR to show how test
looks like.
|
|
Generate a usage error instead of asserting.
|
|
fminimum/fmaximum over v4f32 and v2f64 (#162948)
Related to #55932
|
|
`ISD::VSelect` (#164069)
Fixes #150019
|
|
Refactor the AllocToken pass to accept the mode via pass options rather
than LLVM cl::opt. This is both cleaner, but also required to make the
mode frontend-driven and avoid potential inconsistencies.
|
|
When switch from fast isel to dag isel the input value is from llvm IR
instruction.
If the instruction is call we should get the calling convention of the
callee and
pass it to RegsForValue::getCopyFromRegs, so that it can deduce the
right RegisterVT
of the returned value of the callee.
---------
Co-authored-by: Yuanke Luo <ykluo@birentech.com>
|
|
Reverts llvm/llvm-project#163453
Causes crashes, see
https://github.com/llvm/llvm-project/pull/163453#issuecomment-3429922732
|
|
Refactor the stateless (hash-based) token calculation logic out of the
`AllocToken` pass and into `llvm/Support/AllocToken.h`.
This helps with making the token calculation logic available to other
parts of the codebase, which will be necessary for frontend
implementation of `__builtin_infer_alloc_token` to perform constexpr
evaluation.
The `AllocTokenMode` enum and a new `AllocTokenMetadata` struct are
moved into a shared header. The `getAllocTokenHash()` function now
provides the source of truth for calculating token IDs for `TypeHash`
and `TypeHashPointerSplit` modes.
|
|
When a loop is converted into a low-overhead loop using tail predication
via FPSCR.LTPSIZE, the MQPRCopy pseudo-instruction is expanded into
either two VMOVD or a single MVE_VORR, depending on whether the values
written to the lanes with a 'false' predicate matter. (MVE_VORR uses the
ambient LTPSIZE predicate, so it won't write those lanes at all; the
double VMOVD is slower but gets them right.)
This check was done based on whether the output of the MQPRCopy is live
coming out of the loop. But it missed a case where the live-out value is
not _itself_ an MQPRCopy, but is a predicated operation taking its false
lanes from an MQPRCopy.
Fixes #162644, and adds a new MIR test case derived from the reproducer
in that bug.
|