Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
(#161739)
The switch becomes a conditional branch, one edge going to what was the default target of the switch, the other to a BB that performs a lookup in a table. The branch weights are accurately determinable from the ones of the switch.
Issue #147390
|
|
This is a singleton register class which is a bad idea,
and not actually used.
|
|
SubtargetFeature (#161888)
Introduce a new SchedPredicate, `FeatureSchedPredicate`, that holds true
when a certain SubtargetFeature is enabled. This could be useful when we
want to configure a scheduling model with subtarget features.
I add this as a separate SchedPredicate rather than piggy-back on the
existing `SchedPredicate<[{....}]>` because first and foremost,
`SchedPredicate` is expected to only operate on MachineInstr, so it does
_not_ appear in `MCGenSubtargetInfo::resolveVariantSchedClass` but only
show up in `TargetGenSubtargetInfo::resolveSchedClass`. Yet I think
`FeatureSchedPredicate` will be useful for both MCInst and MachineInstr.
There is another subtle difference between `resolveVariantSchedClass`
and `resolveSchedClass` regarding how we access the MCSubtargetInfo
instance, if we really want to express `FeatureSchedPredicate` using
`SchedPredicate<[{.....}]>`.
So I thought it'll be easier to add another new SchedPredicate for
SubtargetFeature.
|
|
-> switch` (#161549)
We cannot calculate the weights of the switch precisely, but we do know the probability of the default branch. We then split equally the remaining probability over the rest of the cases. If we did nothing, the static estimation could be considerably poorer.
Issue #147390
|
|
These are singleton register classes, which are not a good idea
and also are unused.
|
|
resolveVariantSchedClassImpl (#161886)
`Target_MC::resolveVariantSchedClassImpl` is the implementation function
for `TargetGenMCSubtargetInfo::resolveVariantSchedClass`. Despite being
only called by `resolveVariantSchedClass`,
`resolveVariantSchedClassImpl` is still a standalone function that
cannot access a MCSubtargetInfo through `this` (i.e.
`TargetGenMCSubtargetInfo`). And having access to a `MCSubtargetInfo`
could be useful for some (future) SchedPredicate.
This patch modifies TableGen to generate `resolveVariantSchedClassImpl`
with an additional `MCSubtargetInfo` argument passing in. Note that this
does not change any public interface in either `TargetGenMCSubtargetInfo
` or `MCSubtargetInfo`, as `resolveVariantSchedClassImpl` is basically
an internal function.
|
|
In some cases due to phase ordering issues with re-cloning during
function assignment, we may end up with duplicate clones in the
summaries (calling the same set of callee clones and/or allocation
hints).
Ideally we would fix this in the thin link, but for now, detect and
suppress these in the LTO backend. In order to satisfy possibly
cross-module references, make each duplicate an alias to the first
identical copy, which gets materialized.
This reduces ThinLTO backend compile times.
|
|
With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but
the src operand is in different index.
Use the correct src index for v_mov_b16_t16.
|
|
If a load is scalarized because it is used by a load/store address, the
legacy cost model does not pass ScalarEvolution to getAddressComputationCost.
Match the behavior in VPReplicateRecipe::computeCost.
|
|
`OpBitCast` instruction. (#161891)
Generate `OpBitCast` instruction for pointer cast operation if the
element type is different.
The HLSL for the unit test is
```hlsl
StructuredBuffer<uint2> In : register(t0);
RWStructuredBuffer<double2> Out : register(u2);
[numthreads(1,1,1)]
void main() {
Out[0] = asdouble(In[0], In[1]);
}
```
Resolves https://github.com/llvm/llvm-project/issues/153513
|
|
Replacing uses of the return value with the argument is already handled
in other passes, additionally it causes issues with memory value
numbering when the call is a memory defining intrinsic.
fixes #159918
|
|
Fix Issue #160611
|
|
with conditionals" (#161885) (#161890)
This reverts commit 572b579632fb79ea6eb562a537c9ff1280b3d4f5.
This is a reland of #159666 but with a fix moving the `extern`
declaration of the flag under the LLVM namespace, which is needed to fix
a linker error caused by #161240.
|
|
This changes the intrinsic definitions for shifts to use IntArg, which
in turn changes how the shifts are represented in SDAG to use
TargetConstant (and fixes up a number of ISel lowering places too). The
vecshift immediates are changed from ImmLeaf to TImmLeaf to keep them
matching the TargetConstant. On the GISel side the constant shift
amounts are then represented as immediate operands, not separate constants.
The end result is that this allows a few more patterns to match in GISel.
|
|
Resolves instruction selection failure for v64f16 and v32f32 vector
types.
Patch by: Fateme Hosseini
---------
Co-authored-by: Kaushik Kulkarni <quic_kauskulk@quicinc.com>
|
|
|
|
with conditionals" (#161885)
Reverts llvm/llvm-project#159666
Many bots are broken right now.
|
|
conditionals (#159666)
If `select` simplification produces the transform:
```
(select A && B, T, F) -> (select A, T, F)
```
or
```
(select A || B, T, F) -> (select A, T, F)
```
it stands to reason that if the branches are the same, then the branch
weights remain the same since the net effect is a simplification of the
conditional.
There are also cases where InstCombine negates the conditional (and
therefore reverses the branches); this PR asserts that the branch
weights are reversed in this case.
Tracking issue: #147390
|
|
Improves codegen diff in an upcoming patch
|
|
|
|
functions. (#161319)
This PR stops the attributor pass to infer `amdgpu-no-flat-scratch-init`
for functions marked with `sanitize_*` attribute.
|
|
(#161846)
Allow getV4X86ShuffleImm8ForMask to create a pure splat mask, helping to reduce demanded elts.
|
|
Tolerate setting negative values in tablegen, and store them as a
saturated uint8_t value. This will allow naive uses of the copy cost
to directly add it as a cost without considering the degenerate negative
case. The degenerate negative cases are only used in InstrEmitter / DAG
scheduling, so leave the special case processing there. There are also
fixmes about this system already there.
This is the expedient fix for an out of tree target regression
after #160084. Currently targets can set a negative copy cost to mark
copies as "impossible". However essentially all the in-tree uses only
uses this for non-allocatable condition registers. We probably should
replace the InstrEmitter/DAG scheduler uses with a more direct check
for a copyable register but that has test changes.
|
|
|
|
|
|
This should be always on.
Fixes SWDEV-555931.
|
|
Replace mul and mul_u ops with a neg operation if their second operand
is a splat value -1.
Apply the optimization also for mul_u ops if their first operand is a
splat value -1 due to their commutativity.
|
|
The transformation pattern is identical to the uint_to_fp
conversion from v32i1 to v32f32.
|
|
Now that #161007 will attempt to fold this back to ADD(x,x) in
X86FixupInstTunings, we can more aggressively create X86ISD::VSHLI nodes
to avoid missed optimisations due to oneuse limits, avoids unnecessary
freezes and allows AVX512 to fold to mi memory folding variants.
I've currently limited SSE targets to cases where ADD is the only user
of x to prevent extra moves - AVX shift patterns benefit from breaking
the ADD+ADD+ADD chains into shifts, but its not so beneficial on SSE
with the extra moves.
|
|
When using information from dereferenceable assumptions, we need to make
sure that the memory is not freed between the assume and the specified
context instruction. Instead of just checking canBeFreed, check if there
any calls that may free between the assume and the context instruction.
This patch introduces a willNotFreeBetween to check for calls that may
free between an assume and a context instructions, to also be used in
https://github.com/llvm/llvm-project/pull/161255.
PR: https://github.com/llvm/llvm-project/pull/161725
|
|
The Ada front end can emit somewhat complicated DWARF expressions for
the offset of a field. While working in this area I found that I needed
DW_OP_rot (to implement a branch-free computation -- it looked more
difficult to add support for branching); and DW_OP_neg and DW_OP_abs
(just basic functionality).
|
|
Do not move meta instructions like `FAKE_USE`/`@llvm.fake.use` into
delay slots, as they don't correspond to real machine instructions.
This should fix crashes when compiling with, for example, `clang -Og`.
|
|
We do not need to reconstrain physical registers. Enables an
additional fold for constant physregs.
|
|
This enables `aarch64-split-sve-objects` by default. Note: This option
only has an effect when used in conjunction with hazard padding
(`aarch64-stack-hazard-size` != 0).
See https://github.com/llvm/llvm-project/pull/142392 for more details.
|
|
This patch teaches GVN how to eliminate redundant masked loads and
forward previous loads or instructions with a select. This is possible
when the same mask is used for masked stores/loads that write to the
same memory location
|
|
Original PR broke in rebase
https://github.com/llvm/llvm-project/pull/160247. Continuing here
This patch adds support for G_[U|S][MIN|MAX] opcodes into X86 Target.
This PR addressed review comments
1. About Widening to next power of 2
https://github.com/llvm/llvm-project/pull/160247#discussion_r2371655478
2. clamping scalar
https://github.com/llvm/llvm-project/pull/160247#discussion_r2374748440
|
|
X86 Gisel has all necessary opcodes supported to expand/lower isfpclass
intrinsic, enabling test prior fpclass patch. This patch enables runs
for isel-fpclass.ll tests
|
|
This cost-model takes into account any type-legalisation that would
happen on vectors such as splitting and promotion. This results in wider
VFs being chosen for loops that can use partial reductions.
The cost-model now also assumes that when SVE is available, the SVE dot
instructions for i16 -> i64 dot products can be used for fixed-length
vectors. In practice this means that loops with non-scalable VFs are
vectorized using partial reductions where they wouldn't before, e.g.
```
int64_t foo2(int8_t *src1, int8_t *src2, int N) {
int64_t sum = 0;
for (int i=0; i<N; ++i)
sum += (int64_t)src1[i] * (int64_t)src2[i];
return sum;
}
```
These changes also fix an issue where previously a partial reduction
would be used for mixed sign/zero-extends (USDOT), even when +i8mm was
not available.
|
|
Add Section Header check for getBuildID, fix crash with invalid Program
Header.
Fixes: #126418
---------
Signed-off-by: Ruoyu Qiu <cabbaken@outlook.com>
Signed-off-by: Ruoyu Qiu <qiuruoyu@xiaomi.com>
Co-authored-by: Ruoyu Qiu <qiuruoyu@xiaomi.com>
Co-authored-by: James Henderson <James.Henderson@sony.com>
|
|
Hardware inserts an implicit `S_WAIT_XCNT 0` between
alternate SMEM and VMEM instructions, so there are
never outstanding address translations for both SMEM
and VMEM at the same time.
|
|
(#161799)
…cation.
Replaces a call to ObjectFile::makeTriple (still used for ELF and COFF)
with a call to MachOObjectFile::getArchTriple. The latter knows how to
build correct triples for different MachO CPU subtypes, e.g. arm64 vs
arm64e, which is important for selecting the right slice from universal
archives.
|
|
Needed for future patch.
|
|
biggest legal type (#158070)
For ARM, we want to do this up to 32-bits. Otherwise the code ends up
bigger and bloated.
|
|
combineBitcastvxi1 is sometimes called pre-legalization, so don't
introduce X86ISD::MOVMSK nodes when vector types aren't legal
Fixes #161693
|
|
Adds the name and triple of the graph to LinkGraph::dump output before
the rest of the graph content. Calls from JITLinkGeneric.cpp to dump the
graph are updated to avoid redundantly naming the graph.
|
|
Found this problem when investigating #91207
|
|
Most of the fp16 cases still do not work properly. See #161088.
|
|
Pre-commits extra test coverage for loops with multiple F(Max|Min)Num
reductions w/o fast-math-flags for follow-up PR.
|
|
Make the test for when additional variables can be added to the struct
allocated at address zero more stringent. Previously, variables can be
added to it (for faster access) even when that increases the lds
requested by a kernel. This corrects that oversight.
Test case diff shows the change from all variables being allocated into
the module lds to only some being, in particular the introduction of
uses of the offset table and that some kernels now use less lds than
before.
Alternative to PR 160181
|