Age | Commit message (Collapse) | Author | Files | Lines |
|
Similar to f4370fb801aa, the fp16 tests do not work yet.
|
|
|
|
This is a singleton register class which is a bad idea,
and not actually used.
|
|
These are singleton register classes, which are not a good idea
and also are unused.
|
|
With true16 mode v_mov_b16_t16 is added as new foldable copy inst, but
the src operand is in different index.
Use the correct src index for v_mov_b16_t16.
|
|
`OpBitCast` instruction. (#161891)
Generate `OpBitCast` instruction for pointer cast operation if the
element type is different.
The HLSL for the unit test is
```hlsl
StructuredBuffer<uint2> In : register(t0);
RWStructuredBuffer<double2> Out : register(u2);
[numthreads(1,1,1)]
void main() {
Out[0] = asdouble(In[0], In[1]);
}
```
Resolves https://github.com/llvm/llvm-project/issues/153513
|
|
Fix Issue #160611
|
|
This changes the intrinsic definitions for shifts to use IntArg, which
in turn changes how the shifts are represented in SDAG to use
TargetConstant (and fixes up a number of ISel lowering places too). The
vecshift immediates are changed from ImmLeaf to TImmLeaf to keep them
matching the TargetConstant. On the GISel side the constant shift
amounts are then represented as immediate operands, not separate constants.
The end result is that this allows a few more patterns to match in GISel.
|
|
Resolves instruction selection failure for v64f16 and v32f32 vector
types.
Patch by: Fateme Hosseini
---------
Co-authored-by: Kaushik Kulkarni <quic_kauskulk@quicinc.com>
|
|
Improves codegen diff in an upcoming patch
|
|
functions. (#161319)
This PR stops the attributor pass to infer `amdgpu-no-flat-scratch-init`
for functions marked with `sanitize_*` attribute.
|
|
(#161846)
Allow getV4X86ShuffleImm8ForMask to create a pure splat mask, helping to reduce demanded elts.
|
|
|
|
|
|
This should be always on.
Fixes SWDEV-555931.
|
|
Replace mul and mul_u ops with a neg operation if their second operand
is a splat value -1.
Apply the optimization also for mul_u ops if their first operand is a
splat value -1 due to their commutativity.
|
|
The transformation pattern is identical to the uint_to_fp
conversion from v32i1 to v32f32.
|
|
Now that #161007 will attempt to fold this back to ADD(x,x) in
X86FixupInstTunings, we can more aggressively create X86ISD::VSHLI nodes
to avoid missed optimisations due to oneuse limits, avoids unnecessary
freezes and allows AVX512 to fold to mi memory folding variants.
I've currently limited SSE targets to cases where ADD is the only user
of x to prevent extra moves - AVX shift patterns benefit from breaking
the ADD+ADD+ADD chains into shifts, but its not so beneficial on SSE
with the extra moves.
|
|
Do not move meta instructions like `FAKE_USE`/`@llvm.fake.use` into
delay slots, as they don't correspond to real machine instructions.
This should fix crashes when compiling with, for example, `clang -Og`.
|
|
We do not need to reconstrain physical registers. Enables an
additional fold for constant physregs.
|
|
This enables `aarch64-split-sve-objects` by default. Note: This option
only has an effect when used in conjunction with hazard padding
(`aarch64-stack-hazard-size` != 0).
See https://github.com/llvm/llvm-project/pull/142392 for more details.
|
|
Original PR broke in rebase
https://github.com/llvm/llvm-project/pull/160247. Continuing here
This patch adds support for G_[U|S][MIN|MAX] opcodes into X86 Target.
This PR addressed review comments
1. About Widening to next power of 2
https://github.com/llvm/llvm-project/pull/160247#discussion_r2371655478
2. clamping scalar
https://github.com/llvm/llvm-project/pull/160247#discussion_r2374748440
|
|
X86 Gisel has all necessary opcodes supported to expand/lower isfpclass
intrinsic, enabling test prior fpclass patch. This patch enables runs
for isel-fpclass.ll tests
|
|
Hardware inserts an implicit `S_WAIT_XCNT 0` between
alternate SMEM and VMEM instructions, so there are
never outstanding address translations for both SMEM
and VMEM at the same time.
|
|
Needed for future patch.
|
|
biggest legal type (#158070)
For ARM, we want to do this up to 32-bits. Otherwise the code ends up
bigger and bloated.
|
|
combineBitcastvxi1 is sometimes called pre-legalization, so don't
introduce X86ISD::MOVMSK nodes when vector types aren't legal
Fixes #161693
|
|
Found this problem when investigating #91207
|
|
Most of the fp16 cases still do not work properly. See #161088.
|
|
Make the test for when additional variables can be added to the struct
allocated at address zero more stringent. Previously, variables can be
added to it (for faster access) even when that increases the lds
requested by a kernel. This corrects that oversight.
Test case diff shows the change from all variables being allocated into
the module lds to only some being, in particular the introduction of
uses of the offset table and that some kernels now use less lds than
before.
Alternative to PR 160181
|
|
#153478 made v2i32 legal on newer GPUs, but we can not lower all
operations yet. Expand the `trunc/ext` operation until we implement
efficient lowering.
|
|
For a while we have supported the `-aarch64-stack-hazard-size=<size>`
option, which adds "hazard padding" between GPRs and FPR/ZPRs. However,
there is currently a hole in this mitigation as PPR and FPR/ZPR accesses
to the same area also cause streaming memory hazards (this is noted by
`-pass-remarks-analysis=sme -aarch64-stack-hazard-remark-size=<val>`),
and the current stack layout places PPRs and ZPRs within the same area.
Which looks like:
```
------------------------------------ Higher address
| callee-saved gpr registers |
|---------------------------------- |
| lr,fp (a.k.a. "frame record") |
|-----------------------------------| <- fp(=x29)
| <hazard padding> |
|-----------------------------------|
| callee-saved fp/simd/SVE regs |
|-----------------------------------|
| SVE stack objects |
|-----------------------------------|
| local variables of fixed size |
| <FPR> |
| <hazard padding> |
| <GPR> |
------------------------------------| <- sp
| Lower address
```
With this patch the stack (and hazard padding) is rearranged so that
hazard padding is placed between the PPRs and ZPRs rather than within
the (fixed size) callee-save region. Which looks something like this:
```
------------------------------------ Higher address
| callee-saved gpr registers |
|---------------------------------- |
| lr,fp (a.k.a. "frame record") |
|-----------------------------------| <- fp(=x29)
| callee-saved PPRs |
| PPR stack objects | (These are SVE predicates)
|-----------------------------------|
| <hazard padding> |
|-----------------------------------|
| callee-saved ZPR regs | (These are SVE vectors)
| ZPR stack objects | Note: FPRs are promoted to ZPRs
|-----------------------------------|
| local variables of fixed size |
| <FPR> |
| <hazard padding> |
| <GPR> |
------------------------------------| <- sp
| Lower address
```
This layout is only enabled if:
* SplitSVEObjects are enabled (`-aarch64-split-sve-objects`)
- (This may be enabled by default in a later patch)
* Streaming memory hazards are present
- (`-aarch64-stack-hazard-size=<val>` != 0)
* PPRs and FPRs/ZPRs are on the stack
* There's no stack realignment or variable-sized objects
- This is left as a TODO for now
Additionally, any FPR callee-saves that are present will be promoted to
ZPRs. This is to prevent stack hazards between FPRs and GRPs in the
fixed size callee-save area (which would otherwise require more hazard
padding, or moving the FPR callee-saves).
This layout should resolve the hole in the hazard padding mitigation,
and is not intended change codegen for non-SME code.
|
|
Essentially what happened is the following series of events:
1) We rematerialized the vmv.v.x into the loop.
2) As this was the last use of the instruction, we deleted the
instruction, and removed it from the original live range.
3) We split the live range for the remat.
4) We tried to rematerialize the uses of that split interval, and
crashed because the assert about the def being available in
the original live interval does not hold.
|
|
Fix s_quadmask* instruction description so that it defines SCC.
---------
Signed-off-by: John Lu <John.Lu@amd.com>
|
|
|
|
Check for a valid offset for unaligned vector store V6_vS32Ub_npred_ai.
isValidOffset() is updated to evaluate offset of this instruction.
Fixes #160647
|
|
Previously this took hints from subregister extract of physreg,
like %vreg.sub = COPY $physreg
This now also handles the rarer case:
$physreg_sub = COPY %vreg
Also make an accidental bug here before explicit; this was
only using the superregister as a hint if it was already
in the copy, and not if using the existing assignment. There are
a handful of regressions in that case, so leave that extension
for a future change.
|
|
This splits out "ScalablePredicateVector" from the "ScalableVector"
StackID this is primarily to allow easy differentiation between vectors
and predicates (without inspecting instructions).
This new stack ID is not used in many places yet, but will be used in a
later patch to mark stack slots that are known to contain predicates.
Co-authored-by: Kerry McLaughlin <kerry.mclaughlin@arm.com>
|
|
|
|
(#160424)
For subregister copies, do a subregister live check instead of checking
the main range. Doesn't do much yet, the split analysis still does not
track live ranges.
|
|
ops. (#160515)
When unable to widen a vector load/store we can replace the operation
with a masked variant. Support for extending loads largely came for free
hence its inclusion, but truncating stores require more work.
Fixes https://github.com/llvm/llvm-project/issues/159995
|
|
(#161299)
Prevent adding duplicate instructions for implicit bindings when they
are from the same resource. The fix is to store and check if the binding
number is already assigned for each `OrderId`.
Resolves https://github.com/llvm/llvm-project/issues/160716
|
|
This commit adds the intrinsic `G_FMODF` to GMIR & enables its
translation, legalization and instruction selection in AArch64.
|
|
(#161384)
When the input to ptest_first is a vector concat and the mask is all active,
performPTestFirstCombine returns a ptest_first using the first operand
of the concat, looking through any reinterpret casts.
This allows optimizePTestInstr to later remove the ptest when the first
operand is a flag setting instruction such as whilelo.
|
|
Previously, the `Chain` was dropped meaning LUTI4 nodes that only
differed in the chain operand would be incorrectly CSE'd.
Fixes: #161420
|
|
Also removes the command line option to control this feature.
There seem to be mainly two kinds of test changes:
- Some operands of addition instructions are swapped; that is to be expected
since PTRADD is not commutative.
- Improvements in code generation, probably because the legacy lowering enabled
some transformations that were sometimes harmful.
For SWDEV-516125.
|
|
If X is known never under/poison then skip the freeze and return ComputeNumSignBits(X)
|
|
Previously if we had a subregister extract reading from a
full copy, the no-subregister incoming copy would overwrite
the DefSubReg index of the folding context.
There's one ugly rvv regression, but it's a downstream
issue of this; an unnecessary same class reg-to-reg full copy
was avoided.
|
|
This matches what we do for regular i8 extload due to the lack of c.lb
in Zbc.
This only affects global isel because SelectionDAG won't create an
anyext i8 atomic_load today.
|
|
the fixme (#161531)
Move LowerBufferFatPointers pass after CodegenPrepare and
LoadStoreVectorizer pass, and remove the fixme about that.
|