Age | Commit message (Collapse) | Author | Files | Lines |
|
New instructions added:
* xxmulmul
* xxmulmulhiadd
* xxmulmulloadd
* xxssumudm
* xxssumudmc
* xxssumudmcext
* xsaddadduqm
* xsaddaddsuqm
* xsaddsubuqm
* xsaddsubsuqm
* xsmerge2t1uqm
* xsmerge2t2uqm
* xsmerge2t3uqm
* xsmerge3t1uqm
* xsrebase2t1uqm
* xsrebase2t2uqm
* xsrebase2t3uqm
* xsrebase2t4uqm
* xsrebase3t1uqm
* xsrebase3t2uqm
* xsrebase3t3uqm
|
|
Consolidate predicate definitions into top level entry point for PowerPC
target `PPC.td` and
remove duplicate definitions for 32/64 bit sub-target checks.
|
|
|
|
Apply suggestion as per review comment in
https://github.com/llvm/llvm-project/pull/151004/files#r2240893226
|
|
New instructions added:
* xvadduwm - VSX Vector Add UnsignedWord Modulo
* xvadduhm - VSXVectorAddUnsigned HalfwordModulo
* xvsubuwm - VSXVectorSubtract UnsignedWord Modulo
* xvsubuhm - VSX Vector SubtractUnsigned HalfwordModulo
* xvmuluwm - VSX Vector MultiplyUnsigned WordModulo
* xvmuluhm - VSXVectorMultiply Unsigned Halfword Modulo
* xvmulhsw - VSX Vector MultiplyHigh SignedWord
* xvmulhsh - VSX Vector Multiply HighSigned Halfword
* xvmulhuw - VSX Vector Multiply HighUnsigned Word
* xvmulhuh - VSX Vector MultiplyHigh UnsignedHalfword
|
|
.. to isReMaterializableImpl. The "Really" naming has always been
awkward, and we're working towards removing the "Trivial" part now,
so go ehead and remove both pieces in a single rename.
Note that this doesn't change any aspect of the current
implementation; we still "mostly" only return instructions which
are trivial (meaning no virtual register uses), but some targets
do lie about that today.
|
|
(#159778)
Extract error reporting code emitted by CodeEmitterGen into
MCCodeEmitter static members functions.
Additionally, remove unused ErrorHandling.h header from several files.
|
|
|
|
Implement AES Acceleration Instructions:
* xxaesencp
* xxaesdecp
* xxaesgenlkp
* xxgfmul128
|
|
This patch adds the MIR parsing and serialization support for save and
restore points with subsets of callee saved registers. That is, it
syntactically allows a function to contain two or more distinct
sub-regions in which distinct subsets of registers are spilled/filled as
callee save. This is useful if e.g. one of the CSRs isn't modified in
one of the sub-regions, but is in the other(s).
Support for actually using this capability in code generation is still
forthcoming. This patch is the next logical step for multiple
save/restore points support.
All points are now stored in DenseMap from MBB to vector of
CalleeSavedInfo.
Shrink-Wrap points split Part 4.
RFC:
https://discourse.llvm.org/t/shrink-wrap-save-restore-points-splitting/83581
Part 1: https://github.com/llvm/llvm-project/pull/117862 (landed)
Part 2: https://github.com/llvm/llvm-project/pull/119355 (landed)
Part 3: https://github.com/llvm/llvm-project/pull/119357 (landed)
Part 5: https://github.com/llvm/llvm-project/pull/119359 (likely to be
further split)
|
|
This code was already creating HandleSDNodes to handle the case where a
node gets replaced with an equivalent node. However, the code before the
handles are created also performs RAUW operations, which can end up
CSEing and deleting nodes.
Fix this issue by moving the handle creation earlier.
Fixes https://github.com/llvm/llvm-project/issues/160040.
|
|
XOR(B,C)) and ternary(A,X, OR(B,C)) (#157909)
Adds support for ternary equivalent operations of the form
- `ternary(A, X, xor(B,C))` where `X=[and(B,C)| nor(B,C)| or(B,C)| B |
C]`.
- `ternary(A, X, or(B,C))` where `X = [and(B,C)| eqv(B,C)| not(B)|
not(C)| nand(B,C)| B | C]`.
The following are the patterns involved and the imm values:
```
ternary(A, and(B,C), xor(B,C)) 97
ternary(A, B, xor(B,C)) 99
ternary(A, C, xor(B,C)) 101
ternary(A, or(B,C), xor(B,C)) 103
ternary(A, nor(B,C), xor(B,C)) 104
ternary(A, and(B,C), or(B,C)) 113
ternary(A, B, or(B,C)) 115
ternary(A, C, or(B,C)) 117
ternary(A, eqv(B,C), or(B,C)) 121
ternary(A, not(C), or(B,C)) 122
ternary(A, not(B), or(B,C)) 124
ternary(A, nand(B,C), or(B,C)) 126
```
eg. `xxeval XT, XA, XB, XC, 97`
performs the ternary operation: `XA ? and(XB, XC) : xor(XB, XC)` and
places the result in `XT`.
This is the continuation of:
- [[PowerPC] Exploit xxeval instruction for ternary patterns -
ternary(A, X,
and(B,C))](https://github.com/llvm/llvm-project/pull/141733#top)
- [[PowerPC] Exploit xxeval instruction for operations of the form
ternary(A,X,B) and
ternary(A,X,C).](https://github.com/llvm/llvm-project/pull/152956#top)
---------
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
|
|
Fixes regression after e5bbaa9c8fb6e06dbcbd39404039cc5d31df4410.
e5500 accidentally still had the 64bit feature applied instead of
64bit-support.
|
|
Reverts llvm/llvm-project#159782
The PR breaks multiple build bots and CI as well.
|
|
clean unused PPC target feature FeatureBPERMD.
|
|
|
|
The result type of the vector extend intrinsics generated by the
BUILD_VECTOR lowering code should match how they are actually defined.
Currently the result type is defaulting to the operand type there. This
can conflict with calls to the same intrinsic from other paths.
|
|
AIX has "millicode" routines, which are functions loaded at boot time
into fixed addresses in kernel memory. This allows them to be customized
for the processor. The __strlen routine is a millicode implementation;
we use millicode for the strlen function instead of a library call to
improve performance.
|
|
(#159331)
The current implementation assumes ConstantInt return values are scalar,
which is not true when use-constant-int-for-fixed-length-splat is
enabled.
|
|
Left-justified (#148873)
|
|
This was being used for 2 different purposes.
The TargetMachine constructor prepends +64bit based on isPPC64
triples as a mode switch. The same feature name was also explicitly
added to different processors, making it impossible to perform a pure
feature check for whether 64-bit mode is enabled ir not. i.e.,
checkFeatures("+64bit") would be true even for ppc32 triples.
The comment in tablegen suggests it's relevant to track which processors
support 64-bit mode independently of whether that's the active compile
target, so replace that with a new feature.
|
|
The way this was previously structured does not allow
access to the predicates inside of PPCRegisterInfo
|
|
Several code cleanup changes in code to emit decoder tables:
- Start comments on each line at a fixed column for readibility.
- Combine repeated code to decode and emit ULEB128 into a single
function.
- Add helper `getDecoderOpName` to print decoder op.
- Print Filter/CheckField/predicate index values with those opcodes.
|
|
This is a low level utility to parse the MCInstrInfo and should
not depend on the state of the function.
|
|
getPointerRegClass is a layering violation. Its primary purpose
is to determine how to interpret an MCInstrDesc's operands RegClass
fields. This should be context free, and only depend on the subtarget.
The model of this is also wrong, since this should be an
instruction / operand specific property, not a global pointer class.
Remove the the function argument to help stage removal of this hook
and avoid introducing any new obstacles to replacing it.
The remaining uses of the function were to get the subtarget, which
TargetRegisterInfo already belongs to. A few targets needed new
subtarget derived properties copied there.
|
|
Clang and other frontends generally need the LLVM data layout string in
order to generate LLVM IR modules for LLVM. MLIR clients often need it
as well, since MLIR users often lower to LLVM IR.
Before this change, the LLVM datalayout string was computed in the
LLVM${TGT}CodeGen library in the relevant TargetMachine subclass.
However, none of the logic for computing the data layout string requires
any details of code generation. Clients who want to avoid duplicating
this information were forced to link in LLVMCodeGen and all registered
targets, leading to bloated binaries. This happened in PR #145899,
which measurably increased binary size for some of our users.
By moving this information to the TargetParser library, we
can delete the duplicate datalayout strings in Clang, and retain the
ability to generate IR for unregistered targets.
This is intended to be a very mechanical LLVM-only change, but there is
an immediately obvious follow-up to clang, which will be prepared
separately.
The vast majority of data layouts are computable with two inputs: the
triple and the "ABI name". There is only one exception, NVPTX, which has
a cl::opt to enable short device pointers. I invented a "shortptr" ABI
name to pass this option through the target independent interface.
Everything else fits. Mips is a bit awkward because it uses a special
MipsABIInfo abstraction, which includes members with codegen-like
concepts like ABI physical registers that can't live in TargetParser. I
think the string logic of looking for "n32" "n64" etc is reasonable to
duplicate. We have plenty of other minor duplication to preserve
layering.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
|
|
This patch enables `-fpatchable-function-entry` on PPC64 little-endian
Linux. It is mutually exclusive with existing XRay instrumentation on
this target.
|
|
|
|
This already exists in the base class.
|
|
This will make it possible for tablegen to make subtarget
dependent decisions without adding new arguments to every
target.
---------
Co-authored-by: Sergei Barannikov <barannikov88@gmail.com>
|
|
|
|
|
|
The '-' punctuator was deprecated via:
https://github.com/llvm/llvm-project/commit/196e6f9f18933ed33eee39a1c9350ccce6b18e2c
|
|
The operand is not encoded, decoded, or printed and would break MCInst
verification if we had one.
Extracted from #156358, where the extra operand causes DecoderEmitter
to emit an error about an operand with a missing encoding.
|
|
|
|
Combine same predicate sections into one and move some mma instructions
into the proper section.
|
|
getSExtValue already returns int64_t.
|
|
Implement the set of vector uncompress instructions:
* vucmprhh
* vucmprlh
* vucmprhn
* vucmprln
* vucmprhb
* vucmprlb
|
|
Implement the set of vector uncompress instructions:
* vupkhsntob
* vupklsntob
* vupkint4tobf16
* vupkint8tobf16
* vupkint4tofp32
* vupkint8tofp32
|
|
res_bit_shift) (#154388)
This change implements a patfrag based pattern matching ~dag combiner~
that combines consecutive `VSRO (Vector Shift Right Octet)` and `VSR
(Vector Shift Right)` instructions into a single `VSRQ (Vector Shift
Right Quadword)` instruction on Power10+ processors.
Vector right shift operations like `vec_srl(vec_sro(input, byte_shift),
bit_shift)` generate two separate instructions `(VSRO + VSR)` when they
could be optimised into a single `VSRQ `instruction that performs the
equivalent operation.
```
vsr(vsro (input, vsro_byte_shift), vsr_bit_shift) to vsrq(input, vsrq_bit_shift)
where vsrq_bit_shift = (vsro_byte_shift * 8) + vsr_bit_shift
```
Note:
```
vsro : Vector Shift Right by Octet VX-form
- vsro VRT, VRA, VRB
- The contents of VSR[VRA+32] are shifted right by the number of bytes specified in bits 121:124 of VSR[VRB+32].
- Bytes shifted out of byte 15 are lost.
- Zeros are supplied to the vacated bytes on the left.
- The result is placed into VSR[VRT+32].
vsr : Vector Shift Right VX-form
- vsr VRT, VRA, VRB
- The contents of VSR[VRA+32] are shifted right by the number of bits specified in bits 125:127 of VSR[VRB+32]. 3 bits.
- Bits shifted out of bit 127 are lost.
- Zeros are supplied to the vacated bits on the left.
- The result is place into VSR[VRT+32], except if, for any byte element in VSR[VRB+32], the low-order 3 bits are not equal to the shift amount, then VSR[VRT+32] is undefined.
vsrq : Vector Shift Right Quadword VX-form
- vsrq VRT,VRA,VRB
- Let src1 be the contents of VSR[VRA+32]. Let src2 be the contents of VSR[VRB+32].
- src1 is shifted right by the number of bits specified in the low-order 7 bits of src2.
- Bits shifted out the least-significant bit are lost.
- Zeros are supplied to the vacated bits on the left.
- The result is placed into VSR[VRT+32].
```
---------
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
|
|
ternary(A,X,B) and ternary(A,X,C). (#152956)
Adds support for ternary equivalent operations of the form `ternary(A,
X, B)` and `ternary(A, X, C)` where `X=[and(B,C)| nor(B,C)| eqv(B,C)|
nand(B,C)]`.
The following are the patterns involved and the imm values:
| **Operation** | **Immediate Value** |
|----------------------------|---------------------|
| ternary(A, and(B,C), B) | 49 |
| ternary(A, nor(B,C), B) | 56 |
| ternary(A, eqv(B,C), B) | 57 |
| ternary(A, nand(B,C), B) | 62 |
| | |
| ternary(A, and(B,C), C) | 81 |
| ternary(A, nor(B,C), C) | 88 |
| ternary(A, eqv(B,C), C) | 89 |
| ternary(A, nand(B,C), C) | 94 |
eg. `xxeval XT, XA, XB, XC, 49`
- performs `XA ? and(XB, XC) : B`and places the result in `XT`.
This is the continuation of [[PowerPC] Exploit xxeval instruction for
ternary patterns - ternary(A, X,
and(B,C))](https://github.com/llvm/llvm-project/pull/141733#top).
---------
Co-authored-by: Tony Varghese <tony.varghese@ibm.com>
|
|
If a custom operand has MIOperandInfo with >= 2 sub-operands, it is
required that either the operand or its sub-operands have a decoder
method (depending on usage). Require this for single sub-operand
operands as well, since there is no good reason not to.
There are no changes in the generated files.
|
|
(#136411)
Currently, complex operands of an instruction are flattened in the resulting DAG of `InstAlias`.
This change makes it required to specify complex operands in `InstAlias` as sub-DAGs:
```
InstAlias<"foo $rd, $rs1, $rs2", (Inst RC:$rd, (ComplexOp RC:$rs1, GR0, 42), SimpleOp:$rs2)>;
```
instead of
```
InstAlias<"foo $rd, $rs1, $rs2", (Inst RC:$rd, RC:$rs1, GR0, 42, SimpleOp:$rs2)>;
```
The advantages of the new syntax are improved readability and more robust type checking, although it is a bit more verbose.
|
|
I believe it became no-op with the removal of the "positionally encoded
operands" functionality (b87dc356 is the last commit in the series).
There are no changes in the generated files.
|
|
This patch updates PPCInstrInfo::copyPhysReg to support DMR and WACC
register classes and extends the PPCVSXCopy pass to handle specific WACC
copy patterns.
|
|
This pseudo-instruction emits a local `bl` writing LR, so that must be
saved and restored for the function to return to the right place. If
not, we'll return to the inline `.long` that the `bl` stepped over.
This fixes the `SIGILL` seen in rayon-rs/rayon#1268.
|
|
Add support for PPC Dense Math builtins mma_build_dmr and
mma_disassemble_dmr builtins.
|
|
Add entries for_stack_chk_guard, __ssp_canary_word, __security_cookie,
and __guard_local. As far as I can tell these are all just different
names for the same shaped functionality on different systems.
These aren't really functions, but special global variable names. They
should probably be treated the same way; all the same contexts that
need to know about emittable function names also need to know about
this. This avoids a special case check in IRSymtab.
This isn't a complete change, there's a lot more cleanup which
should be done. The stack protector configuration system is a
complete mess. There are multiple overlapping controls, used in
3 different places. Some of the target control implementations overlap
with conditions used in the emission points, and some use correlated
but not identical conditions in different contexts.
i.e. useLoadStackGuardNode, getIRStackGuard, getSSPStackGuardCheck and
insertSSPDeclarations are all used in inconsistent ways so I don't know
if I've tracked the intention of the system correctly.
The PowerPC test change is a bug fix on linux. Previously the manual
conditions were based around !isOSOpenBSD, which is not the condition
where __stack_chk_guard are used. Now getSDagStackGuard returns the
proper global reference, resulting in LOAD_STACK_GUARD getting a
MachineMemOperand which allows scheduling.
|
|
We just replaced SmallSet<T *, N> with SmallPtrSet<T *, N>, bypassing
the redirection found in SmallSet.h. With that, we no longer need to
include SmallSet.h in many files.
|
|
(#154802)
Extract fixed functions generated by decoder emitter into a new
MCDecoder.h header.
|