aboutsummaryrefslogtreecommitdiff
path: root/target/arm/helper-mve.h
AgeCommit message (Collapse)AuthorFilesLines
2023-05-12target/arm: Move helper-{a64,mve,sme,sve}.h to tcg/Richard Henderson1-890/+0
While we cannot move the main "helper.h" out of target/arm/, due to usage by generic code, we can move the sub-includes. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-id: 20230504110412.1892411-3-richard.henderson@linaro.org Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-01target/arm: Implement MVE VRINT insnsPeter Maydell1-0/+6
Implement the MVE VRINT insns, which round floating point inputs to integer values, leaving them in floating point format. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-01target/arm: Implement MVE VCVT between single and half precisionPeter Maydell1-0/+5
Implement the MVE VCVT instruction which converts between single and half precision floating point. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-01target/arm: Implement MVE VCVT with specified rounding modePeter Maydell1-0/+5
Implement the MVE VCVT which converts from floating-point to integer using a rounding mode specified by the instruction. We implement this similarly to the Neon equivalents, by passing the required rounding mode as an extra integer parameter to the helper functions. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-01target/arm: Implement MVE VCVT between floating and fixed pointPeter Maydell1-0/+9
Implement the MVE VCVT insns which convert between floating and fixed point. As with the Neon equivalents, these use essentially the same constant encoding as right-shift-by-immediate. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-01target/arm: Implement MVE fp scalar comparisonsPeter Maydell1-0/+18
Implement the MVE fp scalar comparisons VCMP and VPT. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-01target/arm: Implement MVE fp vector comparisonsPeter Maydell1-0/+18
Implement the MVE fp vector comparisons VCMP and VPT. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-01target/arm: Implement MVE FP max/min across vectorPeter Maydell1-0/+12
Implement the MVE VMAXNMV, VMINNMV, VMAXNMAV, VMINNMAV insns. These calculate the maximum or minimum of floating point elements across a vector, starting with a value in a general purpose register and returning the result there. The pseudocode silences a possible SNaN in the accumulating result on every iteration (by calling FPConvertNaN), but we do it only on the input ra, because if none of the inputs to float*_maxnum or float*_minnum are SNaNs then the result can't be an SNaN. Note that we can't use the float*_maxnuma() etc functions we defined earlier for VMAXNMA and VMINNMA, because we mustn't take the absolute value of the starting general-purpose register value, which could be negative. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-01target/arm: Implement MVE fp-with-scalar VFMA, VFMASPeter Maydell1-0/+6
Implement the MVE fp-with-scalar VFMA and VFMAS insns. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-01target/arm: Implement MVE scalar fp insnsPeter Maydell1-0/+9
Implement the MVE scalar floating point insns VADD, VSUB and VMUL. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-09-01target/arm: Implement MVE VMAXNMA and VMINNMAPeter Maydell1-0/+6
Implement the MVE VMAXNMA and VMINNMA insns; these are 2-operand, but the destination register must be the same as one of the source registers. We defer the decode of the size in bit 28 to the individual insn patterns rather than doing it in the format, because otherwise we would have a single insn pattern that overlapped with two groups (eg VMAXNMA with the VMULH_S and VMULH_U groups). Having two insn patterns per insn seems clearer than a complex multilevel nesting of overlapping and non-overlapping groups. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-01target/arm: Implement MVE VCMUL and VCMLAPeter Maydell1-0/+18
Implement the MVE VCMUL and VCMLA insns. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-01target/arm: Implement MVE VFMA and VFMSPeter Maydell1-0/+6
Implement the MVE VFMA and VFMS insns. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-01target/arm: Implement MVE VCADDPeter Maydell1-0/+6
Implement the MVE VCADD insn. Note that here the size bit is the opposite sense to the other 2-operand fp insns. We don't check for the sz == 1 && Qd == Qm UNPREDICTABLE case, because that would mean we can't use the DO_2OP_FP macro in translate-mve.c. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-01target/arm: Implement MVE VSUB, VMUL, VABD, VMAXNM, VMINNMPeter Maydell1-0/+15
Implement more simple 2-operand floating point MVE insns. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-09-01target/arm: Implement MVE VADD (floating-point)Peter Maydell1-0/+3
Implement the MVE VADD (floating-point) insn. Handling of this is similar to the 2-operand integer insns, except that we must take care to only update the floating point exception status if the least significant bit of the predicate mask for each element is active. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-08-25target/arm: Implement MVE interleaving loads/storesPeter Maydell1-0/+48
Implement the MVE interleaving load/store functions VLD2, VLD4, VST2 and VST4. VLD2 loads 16 bytes of data from memory and writes to 2 consecutive Qregs; VLD4 loads 16 bytes of data from memory and writes to 4 consecutive Qregs. The 'pattern' field in the encoding determines the offset into memory which is accessed and also which elements in the Qregs are written to. (The intention is that a sequence of four consecutive VLD4 with different pattern values performs a complete de-interleaving load of 64 bytes into all elements of the 4 Qregs.) VST2 and VST4 do the same, but for stores. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE scatter-gather immediate formsPeter Maydell1-0/+5
Implement the MVE VLDR/VSTR insns which do scatter-gather using base addresses from Qm plus or minus an immediate offset (possibly with writeback). Note that writeback is not predicated but it does have to honour ECI state, so we have to add an eci_mask check to the VSTR_SG macros (the VLDR_SG macros already needed this to be able to distinguish "skip beat" from "set predicated element to 0"). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE scatter-gather insnsPeter Maydell1-0/+32
Implement the MVE gather-loads and scatter-stores which form the address by adding a base value from a scalar register to an offset in each element of a vector. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE VCTPPeter Maydell1-0/+2
Implement the MVE VCTP insn, which sets the VPR.P0 predicate bits so as to predicate any element at index Rn or greater is predicated. As with VPNOT, this insn itself is predicable and subject to beatwise execution. The calculation of the mask is the same as is used to determine ltpmask in mve_element_mask(), but we precalculate masklen in generated code to avoid having to have 4 helpers specialized by size. We put the decode line in with the low-overhead-loop insns in t32.decode because it's logically part of that collection of insn patterns, even though it is an MVE only insn. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE VPNOTPeter Maydell1-0/+1
Implement the MVE VPNOT insn, which inverts the bits in VPR.P0 (subject to both predication and to beatwise execution). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE VMAXA, VMINAPeter Maydell1-0/+8
Implement the MVE VMAXA and VMINA insns, which take the absolute value of the signed elements in the input vector and then accumulate the unsigned max or min into the destination vector. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE VQABS, VQNEGPeter Maydell1-0/+8
Implement the MVE 1-operand saturating operations VQABS and VQNEG. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE saturating doubling multiply accumulatesPeter Maydell1-0/+16
Implement the MVE saturating doubling multiply accumulate insns VQDMLAH, VQRDMLAH, VQDMLASH and VQRDMLASH. These perform a multiply, double, add the accumulator shifted by the element size, possibly round, saturate to twice the element size, then take the high half of the result. The *MLAH insns do vector * scalar + vector, and the *MLASH insns do vector * vector + scalar. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE VMLAPeter Maydell1-0/+4
Implement the MVE VMLA insn, which multiplies a vector by a scalar and accumulates into another vector. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE VMLADAV and VMLSLDAVPeter Maydell1-0/+17
Implement the MVE VMLADAV and VMLSLDAV insns. Like the VMLALDAV and VMLSLDAV insns already implemented, these accumulate multiplied vector elements; but they accumulate a 32-bit result rather than a 64-bit one. Note that these encodings overlap with what would be RdaHi=0b111 for VMLALDAV, VMLSLDAV, VRMLALDAVH and VRMLSLDAVH. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE narrowing movesPeter Maydell1-0/+20
Implement the MVE narrowing move insns VMOVN, VQMOVN and VQMOVUN. These take a double-width input, narrow it (possibly saturating) and store the result to either the top or bottom half of the output element. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE VABAVPeter Maydell1-0/+7
Implement the MVE VABAV insn, which computes absolute differences between elements of two vectors and accumulates the result into a general purpose register. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE integer min/max across vectorPeter Maydell1-0/+20
Implement the MVE integer min/max across vector insns VMAXV, VMINV, VMAXAV and VMINAV, which find the maximum from the vector elements and a general purpose register, and store the maximum back into the general purpose register. These insns overlap with VRMLALDAVH (they use what would be RdaHi=0b110). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE shift-by-scalarPeter Maydell1-0/+8
Implement the MVE instructions which perform shifts by a scalar. These are VSHL T2, VRSHL T2, VQSHL T1 and VQRSHL T2. They take the shift amount in a general purpose register and shift every element in the vector by that amount. Mostly we can reuse the helper functions for shift-by-immediate; we do need two new helpers for VQRSHL. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE VMLASPeter Maydell1-0/+4
Implement the MVE VMLAS insn, which multiplies a vector by a vector and adds a scalar. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE VPSELPeter Maydell1-0/+2
Implement the MVE VPSEL insn, which sets each byte of the destination vector Qd to the byte from either Qn or Qm depending on the value of the corresponding bit in VPR.P0. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE integer vector-vs-scalar comparisonsPeter Maydell1-0/+32
Implement the MVE integer vector comparison instructions that compare each element against a scalar from a general purpose register. These are "VCMP (vector)" encodings T4, T5 and T6 and "VPT (vector)" encodings T4, T5 and T6. We have to move the decodetree pattern for VPST, because it overlaps with VCMP T4 with size = 0b11. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE integer vector comparisonsPeter Maydell1-0/+32
Implement the MVE integer vector comparison instructions. These are "VCMP (vector)" encodings T1, T2 and T3, and "VPT (vector)" encodings T1, T2 and T3. These insns compare corresponding elements in each vector, and update the VPR.P0 predicate bits with the results of the comparison. VPT also sets the VPR.MASK01 and VPR.MASK23 fields -- it is effectively "VCMP then VPST". Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE incrementing/decrementing dup insnsPeter Maydell1-0/+12
Implement the MVE incrementing/decrementing dup insns VIDUP, VDDUP, VIWDUP and VDWDUP. These fill the elements of a vector with successively incrementing values, starting at the offset specified in a general purpose register. The final value of the offset is written back to this register. The wrapping variants take a second general purpose register which specifies the point where the count should wrap back to 0. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-08-25target/arm: Implement MVE VMULL (polynomial)Peter Maydell1-0/+5
Implement the MVE VMULL (polynomial) insn. Unlike Neon, this comes in two flavours: 8x8->16 and a 16x16->32. Also unlike Neon, the inputs are in either the low or the high half of each double-width element. The assembler for this insn indicates the size with "P8" or "P16", encoded into bit 28 as size = 0 or 1. We choose to follow the same encoding as VQDMULL and decode this into a->size as MO_16 or MO_32 indicating the size of the result elements. This then carries through to the helper function names where it then matches up with the existing pmull_h() which does an 8x8->16 operation and a new pmull_w() which does the 16x16->32. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2021-07-02target/arm: Implement MVE shifts by registerPeter Maydell1-0/+2
Implement the MVE shifts by register, which perform shifts on a single general-purpose register. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-19-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE shifts by immediatePeter Maydell1-0/+3
Implement the MVE shifts by immediate, which perform shifts on a single general-purpose register. These patterns overlap with the long-shift-by-immediates, so we have to rearrange the grouping a little here. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-18-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE long shifts by registerPeter Maydell1-0/+6
Implement the MVE long shifts by register, which perform shifts on a pair of general-purpose registers treated as a 64-bit quantity, with the shift count in another general-purpose register, which might be either positive or negative. Like the long-shifts-by-immediate, these encodings sit in the space that was previously the UNPREDICTABLE MOVS/ORRS with Rm==13,15. Because LSLL_rr and ASRL_rr overlap with both MOV_rxri/ORR_rrri and also with CSEL (as one of the previously-UNPREDICTABLE Rm==13 cases), we have to move the CSEL pattern into the same decodetree group. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-17-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE long shifts by immediatePeter Maydell1-0/+3
The MVE extension to v8.1M includes some new shift instructions which sit entirely within the non-coprocessor part of the encoding space and which operate only on general-purpose registers. They take up the space which was previously UNPREDICTABLE MOVS and ORRS encodings with Rm == 13 or 15. Implement the long shifts by immediate, which perform shifts on a pair of general-purpose registers treated as a 64-bit quantity, with an immediate shift count between 1 and 32. Awkwardly, because the MOVS and ORRS trans functions do not UNDEF for the Rm==13,15 case, we need to explicitly emit code to UNDEF for the cases where v8.1M now requires that. (Trying to change MOVS and ORRS is too difficult, because the functions that generate the code are shared between a dozen different kinds of arithmetic or logical instruction for all A32, T16 and T32 encodings, and for some insns and some encodings Rm==13,15 are valid.) We make the helper functions we need for UQSHLL and SQSHLL take a 32-bit value which the helper casts to int8_t because we'll need these helpers also for the shift-by-register insns, where the shift count might be < 0 or > 32. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-16-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE VADDLVPeter Maydell1-0/+3
Implement the MVE VADDLV insn; this is similar to VADDV, except that it accumulates 32-bit elements into a 64-bit accumulator stored in a pair of general-purpose registers. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-15-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE VSHLCPeter Maydell1-0/+2
Implement the MVE VSHLC insn, which performs a shift left of the entire vector with carry in bits provided from a general purpose register and carry out bits written back to that register. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-14-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE saturating narrowing shiftsPeter Maydell1-0/+30
Implement the MVE saturating shift-right-and-narrow insns VQSHRN, VQSHRUN, VQRSHRN and VQRSHRUN. do_srshr() is borrowed from sve_helper.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-13-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE VSHRN, VRSHRNPeter Maydell1-0/+10
Implement the MVE shift-right-and-narrow insn VSHRN and VRSHRN. do_urshr() is borrowed from sve_helper.c. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-12-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE VSRI, VSLIPeter Maydell1-0/+8
Implement the MVE VSRI and VSLI insns, which perform a shift-and-insert operation. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-11-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE VSHLLPeter Maydell1-0/+9
Implement the MVE VHLL (vector shift left long) insn. This has two encodings: the T1 encoding is the usual shift-by-immediate format, and the T2 encoding is a special case where the shift count is always equal to the element size. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-10-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE vector shift right by immediate insnsPeter Maydell1-0/+12
Implement the MVE vector shift right by immediate insns VSHRI and VRSHRI. As with Neon, we implement these by using helper functions which perform left shifts but allow negative shift counts to indicate right shifts. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-9-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE vector shift left by immediate insnsPeter Maydell1-0/+16
Implement the MVE shift-vector-left-by-immediate insns VSHL, VQSHL and VQSHLU. The size-and-immediate encoding here is the same as Neon, and we handle it the same way neon-dp.decode does. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-8-peter.maydell@linaro.org
2021-07-02target/arm: Implement MVE logical immediate insnsPeter Maydell1-0/+4
Implement the MVE logical-immediate insns (VMOV, VMVN, VORR and VBIC). These have essentially the same encoding as their Neon equivalents, and we implement the decode in the same way. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210628135835.6690-7-peter.maydell@linaro.org
2021-06-24target/arm: Implement MVE VADDVPeter Maydell1-0/+7
Implement the MVE VADDV insn, which performs an addition across vector lanes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20210617121628.20116-44-peter.maydell@linaro.org