aboutsummaryrefslogtreecommitdiff
path: root/opcodes/aarch64-tbl.h
AgeCommit message (Collapse)AuthorFilesLines
2024-07-18opcodes: aarch64: denote subclasses for insns of iclass dp_2srcIndu Bhagat1-24/+24
For detecting irg, add a subclass to identify it in the set of instructions of iclass dp_2src. opcodes/ * aarch64-tbl.h: Add subclass flag F_DP_TAG_ONLY for irg insn.
2024-07-18opcodes: aarch64: add flags to denote subclasses of uncond branchesIndu Bhagat1-19/+19
Use the two new subclass flags: F_BRANCH_CALL, F_BRANCH_RET, to indicate call to and return from subroutine respectively. opcodes/ * aarch64-tbl.h: Use the new F_BRANCH_* flags.
2024-07-18opcodes: aarch64: add flags to denote subclasses of arithmetic insnsIndu Bhagat1-15/+15
Use the three new subclass flags: F_ARITH_ADD, F_ARITH_SUB, F_ARITH_MOV, to indicate add, sub and mov ops respectively. These flags for subclasses will later be used for SCFI purposes to create appropriate ginsns. At this time, only those iclasses relevant to SCFI have the new subclass flags specified. For addg and subg insns, F_SUBCLASS_OTHER is more suitable because these operations do more than just simple add or sub. opcodes/ * aarch64-tbl.h: Use the new F_ARITH_* flags.
2024-07-18opcodes: aarch64: add flags to denote subclasses of ldst insnsIndu Bhagat1-43/+43
The existing iclass information tells us the general shape and purpose of the instructions. In some cases, however, we need to further disect the iclass on the basis of other finer-grain information. E.g., for the purpose of SCFI, we need to know whether a given insn with iclass of ldst_* is a load or a store. At the moment, specify subclasses for only those iclasses relevant to SCFI: ldst_imm9, ldst_pos, ldstpair_indexed, ldstpair_off and ldstnapair_offs. Some insns are best tagged with F_SUBCLASS_OTHER rather than F_LDST_LOAD or F_LDST_STORE: - stg* ops (as they store tag only), - prfm, - ldpsw, ldrsw (32-bit loads with signed extended value. Not useful for restore operations in context of SCFI.) - Use F_SUBCLASS_OTHER for all QL_LDST_R8 and QL_LDST_R16 operands. Also use F_SUBLASS_OTHER for strb/ldrb, strh/ldrh opcodes. These are not full loads and stores and cannot be allowed for register save / restore for the purpose of SCFI. opcodes/ * aarch64-tbl.h: Use the new F_LDST_* flags.
2024-07-12aarch64: Add support for sme2.1 zero instructions.Srinath Parvathaneni1-0/+18
This patch adds support for following sme2.1 zero instructions and the spec is available here [1]. 1. ZERO (single-vector). 2. ZERO (double-vector). 3. ZERO (quad-vector). The VECTOR GROUP symbols VGx2 and VGx4 are optional for the assembler for most of the sme and sve instructions. But for few of the sme2.1 zero instruction variants VECTOR GROUP symbols VGx2 and VGx4 are mandatory. To address this a bit "F_VG_REQ" is introduced in this patch, on setting F_VG_REQ bit in flags of aarch64_opcode forces the assembler to accept instruction operand only having VECTOR GROUP symbols. [1]: https://developer.arm.com/documentation/ddi0602/2024-03/SME-Instructions?lang=en
2024-07-12aarch64: Add support for sme2.1 movaz instructions.Srinath Parvathaneni1-0/+19
This patch adds support for following sme2.1 movaz instructions and the spec is available here [1]. 1. MOVAZ (array to vector, two registers). 2. MOVAZ (array to vector, four registers). 3. MOVAZ (tile to vector, single). [1]: https://developer.arm.com/documentation/ddi0602/2024-03/SME-Instructions?lang=en
2024-07-12aarch64: Add support for sme2.1 luti2 and luti4 instructions.Srinath Parvathaneni1-0/+10
This patch adds support for following sme2.1 luti2 and luti4 instructions, spec is available here [1] 1. LUTI2 (two registers) strided. 2. LUTI2 (four registers) strided. 3. LUTI4 (two registers) strided. 4. LUTI4 (four registers) strided. [1]: https://developer.arm.com/documentation/ddi0602/2024-03/SME-Instructions?lang=en
2024-07-08aarch64: Add support for sve2p1 pmov instruction.srinath1-3/+51
This patch adds support for followign SVE2p1 instruction, spec is available here [1]. 1. PMOV (to vector) 2. PMOV (to predicate) Both pmov (to vector) and pmov (to predicate) have destination scalable vector register and source scalable vector register respectively as an operand with no suffix and optional index. To handle this case we have added 8 new operands in this patch. AARCH64_OPND_SVE_Zn0_INDEX, /* Zn[index], bits [9:5]. */ AARCH64_OPND_SVE_Zn1_17_INDEX, /* Zn[index], bits [9:5,17]. */ AARCH64_OPND_SVE_Zn2_18_INDEX, /* Zn[index], bits [9:5,18:17]. */ AARCH64_OPND_SVE_Zn3_22_INDEX, /* Zn[index], bits [9:5,18:17,22]. */ AARCH64_OPND_SVE_Zd0_INDEX, /* Zn[index], bits [4:0]. */ AARCH64_OPND_SVE_Zd1_17_INDEX, /* Zn[index], bits [4:0,17]. */ AARCH64_OPND_SVE_Zd2_18_INDEX, /* Zn[index], bits [4:0,18:17]. */ AARCH64_OPND_SVE_Zd3_22_INDEX, /* Zn[index], bits [4:0,18:17,22]. */ Since the index of the <Zd> operand is optional, the index part is dropped in disassembly in both the cases of "no index" or "zero index". As per spec: PMOV <Zd>{[<imm>]}, <Pn>.D PMOV <Pn>.D, <Zd>{[<imm>]} Example1: Assembly: pmov z5[0], p6.d Disassembly: pmov z5, p6.d Assembly: pmov z5, p6.d Disassembly: pmov z5, p6.d Example2: Assembly: pmov p4.b, z5[0] Disassembly: pmov p4.b, z5 Assembly: pmov p4.b, z5 Disassembly: pmov p4.b, z5 [1]: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions?lang=en
2024-07-08aarch64: Add support for sve2p1 tbxq instruction.Srinath Parvathaneni1-0/+1
This patch adds support for SVE2p1 "tbxq" instruction, spec is available here [1]. [1]: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions?lang=en
2024-07-08aarch64: Add support for sve2p1 zipq[1-2] instructions.Srinath Parvathaneni1-0/+2
This patch adds support for SVE2p1 "zipq1" and "zipq2" instructions, spec is available here [1]. [1]: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions?lang=en
2024-07-08aarch64: Add support for sve2p1 uzpq[1-2] instructions.Srinath Parvathaneni1-0/+2
This patch adds support for SVE2p1 "uzpq1" and "uzpq2" instructions, spec is available here [1] [1]: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions?lang=en
2024-07-08aarch64: Add support for sve2p1 tblq instruction.Srinath Parvathaneni1-0/+1
This patch adds support for SVE2p1 "tblq" instruction, spec is available here [1]. [1]: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions?lang=en
2024-07-08aarch64: Add support for sve2p1 orqv instruction.Srinath Parvathaneni1-0/+1
This patch adds support for SVE2p1 "orqv" instruction, spec available here [1]. [1]: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions?lang=en
2024-06-26aarch64: FP8 scale and convert - Implement minor improvementsVictor Do Nascimento1-12/+12
Following feedback received shortly after the initial commit of the aarch64 instructions for scaling and converting fp8 instructions, this patch addresses the issues raised in the relevant feedback. This includes the following changes: * Standardize all FP8 qualifier-set names. This has resulted in the renaming of QL_V2FP8B8H to QL_V2_HB_LOWER and, likewise, QL_V28H16B to QL_V2_HB_FULL. * Update `FP8_INSN' aarch64_opcode_table[] entries to reflect the new standardized qualifier-set names mentioned above and, in the case of the "fcvtn" entries, also add a leading 0 to their opcode values so they are given as 8 hexadecimal digits in length to ensure consistency in formatting relative to other entries in the table. * Revise the added test-cases so that when checking operand fields in the disassembled binaries, all bits for these fields get tested to ensure they can be toggled on/off by the relevant operand arguments.
2024-06-25aarch64: Fix FEAT_B16B16 sve2 instruction constraints.Srinath Parvathaneni1-23/+23
This patch adds missing contraints to FEAT_B16B16 sve2 instructions bfclamp, bfmla and bfmls and add negative tests for all the bfloat instructions. The bfloat16-invalid.* testcases are renamed to bfloat16-1-invalid.* to maintain consistency in the testsuite. The bfloat16-1-invalid.* tests are modified so that "selected processor does not support" is generated by the assembler, since +b16b16 is not passed in the command line. The bfloat16-2-invalid.* testcase includes the wrong operands bfloat16 tests.
2024-06-25arch64: Fix the wrong constraint used for sve2p1 instructions.Srinath Parvathaneni1-13/+12
The current implementation for the following SVE2p1 instructions add a constraint in aarch64_opcode_table[] array, so that these instruction might be immediately preceded in program order by a MOVPRFX instruction. As per the spec these instruction does not immediately preceded in program order by a MOVPRFX instruction and to fix this issue, SVE2p1_INSNC macro is replaced with SVE2p1_INSN macro for the entries of these instructions in aarch64_opcode_table[] array. List of instructions updated: addqv, andqv, smaxqv, sminqv, umaxqv, uminqv, eorqv, faddqv, fmaxnmqv, fmaxqv, fminnmqv and fminqv.
2024-06-25aarch64: Fix sve2p1 ld[1-4]/st[1-4]q instruction operands.Srinath Parvathaneni1-26/+22
This patch fixes encoding and syntax for sve2p1 instructions ld[1-4]q/st[1-4]q as mentioned below, for the issues reported here. https://sourceware.org/pipermail/binutils/2024-February/132408.html 1) Previously all the ld[1-4]q/st[1-4]q instructions are wrongly added as predicated instructions and this issue is fixed in this patch by replacing "SVE2p1_INSNC" with "SVE2p1_INSN" macro. 2) Wrong first operand in all the ld[1-4]q/st[1-4]q instructions is fixed by replacing "SVE_Zt" with "SVE_ZtxN". 3) Wrong operand qualifiers in ld1q and st1q instructions are also fixed in this patch. 4) In ld1q/st1q the index in the second argument is optional and if index is xzr and is skipped in the assembly, the index field is ignored by the disassembler. Fixing above mentioned issues helps with following: 1) ld1q and st1q first register operand accepts enclosed figure braces. 2) ld2q, ld3q, ld4q, st2q, st3q, and st4q instructions accepts wrapping sequence of vector registers. For the instructions ld[2-4]q/st[2-4]q, tests for wrapping sequence of vector registers are added along with short-form of operands for non-wrapping sequence. I have added test using following logic: ld2q {Z0.Q, Z1.Q}, p0/Z, [x0, #0, MUL VL] //raw insn encoding (all zeroes) ld2q {Z31.Q, Z0.Q}, p0/Z, [x0, #0, MUL VL] // encoding of <Zt1> ld2q {Z0.Q, Z1.Q}, p7/Z, [x0, #0, MUL VL] // encoding of <Pg> ld2q {Z0.Q, Z1.Q}, p0/Z, [x30, #0, MUL VL] // encoding of <Xm> ld2q {Z0.Q, Z1.Q}, p0/Z, [x0, #-16, MUL VL] // encoding of <imm> (low value) ld2q {Z0.Q, Z1.Q}, p0/Z, [x0, #14, MUL VL] // encoding of <imm> (high value) ld2q {Z31.Q, Z0.Q}, p7/Z, [x30, #-16, MUL VL] // encoding of all fields (all ones) ld2q {Z30.Q, Z31.Q}, p1/Z, [x3, #-2, MUL VL] // random encoding. For all the above form of instructions the hyphenated form is preferred for disassembly if there are more than two registers in the list, and the register numbers are monotonically increasing in increments of one.
2024-06-25aarch64: Fix sve2p1 extq instruction operands.Srinath Parvathaneni1-4/+3
This patch fixes the syntax of sve2p1 "extq" instruction by modifying the operands count to 4. A new operand AARCH64_OPND_SVE_UIMM4 is defined to handle the 4th argument an 4-bit unsigned immediate of extq instruction. The instruction encoding is updated to use constraint C_SCAN_MOVPRFX, to enable "extq" instruction to immediately precede in program order by a MOVPRFX instruction. Also removed the unused operand AARCH64_OPND_SVE_Zm_imm4. This issues was reported here: https://sourceware.org/pipermail/binutils/2024-February/132408.html
2024-06-25aarch64: Fix sve2p1 dupq instruction operands.Srinath Parvathaneni1-5/+6
This patch fixes the syntax of sve2p1 "dupq" instruction by modifying the way 2nd operand does the encoding and decoding using the [<imm>] value. dupq makes use of already existing aarch64_ins_sve_index and aarch64_ext_sve_index inserter and extractor functions. The definitions of aarch64_ins_sve_index_imm (inserter) and aarch64_ext_sve_index_imm (extractor) is removed in this patch. This issues was reported here: https://sourceware.org/pipermail/binutils/2024-February/132408.html
2024-06-24aarch64: Add SME FP8 multiplication instructionsAndrew Carlotti1-0/+74
This includes: - FEAT_SME_F8F32 (+sme-f8f32) - FEAT_SME_F8F16 (+sme-f8f16) The FP16 addition/subtraction instructions originally added by FEAT_SME_F16F16 haven't been added to Binutils yet. They are also required to be enabled if FEAT_SME_F8F16 is present, so they are included in this patch.
2024-06-24aarch64: Add FP8 Neon and SVE multiplication instructionsAndrew Carlotti1-0/+112
This includes all the instructions under the following features: - FEAT_FP8FMA (+fp8fma) - FEAT_FP8DOT4 (+fp8dot4) - FEAT_FP8DOT2 (+fp8dot2) - FEAT_SSVE_FP8FMA (+ssve-fp8fma) - FEAT_SSVE_FP8DOT4 (+ssve-fp8dot4) - FEAT_SSVE_FP8DOT2 (+ssve-fp8dot2)
2024-06-24gas, aarch64: Add SME2 lutv2 extensionsaurabh.jha@arm.com1-0/+33
Introduces instructions for the SME2 lutv2 extension for AArch64. They are documented in the following document: * ARM DDI0602 For both luti4 instructions, we introduced an operand called SME_Znx2_BIT_INDEX. We use the existing function parse_vector_reg_list for parsing but modified that function so that it can accept operands without qualifiers and rejects instructions that have operands with qualifiers but are not supposed to have operands with qualifiers. For disassembly, we modified print_register_list so that it could accept register lists without qualifiers. For one luti4 instruction, we introduced a SME_Zdnx4_STRIDED. It is similar to SME_Ztx4_STRIDED and we could use existing code for parsing, encoding, and disassembly. For movt instruction, we introduced an operand called SME_ZT0_INDEX2_12. This is a ZT0 register with a bit index encoded in [13:12]. It is similar to SME_ZT0_INDEX. We also introduced an iclass named sme_size_12_b so that we can encode size bits [13:12] correctly when only 'b' is allowed as qualifier.
2024-06-12aarch64: add Branch Record Buffer extension instructionsClaudio Bantaloukas1-0/+15
The FEAT_BRBE extension provides two aliases of sys: - brb iall (Invalidates all Branch records in the Branch Record Buffer) - brb inj (Injects the Branch Record held in BRBINFINJ_EL1, BRBSRCINJ_EL1, and BRBTGTINJ_EL1 into the Branch Record Buffer) This patch adds: - the feature option "brbe" that must be added for the aliases to be available - a new operand flag AARCH64_OPND_Rt_IN_SYS_ALIASES that warns in a comment when Rt is set to the non default value 0b11111 (it is constrained unpredictable whether the instruction is undefined or behaves as if the Rt field is set to 0b11111). - a new operand flag AARCH64_OPND_BRBOP that encodes and decodes Op2 values from bit 5 - support for the two brb aliases above See: - https://developer.arm.com/documentation/ddi0602/2024-03/Base-Instructions/BRB--Branch-Record-Buffer--an-alias-of-SYS-?lang=en - https://developer.arm.com/documentation/ddi0601/2024-03/AArch64-Instructions/BRB-INJ--Branch-Record-Injection-into-the-Branch-Record-Buffer?lang=en - https://developer.arm.com/documentation/ddi0601/2024-03/AArch64-Instructions/BRB-IALL--Invalidate-the-Branch-Record-Buffer?lang=en
2024-05-28gas, aarch64: Add SVE2 lut extensionsaurabh.jha@arm.com1-1/+35
Introduces instructions for the SVE2 lut extension for AArch64. They are documented in the following links: * luti2: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/LUTI2--Lookup-table-read-with-2-bit-indices-?lang=en * luti4: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/LUTI4--Lookup-table-read-with-4-bit-indices-?lang=en These instructions use new SVE2 vector operands. They are called SVE_Zm1_23_INDEX, SVE_Zm2_22_INDEX, and Zm3_12_INDEX and they have 1 bit, 2 bit, and 3 bit indices respectively. The lsb and width of these new operands are the same as many existing operands but the convention is to give different names to fields that serve different purpose so we introduced new fields in aarch64-opc.c and aarch64-opc.h. We made a design choice for the second operand of the halfword variant of luti4 with two register tables. We could have either defined a new operand, like SVE_Znx2, or we could have use the existing operand SVE_ZnxN. With the new operand, we would need to implement constraints on register lists based on either operand or opcode flag. With existing operand, we could just existing constraint checks using opcode flag. We chose the second approach and went with SVE_ZnxN and added opcode flag to enforce lengths of vector register list operands. This way, we can reuse the existing constraint check logic.
2024-05-28gas, aarch64: Add AdvSIMD lut extensionsaurabh.jha@arm.com1-1/+37
Introduces instructions for the Advanced SIMD lut extension for AArch64. They are documented in the following links: * luti2: https://developer.arm.com/documentation/ddi0602/2024-03/SIMD-FP-Instructions/LUTI2--Lookup-table-read-with-2-bit-indices-?lang=en * luti4: https://developer.arm.com/documentation/ddi0602/2024-03/SIMD-FP-Instructions/LUTI4--Lookup-table-read-with-4-bit-indices-?lang=en These instructions needed definition of some new operands. We will first discuss operands for the third operand of the instructions and then discuss a vector register list operand needed for the second operand. The third operands are vectors with bit indices and without type qualifiers. They are called Em_INDEX1_14, Em_INDEX2_13, and Em_INDEX3_12 and they have 1 bit, 2 bit, and 3 bit indices respectively. For these new operands, we defined new parsing case branch. The lsb and width of these operands are the same as many existing but the convention is to give different names to fields that serve different purpose so we introduced new fields in aarch64-opc.c and aarch64-opc.h for these new operands. For the second operand of these instructions, we introduced a new operand called LVn_LUT. This represents a vector register list with stride 1. We defined new inserter and extractor for this new operand and it is encoded in FLD_Rn. We are enforcing the number of registers in the reglist using opcode flag rather than operand flag as this is what other SIMD vector register list operands are doing. The disassembly also uses opcode flag to print the correct number of registers.
2024-05-17aarch64: correct SVE2.1 ld2q (scalar plus scalar)Jan Beulich1-1/+1
It's opcode was wrong, as was e.g. easily visible from the inappropriate testcase expectation.
2024-05-17aarch64: correct SVE2.1 ld{3,4}q / st{3,4}q (scalar plus immediate)Jan Beulich1-4/+4
Like their byte, half, word, and doubleword counterparts their immediates are multiples of 3 / 4 respectively.
2024-05-16aarch64: fp8 convert and scale - add sme2 insn variantsVictor Do Nascimento1-0/+16
Add the SME2 variant of the FP8 convert and scale instructions, enabled at assembly-time using the `+sme2+fp8' architectural extension flag. More specifically, support is added for the following instructions: Multi-vector floating-point convert from FP8 to BFloat16 (in-order): ----------------------------------------------- - bf1cvt { <Zd1>.H-<Zd2>.H }, <Zn>.B - bf2cvt { <Zd1>.H-<Zd2>.H }, <Zn>.B Multi-vector floating-point convert from FP8 to deinterleaved BFloat16: ----------------------------------------------- - bf1cvtl { <Zd1>.H-<Zd2>.H }, <Zn>.B - bf2cvtl { <Zd1>.H-<Zd2>.H }, <Zn>.B Multi-vector floating-point convert from BFloat16 to packed FP8 format: ------------------------------------------------- - bfcvt <Zd>.B, { <Zn1>.H-<Zn2>.H } Multi-vector floating-point convert from FP8 to half-precision (in-order): ----------------------------------------------- - f1cvt { <Zd1>.H-<Zd2>.H }, <Zn>.B - f2cvt { <Zd1>.H-<Zd2>.H }, <Zn>.B Multi-vector floating-point convert from FP8 to deinterleaved half-precision: ----------------------------------------------- - f1cvtl { <Zd1>.H-<Zd2>.H }, <Zn>.B - f2cvtl { <Zd1>.H-<Zd2>.H }, <Zn>.B Multi-vector floating-point convert from half-precision to packed FP8 format: ------------------------------------------------------- fcvt_2h Multi-vector floating-point convert from single-precision to packed FP8 format: --------------------------------------------------------- fcvt_4s Multi-vector floating-point convert from single-precision to interleaved FP8 format: --------------------------------------------------------- - fcvtn <Zd>.B, { <Zn1>.S-<Zn4>.S } Multi-vector floating-point adjust exponent by vector: ------------------------------------------------------ - fscale { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, <Zm>.H - fscale { <Zdn1>.S-<Zdn2>.S }, { <Zdn1>.S-<Zdn2>.S }, <Zm>.S - fscale { <Zdn1>.D-<Zdn2>.D }, { <Zdn1>.D-<Zdn2>.D }, <Zm>.D Multi-vector floating-point adjust exponent: -------------------------------------------- - fscale { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, { <Zm1>.H - <Zm2>.H } - fscale { <Zdn1>.S-<Zdn2>.S }, { <Zdn1>.S-<Zdn2>.S }, { <Zm1>.S - <Zm2>.S } - fscale { <Zdn1>.D-<Zdn2>.D }, { <Zdn1>.D-<Zdn2>.D }, { <Zm1>.D - <Zm2>.D }
2024-05-16aarch64: fp8 convert and scale - add sve2 insn variantsVictor Do Nascimento1-0/+20
Add the SVE2 variant of the FP8 convert and scale instructions, enabled at assembly-time using the `+sve2+fp8' architectural extension flag. More specifically, support is added for the following instructions: FP8 convert to BFloat16 (bottom/top): ------------------------------------- - bf1cvt Z<d>.H, Z<n>.B - bf2cvt Z<d>.H, Z<n>.B - bf1cvtlt Z<d>.H, Z<n>.B - bf2cvtlt Z<d>.H, Z<n>.B FP8 convert to half-precision (bottom/top): ------------------------------------------- - f1cvt Z<d>.H, Z<n>.B - f2cvt Z<d>.H, Z<n>.B - f1cvtlt Z<d>.H, Z<n>.B - f2cvtlt Z<d>.H, Z<n>.B BFloat16/half-precision convert, narrow and interleave to FP8: ------------------------------------------- - bfcvtn Z<d>.B, { Z<n>1.H - Z<n>2.H } - fcvtn Z<d>.B, { Z<n>1.H - Z<n>2.H } Single-precision convert, narrow and interleave to FP8 (bottom/top): ----------------------------------------------- - fcvtnb Z<d>.B, { Z<n>1.S - Z<n>2.S } - fcvtnt Z<d>.B, { Z<n>1.S - Z<n>2.S }
2024-05-16aarch64: fp8 convert and scale - Add advsimd insn variantsVictor Do Nascimento1-0/+41
Add the advanced SIMD variant of the FP8 convert and scale instructions, enabled at assembly-time using the `+fp8' architectural extension flag. More specifically, support is added for the following instructions: FP8 convert to BFloat16 (vector): --------------------------------- - bf1cvtl V<d>.8H, V<n>.8B - bf2cvtl V<d>.8H, V<n>.8B - bf1cvtl2 V<d>.8H, V<n>.16B - bf2cvtl2 V<d>.8H, V<n>.16B FP8 convert to half-precision (vector): --------------------------------------- - f1cvtl V<d>.8H, V<n>.8B - f2cvtl V<d>.8H, V<n>.8B - f1cvtl2 V<d>.8H, V<n>.16B - f2cvtl2 V<d>.8H, V<n>.16B Single-precision to FP8 convert and narrow (vector): ---------------------------------------------------- - fcvtn V<d>.8B, V<n>.4S, V<m>.4S - fcvtn2 V<d>.16B, V<n>.4S, V<m>.4S Half-precision to FP8 convert and narrow (vector): -------------------------------------------------- - fcvtn V<d>.8B, V<n>.4H, V<m>.4H - fcvtn V<d>.16B, V<n>.8H, V<m>.8H Floating-point adjust exponent by vector: ----------------------------------------- - fscale V<d>.4H, V<n>.4H, V<m>.4H - fscale V<d>.8H, V<n>.8H, V<m>.8H - fscale V<d>.2S, V<n>.2S, V<m>.2S - fscale V<d>.4S, V<n>.4S, V<m>.4S - fscale V<d>.2d, V<n>.2d, V<m>.2d
2024-05-16aarch64: fp8 convert and scale - add feature flags and related structuresVictor Do Nascimento1-0/+18
2024-04-03Arm64: check tied operand specifier in aarch64-genJan Beulich1-1/+1
Make sure that field actually matches the specified operands. Don't follow existing F_PSEUDO checking in using assertions, though. Print meaningful error messages, thus - while not having a line number available - at least providing some indication of where things are wrong. Fix SVE2.1's extq accordingly, but don't extend the testsuite there: There are further issues with its operands (SVE_Zm_imm4 doesn't look to be correct to use there, as that describes an indexed vector register, while here a separate vector register and immediate operand are to be specified).
2024-03-19gas, aarch64: Add faminmax extensionSaurabh Jha1-0/+30
2024-03-18aarch64: Add support for SVE ADDPT, SUBPT, MADPT, MLAPT instructionsYury Khrustalev1-0/+18
The following instructions are added in this patch: - ADDPT (predicated): Add checked pointer vectors (predicated). - ADDPT (unpredicated): Add checked pointer vectors (unpredicated). - SUBPT (predicated): Subtract checked pointer vectors (predicated). - SUBPT (unpredicated): Subtract checked pointer vectors (unpredicated). - MADPT: Multiply-add checked pointer vectors, writing multiplicand - MLAPT: Multiply-add checked pointer vectors, writing addend These instructions are part of Checked Pointer Arithmetic extension and are enabled when both CPA and SVE are enabled. To achieve this, both flag "+sve" and "+cpa" should be active. This patch adds assembler and disassembler support for these instructions with relevant checks. Tests are included as well. Regression tested on the aarch64-none-linux-gnu target and no regressions have been found.
2024-03-18aarch64: Add support for (M)ADDPT and (M)SUBPT instructionsYury Khrustalev1-1/+19
The following instructions are added in this patch: - ADDPT and SUBPT - Add/Subtract checked pointer - MADDPT and MSUBPT - Multiply Add/Subtract checked pointer These instructions are part of Checked Pointer Arithmetic extension. This patch adds assembler and disassembler support for these instructions with relevant checks. Tests are included as well. A new flag "+cpa" added to documentation. This flag enables CPA extension. Regression tested on the aarch64-none-linux-gnu target and no regressions have been found.
2024-03-18Arm64: check matching operands for predicated B16B16 insnsJan Beulich1-7/+7
Except for bfml{a,s} their 1st and 3rd operands need to match - pass the TIED macro argument accordingly. While doing that also slightly re-arrange table entries, such that all predicated insns are close together. At the same time change the existing test source to actually use non- matching operands for the respective bfml{a,s} forms.
2024-03-18Arm64: correct B16B16 indexed bf{mla,mls,mul}Jan Beulich1-3/+3
Their index is in bits 19, 20, and 22. Bit 11 in particular is already set in the base opcode. Note also how disassembler output didn't match assembler input in the respective testcase.
2024-02-29aarch64: Fix the 2nd operand in gcsstr and gcssttr instructions.Srinath Parvathaneni1-2/+2
The assembler wrongly expects plain register name instead of memory-form 2nd operand for gcsstr and gcssttr instructions. This patch fixes the issue.
2024-02-27aarch64: rename internals related to PAuth feature to use pauth in their ↵Matthieu Longo1-38/+38
naming for coherency Hi, Commits af1bd77 and 3f4ff08 introduced the Pointer Authentication feature with internal names that don't match the actual feature name pauth. The new feature PAuth_LR introduced in Armv9.5-A is an extension of the PAuth feature of Armv8.3-A. Using a different naming for it not based on the formerly "PAC" would create confusion. Regression tested on aarch64-none-elf, and no regression found. Ok for binutils-master? I don't have commit access so I need someone to commit on my behalf. Regards, Matthieu. From 58b38358b2788939d81f2df7f5fb4c64a31ae06e Mon Sep 17 00:00:00 2001 From: Matthieu Longo <matthieu.longo@arm.com> Date: Fri, 23 Feb 2024 11:30:40 +0000 Subject: [PATCH] aarch64: rename internals related to PAuth feature to use pauth in their naming for coherency Commits af1bd77 and 3f4ff08 introduced the Pointer Authentication feature with internal names that don't match the actual feature name pauth. The new feature PAuth_LR introduced in Armv9.5-A is an extension of the PAuth feature of Armv8.3-A. Using a different naming for it not based on the formerly "PAC" would create confusion.
2024-01-26aarch64: move SHA512 instructions to +sha3Andrew Carlotti1-5/+5
SHA512 instructions were added to the architecture at the same time as SHA3 instructions, but later than the SHA1 and SHA256 instructions. Furthermore, implementations must support either both or neither of the SHA512 and SHA3 instruction sets. However, SHA512 instructions were originally (and incorrectly) added to Binutils under the +sha2 flag. This patch moves SHA512 instructions under the +sha3 flag, which matches the architecture constraints and existing GCC and LLVM behaviour.
2024-01-15aarch64: rcpc3: Add FP load/store insnsVictor Do Nascimento1-0/+4
Along with the relevant unit-tests, this adds the following rcpc3 instructions: STL1 { <Vt>.D }[<index>], [<Xn|SP>] LDAP1 { <Vt>.D }[<index>], [<Xn|SP>] LDAPUR <Bt>, [<Xn|SP>{, #<simm>}] LDAPUR <Ht>, [<Xn|SP>{, #<simm>}] LDAPUR <St>, [<Xn|SP>{, #<simm>}] LDAPUR <Dt>, [<Xn|SP>{, #<simm>}] LDAPUR <Qt>, [<Xn|SP>{, #<simm>}] STLUR <Bt>, [<Xn|SP>{, #<simm>}] STLUR <Ht>, [<Xn|SP>{, #<simm>}] STLUR <St>, [<Xn|SP>{, #<simm>}] STLUR <Dt>, [<Xn|SP>{, #<simm>}] STLUR <Qt>, [<Xn|SP>{, #<simm>}] with `#<simm>' taking on a signed 8-bit integer value in the range [-256,255] and `index' the values 0 or 1. Co-authored-by: Srinath Parvathaneni <srinath.parvathaneni@arm.com>
2024-01-15aarch64: rcpc3: Add integer load/store insnsVictor Do Nascimento1-0/+5
Along with the relevant unit tests and updates to the existing regression tests, this adds support for the following novel rcpc3 insns: LDIAPP <Wt1>, <Wt2>, [<Xn|SP>] LDIAPP <Wt1>, <Wt2>, [<Xn|SP>], #8 LDIAPP <Xt1>, <Xt2>, [<Xn|SP>] LDIAPP <Xt1>, <Xt2>, [<Xn|SP>], #16 STILP <Wt1>, <Wt2>, [<Xn|SP>] STILP <Wt1>, <Wt2>, [<Xn|SP>, #-8]! STILP <Xt1>, <Xt2>, [<Xn|SP>] STILP <Xt1>, <Xt2>, [<Xn|SP>, #-16]! LDAPR <Wt>, [<Xn|SP>], #4 LDAPR <Xt>, [<Xn|SP>], #8 STLR <Wt>, [<Xn|SP>, #-4]! STLR <Xt>, [<Xn|SP>, #-8]!
2024-01-15aarch64: rcpc3: Define RCPC3_INSN macroVictor Do Nascimento1-0/+2
This patch adds the necessary macro for encoding FEAT_RCPC3-dependent instructions in Binutils.
2024-01-15aarch64: rcpc3: New RCPC3_ADDR operand typesVictor Do Nascimento1-1/+15
The particular choices of address indexing, along with their encoding for RCPC3 instructions lead to the requirement of a new set of operand descriptions, along with the relevant inserter/extractor set. That is, for the integer load/stores, there is only a single valid indexing offset quantity and offset mode is allowed - The value is always equivalent to the amount of data read/stored by the operation and the offset is post-indexed for Load-Acquire RCpc, and pre-indexed with writeback for Store-Release insns. This indexing quantity/mode pair is selected by the setting of a single bit in the instruction. To represent these insns, we add the following operand types: - AARCH64_OPND_RCPC3_ADDR_OPT_POSTIND - AARCH64_OPND_RCPC3_ADDR_OPT_PREIND_WB In the case of loads and stores involving SIMD/FP registers, the optional offset is encoded as an 8-bit signed immediate, but neither post-indexing or pre-indexing with writeback is available. This created the need for an operand type similar to AARCH64_OPND_ADDR_OFFSET, with the difference that FLD_index should not be checked. We thus introduce the AARCH64_OPND_RCPC3_ADDR_OFFSET operand, a variant of AARCH64_OPND_ADDR_OFFSET, w/o the FLD_index bitfield.
2024-01-15aarch64: rcpc3: Add +rcpc3 architectural feature support flagVictor Do Nascimento1-0/+4
Indicating the presence of the Armv8.2-a feature adding further support for the Release Consistency Model, the `+rcpc3' architectural extension flag is added to the list of possible `-march' options in Binutils, together with the necessary macro for encoding rcpc3 instructions.
2024-01-15aarch64: Add SVE2.1 Contiguous load/store instructions.Srinath Parvathaneni1-1/+33
Hi, This patch add support for SVE2.1 instructions ld1q, ld2q, ld3q and ld4q, st1q, st2q, st3q and st4q. Regression testing for aarch64-none-elf target and found no regressions. Ok for binutils-master? Regards, Srinath.
2024-01-15PATCH 5/6][Binutils] aarch64: Add SVE2.1 fmin and fmax instructions.Srinath Parvathaneni1-0/+12
Hi, This patch add support for SVE2.1 instruction faddqv, fmaxnmqv, fmaxqv, fminnmqv and fminqv. Regression testing for aarch64-none-elf target and found no regressions. Ok for binutils-master? Regards, Srinath.
2024-01-15aarch64: Add SVE2.1 dupq, eorqv and extq instructions.Srinath Parvathaneni1-0/+10
Hi, This patch add support for SVE2.1 instruction dupq, eorqv and extq. Regression testing for aarch64-none-elf target and found no regressions. Ok for binutils-master? Regards, Srinath.
2024-01-15aarch64: Add support for FEAT_SVE2p1.Srinath Parvathaneni1-0/+29
Hi, This patch add support for FEAT_SVE2p1 (SVE2.1 Extension) feature along with +sve2p1 optional flag to enabe this feature. Also support for following SVE2p1 instructions is added addqv, andqv, smaxqv, sminqv, umaxqv, uminqv and uminqv. Regression testing for aarch64-none-elf target and found no regressions. Ok for binutils-master? Regards, Srinath.
2024-01-15aarch64: Add support for FEAT_SME2p1 instructions.Srinath Parvathaneni1-0/+36
Hi, This patch add support for FEAT_SME2p1 and "movaz" instructions along with the optional flag +sme2p1. Following "movaz" instructions are add: Move and zero two ZA tile slices to vector registers. Move and zero four ZA tile slices to vector registers. Regression testing for aarch64-none-elf target and found no regressions. Ok for binutils-master? Regards, Srinath.