Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
When parsing immediate values, register names should not be
misinterpreted as symbols. However, for backwards compatibility we need
to permit some newer register names within older instructions. The
current mechanism for doing so depends on the list of explicit
architecture requirements for the instructions, which is fragile and
easy to forget, and grows increasingly messy as more architecture
features are added.
This patch add explicit flags to each opcode to indicate which set of
register names is disallowed in each instance. These flags are
mandatory for all opcodes with immediate operands, which ensures that
the choice of disallowed names will always be deliberate and explicit.
This patch should have no functional change.
|
|
FEAT_SVE_AES2 implements the SVE multi-vector Advanced Encryption
Standard and 128-bit destination element polynomial multiply long
instructions, when the PE is not in Streaming SVE mode.
|
|
FEAT_LSUI introduces unprivileged variants of load and store instructions so
that clearing PSTATE.PAN is never required in privileged software.
|
|
FEAT_PCDPHINT - Producer-consumer data placement hints - is an optional
ISA extension that provides hint instructions to indicate:
- a store in the current execution thread is generating data at a specific
location, which a thread of execution on one or more other observers is
waiting on.
- the thread of execution on the current PE will read a location that may not
yet have been written with the value to be consumed.
This extension introduces:
- STSHH, a hint instruction, with operands (policies) keep and strm
- PRFM *IR*, a new prefetch memory operand.
|
|
|
|
This patch add support for FEAT_PoPS feature which can be enabled
through +pops command line flag.
This patch also adds support for following DC instructions and the
spec can be found here [1].
1. "dc cigdvaps" enabled on passing +memtag+pops command line flags.
2. "dc civaps" enabled on passing +pops command line flag.
[1]: https://developer.arm.com/documentation/ddi0601/2025-03/AArch64-Instructions?lang=en
|
|
FEAT_LSFE - Large System Float Extension - implements A64 base atomic
floating-point in-memory instructions.
|
|
FEAT_SVE_F16F32MM introduces the SVE half-precision floating-point
matrix multiply-accumulate to single-precision instruction.
FEAT_F8F32MM introduces the Advanced SIMD 8-bit floating-point matrix
multiply-accumulate to single-precision instruction.
FEAT_F8F16MM introduces the Advanced SIMD 8-bit floating-point matrix
multiply-accumulate to half-precision instruction.
|
|
FEAT_CMPBR - Compare and branch instructions. This patch adds these
instructions:
- CB<CC> (register)
- CB<CC> (immediate)
- CBH<CC>
- CBB<CC>
where CC is one of the following:
- EQ
- NE
- GT
- GE
- LT
- LE
- HI
- HS
- LO
- LS
|
|
FEAT_OCCMO support was introduced, but the feature flags were missing.
This patch adds these flags, as well as splitting up the tests to test
occmo vs occmo+memtag operands.
|
|
FEAT_SVE_BFSCALE introduces the SVE BFSCALE instruction, when the PE is not in
Streaming SVE mode. If FEAT_SME2 is implemented, FEAT_SVE_BFSCALE also
introduces SME multi-vector Z-targeting BFloat16 scaling instructions, BFSCALE
and BFMUL.
|
|
FEAT_FPRCVT introduces new versions of previous instructions.
The instructions are used to convert between floating points and
Integers. These new versions take as operands SIMD&FP registers
for both the source and destination register. FEAT_FPRCVT also
enables the use of some existing AdvSIMD instructions in
streaming mode. However, no changes are needed in gas to support this.
|
|
Complete macros for feature bits for v9.1-A, v9.2-A, v9.3-A,
and v9.4-A.
|
|
Now that most of the effort of updating the number of feature words is
handled by macros, add an additional one, taking the number of
supported features to 192.
|
|
There are quite a few macros that need to be changed when we need to
increase the number of words in the features data structure. With
some macro trickery we can automate most of this so that a single
macro needs to be updated.
With C2X we could probably do even better by using recursion, but this
is still a much better situation than we had previously.
A static assertion is used to ensure that there is always enough space
in the flags macro for the number of feature bits we need to support.
|
|
|
|
Adjust parsing for AARCH64_OPND_SVE_ADDR_RR{_LSL*} operands to accept
implicit XZR offsets. Add new AARCH64_OPND_SVE_ADDR_RM{_LSL*} operands
to support instructions where an XZR offset is allowed but must be
specified explicitly. This allows the removal of the duplicate opcode
table entries using AARCH64_OPND_SVE_ADDR_R.
|
|
Many FEAT_SVE2p1 instructions need to be enabled by either of two
different features (one for streaming mode, and one for non-streaming
mode). This patch adds correct gating conditions for these
instructions.
There were also a few sve2p1 instructions missing altogether, so add
those as well.
The testsuite is modified to check for all alternative enablement
conditions. In many cases this is done by adding an alternative
assembler commands to existing test files. For some SME/SME2 tests,
only some of the instructions are enabled by +sve2p1, so these are
copied into a separate test. For original SVE2p1 tests, the non-SME2p1
instructions have been moved to a separate test file.
There are also new tests for the newly added instructions. These
include a couple of fixme comments relating to bad error reporting,
which should be investigated later.
|
|
This patch adds support for SME ZA-targeting non-widening BFloat16 instructions,
under tick FEAT_SME_B16B16 and command line flag "+sme-b16b16".
FEAT_SME_B16B16 implements FEAT_SME2 and FEAT_SVE_B16B16, in accordance with that
"+sme-b16b16" enables "+sme2" and "+sve-b16b16".
Also the test files related to FEAT_SME_B16B16 are prefixed with sme-b16b16*.
eg: sme-b16b16-1.s, sme-b16b16-1.d.
The spec for this feature and instructions is availabe here [1]:
[1]: https://developer.arm.com/documentation/ddi0602/2024-06/SME-Instructions?lang=en
|
|
In the current code, SVE2 Bfloat16 instructions are implemented with tick
FEAT_B16B16 and command line flag "+b16b16" and this feature was suspended
due to incomplete support.
In the new spec available here[1], FEAT_B16B16 is replaced with
FEAT_SVE_B16B16 and command line flag "+b16b16" is replace with "sve-b16b16".
Also the test files related to FEAT_SVE_B16B16 are prefixed with sve-b16b16*.
eg: sve-b16b16-sve2-1.s, sve-b16b16-sve2-1.d.
This patch supports the SVE Z-targeting non-widening BFloat16 instructions
with command line flag "+sve-b16b16+sve2".
[1]: https://developer.arm.com/documentation/ddi0602/2024-06/SVE-Instructions?lang=en
|
|
Rename to AARCH64_OPND_SME_ZT0_INDEX_MUL_VL.
|
|
|
|
This patch adds support for FEAT_SME_F16F16 feature (Non-widening
half-precision FP16 to FP16 arithmetic for SME2), which is enabled
using command line flags +sme-f16f16 to -march (which enables both
FEAT_SME2 and FEAT_SME_F16F16).
There are couple of instructions (fadd and fsub variants) which should
be allowed by the assembler on either passing +sme-f16f16 or +sme-f8f16.
Those instructions are already supported in the current assembler, this
patch adds tests for those instructions as well.
|
|
|
|
The current space optmization on enum aarch64_opn_qualifier forced its
encoding using an unsigned char. This "hard-coded" optimization has the
bad consequence of making the array of such enums being completely
unreadable when debugging with GDB because the enum type is lost along
the way.
Keeping this space optimization, and the enum type as well, is possible
when the declaration of the enum is tagged with attribute((packed)).
attribute((packed)) is a GNU extension, and is wrapped in the macro
ATTRIBUTE_PACKED (defined in ansidecl.h), and should be used instead.
|
|
|
|
The enum aarch64_opnd_qualifiers in include/opcode/aarch64.h needs to
stay in sync with the array of struct operand_qualifier_data which
defines various properties for the different type of operands. For
instance, for:
- registers: the size of the register, the number of elements.
- immediates: lower and upper bits to determine the range of values.
|
|
Enforce some checks on the newly added subclass flags:
- If a subclass is set of one insn of an iclass, every insn of that
iclass must have non-zero subclass field.
- For all other iclasses, the subclass bits are zero for all insns.
include/
* opcode/aarch64.h (enum aarch64_insn_class): Identify the
maximum iclass enum value.
opcodes/
* aarch64-gen.c (iclass_has_subclasses_p): New array of bool.
(read_table): Enforce checks on subclass flags.
|
|
The existing iclass information tells us the general shape and purpose
of the instructions. In some cases, however, we need to further disect
the iclass on the basis of other finer-grain information. E.g., for the
purpose of SCFI, we need to know whether a given insn with iclass of
ldst_* is a load or a store. Similarly, whether a particular arithmetic
insn is an add or sub or mov, etc.
This patch defines new flags to demarcate the insns. Also provide an
access function for subclass lookup.
Later, we will enforce (in aarch64-gen.c) that if an iclass has at least
one instruction with a non-zero subclass, all instructions of the iclass
must have a non-zero subclass information. If none of the defined
subclasses are applicable (or not required for SCFI purposes),
F_SUBCLASS_OTHER can be used for such instructions.
include/
* opcode/aarch64.h (F_SUBCLASS): New flag.
(F_SUBCLASS_OTHER): Likewise.
(F_LDST_LOAD): Likewise.
(F_LDST_STORE): Likewise.
(F_ARITH_ADD): Likewise.
(F_ARITH_SUB): Likewise.
(F_ARITH_MOV): Likewise.
(F_BRANCH_CALL): Likewise.
(F_BRANCH_RET): Likewise.
(F_DP_TAG_ONLY): Likewise.
(aarch64_opcode_subclass_p): New definition.
|
|
This patch adds support for following sme2.1 zero instructions and
the spec is available here [1].
1. ZERO (single-vector).
2. ZERO (double-vector).
3. ZERO (quad-vector).
The VECTOR GROUP symbols VGx2 and VGx4 are optional for the assembler
for most of the sme and sve instructions. But for few of the sme2.1
zero instruction variants VECTOR GROUP symbols VGx2 and VGx4 are mandatory.
To address this a bit "F_VG_REQ" is introduced in this patch, on setting
F_VG_REQ bit in flags of aarch64_opcode forces the assembler to accept
instruction operand only having VECTOR GROUP symbols.
[1]: https://developer.arm.com/documentation/ddi0602/2024-03/SME-Instructions?lang=en
|
|
This patch adds support for following sme2.1 movaz instructions and
the spec is available here [1].
1. MOVAZ (array to vector, two registers).
2. MOVAZ (array to vector, four registers).
3. MOVAZ (tile to vector, single).
[1]: https://developer.arm.com/documentation/ddi0602/2024-03/SME-Instructions?lang=en
|
|
This patch adds support for following sme2.1 luti2 and luti4 instructions, spec is
available here [1]
1. LUTI2 (two registers) strided.
2. LUTI2 (four registers) strided.
3. LUTI4 (two registers) strided.
4. LUTI4 (four registers) strided.
[1]: https://developer.arm.com/documentation/ddi0602/2024-03/SME-Instructions?lang=en
|
|
This patch adds support for followign SVE2p1 instruction, spec is available here [1].
1. PMOV (to vector)
2. PMOV (to predicate)
Both pmov (to vector) and pmov (to predicate) have destination scalable vector
register and source scalable vector register respectively as an operand with no
suffix and optional index. To handle this case we have added 8 new operands in
this patch.
AARCH64_OPND_SVE_Zn0_INDEX, /* Zn[index], bits [9:5]. */
AARCH64_OPND_SVE_Zn1_17_INDEX, /* Zn[index], bits [9:5,17]. */
AARCH64_OPND_SVE_Zn2_18_INDEX, /* Zn[index], bits [9:5,18:17]. */
AARCH64_OPND_SVE_Zn3_22_INDEX, /* Zn[index], bits [9:5,18:17,22]. */
AARCH64_OPND_SVE_Zd0_INDEX, /* Zn[index], bits [4:0]. */
AARCH64_OPND_SVE_Zd1_17_INDEX, /* Zn[index], bits [4:0,17]. */
AARCH64_OPND_SVE_Zd2_18_INDEX, /* Zn[index], bits [4:0,18:17]. */
AARCH64_OPND_SVE_Zd3_22_INDEX, /* Zn[index], bits [4:0,18:17,22]. */
Since the index of the <Zd> operand is optional, the index part is
dropped in disassembly in both the cases of "no index" or "zero index".
As per spec: PMOV <Zd>{[<imm>]}, <Pn>.D
PMOV <Pn>.D, <Zd>{[<imm>]}
Example1:
Assembly: pmov z5[0], p6.d
Disassembly: pmov z5, p6.d
Assembly: pmov z5, p6.d
Disassembly: pmov z5, p6.d
Example2:
Assembly: pmov p4.b, z5[0]
Disassembly: pmov p4.b, z5
Assembly: pmov p4.b, z5
Disassembly: pmov p4.b, z5
[1]: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions?lang=en
|
|
AArch64 defines new registers for the feature step2 (Enhanced Software Step
Extension). step2 is an Armv9.5-A feature.
This patch also adds relevant tests. Regression tested on aarch64-none-elf,
and no regression found.
|
|
AArch64 defines new registers for the feature spmu2 (System Performance
Monitors Extension version 2). spmu2 is an Armv9.5-A feature.
This patch also adds relevant tests. Regression tested on aarch64-none-elf,
and no regression found.
|
|
AArch64 defines new registers for the feature e3dse (Delegated SError
exceptions for EL3): vdisr_el3 and vdisr_el3. e3dse is an Armv9.5-A
feature.
This patch also adds relevant tests. Regression tested on aarch64-none-elf,
and no regression found.
|
|
The new -march=armv9.5-a flag enables access to the
mandatory cpa, lut and faminmax extensions.
Existing test cases for features are extended to verify they
work without additional flags.
|
|
This patch fixes encoding and syntax for sve2p1 instructions ld[1-4]q/st[1-4]q
as mentioned below, for the issues reported here.
https://sourceware.org/pipermail/binutils/2024-February/132408.html
1) Previously all the ld[1-4]q/st[1-4]q instructions are wrongly added as
predicated instructions and this issue is fixed in this patch by replacing
"SVE2p1_INSNC" with "SVE2p1_INSN" macro.
2) Wrong first operand in all the ld[1-4]q/st[1-4]q instructions is fixed
by replacing "SVE_Zt" with "SVE_ZtxN".
3) Wrong operand qualifiers in ld1q and st1q instructions are also fixed in
this patch.
4) In ld1q/st1q the index in the second argument is optional and if index
is xzr and is skipped in the assembly, the index field is ignored by the
disassembler.
Fixing above mentioned issues helps with following:
1) ld1q and st1q first register operand accepts enclosed figure braces.
2) ld2q, ld3q, ld4q, st2q, st3q, and st4q instructions accepts wrapping
sequence of vector registers.
For the instructions ld[2-4]q/st[2-4]q, tests for wrapping sequence of vector
registers are added along with short-form of operands for non-wrapping sequence.
I have added test using following logic:
ld2q {Z0.Q, Z1.Q}, p0/Z, [x0, #0, MUL VL] //raw insn encoding (all zeroes)
ld2q {Z31.Q, Z0.Q}, p0/Z, [x0, #0, MUL VL] // encoding of <Zt1>
ld2q {Z0.Q, Z1.Q}, p7/Z, [x0, #0, MUL VL] // encoding of <Pg>
ld2q {Z0.Q, Z1.Q}, p0/Z, [x30, #0, MUL VL] // encoding of <Xm>
ld2q {Z0.Q, Z1.Q}, p0/Z, [x0, #-16, MUL VL] // encoding of <imm> (low value)
ld2q {Z0.Q, Z1.Q}, p0/Z, [x0, #14, MUL VL] // encoding of <imm> (high value)
ld2q {Z31.Q, Z0.Q}, p7/Z, [x30, #-16, MUL VL] // encoding of all fields (all ones)
ld2q {Z30.Q, Z31.Q}, p1/Z, [x3, #-2, MUL VL] // random encoding.
For all the above form of instructions the hyphenated form is preferred for
disassembly if there are more than two registers in the list, and the register
numbers are monotonically increasing in increments of one.
|
|
This patch fixes the syntax of sve2p1 "extq" instruction by modifying the operands
count to 4. A new operand AARCH64_OPND_SVE_UIMM4 is defined to handle the 4th
argument an 4-bit unsigned immediate of extq instruction. The instruction encoding
is updated to use constraint C_SCAN_MOVPRFX, to enable "extq" instruction to immediately
precede in program order by a MOVPRFX instruction. Also removed the unused operand
AARCH64_OPND_SVE_Zm_imm4.
This issues was reported here:
https://sourceware.org/pipermail/binutils/2024-February/132408.html
|
|
This patch fixes the syntax of sve2p1 "dupq" instruction by modifying the way
2nd operand does the encoding and decoding using the [<imm>] value.
dupq makes use of already existing aarch64_ins_sve_index and aarch64_ext_sve_index
inserter and extractor functions. The definitions of aarch64_ins_sve_index_imm (inserter)
and aarch64_ext_sve_index_imm (extractor) is removed in this patch.
This issues was reported here:
https://sourceware.org/pipermail/binutils/2024-February/132408.html
|
|
This patch fixes the mandatory feature bits in v9.4-a architectures,
by enabling FEAT_SVE2p1 for Armv9.4-A architecture by default.
|
|
This includes:
- FEAT_SME_F8F32 (+sme-f8f32)
- FEAT_SME_F8F16 (+sme-f8f16)
The FP16 addition/subtraction instructions originally added by
FEAT_SME_F16F16 haven't been added to Binutils yet. They are also
required to be enabled if FEAT_SME_F8F16 is present, so they are
included in this patch.
|
|
This includes all the instructions under the following features:
- FEAT_FP8FMA (+fp8fma)
- FEAT_FP8DOT4 (+fp8dot4)
- FEAT_FP8DOT2 (+fp8dot2)
- FEAT_SSVE_FP8FMA (+ssve-fp8fma)
- FEAT_SSVE_FP8DOT4 (+ssve-fp8dot4)
- FEAT_SSVE_FP8DOT2 (+ssve-fp8dot2)
|
|
Introduces instructions for the SME2 lutv2 extension for AArch64. They
are documented in the following document:
* ARM DDI0602
For both luti4 instructions, we introduced an operand called
SME_Znx2_BIT_INDEX. We use the existing function parse_vector_reg_list
for parsing but modified that function so that it can accept operands
without qualifiers and rejects instructions that have operands with
qualifiers but are not supposed to have operands with qualifiers.
For disassembly, we modified print_register_list so that it could
accept register lists without qualifiers.
For one luti4 instruction, we introduced a SME_Zdnx4_STRIDED. It is
similar to SME_Ztx4_STRIDED and we could use existing code for parsing,
encoding, and disassembly.
For movt instruction, we introduced an operand called SME_ZT0_INDEX2_12.
This is a ZT0 register with a bit index encoded in [13:12]. It is
similar to SME_ZT0_INDEX.
We also introduced an iclass named sme_size_12_b so that we can encode
size bits [13:12] correctly when only 'b' is allowed as qualifier.
|
|
FEAT_CSSC is mandatory in the architecture from Armv8.9.
|
|
The FEAT_BRBE extension provides two aliases of sys:
- brb iall (Invalidates all Branch records in the Branch Record Buffer)
- brb inj (Injects the Branch Record held in BRBINFINJ_EL1,
BRBSRCINJ_EL1, and BRBTGTINJ_EL1 into the Branch Record Buffer)
This patch adds:
- the feature option "brbe" that must be added for the aliases to be available
- a new operand flag AARCH64_OPND_Rt_IN_SYS_ALIASES that warns in a comment
when Rt is set to the non default value 0b11111 (it is constrained
unpredictable whether the instruction is undefined or behaves as if the Rt
field is set to 0b11111).
- a new operand flag AARCH64_OPND_BRBOP that encodes and decodes Op2 values
from bit 5
- support for the two brb aliases above
See:
- https://developer.arm.com/documentation/ddi0602/2024-03/Base-Instructions/BRB--Branch-Record-Buffer--an-alias-of-SYS-?lang=en
- https://developer.arm.com/documentation/ddi0601/2024-03/AArch64-Instructions/BRB-INJ--Branch-Record-Injection-into-the-Branch-Record-Buffer?lang=en
- https://developer.arm.com/documentation/ddi0601/2024-03/AArch64-Instructions/BRB-IALL--Invalidate-the-Branch-Record-Buffer?lang=en
|
|
Introduces instructions for the SVE2 lut extension for AArch64. They are documented in the following links:
* luti2: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/LUTI2--Lookup-table-read-with-2-bit-indices-?lang=en
* luti4: https://developer.arm.com/documentation/ddi0602/2024-03/SVE-Instructions/LUTI4--Lookup-table-read-with-4-bit-indices-?lang=en
These instructions use new SVE2 vector operands. They are called
SVE_Zm1_23_INDEX, SVE_Zm2_22_INDEX, and Zm3_12_INDEX and they have
1 bit, 2 bit, and 3 bit indices respectively.
The lsb and width of these new operands are the same as many existing
operands but the convention is to give different names to fields that
serve different purpose so we introduced new fields in aarch64-opc.c
and aarch64-opc.h.
We made a design choice for the second operand of the halfword variant of
luti4 with two register tables. We could have either defined a new operand,
like SVE_Znx2, or we could have use the existing operand SVE_ZnxN. With
the new operand, we would need to implement constraints on register
lists based on either operand or opcode flag. With existing operand, we
could just existing constraint checks using opcode flag. We chose
the second approach and went with SVE_ZnxN and added opcode flag to
enforce lengths of vector register list operands. This way, we can reuse
the existing constraint check logic.
|
|
Introduces instructions for the Advanced SIMD lut extension for AArch64. They are documented in the following links:
* luti2: https://developer.arm.com/documentation/ddi0602/2024-03/SIMD-FP-Instructions/LUTI2--Lookup-table-read-with-2-bit-indices-?lang=en
* luti4: https://developer.arm.com/documentation/ddi0602/2024-03/SIMD-FP-Instructions/LUTI4--Lookup-table-read-with-4-bit-indices-?lang=en
These instructions needed definition of some new operands. We will first
discuss operands for the third operand of the instructions and then
discuss a vector register list operand needed for the second operand.
The third operands are vectors with bit indices and without type
qualifiers. They are called Em_INDEX1_14, Em_INDEX2_13, and Em_INDEX3_12
and they have 1 bit, 2 bit, and 3 bit indices respectively. For these
new operands, we defined new parsing case branch. The lsb and width of
these operands are the same as many existing but the convention is to
give different names to fields that serve different purpose so we
introduced new fields in aarch64-opc.c and aarch64-opc.h for these new
operands.
For the second operand of these instructions, we introduced a new
operand called LVn_LUT. This represents a vector register list with
stride 1. We defined new inserter and extractor for this new operand and
it is encoded in FLD_Rn. We are enforcing the number of registers in the
reglist using opcode flag rather than operand flag as this is what other
SIMD vector register list operands are doing. The disassembly also uses
opcode flag to print the correct number of registers.
|
|
|