aboutsummaryrefslogtreecommitdiff
path: root/opcodes/i386-tbl.h
AgeCommit message (Collapse)AuthorFilesLines
2024-12-23Support Intel AVX10.2 minmax, vector copy and compare instructionsHaochen Jiang1-176/+378
In this patch, we will support AVX10.2 minmax, vector copy and compare instructions. This will finish the new instruction form support for AVX10.2. Most of them are new instructions forms except for vmovd and vmovw, which are extended usage from the old ones. gas/ChangeLog: * NEWS: Mention AVX10.2. * testsuite/gas/i386/i386.exp: Add AVX10.2 tests. * testsuite/gas/i386/x86-64.exp: Ditto. * testsuite/gas/i386/avx10_2-256-5-intel.d: New test. * testsuite/gas/i386/avx10_2-256-miscs.d: Ditto. * testsuite/gas/i386/avx10_2-256-miscs.s: Ditto. * testsuite/gas/i386/avx10_2-512-miscs-intel.d: Ditto. * testsuite/gas/i386/avx10_2-512-miscs.d: Ditto. * testsuite/gas/i386/avx10_2-512-miscs.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-miscs-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-miscs.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-miscs.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-miscs-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-miscs.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-miscs.s: Ditto. opcodes/ChangeLog: * i386-dis-evex-len.h: Add EVEX_LEN_0F7E_P_1_W_1, EVEX_LEN_0FD6_P_2_W_0, EVEX_LEN_MAP5_6E and EVEX_LEN_MAP5_7E. * i386-dis-evex-prefix.h: Add PREFIX_EVEX_0F2E, PREFIX_EVEX_0F2F, PREFIX_EVEX_0F3A52, PREFIX_EVEX_0F3A53, PREFIX_EVEX_MAP5_2E, PREFIX_EVEX_MAP5_2F, PREFIX_EVEX_MAP5_6E and PREFIX_EVEX_MAP5_7E. * i386-dis-evex-w.h: Adjust EVEX_W_0F3A42, EVEX_W_0F7E_P_1 and EVEX_W_0FD6. Add EVEX_W_MAP5_6E_P_1 and EVEX_W_MAP5_7E_P_1. * i386-dis-evex.h: Add and adjust table entries for AVX10.2. * i386-dis.c (PREFIX_EVEX_0F2E): New. (PREFIX_EVEX_0F2F): Ditto. (PREFIX_EVEX_0F3A52): Ditto. (PREFIX_EVEX_0F3A53): Ditto. (PREFIX_EVEX_MAP5_2E): Ditto. (PREFIX_EVEX_MAP5_2F): Ditto. (PREFIX_EVEX_MAP5_6E_L_0): Ditto. (PREFIX_EVEX_MAP5_7E_L_0): Ditto. (EVEX_LEN_0F7E_P_1_W_1): Ditto. (EVEX_LEN_0FD6_P_2_W_0): Ditto. (EVEX_LEN_MAP5_6E): Ditto. (EVEX_LEN_MAP5_7E): Ditto. (EVEX_W_MAP5_6E_P_1): Ditto. (EVEX_W_MAP5_7E_P_1): Ditto. * i386-opc.tbl: Add AVX10.2 instructions. * i386-mnem.h: Regenerated. * i386-tbl.h: Ditto. Co-authored-by: Jun Zhang <jun.zhang@intel.com> Co-authored-by: Zewei Mo <zewei.mo@intel.com>
2024-12-18Support Intel SM4 AVX10.2 extensionHaochen Jiang1-9052/+9076
In this patch, we will support SM4 AVX10.2 extension part. It is a promotion from VEX encoding to EVEX encoding. The EVEX encoding is based on AVX10.2, which is the same as the upcoming MOVRS ISA. Thus, we decide to pull AVX10.2 out to CPU_COMMON_FLAGS. While I have also tried to merge the table like AVX/AVX512, I choose to just templatize the table. I am okay to go either way, but slightly prefer the templatizing one since probably SM4 would be the only ISA with AVX10.2 needs such VEX to EVEX extension (MOVRS does not need that). Also, it is a tendancy that we will directly provide EVEX encodings and no VEX encodings for vector instructions since AVX10. This will make the adding in gas/config/tc-i386.c not that worthy. gas/ChangeLog: * NEWS: Support Intel SM4 EVEX instructions. * config/tc-i386.c (_is_cpu): Handle AVX10.2. * testsuite/gas/i386/i386.exp: Run SM4 tests. * testsuite/gas/i386/x86-64.exp: Ditto. * testsuite/gas/i386/avx10_2-256-sm4-intel.d: Add SM4 tests. * testsuite/gas/i386/avx10_2-256-sm4.d: Ditto. * testsuite/gas/i386/avx10_2-256-sm4.s: Ditto. * testsuite/gas/i386/avx10_2-512-sm4-intel.d: Ditto. * testsuite/gas/i386/avx10_2-512-sm4.d: Ditto. * testsuite/gas/i386/avx10_2-512-sm4.s: Ditto. * testsuite/gas/i386/avx10_2-sm4-inval.l: Ditto. * testsuite/gas/i386/avx10_2-sm4-inval.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-sm4-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-sm4.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-sm4.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-sm4-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-sm4.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-sm4.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-sm4-inval.l: Ditto. * testsuite/gas/i386/x86-64-avx10_2-sm4-inval.s: Ditto. opcodes/ChangeLog: * i386-dis-evex.h: Add evex table entry for SM4. * i386-dis.h: Ditto. * i386-opc.h: (i386_cpu): Move AVX10.2 to CPU_FLAGS_COMMON. * i386-opc.tbl: Add SM4 EVEX instructions. * i386-init.h: Regenerated. * i386-tbl.h: Ditto.
2024-12-05Support Intel AVX10.2 satcvt instructionsHu, Lin11-1/+393
In this patch, we will support AVX10.2 satcvt instructions. All of them are new instruction forms. In current documentation, it is still VCVTTNEBF162I[,U]BS, but it will change to VCVTTBF162I[,U]BS eventually. In table part, we used temporary <sign> iterator to reduce redundancy. It definitely could be done for legacy cvt insns, but it is out of this patch's scope. gas/ChangeLog: * testsuite/gas/i386/i386.exp: Add AVX10.2 tests. * testsuite/gas/i386/x86-64.exp: Ditto. * testsuite/gas/i386/avx10_2-512-satcvt-intel.d: New test. * testsuite/gas/i386/avx10_2-512-satcvt.d: Ditto. * testsuite/gas/i386/avx10_2-512-satcvt.s: Ditto. * testsuite/gas/i386/avx10_2-256-satcvt-intel.d: Ditto. * testsuite/gas/i386/avx10_2-256-satcvt.d: Ditto. * testsuite/gas/i386/avx10_2-256-satcvt.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-satcvt-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-satcvt.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-satcvt.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-satcvt-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-satcvt.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-satcvt.s: Ditto. opcodes/ChangeLog: * i386-dis-evex-prefix.h: Add PREFIX_EVEX_MAP5_68, PREFIX_EVEX_MAP5_69, PREFIX_EVEX_MAP5_6A, PREFIX_EVEX_MAP5_6B, PREFIX_EVEX_MAP5_6C, PREFIX_EVEX_MAP5_6D. * i386-dis-evex-w.h: Add EVEX_W_MAP5_6C_P_0, EVEX_W_MAP5_6C_P_2, EVEX_W_MAP5_6D_P_0, EVEX_W_MAP5_6D_P_2. * i386-dis-evex.h (prefix_table): Add PREFIX_EVEX_MAP5_68, * PREFIX_EVEX_MAP5_69, PREFIX_EVEX_MAP5_6A, PREFIX_EVEX_MAP5_6B. * i386-dis.c: (PREFIX_EVEX_MAP5_68): New. (PREFIX_EVEX_MAP5_69): Ditto. (PREFIX_EVEX_MAP5_6A): Ditto. (PREFIX_EVEX_MAP5_6B): Ditto. (PREFIX_EVEX_MAP5_6C): Ditto. (PREFIX_EVEX_MAP5_6D): Ditto. (EVEX_MAP5_6C_P_0): Ditto. (EVEX_MAP5_6C_P_2): Ditto. (EVEX_MAP5_6D_P_0): Ditto. (EVEX_MAP5_6D_P_2): Ditto. * i386-opc.tbl: Add AVX10.2 instructions. * i386-mnem.h: Regenerated. * i386-tbl.h: Ditto. Co-authored-by: Zewei Mo <zewei.mo@intel.com> Co-authored-by: Haochen Jiang <haochen.jiang@intel.com> Co-authored-by: Levy Hsu <admin@levyhsu.com>
2024-12-03Support Intel AVX10.2 BF16 instructionsKong Lingling1-1/+413
In this patch, we will support AVX10.2 BF16 instructions. All of them are new instructions forms. In current documentation, it is still VSCALEFPBF16, but it will change to VSCALEFNEPBF16 eventually. In disassembler part, we added %XB to reduce W table pass since all of them get evex.w=0. gas/Changelog: * testsuite/gas/i386/i386.exp: Add AVX10.2 tests. * testsuite/gas/i386/x86-64.exp: Ditto. * testsuite/gas/i386/avx10_2-256-bf16-intel.d: New. * testsuite/gas/i386/avx10_2-256-bf16.d: Ditto. * testsuite/gas/i386/avx10_2-256-bf16.s: Ditto. * testsuite/gas/i386/avx10_2-512-bf16-intel.d: Ditto. * testsuite/gas/i386/avx10_2-512-bf16.d: Ditto. * testsuite/gas/i386/avx10_2-512-bf16.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-bf16-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-bf16.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-bf16.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-bf16-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-bf16.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-bf16.s: Ditto. opcodes/ * i386-dis-evex-prefix.h: Update PREFIX_EVEX_0F3A08, PREFIX_EVEX_0F3A26, PREFIX_EVEX_0F3A56, PREFIX_EVEX_0F3A66, PREFIX_EVEX_0F3AC2, PREFIX_EVEX_MAP5_2F, PREFIX_EVEX_MAP5_51, PREFIX_EVEX_MAP5_58, PREFIX_EVEX_MAP5_59, PREFIX_EVEX_MAP5_5C, PREFIX_EVEX_MAP5_5D, PREFIX_EVEX_MAP5_5E, PREFIX_EVEX_MAP5_5F. Add PREFIX_EVEX_MAP6_2C, PREFIX_EVEX_MAP6_4C, PREFIX_EVEX_MAP6_4E, PREFIX_EVEX_MAP6_98, PREFIX_EVEX_MAP6_9A, PREFIX_EVEX_MAP6_9C, PREFIX_EVEX_MAP6_9E, PREFIX_EVEX_MAP6_A8, PREFIX_EVEX_MAP6_AA, PREFIX_EVEX_MAP6_AC, PREFIX_EVEX_MAP6_AE, PREFIX_EVEX_MAP6_B8, PREFIX_EVEX_MAP6_BA, PREFIX_EVEX_MAP6_BC, PREFIX_EVEX_MAP6_BE. * i386-dis-evex.h (evex_table): Update PREFIX_EVEX_MAP6_2C, PREFIX_EVEX_MAP6_42, PREFIX_EVEX_MAP6_4C, PREFIX_EVEX_MAP6_4E, PREFIX_EVEX_MAP6_98, PREFIX_EVEX_MAP6_9A, PREFIX_EVEX_MAP6_9C, PREFIX_EVEX_MAP6_9E, PREFIX_EVEX_MAP6_A8, PREFIX_EVEX_MAP6_AA, PREFIX_EVEX_MAP6_AC, PREFIX_EVEX_MAP6_AE, PREFIX_EVEX_MAP6_B8, PREFIX_EVEX_MAP6_BA, PREFIX_EVEX_MAP6_BC, PREFIX_EVEX_MAP6_BE. * i386-dis.c (PREFIX_EVEX_MAP6_2C): New enum. (PREFIX_EVEX_MAP6_42): Ditto. (PREFIX_EVEX_MAP6_4C): Ditto. (PREFIX_EVEX_MAP6_4E): Ditto. (PREFIX_EVEX_MAP6_98): Ditto. (PREFIX_EVEX_MAP6_9A): Ditto. (PREFIX_EVEX_MAP6_9C): Ditto. (PREFIX_EVEX_MAP6_9E): Ditto. (PREFIX_EVEX_MAP6_A8): Ditto. (PREFIX_EVEX_MAP6_AA): Ditto. (PREFIX_EVEX_MAP6_AC): Ditto. (PREFIX_EVEX_MAP6_AE): Ditto. (PREFIX_EVEX_MAP6_B8): Ditto. (PREFIX_EVEX_MAP6_BA): Ditto. (PREFIX_EVEX_MAP6_BC): Ditto. (PREFIX_EVEX_MAP6_BE): Ditto. (putop): Handle %XB. * i386-opc.tbl: Add AVX10.2 instructions. * i386-mnem.h: Regenerated. * i386-tbl.h: Ditto. Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
2024-11-29x86: SETcc doesn't permit W suffixJan Beulich1-30/+30
Accidentally I had removed No_wSuf when cloning the extra template.
2024-11-19Support x86 Intel MSR_IMMHu, Lin11-435/+455
gas/ChangeLog: * NEWS: Support x86 Intel MSR_IMM. * config/tc-i386.c (cpu_arch): Add MSR_IMM. (cpu_flags_match): Add MSR_IMM to APX_F related processing. (i386_assemble): WRMSRNS's first operand is imm32, so add MN_wrmsrns like MN_uwrmsr. * doc/c-i386.texi: Document .msr_imm. * testsuite/gas/i386/i386.exp: Run MSR_IMM tests. * testsuite/gas/i386/x86-64.exp: Ditto. * testsuite/gas/i386/msr_imm-inval.l: New test. * testsuite/gas/i386/msr_imm-inval.s: Ditto. * testsuite/gas/i386/x86-64-msr_imm-intel.d: Ditto. * testsuite/gas/i386/x86-64-msr_imm.d: Ditto. * testsuite/gas/i386/x86-64-msr_imm.s: Ditto. opcodes/ChangeLog: * i386-dis.c: Add REG_VEX_MAP7_F6_L_0_W_0, PREFIX_VEX_MAP7_F6_L_0_W_0_R_0_X86_64, X86_64_VEX_MAP7_F6_L_0_W_0_R_0, VEX_LEN_MAP7_F6, VEX_W_MAP7_F6_L_0. (reg_table): New entry for MSR_IMM. (prefix_table): Ditto. (x86_64_table): Ditto. (vex_len_table): Ditto. (vex_w_table): Ditto. (map7_f6_opcode): New variable for MAP7. (get_valid_dis386): Support MAP7. * i386-gen.c (cpu_flags): Add MSR_IMM. * i386-init.h: Regenerated. * i386-mnem.h: Ditto. * i386-opc.h (i386_cpu_flags): Add cpumsr_imm. * i386-opc.tbl: Add MSR_IMM instructions. * i386-tbl.h: Regenerated.
2024-11-18x86: rename SPACE_{,E}VEX_MAP<N>Jan Beulich1-652/+652
Map7 already has dual purpose for USER-MSR (and is to gain more for MSR-IMM), while Map5 is about to gain VEX uses for AMX extensions. Drop the not really meaningful infixes and (in the opcode table) prefixes, retaining merely EVexMap4 for encoding EVex128 at the same time.
2024-11-18x86: VP2INTERSECT{D,Q} have mask register destination groupJan Beulich1-2/+2
Much like AVX512-{4FMAPS,4VNNIW} have a constraint on their register source, there's a constraint (need to be even) on the destination register here. Adjust "good" test cases accordingly, and add a new test case to check the warning.
2024-10-30x86/APX: support JMPABS also in assemblerJan Beulich1-254/+271
Without this APX support isn't really complete. For Intel syntax displacement form is needed, such that symbolic operands won't need prefixing by "offset". (The other form is actually not used at all in Intel syntax.) For the record: To restrict displacement form to Intel syntax is not something I actually agree with.
2024-10-29x86: use <xyz> for VFPCLASSP{S,D}Jan Beulich1-30/+30
Just like VFPCLASSPH does. While the order of generated table entries changes this way, the individual entries don't change.
2024-10-18x86: Regenerate missing table filesMayShao-oc1-1666/+1693
As soon as I committed Zhaoxin's patch, I realized that I did not include the regen file. Regenerate them and commit as obvious. opcodes/ChangeLog: * i386-tbl.h: Regenerated. * i386-mnem.h: Ditto. * i386-init.h: Ditto.
2024-10-16Support Intel AVX10.2 convert instructionsLiwei Xu1-1/+412
In this patch, we will support AVX10.2 convert instructions. All of them are new instruction forms. Among all the instructions, vcvtbiasph2[b,h]f8[,s] needs extra care. Since Operand 2 could indicate memory size, we do not need suffix under ATTmode. However, we could not fold all three templates but only XMM/YMM since the dst operand size are the same for them. Also, a new iterator <cvt8> is added to reduce redundancy. gas/ * testsuite/gas/i386/i386.exp: Add AVX10.2 tests. * testsuite/gas/i386/x86-64.exp: Ditto. * testsuite/gas/i386/avx10_2-256-cvt-intel.d: New. * testsuite/gas/i386/avx10_2-256-cvt.d: Ditto. * testsuite/gas/i386/avx10_2-256-cvt.s: Ditto. * testsuite/gas/i386/avx10_2-512-cvt-intel.d: Ditto. * testsuite/gas/i386/avx10_2-512-cvt.d: Ditto. * testsuite/gas/i386/avx10_2-512-cvt.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-cvt-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-cvt.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-cvt.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-cvt-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-cvt.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-cvt.s: Ditto. opcodes/ * i386-dis-evex-prefix.h: Add PREFIX_EVEX_0F3874, PREFIX_EVEX_MAP5_18, PREFIX_EVEX_MAP5_1B, PREFIX_EVEX_MAP5_1E and PREFIX_EVEX_MAP5_74. * i386-dis-evex.h: Add table pass for AVX10.2 instructions. * i386-dis.c (MOD_EVEX_0F38B1): New. (PREFIX_EVEX_0F3874): Ditto. (PREFIX_EVEX_MAP5_18): Ditto. (PREFIX_EVEX_MAP5_1B): Ditto. (PREFIX_EVEX_MAP5_1E): Ditto. (PREFIX_EVEX_MAP5_74): Ditto. * i386-opc.tbl: Add AVX10.2 instructions. * i386-mnem.h: Regenerated. * i386-tbl.h: Ditto. Co-authored-by: Kong Lingling <lingling.kong@intel.com> Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
2024-10-11Support Intel AVX10.2 media instructionsHaochen Jiang1-162/+333
In disassembler part, for vnni instructions, we extended previous VEX part using %XE in disassembler to promote them to EVEX by reusing the original VEX table. For vmpsadbw, we will also use %XE. However, it is hard to reuse the VEX table, so we are using new ones. In assmbler part, we put the vnni table entries with previous vnni instructions since they are just promotion from AVX-VNNI-INT{8,16}. Since we will prefer VEX encoding, we need to use the different table order in template <vnni>, which prefers EVEX due to earlier introduction for AVX512_VNNI than AVX_VNNI. This means a new <vnni>. For vdpphps and vmpsadbw, we put them at the end of the table, with future AVX10.2 instructions. Nit: I will remove the arch requirement for avx_vnni_int{8,16} in evex-promote testcases after AVX10.2 implies AVX-VNNI-INT{8,16}. gas/Changelog: * testsuite/gas/i386/i386.exp: Add AVX10.2 tests. * testsuite/gas/i386/x86-64.exp: Ditto. * testsuite/gas/i386/avx10_2-256-1-intel.d: New. * testsuite/gas/i386/avx10_2-256-1.d: Ditto. * testsuite/gas/i386/avx10_2-256-1.s: Ditto. * testsuite/gas/i386/avx10_2-512-1-intel.d: Ditto. * testsuite/gas/i386/avx10_2-512-1.d: Ditto. * testsuite/gas/i386/avx10_2-512-1.s: Ditto. * testsuite/gas/i386/avx10_2-promote.d: Ditto. * testsuite/gas/i386/avx10_2-promote.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-1-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-1.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-256-1.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-1-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-1.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-512-1.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-promote.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-promote.s: Ditto. opcodes/Changelog: * i386-dis-evex-prefix.h: Adjust PREFIX_EVEX_0F3852. Add PREFIX_EVEX_0F3A42_W_0. * i386-dis-evex-w.h: Adjust EVEX_W_0F3A42. * i386-dis-evex.h: Add table pass for AVX10.2 instructions. * i386-dis.c: Adjust PREFIX_VEX_0F3850_W_0, PREFIX_VEX_0F3851_W_0, PREFIX_VEX_0F38D2_W_0 and PREFIX_VEX_0F38D3_W_0. * i386-opc.tbl: Add AVX10.2 instructions. * i386-mnem.h: Regenerated. * i386-tbl.h: Ditto. Co-authored-by: Lili Cui <lili.cui@intel.com>
2024-09-27x86: optimize {,V}INSERTPS with certain immediatesJan Beulich1-4/+4
They are equivalent to simple moves or xors, which are up to 3 bytes shorter to encode (and maybe/likely also cheaper to execute).
2024-09-27x86: optimize {,V}EXTRACT{F,I}{128,32x{4,8},64x{2,4}} with immediate 0Jan Beulich1-10/+10
They, too, are equivalent to simple moves, which are up to 3 bytes shorter to encode (and maybe also cheaper to execute).
2024-09-27x86: optimize {,V}EXTRACTPS with immediate 0Jan Beulich1-6/+6
They are equivalent to simple moves, which are up to 2 bytes shorter to encode (and maybe also cheaper to execute).
2024-09-26x86: templatize SIMD narrowing-move templatesJan Beulich1-9/+9
Once again to reduce redundancy.
2024-09-26x86: templatize SIMD sign-/zero-extension templatesJan Beulich1-191/+191
Yet again to reduce redundancy.
2024-09-26x86: templatize SIMD FP binary-logic templatesJan Beulich1-266/+266
Once more to reduce redundancy.
2024-09-26x86: further templatize FMA templatesJan Beulich1-333/+333
Further reduce redundancy, in preparation of the addition of counterparts for AVX10.2.
2024-09-26x86: templatize SIMD FP arithmetic templatesJan Beulich1-1084/+1084
Reduce redundancy, in preparation of the addition of further counterparts for AVX10.2. Provide the "ne" parameter needed there right away, even if unused for now.
2024-09-18x86/APX: Don't promote AVX/AVX2 instructions out of APX specH.J. Lu1-321/+197
V{BROADCAST,EXTRACT,INSERT}{F,I}128 and VROUND{P,S}{S,D} aren't promoted to support EGPR in APX spec. Don't promote them out of APX spec. This commit effectively reverted: ec3babb8c10 x86/APX: V{BROADCAST,EXTRACT,INSERT}{F,I}128 can also be expressed 5a635f1f59a x86/APX: VROUND{P,S}{S,D} encodings require AVX512{F,VL} eea4357967b x86/APX: VROUND{P,S}{S,D} can generally be encoded gas/ PR gas/32171 * testsuite/gas/i386/x86-64-apx-egpr-promote-inval.s: Add V{BROADCAST,EXTRACT,INSERT}{F,I}128 tests with EGPR. * testsuite/gas/i386/x86-64-apx-evex-promoted.s: Remove V{BROADCAST,EXTRACT,INSERT}{F,I}128 and VROUND{P,S}{S,D} tests with EGPR. * testsuite/gas/i386/x86-64-apx-egpr-inval.l: Updated. * testsuite/gas/i386/x86-64-apx-egpr-promote-inval.l: Likewise. * testsuite/gas/i386/x86-64-apx-evex-promoted-intel.d: Likewise. * testsuite/gas/i386/x86-64-apx-evex-promoted-wig.d: Likewise. * testsuite/gas/i386/x86-64-apx-evex-promoted.d: Likewise. opcodes/ PR gas/32171 * i386-opc.tbl: Remove V{BROADCAST,EXTRACT,INSERT}{F,I}128 and VROUND{P,S}{S,D} entries with EGPR. * i386-tbl.h: Regenerated. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-09-06x86/APX: use D for 2-operand CFCMOVccJan Beulich1-575/+275
There's no need to have 30 redundant templates when we can easily take care of the operand swapping like we do for various other insns.
2024-09-06x86/APX: optimize certain reg-only CFCMOVcc formsJan Beulich1-30/+30
Along the lines of 2513312930b2 ("x86/APX: apply NDD-to-legacy transformation to further CMOVcc forms") these can similarly be converted to the shorter legacy-encoded CMOVcc.
2024-09-06x86: templatize VNNI templatesJan Beulich1-20/+20
Reduce redundancy, in preparation of the addition of further counterparts for AVX10.2.
2024-09-02Support ymm rounding control for Intel AVX10.2Haochen Jiang1-179/+179
In the patch, in order to support ymm rounding for AVX10.2, we derive evex attribute for all cases instead of only for rc_none to encode U bit. Also changed some bad_opcode return due to the share of U bit with APX_F. gas/ChangeLog: * config/tc-i386.c (cpu_flags_match): Handle AVX10_2. (build_evex_prefix): Handle U bit. Derive evex attribute for all cases. (check_VecOperands): Handle AVX10.2 and ymm roundings. * doc/c-i386.texi: Document .avx10.2. * testsuite/gas/i386/i386.exp: Run AVX10.2 tests. * testsuite/gas/i386/x86-64.exp: Ditto. * testsuite/gas/i386/avx10_2-rounding-intel.d: New test. * testsuite/gas/i386/avx10_2-rounding-inval.l: Ditto. * testsuite/gas/i386/avx10_2-rounding-inval.s: Ditto. * testsuite/gas/i386/avx10_2-rounding.d: Ditto. * testsuite/gas/i386/avx10_2-rounding.s: Ditto. * testsuite/gas/i386/x86-64-avx10_2-rounding-intel.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-rounding.d: Ditto. * testsuite/gas/i386/x86-64-avx10_2-rounding.s: Ditto. opcodes/ChangeLog: * i386-dis.c (struct instr_info): Add U bit. (get_valid_dis386): Handle U bit. * i386-gen.c (isa_dependencies): Add AVX10.2. (cpu_flags): Ditto. * i386-init.h: Regenerated. * i386-opc.h (CpuAVX10_2): New. (i386_cpu_flags): Add cpuavx10_2. * i386-opc.tbl: Add rounding to old entries which do not permit rounding previously. Also eliminate the redundant RegXMM for vcvtps2uqq. * i386-tbl.h: Regenerated.
2024-08-30x86: limit RegRex64 useJan Beulich1-24/+24
The special property really only applies to the "extended" byte regs having legacy word/dword counterparts. While touching involved code also drop redundant byte checks from a conditional in establish_rex(): The other remaining RegRex64 uses only exist on registers which can't be used as register operands anyway. Hence RegRex64 as an attribute of a (valid) register operand implies that it's a byte reg.
2024-07-26x86/APX: optimize certain {nf}-form insns to BMI2 onesJan Beulich1-13/+13
..., as those leave EFLAGS untouched anyway. That's a shorter encoding, available as long as no eGPR is in use anywhere.
2024-07-04Support APX CFCMOVCui, Lili1-265/+1229
The CMOVcc instruction proposed by EVEX has four different forms, corresponding to the four possible combinations of EVEX.ND and EVEX.NF values. In the encoder part, when the CFCMOV template supports EVEX_NF, it means that it requires EVEX.NF to be 1. In the decoder part, CFCMOV_Fixup is used to reverse source and destination operands in the 2-operand case. gas/ChangeLog: * config/tc-i386.c (build_apx_evex_prefix): Set NF bit for cfcmov when the insn template supports EVEX_NF. * testsuite/gas/i386/x86-64-apx-inval.l: Add invalid tests for cfcmov. * testsuite/gas/i386/x86-64-apx-inval.s: Ditto. * testsuite/gas/i386/x86-64.exp: Add tests for cfcmov and cmov. * testsuite/gas/i386/x86-64-apx-cfcmov-intel.d: Ditto. * testsuite/gas/i386/x86-64-apx-cfcmov.d: Ditto. * testsuite/gas/i386/x86-64-apx-cfcmov.s: Ditto. opcodes/ChangeLog: * i386-dis-evex-prefix.h: Add cfcmov instructions. * i386-dis.c (CFCMOV_Fixup): Special handling of cfcmov. (putop): Print 'cf' for cfcmov instructions. * i386-opc.h (EVEX_NF): New. * i386-opc.tbl: Add cfcmov instructions. * i386-mnem.h: Regerated. * i386-tbl.h: Regerated.
2024-06-28x86/APX: apply NDD-to-legacy transformation to further CMOVcc formsJan Beulich1-30/+30
With both sources being registers, these insns are almost commutative; the only extra adjustment needed is inversion of the encoded condition.
2024-06-28x86/APX: extend TEST-by-imm7 optimization to CTESTccJan Beulich1-29/+29
The same properties apply there.
2024-06-28x86/APX: optimize {nf}-form IMUL-by-power-of-2 to SHLJan Beulich1-6/+6
..., for differing only in the resulting EFLAGS, which are left untouched anyway. That's a shorter encoding, available as long as certain constraints on operands are met; see code comments. (SHL-by-1 forms may then be subject to further optimization that was introduced earlier.) Note that kind of as a side effect this also converts multiplication by 1 to shift by 0, which is a plain move or even no-op anyway. That could be further shrunk (as could be presence of shifts/rotates by 0 in the original code as well as a fair set of other {nf}-form insns), yet the expectation (for now) is that people won't write such code in the first place.
2024-06-28x86/APX: optimize certain {nf}-form insns to LEAJan Beulich1-4/+4
..., as that leaves EFLAGS untouched anyway. That's a shorter encoding, available as long as certain constraints on operand size and registers are met; see code comments. Note that this requires deferring to derive encoding_evex from {nf} presence, as in optimize_encoding() we want to avoid touching the insns when {evex} was also used. Note further that this requires want_disp32() to now also consider the opcode: We don't want to replace i.tm.mnem_off, for diagnostics to still report the original mnemonic (or else things can get confusing). While there, correct adjacent mis-indentation.
2024-06-28x86/APX: optimize {nf}-form rotate-by-width-less-1Jan Beulich1-4/+4
Unlike for the legacy forms, where there's a difference in the resulting EFLAGS.CF, for the NF variants the immediate can be got rid of in that case by switching to a 1-bit rotate in the opposite direction.
2024-06-28x86/APX: optimize {nf} forms of ADD/SUB with specific immediatesJan Beulich1-8/+8
Unlike for the legacy forms, where there's a difference in the resulting EFLAGS, for the NF variants we can safely replace ones using 0x80 by the respectively other insn while negating the immediate, saving 3 immediate bytes (just 1 though for 16-bit operand size). Similarly we can replace ones using 1 / -1 by INC/DEC (eliminating the immediate).
2024-06-21x86: optimize {,V}PEXTR{D,Q} with immediate of 0Jan Beulich1-10/+10
Such are equivalent to simple moves, which are up to 3 bytes shorter to encode (and perhaps also cheaper to execute).
2024-06-21x86: optimize left-shift-by-1Jan Beulich1-24/+24
These can be replaced by adds when acting on a register operand. While for the scalar forms there's no gain in encoding size, ADD generally has higher throughput than SHL. EFLAGS set by ADD are a superset of those set by SHL (AF in particular is undefined there). For the SIMD cases the transformation also reduced code size, by eliminating the 1-byte immediate from the resulting encoding. Note that this transformation is not applied by gcc13 (according to my observations), so would - as of now - even improve compiler generated code.
2024-06-19x86: Remove the secondary encoding for ctest.Cui, Lili1-569/+289
There are two encodings for each opcode F6/F7 in ctest, but the second one is never used, so remove it to reduce the size of opcode_tbl.h. opcodes/ChangeLog: * i386-opc.tbl: Removed the secondary insn template for ctest. * i386-tbl.h: Regenerated.
2024-06-18Support APX CCMP and CTESTCui, Lili1-290/+2081
CCMP and CTEST are two new sets of instructions for conditional CMP and TEST, SCC and OSZC flags are given as suffixes of CCMP or CTEST in the instruction mnemonic, e.g.: ccmp<cc> { dfv=sf , cf , of } %eax, %ecx also add {evex} cmp/test %eax, %ecx as an alias for ccmpt. For the encoder part, add function check_Scc_OszcOperation to parse '{ dfv=of , sf, sf, cf}', store scc in the lower 4 bits of base_opcode, and adjust base_opcode to its normal meaning in install_template. For the decoder part, add 'SC' and 'DF' macros to add scc and oszc flags suffixes. gas/ChangeLog: * config/tc-i386.c (OSZC_CF): New. (OSZC_ZF): Ditto. (OSZC_SF): Ditto. (OSZC_OF): Ditto. (set_oszc_flags): Set oszc flags and report error for using the same oszc flags twice. (check_Scc_OszcOperations): Handle SCC OSZC flags. (install_template): Add scc and oszc_flags. (build_apx_evex_prefix): Encode SCC and oszc flags bits. (parse_insn): Handle check_Scc_OszcOperations. * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.d: Add ivalid test case. * testsuite/gas/i386/x86-64-apx-evex-promoted-bad.s: Ditto. * testsuite/gas/i386/x86-64.exp: Add test for ccmp and ctest. * testsuite/gas/i386/x86-64-apx-ccmp-ctest-intel.d: New test. * testsuite/gas/i386/x86-64-apx-ccmp-ctest-inval.l: Ditto. * testsuite/gas/i386/x86-64-apx-ccmp-ctest-inval.s: Ditto. * testsuite/gas/i386/x86-64-apx-ccmp-ctest.d: Ditto. * testsuite/gas/i386/x86-64-apx-ccmp-ctest.s: Ditto. opcodes/ChangeLog: * i386-dis-evex-reg.h: Add ccmp and ctest. * i386-dis-evex.h: Ditto. * i386-dis.c (struct instr_info): add scc. (struct dis386): Add new micro 'NE','SC' and'DF'. (get_valid_dis386): Get scc value and move MAP4 invalid check to print_insn. (putop): Handle %NE, %SC and %DF. * i386-opc.h (SCC): New. * i386-opc.tbl: Add ccmp/ctest and evex format for cmp/test. * i386-mnem.h: Regenerated. * i386-tbl.h: Ditto.
2024-06-10x86/APX: convert ZU to operand constraintJan Beulich1-4216/+4216
Extremely rarely used attributes are inefficient when represented by a separate attribute. Convert it to an operand constraint, as already suggested during review. The collision with RegKludge is pretty simple to resolve.
2024-06-10x86/APX: support extended SETcc formJan Beulich1-311/+551
As indicated during review, spelling/readability-wise setz %eax is easier than setzuz %al _and_ properly specifies the full register that's being modified. Permit that form to be used, even if the spec writers are unwilling to formally mention it. While there also correct the non-ZU EVEX form: That ought to also permit memory operands.
2024-06-10x86/APX: add missing CPU requirement to imm+rm forms of <alu2> insnsJan Beulich1-14/+14
This was overlooked when the form was added by dd74a603376e ("Support APX NF").
2024-05-29x86/Intel: warn about undue mnemonic suffixesJan Beulich1-5570/+5570
Except for very few insns mnemonic suffixes aren't permitted in Intel syntax. Warn about such for now, indicating that they will be outright refused down the road. While fiddling with testcases to address fallout, drop a few things which should never have been tested as valid Intel syntax. Also add a previously missing line to simd-suffix.d.
2024-05-24x86: correct VCVT{,U}SI2SDJan Beulich1-8/+8
Properly reject inappropriate suffixes (No_lSuf / No_qSuf mistakenly omitted by cf665fee1d6c ["x86: re-work AVX512 embedded rounding / SAE"]), to avoid emitting bad or arbitrarily guessed instructions. Interestingly check_{long,qword}_suffix() don't help here, which perhaps is another indication that the way they work right now isn't quite appropriate. Sadly correcting just the templates breaks operand ambiguity detection, since so far that worked from a single template permitting more than one suffix. Here we have ambiguity though which can now be noticed only when taking all (matching) templates together. Therefore we need to determine further matching templates (see code comments for constraints), to then accumulate permitted suffixes across all of them.
2024-05-22Support APX zero-upperCui, Lili1-4339/+4898
This patch is to enable ZU for IMUL (opcodes 0x69 and 0x6B) and SETcc. Since the spec only recommends one form of setzu, I won't be adding set<cc>reg32/reg64 support in this patch. gas/ChangeLog: * config/tc-i386.c (build_apx_evex_prefix): Handle ZU. * testsuite/gas/i386/x86-64.exp: Added new tests for ZU. * testsuite/gas/i386/x86-64.exp: Added new tests for ZU. * testsuite/gas/i386/x86-64-apx-zu-intel.d: New test. * testsuite/gas/i386/x86-64-apx-zu-inval.l: Ditto. * testsuite/gas/i386/x86-64-apx-zu-inval.s: Ditto. * testsuite/gas/i386/x86-64-apx-zu.d: Ditto. * testsuite/gas/i386/x86-64-apx-zu.s: Ditto. opcodes/ChangeLog: * i386-dis-evex-prefix.h: Handle PREFIX_EVEX_MAP4_40 ~ PREFIX_EVEX_MAP4_4F. * i386-dis-evex.h: Ditto. * i386-dis.c (struct dis386): Add new micro 'ZU'. (putop): Handle %ZU. * i386-gen.c: Added ZU. * i386-opc.h: Ditto. * i386-opc.tbl: Added new templates to support ZU.
2024-05-06x86: Drop using extension_opcode to encode vvvv registerCui, Lili1-56/+56
gas/ChangeLog: * config/tc-i386.c (build_modrm_byte): Dropped the use of extension_opcode to encode the vvvv register. * testsuite/gas/i386/x86-64-sse2avx.d: Added new testcases. * testsuite/gas/i386/x86-64-sse2avx.s: Diito. opcodes/ChangeLog: * i386-opc.tbl: Added DstVVVV to some extension_opcode instructions. * i386-tbl.h: Regenerated.
2024-05-06x86: Drop SwapSourcesCui, Lili1-283/+283
gas/ChangeLog: * config/tc-i386.c (build_modrm_byte): Dropped the use of SWAP_SOURCES to encode the vvvv register. opcodes/ChangeLog: * i386-opc.h (SWAP_SOURCES): Dropped. (NO_DEFAULT_MASK): Adjusted the value. (ADDR_PREFIX_OP_REG): Ditto. (DISTINCT_DEST): Ditto. (IMPLICIT_STACK_OP): Ditto. (VexVVVV_SRC2): New. * i386-opc.tbl: Dropped SwapSources and replaced its VexVVVV with Src1VVVV. * i386-tbl.h: Regenerated.
2024-05-06x86: Use vexvvvv as the switch state to encode the vvvv registerCui, Lili1-86/+86
Use vexvvvv as the switch state, and replace VexVVVV with Src1VVVV. Src1VVVV means using VEX.vvvv encodes the first source register operand. The old logic did not check vexvvvv first, which made the logic here very complicated. gas/ChangeLog: * config/tc-i386.c (optimize_encoding): Replaced 1 with Src1VVVV. (build_modrm_byte): Used vexvvvv to encode the vvvv register. (s_insn): Replaced 1 with Src1VVVV. opcodes/ChangeLog: * i386-opc.h (VexVVVV_DST): Adjusted the value. (Src1VVVV): New. * i386-opc.tbl: Replaced part VexVVVV with Src1VVVV. * i386-tbl.h: Regenerated.
2024-05-03x86/APX: further extend SSE2AVX coverageJan Beulich1-225/+255
Since {vex}/{vex3} are respected on legacy mnemonics when -msse2avx is in use, {evex} should be respected, too. So far this is the case only for insns where eGPR-s can come into play. Extend coverage to insns with only %xmm register and possibly immediate operands.
2024-05-03x86/APX: extend SSE2AVX coverageJan Beulich1-452/+1684
Legacy encoded SIMD insns are converted to AVX ones in that mode. When eGPR-s are in use, i.e. with APX, convert to AVX10 insns (where available; there are quite a few which can't be converted). Note that LDDQU is represented as VMOVDQU32 (and the prior use of the sse3 template there needs dropping, to get the order right). Note further that in a few cases, due to the use of templates, AVX512VL is used when AVX512F would suffice. Since AVX10 is the main reference, this shouldn't be too much of a problem.