aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-10-20target/arm: Restrict tlb flush from vttbr_write to vmid changeRichard Henderson1-2/+2
Compare only the VMID field when considering whether we need to flush. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20221011031911.2408754-7-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-20target/arm: Move ARMMMUIdx_Stage2 to a real tlb mmu_idxRichard Henderson3-49/+127
We had been marking this ARM_MMU_IDX_NOTLB, move it to a real tlb. Flush the tlb when invalidating stage 1+2 translations. Re-use alle1_tlbmask() for other instances of EL1&0 + Stage2. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20221011031911.2408754-6-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-20target/arm: Add ARMMMUIdx_Phys_{S,NS}Richard Henderson3-4/+24
Not yet used, but add mmu indexes for 1-1 mapping to physical addresses. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221011031911.2408754-5-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-20target/arm: Use probe_access_full for BTIRichard Henderson5-31/+20
Add a field to TARGET_PAGE_ENTRY_EXTRA to hold the guarded bit. In is_guarded_page, use probe_access_full instead of just guessing that the tlb entry is still present. Also handles the FIXME about executing from device memory. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221011031911.2408754-4-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-20target/arm: Use probe_access_full for MTERichard Henderson5-86/+36
The CPUTLBEntryFull structure now stores the original pte attributes, as well as the physical address. Therefore, we no longer need a separate bit in MemTxAttrs, nor do we need to walk the tree of memory regions. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221011031911.2408754-3-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-20target/arm: Enable TARGET_PAGE_ENTRY_EXTRARichard Henderson2-0/+15
Copy attrs and shareability, into the TLB. This will eventually be used by S1_ptw_translate to report stage1 translation failures, and by do_ats_write to fill in PAR_EL1. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20221011031911.2408754-2-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-20target/arm: update the cortex-a15 MIDR to latest revAlex Bennée1-1/+3
QEMU doesn't model micro-architectural details which includes most chip errata. The ARM_ERRATA_798181 work around in the Linux kernel (see erratum_a15_798181_init) currently detects QEMU's cortex-a15 as broken and triggers additional expensive TLB flushes as a result. Change the MIDR to report what the latest silicon would (r4p0). We explicitly set the IMPDEF revidr bits to 0 because we don't need to set anything other than the silicon revision to indicate these flushes are not needed. This cuts about 5s from my Debian kernel boot with the latest 6.0rc1 kernel (29s->24s). Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Message-id: 20221010153225.506394-1-alex.bennee@linaro.org Cc: Arnd Bergmann <arnd@linaro.org> Cc: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Message-Id: <20220906172257.2776521-1-alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-20hw/char/pl011: fix baud rate calculationBaruch Siach1-1/+1
The PL011 TRM says that "UARTIBRD = 0 is invalid and UARTFBRD is ignored when this is the case". But the code looks at FBRD for the invalid case. Fix this. Signed-off-by: Baruch Siach <baruch@tkos.co.il> Message-id: 1408f62a2e45665816527d4845ffde650957d5ab.1665051588.git.baruchs-c@neureality.ai Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-10-18Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingStefan Hajnoczi32-2931/+5980
* configure: don't enable firmware for targets that are not built * configure: don't use strings(1) * scsi, target/i386: switch from device_legacy_reset() to device_cold_reset() * target/i386: AVX support for TCG * target/i386: fix SynIC SINT assertion failure on guest reset * target/i386: Use atomic operations for pte updates and other cleanups * tests/tcg: extend SSE tests to AVX * virtio-scsi: send "REPORTED LUNS CHANGED" sense data upon disk hotplug events # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmNOlOcUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroNuvwgAj/Z5pI9KU33XiWKFR3bZf2lHh21P # xmTzNtPmnP1WHDY1DNug/UB+BLg3c+carpTf5n3B8aKI4X3FfxGSJvYlXy4BONFD # XqYMH3OZB5GaR8Wza9trNYjDs/9hOZus/0R6Hqdl/T38PlMjf8mmayULJIGdcFcJ # WJvITVntbcCwwbpyJbRC5BNigG8ZXTNRoKBgtFVGz6Ox+n0YydwKX5qU5J7xRfCU # lW41LjZ0Fk5lonH16+xuS4WD5EyrNt8cMKCGsxnyxhI7nehe/OGnYr9l+xZJclrh # inQlSwJv0IpUJcrGCI4Xugwux4Z7ZXv3JQ37FzsdZcv/ZXpGonXMeXNJ9A== # =o6x7 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 18 Oct 2022 07:58:31 EDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (53 commits) target/i386: remove old SSE decoder target/i386: move 3DNow to the new decoder tests/tcg: extend SSE tests to AVX target/i386: Enable AVX cpuid bits when using TCG target/i386: implement VLDMXCSR/VSTMXCSR target/i386: implement XSAVE and XRSTOR of AVX registers target/i386: reimplement 0x0f 0x28-0x2f, add AVX target/i386: reimplement 0x0f 0x10-0x17, add AVX target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVX target/i386: reimplement 0x0f 0x38, add AVX target/i386: Use tcg gvec ops for pmovmskb target/i386: reimplement 0x0f 0x3a, add AVX target/i386: clarify (un)signedness of immediates from 0F3Ah opcodes target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVX target/i386: reimplement 0x0f 0x70-0x77, add AVX target/i386: reimplement 0x0f 0x78-0x7f, add AVX target/i386: reimplement 0x0f 0x50-0x5f, add AVX target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVX target/i386: reimplement 0x0f 0x60-0x6f, add AVX target/i386: Introduce 256-bit vector helpers ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-18Merge tag 'pull-ppc-20221017' of https://gitlab.com/danielhb/qemu into stagingStefan Hajnoczi37-370/+589
ppc patch queue for 2022-10-18: This queue contains improvements in the e500 and ppc4xx boards, changes in the maintainership of the project, a new QMP/HMP command and bug fixes: - Cedric is stepping back from qemu-ppc maintainership; - ppc4xx_sdram: QOMification and clean ups; - e500: add new types of flash and clean ups; - QMP/HMP: introduce dumpdtb command; - spapr_pci, booke doorbell interrupt and xvcmp* bit fixes; The 'dumpdtb' implementation is also making changes to RISC-V files that were acked by Alistair Francis and are being included in this queue. # -----BEGIN PGP SIGNATURE----- # # iHUEABYKAB0WIQQX6/+ZI9AYAK8oOBk82cqW3gMxZAUCY02qEgAKCRA82cqW3gMx # ZIadAQCYY9f+NFrSJBm3z4JjUaP+GmbgEjibjZW05diyKwbqzQEAjE1KXFCcd40D # 3Brs2Dm4YruaJCwb68vswVQAYteXaQ8= # =hl94 # -----END PGP SIGNATURE----- # gpg: Signature made Mon 17 Oct 2022 15:16:34 EDT # gpg: using EDDSA key 17EBFF9923D01800AF2838193CD9CA96DE033164 # gpg: Good signature from "Daniel Henrique Barboza <danielhb413@gmail.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 17EB FF99 23D0 1800 AF28 3819 3CD9 CA96 DE03 3164 * tag 'pull-ppc-20221017' of https://gitlab.com/danielhb/qemu: (38 commits) hw/riscv: set machine->fdt in spike_board_init() hw/riscv: set machine->fdt in sifive_u_machine_init() hw/ppc: set machine->fdt in spapr machine hw/ppc: set machine->fdt in pnv_reset() hw/ppc: set machine->fdt in pegasos2_machine_reset() hw/ppc: set machine->fdt in xilinx_load_device_tree() hw/ppc: set machine->fdt in sam460ex_load_device_tree() hw/ppc: set machine->fdt in bamboo_load_device_tree() hw/nios2: set machine->fdt in nios2_load_dtb() qmp/hmp, device_tree.c: introduce dumpdtb hw/ppc/spapr_pci.c: Use device_cold_reset() rather than device_legacy_reset() target/ppc: Fix xvcmp* clearing FI bit hw/ppc/e500: Remove if statement which is now always true hw/ppc/mpc8544ds: Add platform bus hw/ppc/mpc8544ds: Rename wrongly named method hw/ppc/e500: Reduce usage of sysbus API docs/system/ppc/ppce500: Add heading for networking chapter hw/gpio/meson: Introduce dedicated config switch for hw/gpio/mpc8xxx hw/ppc/meson: Allow e500 boards to be enabled separately ppc440_uc.c: Remove unneeded parenthesis ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-10-18target/i386: remove old SSE decoderPaolo Bonzini5-1907/+19
With all SSE (and AVX!) instructions now implemented in disas_insn_new, it's possible to remove gen_sse, as well as the helpers for instructions that now use gvec. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: move 3DNow to the new decoderPaolo Bonzini6-76/+74
This adds another kind of weirdness when you thought you had seen it all: an opcode byte that comes _after_ the address, not before. It's not worth adding a new X86_SPECIAL_* constant for it, but it's actually not unlike VCMP; so, forgive me for exploiting the similarity and just deciding to dispatch to the right gen_helper_* call in a single code generation function. In fact, the old decoder had a bug where s->rip_offset should have been set to 1 for 3DNow! instructions, and it's fixed now. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18tests/tcg: extend SSE tests to AVXPaolo Bonzini3-94/+112
Extracted from a patch by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Enable AVX cpuid bits when using TCGPaul Brook1-5/+5
Include AVX, AVX2 and VAES in the guest cpuid features supported by TCG. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-40-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: implement VLDMXCSR/VSTMXCSRPaolo Bonzini2-0/+45
These are exactly the same as the non-VEX version, but one has to be careful that only VEX.L=0 is allowed. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: implement XSAVE and XRSTOR of AVX registersPaolo Bonzini1-3/+75
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x28-0x2f, add AVXPaolo Bonzini3-0/+185
Here the code is a bit uglier due to the truncation and extension of registers to and from 32-bit. There is also a mistake in the manual with respect to the size of the memory operand of CVTPS2PI and CVTTPS2PI, reported by Ricky Zhou. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x10-0x17, add AVXPaolo Bonzini5-0/+264
These are mostly moves, and yet are a total pain. The main issue is that: 1) some instructions are selected by mod==11 (register operand) vs. mod=00/01/10 (memory operand) 2) stores to memory are two-operand operations, while the 3-register and load-from-memory versions operate on the entire contents of the destination; this makes it easier to separate the gen_* function for the store case 3) it's inefficient to load into xmm_T0 only to move the value out again, so the gen_* function for the load case is separated too The manual also has various mistakes in the operands here, for example the store case of MOVHPS operates on a 128-bit source (albeit discarding the bottom 64 bits) and therefore should be Mq,Vdq rather than Mq,Vq. Likewise for the destination and source of MOVHLPS. VUNPCK?PS and VUNPCK?PD are the same as VUNPCK?DQ and VUNPCK?QDQ, but encoded as prefixes rather than separate operands. The helpers can be reused however. For MOVSLDUP, MOVSHDUP and MOVDDUP I chose to reimplement them as helpers. I named the helper for MOVDDUP "movdldup" in preparation for possible future introduction of MOVDHDUP and to clarify the similarity with MOVSLDUP. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0xc2, 0xc4-0xc6, add AVXPaolo Bonzini3-0/+81
Nothing special going on here, for once. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x38, add AVXPaolo Bonzini6-8/+524
There are several special cases here: 1) extending moves have different widths for the helpers vs. for the memory loads, and the width for memory loads depends on VEX.L too. This is represented by X86_SPECIAL_AVXExtMov. 2) some instructions, such as variable-width shifts, select the vector element size via REX.W. 3) VSIB instructions (VGATHERxPy, VPGATHERxy) are also part of this group, and they have (among other things) two output operands. 3) the macros for 4-operand blends (which are under 0x0f 0x3a) have to be extended to support 2-operand blends. The 2-operand variant actually came a few years earlier, but it is clearer to implement them in the opposite order. X86_TYPE_WM, introduced earlier for unaligned loads, is reused for helpers that accept a Reg* but have a M argument. These three-byte opcodes also include AVX new instructions, for which the helpers were originally implemented by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Use tcg gvec ops for pmovmskbRichard Henderson1-5/+83
As pmovmskb is used by strlen et al, this is the third highest overhead sse operation at %0.8. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> [Reorganize to generate code for any vector size. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x3a, add AVXPaolo Bonzini5-1/+491
The more complicated operations here are insertions and extractions. Otherwise, there are just more entries than usual because the PS/PD/SS/SD variations are encoded in the opcode rater than in the prefixes. These three-byte opcodes also include AVX new instructions, whose implementation in the helpers was originally done by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: clarify (un)signedness of immediates from 0F3Ah opcodesPaolo Bonzini2-5/+5
Three-byte opcodes from the 0F3Ah area all have an immediate byte which is usually unsigned. Clarify in the helper code that it is unsigned; the new decoder treats immediates as signed by default, and seeing an intN_t in the prototype might give the wrong impression that one can use decode->immediate directly. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0xd0-0xd7, 0xe0-0xe7, 0xf0-0xf7, add AVXPaolo Bonzini4-11/+122
The more complicated ones here are d6-d7, e6-e7, f7. The others are trivial. For LDDQU, using gen_load_sse directly might corrupt the register if the second part of the load fails. Therefore, add a custom X86_TYPE_WM value; like X86_TYPE_W it does call gen_load(), but it also rejects a value of 11 in the ModRM field like X86_TYPE_M. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x70-0x77, add AVXPaolo Bonzini3-6/+293
This includes shifts by immediate, which use bits 3-5 of the ModRM byte as an opcode extension. With the exception of 128-bit shifts, they are implemented using gvec. This also covers VZEROALL and VZEROUPPER, which use the same opcode as EMMS. If we were wanting to optimize out gen_clear_ymmh then this would be one of the starting points. The implementation of the VZEROALL and VZEROUPPER helpers is by Paul Brook. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x78-0x7f, add AVXPaolo Bonzini3-0/+138
These are a mixed batch, including the first two horizontal (66 and F2 only) operations, more moves, and SSE4a extract/insert. Because SSE4a is pretty rare, I chose to leave the helper as they are, but it is possible to unify them by loading index and length from the source XMM register and generating deposit or extract TCG ops. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x50-0x5f, add AVXPaolo Bonzini3-1/+210
These are mostly floating-point SSE operations. The odd ones out are MOVMSK and CVTxx2yy, the others are straightforward. Unary operations are a bit special in AVX because they have 2 operands for PD/PS operands (VEX.vvvv must be 1111b), and 3 operands for SD/SS. They are handled using X86_OP_GROUP3 for compactness. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0xd8-0xdf, 0xe8-0xef, 0xf8-0xff, add AVXPaolo Bonzini3-1/+63
These are more simple integer instructions present in both MMX and SSE/AVX, with no holes that were later occupied by newer instructions. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: reimplement 0x0f 0x60-0x6f, add AVXPaolo Bonzini3-1/+262
These are both MMX and SSE/AVX instructions, except for vmovdqu. In both cases the inputs and output is in s->ptr{0,1,2}, so the only difference between MMX, SSE, and AVX is which helper to call. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Introduce 256-bit vector helpersPaolo Bonzini4-0/+14
The new implementation of SSE will cover AVX from the get go, because all the work for the helper functions is already done. We just need to build them. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: implement additional AVX comparison operatorsPaolo Bonzini2-0/+65
The new implementation of SSE will cover AVX from the get go, so include the 24 extra comparison operators that are only available with the VEX prefix. Based on a patch by Paul Brook <paul@nowt.org>. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: provide 3-operand versions of unary scalar helpersPaolo Bonzini3-25/+61
Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the old SSE decoder to be compatible with new-style helpers. The affected instructions are CVTSx2Sx, ROUNDSx, RSQRTSx, SQRTSx, RCPSx. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: support operand merging in binary scalar helpersPaolo Bonzini1-0/+16
Compared to Paul's implementation, the new decoder will use a different approach to implement AVX's merging of dst with src1 on scalar operations. Adjust the helpers to provide this functionality. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: extend helpers to support VEX.V 3- and 4- operand encodingsPaolo Bonzini3-238/+265
Add to the helpers all the operands that are needed to implement AVX. Extracted from a patch by Paul Brook <paul@nowt.org>. Message-Id: <20220424220204.2493824-26-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Prepare ops_sse_header.h for 256 bit AVXPaul Brook1-40/+76
Adjust all #ifdefs to match the ones in ops_sse.h. Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-23-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: move scalar 0F 38 and 0F 3A instruction to new decoderPaolo Bonzini3-289/+321
Because these are the only VEX instructions that QEMU supports, the new decoder is entered on the first byte of a valid VEX prefix, and VEX decoding only needs to be done in decode-new.c.inc. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: validate SSE prefixes directly in the decoding tablePaolo Bonzini2-0/+38
Many SSE and AVX instructions are only valid with specific prefixes (none, 66, F3, F2). Introduce a direct way to encode this in the decoding table to avoid using decode groups too much. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: validate VEX prefixes via the instructions' exception classesPaolo Bonzini4-12/+239
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: add AVX_EN hflagPaul Brook3-0/+16
Add a new hflag bit to determine whether AVX instructions are allowed Signed-off-by: Paul Brook <paul@nowt.org> Message-Id: <20220424220204.2493824-4-paul@nowt.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: add CPUID feature checks to new decoderPaolo Bonzini2-0/+75
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: add CPUID[EAX=7,ECX=0].ECX to DisasContextPaolo Bonzini1-0/+2
TCG will shortly implement VAES instructions, so add the relevant feature word to the DisasContext. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: add ALU load/writeback corePaolo Bonzini4-1/+212
Add generic code generation that takes care of preparing operands around calls to decode.e.gen in a table-driven manner, so that ALU operations need not take care of that. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: add core of new i386 decoderPaolo Bonzini4-8/+1020
The new decoder is based on three principles: - use mostly table-driven decoding, using tables derived as much as possible from the Intel manual. Centralizing the decode the operands makes it more homogeneous, for example all immediates are signed. All modrm handling is in one function, and can be shared between SSE and ALU instructions (including XMM<->GPR instructions). The SSE/AVX decoder will also not have duplicated code between the 0F, 0F38 and 0F3A tables. - keep the code as "non-branchy" as possible. Generally, the code for the new decoder is more verbose, but the control flow is simpler. Conditionals are not nested and have small bodies. All instruction groups are resolved even before operands are decoded, and code generation is separated as much as possible within small functions that only handle one instruction each. - keep address generation and (for ALU operands) memory loads and writeback as much in common code as possible. All ALU operations for example are implemented as T0=f(T0,T1). For non-ALU instructions, read-modify-write memory operations are rare, but registers do not have TCGv equivalents: therefore, the common logic sets up pointer temporaries with the operands, while load and writeback are handled by gvec or by helpers. These principles make future code review and extensibility simpler, at the cost of having a relatively large amount of code in the form of this patch. Even EVEX should not be _too_ hard to implement (it's just a crazy large amount of possibilities). This patch introduces the main decoder flow, and integrates the old decoder with the new one. The old decoder takes care of parsing prefixes and then optionally drops to the new one. The changes to the old decoder are minimal and allow it to be replaced incrementally with the new one. There is a debugging mechanism through a "LIMIT" environment variable. In user-mode emulation, the variable is the number of instructions decoded by the new decoder before permanently switching to the old one. In system emulation, the variable is the highest opcode that is decoded by the new decoder (this is less friendly, but it's the best that can be done without requiring deterministic execution). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: make rex_w available even in 32-bit modePaolo Bonzini1-5/+5
REX.W can be used even in 32-bit mode by AVX instructions, where it is retroactively renamed to VEX.W. Make the field available even in 32-bit mode but keep the REX_W() macro as it was; this way, that the handling of dflag does not use it by mistake and the AVX code more clearly points at the special VEX behavior of the bit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: make ldo/sto operations consistent with ldqPaolo Bonzini1-21/+22
ldq takes a pointer to the first byte to load the 64-bit word in; ldo takes a pointer to the first byte of the ZMMReg. Make them consistent, which will be useful in the new SSE decoder's load/writeback routines. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Define XMMReg and access macros, align ZMM registersRichard Henderson1-14/+44
This will be used for emission and endian adjustments of gvec operations. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220822223722.1697758-2-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Use probe_access_full for final stage2 translationRichard Henderson1-14/+28
Rather than recurse directly on mmu_translate, go through the same softmmu lookup that we did for the page table walk. This centralizes all knowledge of MMU_NESTED_IDX, with respect to setup of TranslationParams, to get_physical_address. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221002172956.265735-10-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Use atomic operations for pte updatesRichard Henderson1-74/+168
Use probe_access_full in order to resolve to a host address, which then lets us use a host cmpxchg to update the pte. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/279 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221002172956.265735-9-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Combine 5 sets of variables in mmu_translateRichard Henderson1-87/+91
We don't need one variable set per translation level, which requires copying into pte/pte_addr for huge pages. Standardize on pte/pte_addr for all levels. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221002172956.265735-8-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-10-18target/i386: Use MMU_NESTED_IDX for vmload/vmsaveRichard Henderson3-138/+126
Use MMU_NESTED_IDX for each memory access, rather than just a single translation to physical. Adjust svm_save_seg and svm_load_seg to pass in mmu_idx. This removes the last use of get_hphys so remove it. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20221002172956.265735-7-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>