aboutsummaryrefslogtreecommitdiff
path: root/target/i386
AgeCommit message (Collapse)AuthorFilesLines
2024-02-28target/i386: leave the A20 bit set in the final NPT walkPaolo Bonzini1-5/+7
The A20 mask is only applied to the final memory access. Nested page tables are always walked with the raw guest-physical address. Unlike the previous patch, in this one the masking must be kept, but it was done too early. Cc: qemu-stable@nongnu.org Fixes: 4a1e9d4d11c ("target/i386: Use atomic operations for pte updates", 2022-10-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit b5a9de3259f4c791bde2faff086dd5737625e41e) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-28target/i386: remove unnecessary/wrong application of the A20 maskPaolo Bonzini1-13/+8
If ptw_translate() does a MMU_PHYS_IDX access, the A20 mask is already applied in get_physical_address(), which is called via probe_access_full() and x86_cpu_tlb_fill(). If ptw_translate() on the other hand does a MMU_NESTED_IDX access, the A20 mask must not be applied to the address that is looked up in the nested page tables; it must be applied only to the addresses that hold the NPT entries (which is achieved via MMU_PHYS_IDX, per the previous paragraph). Therefore, we can remove A20 masking from the computation of the page table entry's address, and let get_physical_address() or mmu_translate() apply it when they know they are returning a host-physical address. Cc: qemu-stable@nongnu.org Fixes: 4a1e9d4d11c ("target/i386: Use atomic operations for pte updates", 2022-10-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit a28fe7dc1939333c81b895cdced81c69eb7c5ad0) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-28target/i386: Fix physical address truncationPaolo Bonzini2-7/+11
The address translation logic in get_physical_address() will currently truncate physical addresses to 32 bits unless long mode is enabled. This is incorrect when using physical address extensions (PAE) outside of long mode, with the result that a 32-bit operating system using PAE to access memory above 4G will experience undefined behaviour. The truncation code was originally introduced in commit 33dfdb5 ("x86: only allow real mode to access 32bit without LMA"), where it applied only to translations performed while paging is disabled (and so cannot affect guests using PAE). Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX") rearranged the code such that the truncation also applied to the use of MMU_PHYS_IDX and MMU_NESTED_IDX. Commit 4a1e9d4 ("target/i386: Use atomic operations for pte updates") brought this truncation into scope for page table entry accesses, and is the first commit for which a Windows 10 32-bit guest will reliably fail to boot if memory above 4G is present. The truncation code however is not completely redundant. Even though the maximum address size for any executed instruction is 32 bits, helpers for operations such as BOUND, FSAVE or XSAVE may ask get_physical_address() to translate an address outside of the 32-bit range, if invoked with an argument that is close to the 4G boundary. Likewise for processor accesses, for example TSS or IDT accesses, when EFER.LMA==0. So, move the address truncation in get_physical_address() so that it applies to 32-bit MMU indexes, but not to MMU_PHYS_IDX and MMU_NESTED_IDX. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040 Fixes: 4a1e9d4d11c ("target/i386: Use atomic operations for pte updates", 2022-10-18) Cc: qemu-stable@nongnu.org Co-developed-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit b1661801c184119a10ad6cbc3b80330fc22e7b2c) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> (Mjt: drop unrelated change in target/i386/cpu.c)
2024-02-28target/i386: check validity of VMCB addressesPaolo Bonzini2-6/+24
MSR_VM_HSAVE_PA bits 0-11 are reserved, as are the bits above the maximum physical address width of the processor. Setting them to 1 causes a #GP (see "15.30.4 VM_HSAVE_PA MSR" in the AMD manual). The same is true of VMCB addresses passed to VMRUN/VMLOAD/VMSAVE, even though the manual is not clear on that. Cc: qemu-stable@nongnu.org Fixes: 4a1e9d4d11c ("target/i386: Use atomic operations for pte updates", 2022-10-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit d09c79010ffd880dc69e7a21e3cfdef90b928fb8) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-28target/i386: mask high bits of CR3 in 32-bit modePaolo Bonzini1-2/+2
CR3 bits 63:32 are ignored in 32-bit mode (either legacy 2-level paging or PAE paging). Do this in mmu_translate() to remove the last where get_physical_address() meaningfully drops the high bits of the address. Cc: qemu-stable@nongnu.org Suggested-by: Richard Henderson <richard.henderson@linaro.org> Fixes: 4a1e9d4d11c ("target/i386: Use atomic operations for pte updates", 2022-10-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 68fb78d7d5723066ec2cacee7d25d67a4143b42f) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-20target/i386: Generate an illegal opcode exception on cmp instructions with ↵Ziqiao Kong1-5/+6
lock prefix target/i386: As specified by Intel Manual Vol2 3-180, cmp instructions are not allowed to have lock prefix and a `UD` should be raised. Without this patch, s1->T0 will be uninitialized and used in the case OP_CMPL. Signed-off-by: Ziqiao Kong <ziqiaokong@gmail.com> Message-ID: <20240215095015.570748-2-ziqiaokong@gmail.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 99d0dcd7f102c07a510200d768cae65e5db25d23) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-20i386/cpuid: Move leaf 7 to correct groupXiaoyao Li1-1/+1
CPUID leaf 7 was grouped together with SGX leaf 0x12 by commit b9edbadefb9e ("i386: Propagate SGX CPUID sub-leafs to KVM") by mistake. SGX leaf 0x12 has its specific logic to check if subleaf (starting from 2) is valid or not by checking the bit 0:3 of corresponding EAX is 1 or not. Leaf 7 follows the logic that EAX of subleaf 0 enumerates the maximum valid subleaf. Fixes: b9edbadefb9e ("i386: Propagate SGX CPUID sub-leafs to KVM") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240125024016.2521244-4-xiaoyao.li@intel.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 0729857c707535847d7fe31d3d91eb8b2a118e3c) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-20i386/cpuid: Decrease cpuid_i when skipping CPUID leaf 1FXiaoyao Li1-0/+1
Existing code misses a decrement of cpuid_i when skip leaf 0x1F. There's a blank CPUID entry(with leaf, subleaf as 0, and all fields stuffed 0s) left in the CPUID array. It conflicts with correct CPUID leaf 0. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by:Yang Weijiang <weijiang.yang@intel.com> Message-ID: <20240125024016.2521244-2-xiaoyao.li@intel.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 10f92799af8ba3c3cef2352adcd4780f13fbab31) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-20i386/cpu: Mask with XCR0/XSS mask for FEAT_XSAVE_XCR0_HI and ↵Xiaoyao Li1-2/+2
FEAT_XSAVE_XSS_HI leafs The value of FEAT_XSAVE_XCR0_HI leaf and FEAT_XSAVE_XSS_HI leaf also need to be masked by XCR0 and XSS mask respectively, to make it logically correct. Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Yang Weijiang <weijiang.yang@intel.com> Message-ID: <20240115091325.1904229-3-xiaoyao.li@intel.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit a11a365159b944e05be76f3ec3b98c8b38cb70fd) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-02-20i386/cpu: Clear FEAT_XSAVE_XSS_LO/HI leafs when CPUID_EXT_XSAVE is not availableXiaoyao Li1-0/+2
Leaf FEAT_XSAVE_XSS_LO and FEAT_XSAVE_XSS_HI also need to be cleared when CPUID_EXT_XSAVE is not set. Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features") Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Yang Weijiang <weijiang.yang@intel.com> Message-ID: <20240115091325.1904229-2-xiaoyao.li@intel.com> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 81f5cad3858f27623b1b14467926032d229b76cc) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-01-20target/i386: pcrel: store low bits of physical address in data[0]Paolo Bonzini2-5/+16
For PC-relative translation blocks, env->eip changes during the execution of a translation block, Therefore, QEMU must be able to recover an instruction's PC just from the TranslationBlock struct and the instruction data with. Because a TB will not span two pages, QEMU stores all the low bits of EIP in the instruction data and replaces them in x86_restore_state_to_opc. Bits 12 and higher (which may vary between executions of a PCREL TB, since these only use the physical address in the hash key) are kept unmodified from env->eip. The assumption is that these bits of EIP, unlike bits 0-11, will not change as the translation block executes. Unfortunately, this is incorrect when the CS base is not aligned to a page. Then the linear address of the instructions (i.e. the one with the CS base addred) indeed will never span two pages, but bits 12+ of EIP can actually change. For example, if CS base is 0x80262200 and EIP = 0x6FF4, the first instruction in the translation block will be at linear address 0x802691F4. Even a very small TB will cross to EIP = 0x7xxx, while the linear addresses will remain comfortably within a single page. The fix is simply to use the low bits of the linear address for data[0], since those don't change. Then x86_restore_state_to_opc uses tb->cs_base to compute a temporary linear address (referring to some unknown instruction in the TB, but with the correct values of bits 12 and higher); the low bits are replaced with data[0], and EIP is obtained by subtracting again the CS base. Huge thanks to Mark Cave-Ayland for the image and initial debugging, and to Gitlab user @kjliew for help with bisecting another occurrence of (hopefully!) the same bug. It should be relatively easy to write a testcase that performs MMIO on an EIP with different bits 12+ than the first instruction of the translation block; any help is welcome. Fixes: e3a79e0e878 ("target/i386: Enable TARGET_TB_PCREL", 2022-10-11) Cc: qemu-stable@nongnu.org Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Cc: Richard Henderson <richard.henderson@linaro.org> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1759 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1964 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2012 Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 729ba8e933f8af5800c3a92b37e630e9bdaa9f1e) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-01-20target/i386: fix incorrect EIP in PC-relative translation blocksguoguangyao1-2/+2
The PCREL patches introduced a bug when updating EIP in the !CF_PCREL case. Using s->pc in func gen_update_eip_next() solves the problem. Cc: qemu-stable@nongnu.org Fixes: b5e0d5d22fbf ("target/i386: Fix 32-bit wrapping of pc/eip computation") Signed-off-by: guoguangyao <guoguangyao18@mails.ucas.ac.cn> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240115020804.30272-1-guoguangyao18@mails.ucas.ac.cn> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit 2926eab8969908bc068629e973062a0fb6ff3759) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-01-20target/i386: Do not re-compute new pc with CF_PCRELRichard Henderson1-4/+2
With PCREL, we have a page-relative view of EIP, and an approximation of PC = EIP+CSBASE that is good enough to detect page crossings. If we try to recompute PC after masking EIP, we will mess up that approximation and write a corrupt value to EIP. We already handled masking properly for PCREL, so the fix in b5e0d5d2 was only needed for the !PCREL path. Cc: qemu-stable@nongnu.org Fixes: b5e0d5d22fbf ("target/i386: Fix 32-bit wrapping of pc/eip computation") Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240101230617.129349-1-richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> (cherry picked from commit a58506b748b8988a95f4fa1a2420ac5c17038b30) Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-12-12target/i386: Fix 32-bit wrapping of pc/eip computationRichard Henderson3-10/+33
In 32-bit mode, pc = eip + cs_base is also 32-bit, and must wrap. Failure to do so results in incorrect memory exceptions to the guest. Before 732d548732ed, this was implicitly done via truncation to target_ulong but only in qemu-system-i386, not qemu-system-x86_64. To fix this, we must add conditional zero-extensions. Since we have to test for 32 vs 64-bit anyway, note that cs_base is always zero in 64-bit mode. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2022 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20231212172510.103305-1-richard.henderson@linaro.org>
2023-12-06i386/sev: Avoid SEV-ES crash due to missing MSR_EFER_LMA bitMichael Roth1-0/+8
Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors") added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it exposed a long-running bug in current KVM support for SEV-ES where the kernel assumes that MSR_EFER_LMA will be set explicitly by the guest kernel, in which case EFER write traps would result in KVM eventually seeing MSR_EFER_LMA get set and recording it in such a way that it would be subsequently visible when accessing it via KVM_GET_SREGS/etc. However, guest kernels currently rely on MSR_EFER_LMA getting set automatically when MSR_EFER_LME is set and paging is enabled via CR0_PG_MASK. As a result, the EFER write traps don't actually expose the MSR_EFER_LMA bit, even though it is set internally, and when QEMU subsequently tries to pass this EFER value back to KVM via KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL, which is now considered fatal due to the aforementioned QEMU commit. This can be addressed by inferring the MSR_EFER_LMA bit being set when paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure the expected bits are all present in subsequent handling on the host side. Ultimately, this handling will be implemented in the host kernel, but to avoid breaking QEMU's SEV-ES support when using older host kernels, the same handling can be done in QEMU just after fetching the register values via KVM_GET_SREGS*. Implement that here. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Akihiko Odaki <akihiko.odaki@daynix.com> Cc: Philippe Mathieu-Daudé <philmd@linaro.org> Cc: Lara Lazier <laramglazier@gmail.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: <kvm@vger.kernel.org> Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors") Signed-off-by: Michael Roth <michael.roth@amd.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20231206155821.1194551-1-michael.roth@amd.com>
2023-11-17target/i386/cpu: Improve error message for property "vendor"Markus Armbruster1-1/+2
Improve $ qemu-system-x86_64 -device max-x86_64-cpu,vendor=me qemu-system-x86_64: -device max-x86_64-cpu,vendor=me: Property '.vendor' doesn't take value 'me' to qemu-system-x86_64: -device max-x86_64-cpu,vendor=0123456789abc: value of property 'vendor' must consist of exactly 12 characters Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20231031111059.3407803-8-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> [Typo corrected]
2023-11-08Merge tag 'misc-cpus-20231107' of https://github.com/philmd/qemu into stagingStefan Hajnoczi8-109/+110
Misc hardware patch queue HW emulation: - PMBus fixes and tests (Titus) - IDE fixes and tests (Fiona) - New ADM1266 sensor (Titus) - Better error propagation in PCI-ISA i82378 (Philippe) - Declare SD model QOM types using DEFINE_TYPES macro (Philippe) Topology: - Fix CPUState::nr_cores calculation (Zhuocheng Ding and Zhao Liu) Monitor: - Synchronize CPU state in 'info lapic' (Dongli Zhang) QOM: - Have 'cpu-qom.h' target-agnostic (Philippe) - Move ArchCPUClass definition to each target's cpu.h (Philippe) - Call object_class_is_abstract once in cpu_class_by_name (Philippe) UI: - Use correct key names in titles on MacOS / SDL2 (Adrian) MIPS: - Fix MSA BZ/BNZ and TX79 LQ/SQ opcodes (Philippe) Nios2: - Create IRQs *after* vCPU is realized (Philippe) PPC: - Restrict KVM objects to system emulation (Philippe) - Move target-specific definitions out of 'cpu-qom.h' (Philippe) S390X: - Make hw/s390x/css.h and hw/s390x/sclp.h headers target agnostic (Philippe) X86: - HVF & KVM cleanups (Philippe) Various targets: - Use env_archcpu() to optimize (Philippe) Misc: - Few global variable shadowing removed (Philippe) - Introduce cpu_exec_reset_hold and factor tcg_cpu_reset_hold out (Philippe) - Remove few more 'softmmu' mentions (Philippe) - Fix and cleanup in vl.c (Akihiko & Marc-André) - Resource leak fix in dump (Zongmin Zhou) - MAINTAINERS updates (Thomas, Daniel) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmVKKmEACgkQ4+MsLN6t # wN4xHQ//X/enH4C7K3VP/tSinDiwmXN2o61L9rjqSDQkBaCtktZx4c8qKSDL7V4S # vwzmvvBn3biMXQwZNVJo9d0oz2qoaF9tI6Ao0XDHAan9ziagfG9YMqWhkCfj077Q # jLdCqkUuMJBvQgXGB1a6UgCme8PQx7h0oqjbCNfB0ZBls24b5DiEjO87LE4OTbTi # zKRhYEpZpGwIVcy+1dAsbaBpGFP06sr1doB9Wz4c06eSx7t0kFSPk6U4CyOPrGXh # ynyCxPwngxIXmarY8gqPs3SBs7oXsH8Q/ZOHr1LbuXhwSuw/0zBQU9aF7Ir8RPan # DB79JjPrtxTAhICKredWT79v9M18D2/1MpONgg4vtx5K2FzGYoAJULCHyfkHMRSM # L6/H0ZQPHvf7w72k9EcSQIhd0wPlMqRmfy37/8xcLiw1h4l/USx48QeKaeFWeSEu # DgwSk+R61HbrKvQz/U0tF98zUEyBaQXNrKmyzht0YE4peAtpbPNBeRHkd0GMae/Z # HOmkt8QlFQ0T14qSK7mSHaSJTUzRvFGD01cbuCDxVsyCWWsesEikXBACZLG5RCRY # Rn1WeX1H9eE3kKi9iueLnhzcF9yM5XqFE3f6RnDzY8nkg91lsTMSQgFcIpv6uGyp # 3WOTNSC9SoFyI3x8pCWiKOGytPUb8xk+PnOA85wYvVmT+7j6wus= # =OVdQ # -----END PGP SIGNATURE----- # gpg: Signature made Tue 07 Nov 2023 20:15:29 HKT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'misc-cpus-20231107' of https://github.com/philmd/qemu: (75 commits) dump: Add close fd on error return to avoid resource leak ui/sdl2: use correct key names in win title on mac MAINTAINERS: Add more guest-agent related files to the corresponding section MAINTAINERS: Add include/hw/xtensa/mx_pic.h to the XTFPGA machine section MAINTAINERS: update libvirt devel mailing list address MAINTAINERS: Add the CAN documentation file to the CAN section MAINTAINERS: Add include/hw/timer/tmu012.h to the SH4 R2D section hw/sd: Declare QOM types using DEFINE_TYPES() macro hw/i2c: pmbus: reset page register for out of range reads hw/i2c: pmbus: immediately clear faults on request tests/qtest: add tests for ADM1266 hw/sensor: add ADM1266 device model hw/i2c: pmbus: add VCAP register hw/i2c: pmbus: add fan support hw/i2c: pmbus: add vout mode bitfields hw/i2c: pmbus add support for block receive tests/qtest: ahci-test: add test exposing reset issue with pending callback hw/ide: reset: cancel async DMA operation before resetting state hw/cpu: Update the comments of nr_cores and nr_dies system/cpus: Fix CPUState.nr_cores' calculation ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-11-07hw/cpu: Update the comments of nr_cores and nr_diesZhao Liu1-0/+1
In the nr_threads' comment, specify it represents the number of threads in the "core" to avoid confusion. Also add comment for nr_dies in CPUX86State. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20231024090323.1859210-5-zhao1.liu@linux.intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-11-07system/cpus: Fix CPUState.nr_cores' calculationZhuocheng Ding1-5/+4
From CPUState.nr_cores' comment, it represents "number of cores within this CPU package". After 003f230e37d7 ("machine: Tweak the order of topology members in struct CpuTopology"), the meaning of smp.cores changed to "the number of cores in one die", but this commit missed to change CPUState.nr_cores' calculation, so that CPUState.nr_cores became wrong and now it misses to consider numbers of clusters and dies. At present, only i386 is using CPUState.nr_cores. But as for i386, which supports die level, the uses of CPUState.nr_cores are very confusing: Early uses are based on the meaning of "cores per package" (before die is introduced into i386), and later uses are based on "cores per die" (after die's introduction). This difference is due to that commit a94e1428991f ("target/i386: Add CPUID.1F generation support for multi-dies PCMachine") misunderstood that CPUState.nr_cores means "cores per die" when calculated CPUID.1FH.01H:EBX. After that, the changes in i386 all followed this wrong understanding. With the influence of 003f230e37d7 and a94e1428991f, for i386 currently the result of CPUState.nr_cores is "cores per die", thus the original uses of CPUState.cores based on the meaning of "cores per package" are wrong when multiple dies exist: 1. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.01H:EBX[bits 23:16] is incorrect because it expects "cpus per package" but now the result is "cpus per die". 2. In cpu_x86_cpuid() of target/i386/cpu.c, for all leaves of CPUID.04H: EAX[bits 31:26] is incorrect because they expect "cpus per package" but now the result is "cpus per die". The error not only impacts the EAX calculation in cache_info_passthrough case, but also impacts other cases of setting cache topology for Intel CPU according to cpu topology (specifically, the incoming parameter "num_cores" expects "cores per package" in encode_cache_cpuid4()). 3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.0BH.01H:EBX[bits 15:00] is incorrect because the EBX of 0BH.01H (core level) expects "cpus per package", which may be different with 1FH.01H (The reason is 1FH can support more levels. For QEMU, 1FH also supports die, 1FH.01H:EBX[bits 15:00] expects "cpus per die"). 4. In cpu_x86_cpuid() of target/i386/cpu.c, when CPUID.80000001H is calculated, here "cpus per package" is expected to be checked, but in fact, now it checks "cpus per die". Though "cpus per die" also works for this code logic, this isn't consistent with AMD's APM. 5. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.80000008H:ECX expects "cpus per package" but it obtains "cpus per die". 6. In simulate_rdmsr() of target/i386/hvf/x86_emu.c, in kvm_rdmsr_core_thread_count() of target/i386/kvm/kvm.c, and in helper_rdmsr() of target/i386/tcg/sysemu/misc_helper.c, MSR_CORE_THREAD_COUNT expects "cpus per package" and "cores per package", but in these functions, it obtains "cpus per die" and "cores per die". On the other hand, these uses are correct now (they are added in/after a94e1428991f): 1. In cpu_x86_cpuid() of target/i386/cpu.c, topo_info.cores_per_die meets the actual meaning of CPUState.nr_cores ("cores per die"). 2. In cpu_x86_cpuid() of target/i386/cpu.c, vcpus_per_socket (in CPUID. 04H's calculation) considers number of dies, so it's correct. 3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.1FH.01H:EBX[bits 15:00] needs "cpus per die" and it gets the correct result, and CPUID.1FH.02H:EBX[bits 15:00] gets correct "cpus per package". When CPUState.nr_cores is correctly changed to "cores per package" again , the above errors will be fixed without extra work, but the "currently" correct cases will go wrong and need special handling to pass correct "cpus/cores per die" they want. Fix CPUState.nr_cores' calculation to fit the original meaning "cores per package", as well as changing calculation of topo_info.cores_per_die, vcpus_per_socket and CPUID.1FH. Fixes: a94e1428991f ("target/i386: Add CPUID.1F generation support for multi-dies PCMachine") Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in struct CpuTopology") Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com> Co-developed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Tested-by: Yongwei Ma <yongwei.ma@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20231024090323.1859210-4-zhao1.liu@linux.intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-11-07target: Move ArchCPUClass definition to 'cpu.h'Philippe Mathieu-Daudé2-39/+38
The OBJECT_DECLARE_CPU_TYPE() macro forward-declares each ArchCPUClass type. These forward declarations are sufficient for code in hw/ to use the QOM definitions. No need to expose these structure definitions. Keep each local to their target/ by moving them to the corresponding "cpu.h" header. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231013140116.255-13-philmd@linaro.org>
2023-11-07target/i386/monitor: synchronize cpu state for lapic infoDongli Zhang1-0/+5
While the default "info lapic" always synchronizes cpu state ... mon_get_cpu() -> mon_get_cpu_sync(mon, true) -> cpu_synchronize_state(cpu) -> ioctl KVM_GET_LAPIC (taking KVM as example) ... the cpu state is not synchronized when the apic-id is available as argument. The cpu state should be synchronized when apic-id is available. Otherwise the "info lapic <apic-id>" always returns stale data. Reference: https://lore.kernel.org/all/20211028155457.967291-19-berrange@redhat.com/ Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Message-ID: <20231030085336.2681386-1-armbru@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231026211938.162815-1-dongli.zhang@oracle.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-11-07target/i386/kvm: Correct comment in kvm_cpu_realize()Philippe Mathieu-Daudé1-0/+1
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230918160257.30127-4-philmd@linaro.org>
2023-11-07target/i386/hvf: Rename 'X86CPU *x86_cpu' variable as 'cpu'Philippe Mathieu-Daudé1-9/+9
Follow the naming used by other files in target/i386/. No functional changes. Suggested-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20231020111136.44401-4-philmd@linaro.org>
2023-11-07target/i386/hvf: Rename 'CPUState *cpu' variable as 'cs'Philippe Mathieu-Daudé1-46/+46
Follow the naming used by other files in target/i386/. No functional changes. Suggested-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20231020111136.44401-3-philmd@linaro.org>
2023-11-07target/i386/hvf: Use CPUState typedefPhilippe Mathieu-Daudé1-3/+3
Follow C style guidelines and use CPUState forward declaration from "qemu/typedefs.h". No functional changes. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20231020111136.44401-2-philmd@linaro.org>
2023-11-07target/i386/hvf: Use env_archcpu() in simulate_[rdmsr/wrmsr]()Philippe Mathieu-Daudé3-15/+14
When CPUArchState* is available (here CPUX86State*), we can use the fast env_archcpu() macro to get ArchCPU* (here X86CPU*). The QOM cast X86_CPU() macro will be slower when building with --enable-qom-cast-debug. Pass CPUX86State* as argument to simulate_rdmsr / simulate_wrmsr instead of a CPUState* to avoid an extra cast. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Roman Bolshakov <roman@roolebo.dev> Tested-by: Roman Bolshakov <roman@roolebo.dev> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20231009110239.66778-7-philmd@linaro.org>
2023-11-07target/i386/hvf: Use x86_cpu in simulate_[rdmsr|wrmsr]()Philippe Mathieu-Daudé1-2/+2
We already have 'x86_cpu = X86_CPU(cpu)'. Use the variable instead of doing another QOM cast with X86_CPU(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Roman Bolshakov <roman@roolebo.dev> Tested-by: Roman Bolshakov <roman@roolebo.dev> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20231009110239.66778-6-philmd@linaro.org>
2023-11-07target: Declare FOO_CPU_TYPE_NAME/SUFFIX in 'cpu-qom.h'Philippe Mathieu-Daudé2-2/+3
Hegerogeneous code needs access to the FOO_CPU_TYPE_NAME() macro to resolve target CPU types. Move the declaration (along with the required FOO_CPU_TYPE_SUFFIX) to "cpu-qom.h". "target/foo/cpu-qom.h" is supposed to be target agnostic (include-able by any target). Add such mention in the header. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231013140116.255-7-philmd@linaro.org>
2023-11-07target: Unify QOM stylePhilippe Mathieu-Daudé2-4/+0
Enforce the style described by commit 067109a11c ("docs/devel: mention the spacing requirement for QOM"): The first declaration of a storage or class structure should always be the parent and leave a visual space between that declaration and the new code. It is also useful to separate backing for properties (options driven by the user) and internal state to make navigation easier. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20231013140116.255-2-philmd@linaro.org>
2023-11-07hw/xen: add support for Xen primary console in emulated modeDavid Woodhouse1-2/+21
The primary console is special because the toolstack maps a page into the guest for its ring, and also allocates the guest-side event channel. The guest's grant table is even primed to export that page using a known grant ref#. Add support for all that in emulated mode, so that we can have a primary console. For reasons unclear, the backends running under real Xen don't just use a mapping of the well-known GNTTAB_RESERVED_CONSOLE grant ref (which would also be in the ring-ref node in XenStore). Instead, the toolstack sets the ring-ref node of the primary console to the GFN of the guest page. The backend is expected to handle that special case and map it with foreignmem operations instead. We don't have an implementation of foreignmem ops for emulated Xen mode, so just make it map GNTTAB_RESERVED_CONSOLE instead. This would probably work for real Xen too, but we can't work out how to make real Xen create a primary console of type "ioemu" to make QEMU drive it, so we can't test that; might as well leave it as it is for now under Xen. Now at last we can boot the Xen PV shim and run PV kernels in QEMU. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-11-07i386/xen: advertise XEN_HVM_CPUID_UPCALL_VECTOR in CPUIDDavid Woodhouse1-0/+4
This will allow Linux guests (since v6.0) to use the per-vCPU upcall vector delivered as MSI through the local APIC. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-11-07i386/xen: Ignore VCPU_SSHOTTMR_future flag in set_singleshot_timer()David Woodhouse1-10/+10
Upstream Xen now ignores this flag¹, since the only guest kernel ever to use it was buggy. ¹ https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=19c6cbd909 Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-11-06hw/xen: select kernel mode for per-vCPU event channel upcall vectorDavid Woodhouse1-0/+7
A guest which has configured the per-vCPU upcall vector may set the HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero. For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support for HVMOP_set_evtchn_upcall_vector") will just do this after setting the vector: /* Trick toolstack to think we are enlightened. */ if (!cpu) rc = xen_set_callback_via(1); That's explicitly setting the delivery to GSI#1, but it's supposed to be overridden by the per-vCPU vector setting. This mostly works in Qemu *except* for the logic to enable the in-kernel handling of event channels, which falsely determines that the kernel cannot accelerate GSI delivery in this case. Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has the vector set, and use that in xen_evtchn_set_callback_param() to enable the kernel acceleration features even when the param *appears* to be set to target a GSI. Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to *zero* the event channel delivery is disabled completely. (Which is what that bizarre guest behaviour is working round in the first place.) Cc: qemu-stable@nongnu.org Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-11-06i386/xen: fix per-vCPU upcall vector for Xen emulationDavid Woodhouse1-4/+4
The per-vCPU upcall vector support had three problems. Firstly it was using the wrong hypercall argument and would always return -EFAULT when the guest tried to set it up. Secondly it was using the wrong ioctl() to pass the vector to the kernel and thus the *kernel* would always return -EINVAL. Finally, even when delivering the event directly from userspace with an MSI, it put the destination CPU ID into the wrong bits of the MSI address. Linux doesn't (yet) use this mode so it went without decent testing for a while. Cc: qemu-stable@nongnu.org Fixes: 105b47fdf2d0 ("i386/xen: implement HVMOP_set_evtchn_upcall_vector") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-11-06i386/xen: Don't advertise XENFEAT_supervisor_mode_kernelDavid Woodhouse1-1/+0
This confuses lscpu into thinking it's running in PVH mode. Cc: qemu-stable@nongnu.org Fixes: bedcc139248 ("i386/xen: implement HYPERVISOR_xen_version") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org>
2023-10-25kvm: i8254: require KVM_CAP_PIT2 and KVM_CAP_PIT_STATE2Paolo Bonzini2-8/+0
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: i386: require KVM_CAP_SET_IDENTITY_MAP_ADDRPaolo Bonzini1-13/+7
This was introduced in KVM in Linux 2.6.32, we can require it unconditionally. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: i386: require KVM_CAP_ADJUST_CLOCKPaolo Bonzini2-6/+1
This was introduced in KVM in Linux 2.6.33, we can require it unconditionally. KVM_CLOCK_TSC_STABLE was only added in Linux 4.9, for now do not require it (though it would allow the removal of some pretty yucky code). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: i386: require KVM_CAP_MCEPaolo Bonzini1-10/+4
This was introduced in KVM in Linux 2.6.34, we can require it unconditionally. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: i386: require KVM_CAP_SET_VCPU_EVENTS and KVM_CAP_X86_ROBUST_SINGLESTEPPaolo Bonzini1-90/+2
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: i386: require KVM_CAP_XSAVEPaolo Bonzini1-68/+2
This was introduced in KVM in Linux 2.6.36, and could already be used at the time to save/restore FPU data even on older processor. We can require it unconditionally and stop using KVM_GET/SET_FPU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: i386: require KVM_CAP_DEBUGREGSPaolo Bonzini1-8/+1
This was introduced in KVM in Linux 2.6.35, we can require it unconditionally. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: i386: move KVM_CAP_IRQ_ROUTING detection to kvm_arch_required_capabilitiesPaolo Bonzini1-5/+1
Simple code cleanup. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: require KVM_CAP_SIGNAL_MSIPaolo Bonzini1-0/+1
This was introduced in KVM in Linux 3.5, we can require it unconditionally in kvm_irqchip_send_msi(). However, not all architectures have to implement it so check it only in x86, the only architecture that ever had MSI injection but not KVM_CAP_SIGNAL_MSI. ARM uses it to detect the presence of the ITS emulation in the kernel, introduced in Linux 4.8. Assume that it's there and possibly fail when realizing the arm-its-kvm device. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25target/i386: check CPUID_PAE to determine 36 bit processor address spaceAni Sinha1-1/+1
PAE mode in x86 supports 36 bit address space. Check the PAE CPUID on the guest processor and set phys_bits to 36 if PAE feature is set. This is in addition to checking the presence of PSE36 CPUID feature for setting 36 bit phys_bits. Signed-off-by: Ani Sinha <anisinha@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-ID: <20230912120650.371781-1-anisinha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25target/i386: validate VEX.W for AVX instructionsPaolo Bonzini2-42/+108
Instructions in VEX exception class 6 generally look at the value of VEX.W. Note that the manual places some instructions incorrectly in class 4, for example VPERMQ which has no non-VEX encoding and no legacy SSE analogue. AMD does a mess of its own, as documented in the comment that this patch adds. Most of them are checked for VEX.W=0, and are listed in the manual (though with an omission) in table 2-16; VPERMQ and VPERMPD check for VEX.W=1, which is only listed in the instruction description. Others, such as VPSRLV, VPSLLV and the FMA3 instructions, use VEX.W to switch between a 32-bit and 64-bit operation. Fix more of the class 4/class 6 mismatches, and implement the check for VEX.W in TCG. Acked-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25target/i386: group common checks in the decoding phasePaolo Bonzini3-37/+85
In preparation for adding more similar checks, move the VEX.L=0 check and several X86_SPECIAL_* checks to a new field, where each bit represent a common check on unused bits, or a restriction on the processor mode. Likewise, many SVM intercepts can be checked during the decoding phase, the main exception being the selective CR0 write, MSR and IOIO intercepts. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25target/i386: implement SHA instructionsPaolo Bonzini6-1/+209
The implementation was validated with OpenSSL and with the test vectors in https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/sha.rs. The instructions provide a ~25% improvement on hashing a 64 MiB file: runtime goes down from 1.8 seconds to 1.4 seconds; instruction count on the host goes down from 5.8 billion to 4.8 billion with slightly better IPC too. Good job Intel. ;) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-22target/i386: Use tcg_gen_ext_tlRichard Henderson1-25/+3
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-10-22target/i386: Use i128 for 128 and 256-bit loads and storesRichard Henderson1-34/+29
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>