Age | Commit message (Collapse) | Author | Files | Lines |
|
The A20 mask is only applied to the final memory access. Nested
page tables are always walked with the raw guest-physical address.
Unlike the previous patch, in this one the masking must be kept, but
it was done too early.
Cc: qemu-stable@nongnu.org
Fixes: 4a1e9d4d11c ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b5a9de3259f4c791bde2faff086dd5737625e41e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
If ptw_translate() does a MMU_PHYS_IDX access, the A20 mask is already
applied in get_physical_address(), which is called via probe_access_full()
and x86_cpu_tlb_fill().
If ptw_translate() on the other hand does a MMU_NESTED_IDX access,
the A20 mask must not be applied to the address that is looked up in
the nested page tables; it must be applied only to the addresses that
hold the NPT entries (which is achieved via MMU_PHYS_IDX, per the
previous paragraph).
Therefore, we can remove A20 masking from the computation of the page
table entry's address, and let get_physical_address() or mmu_translate()
apply it when they know they are returning a host-physical address.
Cc: qemu-stable@nongnu.org
Fixes: 4a1e9d4d11c ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a28fe7dc1939333c81b895cdced81c69eb7c5ad0)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
The address translation logic in get_physical_address() will currently
truncate physical addresses to 32 bits unless long mode is enabled.
This is incorrect when using physical address extensions (PAE) outside
of long mode, with the result that a 32-bit operating system using PAE
to access memory above 4G will experience undefined behaviour.
The truncation code was originally introduced in commit 33dfdb5 ("x86:
only allow real mode to access 32bit without LMA"), where it applied
only to translations performed while paging is disabled (and so cannot
affect guests using PAE).
Commit 9828198 ("target/i386: Add MMU_PHYS_IDX and MMU_NESTED_IDX")
rearranged the code such that the truncation also applied to the use
of MMU_PHYS_IDX and MMU_NESTED_IDX. Commit 4a1e9d4 ("target/i386: Use
atomic operations for pte updates") brought this truncation into scope
for page table entry accesses, and is the first commit for which a
Windows 10 32-bit guest will reliably fail to boot if memory above 4G
is present.
The truncation code however is not completely redundant. Even though the
maximum address size for any executed instruction is 32 bits, helpers for
operations such as BOUND, FSAVE or XSAVE may ask get_physical_address()
to translate an address outside of the 32-bit range, if invoked with an
argument that is close to the 4G boundary. Likewise for processor
accesses, for example TSS or IDT accesses, when EFER.LMA==0.
So, move the address truncation in get_physical_address() so that it
applies to 32-bit MMU indexes, but not to MMU_PHYS_IDX and MMU_NESTED_IDX.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2040
Fixes: 4a1e9d4d11c ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Cc: qemu-stable@nongnu.org
Co-developed-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit b1661801c184119a10ad6cbc3b80330fc22e7b2c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
(Mjt: drop unrelated change in target/i386/cpu.c)
|
|
MSR_VM_HSAVE_PA bits 0-11 are reserved, as are the bits above the
maximum physical address width of the processor. Setting them to
1 causes a #GP (see "15.30.4 VM_HSAVE_PA MSR" in the AMD manual).
The same is true of VMCB addresses passed to VMRUN/VMLOAD/VMSAVE,
even though the manual is not clear on that.
Cc: qemu-stable@nongnu.org
Fixes: 4a1e9d4d11c ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit d09c79010ffd880dc69e7a21e3cfdef90b928fb8)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
CR3 bits 63:32 are ignored in 32-bit mode (either legacy 2-level
paging or PAE paging). Do this in mmu_translate() to remove
the last where get_physical_address() meaningfully drops the high
bits of the address.
Cc: qemu-stable@nongnu.org
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Fixes: 4a1e9d4d11c ("target/i386: Use atomic operations for pte updates", 2022-10-18)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 68fb78d7d5723066ec2cacee7d25d67a4143b42f)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
lock prefix
target/i386: As specified by Intel Manual Vol2 3-180, cmp instructions
are not allowed to have lock prefix and a `UD` should be raised. Without
this patch, s1->T0 will be uninitialized and used in the case OP_CMPL.
Signed-off-by: Ziqiao Kong <ziqiaokong@gmail.com>
Message-ID: <20240215095015.570748-2-ziqiaokong@gmail.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 99d0dcd7f102c07a510200d768cae65e5db25d23)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
CPUID leaf 7 was grouped together with SGX leaf 0x12 by commit
b9edbadefb9e ("i386: Propagate SGX CPUID sub-leafs to KVM") by mistake.
SGX leaf 0x12 has its specific logic to check if subleaf (starting from 2)
is valid or not by checking the bit 0:3 of corresponding EAX is 1 or
not.
Leaf 7 follows the logic that EAX of subleaf 0 enumerates the maximum
valid subleaf.
Fixes: b9edbadefb9e ("i386: Propagate SGX CPUID sub-leafs to KVM")
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-ID: <20240125024016.2521244-4-xiaoyao.li@intel.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 0729857c707535847d7fe31d3d91eb8b2a118e3c)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
Existing code misses a decrement of cpuid_i when skip leaf 0x1F.
There's a blank CPUID entry(with leaf, subleaf as 0, and all fields
stuffed 0s) left in the CPUID array.
It conflicts with correct CPUID leaf 0.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by:Yang Weijiang <weijiang.yang@intel.com>
Message-ID: <20240125024016.2521244-2-xiaoyao.li@intel.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 10f92799af8ba3c3cef2352adcd4780f13fbab31)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
FEAT_XSAVE_XSS_HI leafs
The value of FEAT_XSAVE_XCR0_HI leaf and FEAT_XSAVE_XSS_HI leaf also
need to be masked by XCR0 and XSS mask respectively, to make it
logically correct.
Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features")
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yang Weijiang <weijiang.yang@intel.com>
Message-ID: <20240115091325.1904229-3-xiaoyao.li@intel.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a11a365159b944e05be76f3ec3b98c8b38cb70fd)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
Leaf FEAT_XSAVE_XSS_LO and FEAT_XSAVE_XSS_HI also need to be cleared
when CPUID_EXT_XSAVE is not set.
Fixes: 301e90675c3f ("target/i386: Enable support for XSAVES based features")
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Yang Weijiang <weijiang.yang@intel.com>
Message-ID: <20240115091325.1904229-2-xiaoyao.li@intel.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 81f5cad3858f27623b1b14467926032d229b76cc)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
For PC-relative translation blocks, env->eip changes during the
execution of a translation block, Therefore, QEMU must be able to
recover an instruction's PC just from the TranslationBlock struct and
the instruction data with. Because a TB will not span two pages, QEMU
stores all the low bits of EIP in the instruction data and replaces them
in x86_restore_state_to_opc. Bits 12 and higher (which may vary between
executions of a PCREL TB, since these only use the physical address in
the hash key) are kept unmodified from env->eip. The assumption is that
these bits of EIP, unlike bits 0-11, will not change as the translation
block executes.
Unfortunately, this is incorrect when the CS base is not aligned to a page.
Then the linear address of the instructions (i.e. the one with the
CS base addred) indeed will never span two pages, but bits 12+ of EIP
can actually change. For example, if CS base is 0x80262200 and EIP =
0x6FF4, the first instruction in the translation block will be at linear
address 0x802691F4. Even a very small TB will cross to EIP = 0x7xxx,
while the linear addresses will remain comfortably within a single page.
The fix is simply to use the low bits of the linear address for data[0],
since those don't change. Then x86_restore_state_to_opc uses tb->cs_base
to compute a temporary linear address (referring to some unknown
instruction in the TB, but with the correct values of bits 12 and higher);
the low bits are replaced with data[0], and EIP is obtained by subtracting
again the CS base.
Huge thanks to Mark Cave-Ayland for the image and initial debugging,
and to Gitlab user @kjliew for help with bisecting another occurrence
of (hopefully!) the same bug.
It should be relatively easy to write a testcase that performs MMIO on
an EIP with different bits 12+ than the first instruction of the translation
block; any help is welcome.
Fixes: e3a79e0e878 ("target/i386: Enable TARGET_TB_PCREL", 2022-10-11)
Cc: qemu-stable@nongnu.org
Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Cc: Richard Henderson <richard.henderson@linaro.org>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1759
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1964
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2012
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 729ba8e933f8af5800c3a92b37e630e9bdaa9f1e)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
The PCREL patches introduced a bug when updating EIP in the !CF_PCREL case.
Using s->pc in func gen_update_eip_next() solves the problem.
Cc: qemu-stable@nongnu.org
Fixes: b5e0d5d22fbf ("target/i386: Fix 32-bit wrapping of pc/eip computation")
Signed-off-by: guoguangyao <guoguangyao18@mails.ucas.ac.cn>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240115020804.30272-1-guoguangyao18@mails.ucas.ac.cn>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 2926eab8969908bc068629e973062a0fb6ff3759)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
With PCREL, we have a page-relative view of EIP, and an
approximation of PC = EIP+CSBASE that is good enough to
detect page crossings. If we try to recompute PC after
masking EIP, we will mess up that approximation and write
a corrupt value to EIP.
We already handled masking properly for PCREL, so the
fix in b5e0d5d2 was only needed for the !PCREL path.
Cc: qemu-stable@nongnu.org
Fixes: b5e0d5d22fbf ("target/i386: Fix 32-bit wrapping of pc/eip computation")
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20240101230617.129349-1-richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit a58506b748b8988a95f4fa1a2420ac5c17038b30)
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
|
In 32-bit mode, pc = eip + cs_base is also 32-bit, and must wrap.
Failure to do so results in incorrect memory exceptions to the guest.
Before 732d548732ed, this was implicitly done via truncation to
target_ulong but only in qemu-system-i386, not qemu-system-x86_64.
To fix this, we must add conditional zero-extensions.
Since we have to test for 32 vs 64-bit anyway, note that cs_base
is always zero in 64-bit mode.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2022
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231212172510.103305-1-richard.henderson@linaro.org>
|
|
Commit 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
added error checking for KVM_SET_SREGS/KVM_SET_SREGS2. In doing so, it
exposed a long-running bug in current KVM support for SEV-ES where the
kernel assumes that MSR_EFER_LMA will be set explicitly by the guest
kernel, in which case EFER write traps would result in KVM eventually
seeing MSR_EFER_LMA get set and recording it in such a way that it would
be subsequently visible when accessing it via KVM_GET_SREGS/etc.
However, guest kernels currently rely on MSR_EFER_LMA getting set
automatically when MSR_EFER_LME is set and paging is enabled via
CR0_PG_MASK. As a result, the EFER write traps don't actually expose the
MSR_EFER_LMA bit, even though it is set internally, and when QEMU
subsequently tries to pass this EFER value back to KVM via
KVM_SET_SREGS* it will fail various sanity checks and return -EINVAL,
which is now considered fatal due to the aforementioned QEMU commit.
This can be addressed by inferring the MSR_EFER_LMA bit being set when
paging is enabled and MSR_EFER_LME is set, and synthesizing it to ensure
the expected bits are all present in subsequent handling on the host
side.
Ultimately, this handling will be implemented in the host kernel, but to
avoid breaking QEMU's SEV-ES support when using older host kernels, the
same handling can be done in QEMU just after fetching the register
values via KVM_GET_SREGS*. Implement that here.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: Lara Lazier <laramglazier@gmail.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>
Cc: <kvm@vger.kernel.org>
Fixes: 7191f24c7fcf ("accel/kvm/kvm-all: Handle register access errors")
Signed-off-by: Michael Roth <michael.roth@amd.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20231206155821.1194551-1-michael.roth@amd.com>
|
|
Improve
$ qemu-system-x86_64 -device max-x86_64-cpu,vendor=me
qemu-system-x86_64: -device max-x86_64-cpu,vendor=me: Property '.vendor' doesn't take value 'me'
to
qemu-system-x86_64: -device max-x86_64-cpu,vendor=0123456789abc: value of property 'vendor' must consist of exactly 12 characters
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20231031111059.3407803-8-armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
[Typo corrected]
|
|
Misc hardware patch queue
HW emulation:
- PMBus fixes and tests (Titus)
- IDE fixes and tests (Fiona)
- New ADM1266 sensor (Titus)
- Better error propagation in PCI-ISA i82378 (Philippe)
- Declare SD model QOM types using DEFINE_TYPES macro (Philippe)
Topology:
- Fix CPUState::nr_cores calculation (Zhuocheng Ding and Zhao Liu)
Monitor:
- Synchronize CPU state in 'info lapic' (Dongli Zhang)
QOM:
- Have 'cpu-qom.h' target-agnostic (Philippe)
- Move ArchCPUClass definition to each target's cpu.h (Philippe)
- Call object_class_is_abstract once in cpu_class_by_name (Philippe)
UI:
- Use correct key names in titles on MacOS / SDL2 (Adrian)
MIPS:
- Fix MSA BZ/BNZ and TX79 LQ/SQ opcodes (Philippe)
Nios2:
- Create IRQs *after* vCPU is realized (Philippe)
PPC:
- Restrict KVM objects to system emulation (Philippe)
- Move target-specific definitions out of 'cpu-qom.h' (Philippe)
S390X:
- Make hw/s390x/css.h and hw/s390x/sclp.h headers target agnostic (Philippe)
X86:
- HVF & KVM cleanups (Philippe)
Various targets:
- Use env_archcpu() to optimize (Philippe)
Misc:
- Few global variable shadowing removed (Philippe)
- Introduce cpu_exec_reset_hold and factor tcg_cpu_reset_hold out (Philippe)
- Remove few more 'softmmu' mentions (Philippe)
- Fix and cleanup in vl.c (Akihiko & Marc-André)
- Resource leak fix in dump (Zongmin Zhou)
- MAINTAINERS updates (Thomas, Daniel)
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmVKKmEACgkQ4+MsLN6t
# wN4xHQ//X/enH4C7K3VP/tSinDiwmXN2o61L9rjqSDQkBaCtktZx4c8qKSDL7V4S
# vwzmvvBn3biMXQwZNVJo9d0oz2qoaF9tI6Ao0XDHAan9ziagfG9YMqWhkCfj077Q
# jLdCqkUuMJBvQgXGB1a6UgCme8PQx7h0oqjbCNfB0ZBls24b5DiEjO87LE4OTbTi
# zKRhYEpZpGwIVcy+1dAsbaBpGFP06sr1doB9Wz4c06eSx7t0kFSPk6U4CyOPrGXh
# ynyCxPwngxIXmarY8gqPs3SBs7oXsH8Q/ZOHr1LbuXhwSuw/0zBQU9aF7Ir8RPan
# DB79JjPrtxTAhICKredWT79v9M18D2/1MpONgg4vtx5K2FzGYoAJULCHyfkHMRSM
# L6/H0ZQPHvf7w72k9EcSQIhd0wPlMqRmfy37/8xcLiw1h4l/USx48QeKaeFWeSEu
# DgwSk+R61HbrKvQz/U0tF98zUEyBaQXNrKmyzht0YE4peAtpbPNBeRHkd0GMae/Z
# HOmkt8QlFQ0T14qSK7mSHaSJTUzRvFGD01cbuCDxVsyCWWsesEikXBACZLG5RCRY
# Rn1WeX1H9eE3kKi9iueLnhzcF9yM5XqFE3f6RnDzY8nkg91lsTMSQgFcIpv6uGyp
# 3WOTNSC9SoFyI3x8pCWiKOGytPUb8xk+PnOA85wYvVmT+7j6wus=
# =OVdQ
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 07 Nov 2023 20:15:29 HKT
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE
* tag 'misc-cpus-20231107' of https://github.com/philmd/qemu: (75 commits)
dump: Add close fd on error return to avoid resource leak
ui/sdl2: use correct key names in win title on mac
MAINTAINERS: Add more guest-agent related files to the corresponding section
MAINTAINERS: Add include/hw/xtensa/mx_pic.h to the XTFPGA machine section
MAINTAINERS: update libvirt devel mailing list address
MAINTAINERS: Add the CAN documentation file to the CAN section
MAINTAINERS: Add include/hw/timer/tmu012.h to the SH4 R2D section
hw/sd: Declare QOM types using DEFINE_TYPES() macro
hw/i2c: pmbus: reset page register for out of range reads
hw/i2c: pmbus: immediately clear faults on request
tests/qtest: add tests for ADM1266
hw/sensor: add ADM1266 device model
hw/i2c: pmbus: add VCAP register
hw/i2c: pmbus: add fan support
hw/i2c: pmbus: add vout mode bitfields
hw/i2c: pmbus add support for block receive
tests/qtest: ahci-test: add test exposing reset issue with pending callback
hw/ide: reset: cancel async DMA operation before resetting state
hw/cpu: Update the comments of nr_cores and nr_dies
system/cpus: Fix CPUState.nr_cores' calculation
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
|
In the nr_threads' comment, specify it represents the
number of threads in the "core" to avoid confusion.
Also add comment for nr_dies in CPUX86State.
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20231024090323.1859210-5-zhao1.liu@linux.intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
|
|
From CPUState.nr_cores' comment, it represents "number of cores within
this CPU package".
After 003f230e37d7 ("machine: Tweak the order of topology members in
struct CpuTopology"), the meaning of smp.cores changed to "the number of
cores in one die", but this commit missed to change CPUState.nr_cores'
calculation, so that CPUState.nr_cores became wrong and now it
misses to consider numbers of clusters and dies.
At present, only i386 is using CPUState.nr_cores.
But as for i386, which supports die level, the uses of CPUState.nr_cores
are very confusing:
Early uses are based on the meaning of "cores per package" (before die
is introduced into i386), and later uses are based on "cores per die"
(after die's introduction).
This difference is due to that commit a94e1428991f ("target/i386: Add
CPUID.1F generation support for multi-dies PCMachine") misunderstood
that CPUState.nr_cores means "cores per die" when calculated
CPUID.1FH.01H:EBX. After that, the changes in i386 all followed this
wrong understanding.
With the influence of 003f230e37d7 and a94e1428991f, for i386 currently
the result of CPUState.nr_cores is "cores per die", thus the original
uses of CPUState.cores based on the meaning of "cores per package" are
wrong when multiple dies exist:
1. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.01H:EBX[bits 23:16] is
incorrect because it expects "cpus per package" but now the
result is "cpus per die".
2. In cpu_x86_cpuid() of target/i386/cpu.c, for all leaves of CPUID.04H:
EAX[bits 31:26] is incorrect because they expect "cpus per package"
but now the result is "cpus per die". The error not only impacts the
EAX calculation in cache_info_passthrough case, but also impacts other
cases of setting cache topology for Intel CPU according to cpu
topology (specifically, the incoming parameter "num_cores" expects
"cores per package" in encode_cache_cpuid4()).
3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.0BH.01H:EBX[bits
15:00] is incorrect because the EBX of 0BH.01H (core level) expects
"cpus per package", which may be different with 1FH.01H (The reason
is 1FH can support more levels. For QEMU, 1FH also supports die,
1FH.01H:EBX[bits 15:00] expects "cpus per die").
4. In cpu_x86_cpuid() of target/i386/cpu.c, when CPUID.80000001H is
calculated, here "cpus per package" is expected to be checked, but in
fact, now it checks "cpus per die". Though "cpus per die" also works
for this code logic, this isn't consistent with AMD's APM.
5. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.80000008H:ECX expects
"cpus per package" but it obtains "cpus per die".
6. In simulate_rdmsr() of target/i386/hvf/x86_emu.c, in
kvm_rdmsr_core_thread_count() of target/i386/kvm/kvm.c, and in
helper_rdmsr() of target/i386/tcg/sysemu/misc_helper.c,
MSR_CORE_THREAD_COUNT expects "cpus per package" and "cores per
package", but in these functions, it obtains "cpus per die" and
"cores per die".
On the other hand, these uses are correct now (they are added in/after
a94e1428991f):
1. In cpu_x86_cpuid() of target/i386/cpu.c, topo_info.cores_per_die
meets the actual meaning of CPUState.nr_cores ("cores per die").
2. In cpu_x86_cpuid() of target/i386/cpu.c, vcpus_per_socket (in CPUID.
04H's calculation) considers number of dies, so it's correct.
3. In cpu_x86_cpuid() of target/i386/cpu.c, CPUID.1FH.01H:EBX[bits
15:00] needs "cpus per die" and it gets the correct result, and
CPUID.1FH.02H:EBX[bits 15:00] gets correct "cpus per package".
When CPUState.nr_cores is correctly changed to "cores per package" again
, the above errors will be fixed without extra work, but the "currently"
correct cases will go wrong and need special handling to pass correct
"cpus/cores per die" they want.
Fix CPUState.nr_cores' calculation to fit the original meaning "cores
per package", as well as changing calculation of topo_info.cores_per_die,
vcpus_per_socket and CPUID.1FH.
Fixes: a94e1428991f ("target/i386: Add CPUID.1F generation support for multi-dies PCMachine")
Fixes: 003f230e37d7 ("machine: Tweak the order of topology members in struct CpuTopology")
Signed-off-by: Zhuocheng Ding <zhuocheng.ding@intel.com>
Co-developed-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Tested-by: Babu Moger <babu.moger@amd.com>
Tested-by: Yongwei Ma <yongwei.ma@intel.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-ID: <20231024090323.1859210-4-zhao1.liu@linux.intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
|
|
The OBJECT_DECLARE_CPU_TYPE() macro forward-declares each
ArchCPUClass type. These forward declarations are sufficient
for code in hw/ to use the QOM definitions. No need to expose
these structure definitions. Keep each local to their target/
by moving them to the corresponding "cpu.h" header.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231013140116.255-13-philmd@linaro.org>
|
|
While the default "info lapic" always synchronizes cpu state ...
mon_get_cpu()
-> mon_get_cpu_sync(mon, true)
-> cpu_synchronize_state(cpu)
-> ioctl KVM_GET_LAPIC (taking KVM as example)
... the cpu state is not synchronized when the apic-id is available as
argument.
The cpu state should be synchronized when apic-id is available. Otherwise
the "info lapic <apic-id>" always returns stale data.
Reference:
https://lore.kernel.org/all/20211028155457.967291-19-berrange@redhat.com/
Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Message-ID: <20231030085336.2681386-1-armbru@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Message-ID: <20231026211938.162815-1-dongli.zhang@oracle.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
|
|
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230918160257.30127-4-philmd@linaro.org>
|
|
Follow the naming used by other files in target/i386/.
No functional changes.
Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20231020111136.44401-4-philmd@linaro.org>
|
|
Follow the naming used by other files in target/i386/.
No functional changes.
Suggested-by: Zhao Liu <zhao1.liu@intel.com>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20231020111136.44401-3-philmd@linaro.org>
|
|
Follow C style guidelines and use CPUState forward
declaration from "qemu/typedefs.h".
No functional changes.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20231020111136.44401-2-philmd@linaro.org>
|
|
When CPUArchState* is available (here CPUX86State*), we can
use the fast env_archcpu() macro to get ArchCPU* (here X86CPU*).
The QOM cast X86_CPU() macro will be slower when building with
--enable-qom-cast-debug.
Pass CPUX86State* as argument to simulate_rdmsr / simulate_wrmsr
instead of a CPUState* to avoid an extra cast.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Roman Bolshakov <roman@roolebo.dev>
Tested-by: Roman Bolshakov <roman@roolebo.dev>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20231009110239.66778-7-philmd@linaro.org>
|
|
We already have 'x86_cpu = X86_CPU(cpu)'. Use the variable
instead of doing another QOM cast with X86_CPU().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Roman Bolshakov <roman@roolebo.dev>
Tested-by: Roman Bolshakov <roman@roolebo.dev>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20231009110239.66778-6-philmd@linaro.org>
|
|
Hegerogeneous code needs access to the FOO_CPU_TYPE_NAME()
macro to resolve target CPU types. Move the declaration
(along with the required FOO_CPU_TYPE_SUFFIX) to "cpu-qom.h".
"target/foo/cpu-qom.h" is supposed to be target agnostic
(include-able by any target). Add such mention in the
header.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Acked-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20231013140116.255-7-philmd@linaro.org>
|
|
Enforce the style described by commit 067109a11c ("docs/devel:
mention the spacing requirement for QOM"):
The first declaration of a storage or class structure should
always be the parent and leave a visual space between that
declaration and the new code. It is also useful to separate
backing for properties (options driven by the user) and internal
state to make navigation easier.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20231013140116.255-2-philmd@linaro.org>
|
|
The primary console is special because the toolstack maps a page into
the guest for its ring, and also allocates the guest-side event channel.
The guest's grant table is even primed to export that page using a known
grant ref#. Add support for all that in emulated mode, so that we can
have a primary console.
For reasons unclear, the backends running under real Xen don't just use
a mapping of the well-known GNTTAB_RESERVED_CONSOLE grant ref (which
would also be in the ring-ref node in XenStore). Instead, the toolstack
sets the ring-ref node of the primary console to the GFN of the guest
page. The backend is expected to handle that special case and map it
with foreignmem operations instead.
We don't have an implementation of foreignmem ops for emulated Xen mode,
so just make it map GNTTAB_RESERVED_CONSOLE instead. This would probably
work for real Xen too, but we can't work out how to make real Xen create
a primary console of type "ioemu" to make QEMU drive it, so we can't
test that; might as well leave it as it is for now under Xen.
Now at last we can boot the Xen PV shim and run PV kernels in QEMU.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
|
|
This will allow Linux guests (since v6.0) to use the per-vCPU upcall
vector delivered as MSI through the local APIC.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
|
|
Upstream Xen now ignores this flag¹, since the only guest kernel ever to
use it was buggy.
¹ https://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=19c6cbd909
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
|
|
A guest which has configured the per-vCPU upcall vector may set the
HVM_PARAM_CALLBACK_IRQ param to fairly much anything other than zero.
For example, Linux v6.0+ after commit b1c3497e604 ("x86/xen: Add support
for HVMOP_set_evtchn_upcall_vector") will just do this after setting the
vector:
/* Trick toolstack to think we are enlightened. */
if (!cpu)
rc = xen_set_callback_via(1);
That's explicitly setting the delivery to GSI#1, but it's supposed to be
overridden by the per-vCPU vector setting. This mostly works in Qemu
*except* for the logic to enable the in-kernel handling of event channels,
which falsely determines that the kernel cannot accelerate GSI delivery
in this case.
Add a kvm_xen_has_vcpu_callback_vector() to report whether vCPU#0 has
the vector set, and use that in xen_evtchn_set_callback_param() to
enable the kernel acceleration features even when the param *appears*
to be set to target a GSI.
Preserve the Xen behaviour that when HVM_PARAM_CALLBACK_IRQ is set to
*zero* the event channel delivery is disabled completely. (Which is
what that bizarre guest behaviour is working round in the first place.)
Cc: qemu-stable@nongnu.org
Fixes: 91cce756179 ("hw/xen: Add xen_evtchn device for event channel emulation")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
|
|
The per-vCPU upcall vector support had three problems. Firstly it was
using the wrong hypercall argument and would always return -EFAULT when
the guest tried to set it up. Secondly it was using the wrong ioctl() to
pass the vector to the kernel and thus the *kernel* would always return
-EINVAL. Finally, even when delivering the event directly from userspace
with an MSI, it put the destination CPU ID into the wrong bits of the
MSI address.
Linux doesn't (yet) use this mode so it went without decent testing
for a while.
Cc: qemu-stable@nongnu.org
Fixes: 105b47fdf2d0 ("i386/xen: implement HVMOP_set_evtchn_upcall_vector")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
|
|
This confuses lscpu into thinking it's running in PVH mode.
Cc: qemu-stable@nongnu.org
Fixes: bedcc139248 ("i386/xen: implement HYPERVISOR_xen_version")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
|
|
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This was introduced in KVM in Linux 2.6.32, we can require it unconditionally.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This was introduced in KVM in Linux 2.6.33, we can require it
unconditionally. KVM_CLOCK_TSC_STABLE was only added in Linux 4.9,
for now do not require it (though it would allow the removal of some
pretty yucky code).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This was introduced in KVM in Linux 2.6.34, we can require it unconditionally.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This was introduced in KVM in Linux 2.6.36, and could already be used at
the time to save/restore FPU data even on older processor. We can require
it unconditionally and stop using KVM_GET/SET_FPU.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This was introduced in KVM in Linux 2.6.35, we can require it unconditionally.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Simple code cleanup.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This was introduced in KVM in Linux 3.5, we can require it unconditionally
in kvm_irqchip_send_msi(). However, not all architectures have to implement
it so check it only in x86, the only architecture that ever had MSI injection
but not KVM_CAP_SIGNAL_MSI.
ARM uses it to detect the presence of the ITS emulation in the kernel,
introduced in Linux 4.8. Assume that it's there and possibly fail when
realizing the arm-its-kvm device.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
PAE mode in x86 supports 36 bit address space. Check the PAE CPUID on the
guest processor and set phys_bits to 36 if PAE feature is set. This is in
addition to checking the presence of PSE36 CPUID feature for setting 36 bit
phys_bits.
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-ID: <20230912120650.371781-1-anisinha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Instructions in VEX exception class 6 generally look at the value of
VEX.W. Note that the manual places some instructions incorrectly in
class 4, for example VPERMQ which has no non-VEX encoding and no legacy
SSE analogue. AMD does a mess of its own, as documented in the comment
that this patch adds.
Most of them are checked for VEX.W=0, and are listed in the manual
(though with an omission) in table 2-16; VPERMQ and VPERMPD check for
VEX.W=1, which is only listed in the instruction description. Others,
such as VPSRLV, VPSLLV and the FMA3 instructions, use VEX.W to switch
between a 32-bit and 64-bit operation.
Fix more of the class 4/class 6 mismatches, and implement the check for
VEX.W in TCG.
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In preparation for adding more similar checks, move the VEX.L=0 check
and several X86_SPECIAL_* checks to a new field, where each bit represent
a common check on unused bits, or a restriction on the processor mode.
Likewise, many SVM intercepts can be checked during the decoding phase,
the main exception being the selective CR0 write, MSR and IOIO intercepts.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The implementation was validated with OpenSSL and with the test vectors in
https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/sha.rs.
The instructions provide a ~25% improvement on hashing a 64 MiB file:
runtime goes down from 1.8 seconds to 1.4 seconds; instruction count on
the host goes down from 5.8 billion to 4.8 billion with slightly better
IPC too. Good job Intel. ;)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|