aboutsummaryrefslogtreecommitdiff
path: root/hw/virtio
AgeCommit message (Collapse)AuthorFilesLines
2025-12-09vhost: Always initialize cached vring dataHanna Czenczek1-15/+23
vhost_virtqueue_start() can exit early if the descriptor ring address is 0, assuming the virtqueue isn’t ready to start. In this case, all cached vring information (size, physical address, pointer) is left as-is. This is OK at first startup, when that info is still initialized to 0, but after a reset, it will retain old (outdated) information. vhost_virtqueue_start() must make sure these values are (re-)set properly before exiting. (When using an IOMMU, these outdated values can stall the device: vhost_dev_start() deliberately produces an IOMMU miss event for each used vring. If used_phys contains an outdated value, the resulting lookup may fail, forcing the device to be stopped.) Cc: qemu-stable@nongnu.org Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20251208113008.153249-1-hreitz@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-11-25hw/virtio: Use error_setg_file_open() for a better error messageMarkus Armbruster1-2/+1
The error message changes from vhost-vsock: failed to open vhost device: REASON to Could not open '/dev/vhost-vsock': REASON I think the exact file name is more useful to know than the file's purpose. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20251121121438.1249498-8-armbru@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-11-21qmp: Fix a typo for a USO featureJack Wang1-1/+1
There is a copy & paste error, USO6 should be there. Fixes: 58f81689789f ("qmp: update virtio feature maps, vhost-user-gpio introspection") Signed-off-by: Jack Wang <jinpu.wang@ionos.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-11-09vhost-user: make vhost_set_vring_file() synchronousGerman Maglione1-1/+23
QEMU sends all of VHOST_USER_SET_VRING_KICK, _CALL, and _ERR without setting the NEED_REPLY flag, i.e. by the time the respective vhost_user_set_vring_*() function returns, it is completely up to chance whether the back-end has already processed the request and switched over to the new FD for interrupts. At least for vhost_user_set_vring_call(), that is a problem: It is called through vhost_virtqueue_mask(), which is generally used in the VirtioDeviceClass.guest_notifier_mask() implementation, which is in turn called by virtio_pci_one_vector_unmask(). The fact that we do not wait for the back-end to install the FD leads to a race there: Masking interrupts is implemented by redirecting interrupts to an internal event FD that is not connected to the guest. Unmasking then re-installs the guest-connected IRQ FD, then checks if there are pending interrupts left on the masked event FD, and if so, issues an interrupt to the guest. Because guest_notifier_mask() (through vhost_user_set_vring_call()) doesn't wait for the back-end to switch over to the actual IRQ FD, it's possible we check for pending interrupts while the back-end is still using the masked event FD, and then we will lose interrupts that occur before the back-end finally does switch over. Fix this by setting NEED_REPLY on those VHOST_USER_SET_VRING_* messages, so when we get that reply, we know that the back-end is now using the new FD. We have a few reports of a virtiofs mount hanging: - https://gitlab.com/virtio-fs/virtiofsd/-/issues/101 - https://gitlab.com/virtio-fs/virtiofsd/-/issues/133 - https://gitlab.com/virtio-fs/virtiofsd/-/issues/213 This is quite difficult bug to reproduce, even for the reporters. It only happens on production, every few weeks, and/or on 1 in 300 VMs. So, we are not 100% sure this fixes that issue. However, we think this is still a bug, and at least we have one report that claims this fixed the issue: https://gitlab.com/virtio-fs/virtiofsd/-/issues/133#note_2743209419 Fixes: 5f6f6664bf24 ("Add vhost-user as a vhost backend.") Signed-off-by: German Maglione <gmaglione@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20251022162405.318672-1-gmaglione@redhat.com>
2025-11-09vhost-user: fix shared object lookup handler logicAlbert Esteve1-27/+13
Refactor backend_read() function and add a reply_ack variable to have the option for handlers to force tweak whether they should send a reply or not without depending on VHOST_USER_NEED_REPLY_MASK flag. This fixes an issue with vhost_user_backend_handle_shared_object_lookup() logic, as the error path was not closing the backend channel correctly. So, we can remove the reply call from within the handler, make sure it returns early on errors as other handlers do and set the reply_ack variable on backend_read() to true to ensure that it will send a response, thus keeping the original intent. Fixes: 1609476662 ("vhost-user: add shared_object msg") Cc: qemu-stable@nongnu.org Signed-off-by: Albert Esteve <aesteve@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20251017072011.1874874-2-aesteve@redhat.com>
2025-11-03migration: Fix regression of passing error_fatal into vmstate_load_state()Arun Menon3-7/+33
error_fatal is passed to vmstate_load_state() and vmstate_save_state() functions. This was introduced in commit c632ffbd74. This would exit(1) on error, and therefore does not allow to propagate the error back to the caller. To maintain consistency with prior error handling i.e. either propagating the error to the caller or reporting it, we must set the error within a local Error object instead of using error_fatal. Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20251028-solve_error_fatal_regression-v2-1-dab24c808a28@redhat.com [peterx: always uninit var ret, per Akihiko] [peterx: touchups on line ordering, spacings etc.] Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-28char: rename CharBackend->CharFrontendMarc-André Lureau2-11/+11
The actual backend is "Chardev", CharBackend is the frontend side of it (whatever talks to the backend), let's rename it for readability. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Link: https://lore.kernel.org/r/20251022074612.1258413-1-marcandre.lureau@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-21hw/virtio: Compile virtio-mem.c oncePhilippe Mathieu-Daudé1-1/+1
Remove unused "system/ram_addr.h" header. This file doesn't use any target specific definitions anymore, compile it once by moving it to system_virtio_ss[]. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20250502214551.80401-6-philmd@linaro.org>
2025-10-21hw/virtio/virtio-mem: Convert VIRTIO_MEM_HAS_LEGACY_GUESTS to runtimePhilippe Mathieu-Daudé1-33/+43
Check legacy guests support at runtime: instead of evaluating the VIRTIO_MEM_HAS_LEGACY_GUESTS definition at compile time, call target_arch() to detect which target is being run at runtime. Register virtio_mem_legacy_guests_properties[] at runtime. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20250502214551.80401-5-philmd@linaro.org>
2025-10-21hw/virtio/virtio-mem: Convert VIRTIO_MEM_USABLE_EXTENT to runtimePhilippe Mathieu-Daudé1-8/+16
Use target_arch() to check at runtime which target architecture is being run. Note, since TARGET_ARM is defined for TARGET_AARCH64, we check for both ARM & AARCH64 enum values. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20250502214551.80401-4-philmd@linaro.org>
2025-10-09Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingRichard Henderson1-10/+11
* i386: fix migration issues in 10.1 * target/i386/mshv: new accelerator * rust: use glib-sys-rs * rust: fixes for docker tests # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmjnaOwUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroNsFQf/WXKxZLLnItHwDz3UdwjzewPWpz5N # fpS0E4C03J8pACDgyfl7PQl47P7NlJ08Ig2Lc5l3Z9KiAKgh0orR7Cqd0BY5f9lo # uk4FgXfXpQyApywAlctadrTfcH8sRv2tMaP6EJ9coLtJtHW9RUGFPaZeMsqrjpAl # TpwAXPYNDDvvy1ih1LPh5DzOPDXE4pin2tDa94gJei56gY95auK4zppoNYLdB3kR # GOyR4QK43/yhuxPHOmQCZOE3HK2XrKgMZHWIjAovjZjZFiJs49FaHBOpRfFpsUlG # PB3UbIMtu69VY20LqbbyInPnyATRQzqIGnDGTErP6lfCGTKTy2ulQYWvHA== # =KM5O # -----END PGP SIGNATURE----- # gpg: Signature made Thu 09 Oct 2025 12:49:00 AM PDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [unknown] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (35 commits) rust: fix path to rust_root_crate.sh tests/docker: make --enable-rust overridable with EXTRA_CONFIGURE_OPTS MAINTAINERS: Add maintainers for mshv accelerator docs: Add mshv to documentation target/i386/mshv: Use preallocated page for hvcall qapi/accel: Allow to query mshv capabilities accel/mshv: Handle overlapping mem mappings target/i386/mshv: Implement mshv_vcpu_run() target/i386/mshv: Write MSRs to the hypervisor target/i386/mshv: Integrate x86 instruction decoder/emulator target/i386/mshv: Register MSRs with MSHV target/i386/mshv: Register CPUID entries with MSHV target/i386/mshv: Set local interrupt controller state target/i386/mshv: Implement mshv_arch_put_registers() target/i386/mshv: Implement mshv_get_special_regs() target/i386/mshv: Implement mshv_get_standard_regs() target/i386/mshv: Implement mshv_store_regs() target/i386/mshv: Add CPU create and remove logic accel/mshv: Add vCPU signal handling accel/mshv: Add vCPU creation and execution loop ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2025-10-08hw/intc: Generalize APIC helper names from kvm_* to accel_*Magnus Kulke1-10/+11
Rename APIC helper functions to use an accel_* prefix instead of kvm_* to support use by accelerators other than KVM. This is a preparatory step for integrating MSHV support with common APIC logic. Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com> Link: https://lore.kernel.org/r/20250916164847.77883-5-magnuskulke@linux.microsoft.com [Remove dead definition of mshv_msi_via_irqfd_enabled. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-10-07hw: Remove unnecessary 'system/ram_addr.h' headerPhilippe Mathieu-Daudé1-1/+0
None of these files require definition exposed by "system/ram_addr.h", remove its inclusion. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Jagannathan Raman <jag.raman@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20251001175448.18933-7-philmd@linaro.org>
2025-10-07hw/virtio/virtio: Replace legacy cpu_physical_memory_map() callPhilippe Mathieu-Daudé1-4/+6
Propagate VirtIODevice::dma_as to virtqueue_undo_map_desc() in order to replace the legacy cpu_physical_memory_unmap() call by address_space_unmap(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251002084203.63899-18-philmd@linaro.org>
2025-10-07hw/virtio/vhost: Replace legacy cpu_physical_memory_*map() callsPhilippe Mathieu-Daudé1-2/+5
Use VirtIODevice::dma_as address space to convert the legacy cpu_physical_memory_[un]map() calls to address_space_[un]map(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251002084203.63899-17-philmd@linaro.org>
2025-10-07system/ramblock: Move ram_block_discard_*_range() declarationsPhilippe Mathieu-Daudé2-0/+2
Keep RAM blocks API in the same header: "system/ramblock.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <20251002032812.26069-4-philmd@linaro.org>
2025-10-06Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Richard Henderson14-148/+371
into staging virtio,pci,pc: features, fixes users can now control VM bit in smbios. vhost-user-device is now user-createable. intel_iommu now supports PRI virtio-net now supports GSO over UDP tunnel ghes now supports error injection amd iommu now supports dma remapping for vfio better error messages for virtio small fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmji0s0PHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpuH4H/09h70IqAWZGHIWKGmmGGtdKOj3g54KuI0Ss # mGECEsHvvBexOy670Qy8jdgXfaW4UuNui8BiOnJnGsBX8Y0dy+/yZori3KhkXkaY # D57Ap9agkpHem7Vw0zgNsAF2bzDdlzTiQ6ns5oDnSq8yt82onCb5WGkWTGkPs/jL # Gf8Jv+Ddcpt5SU4/hHPYC8pUhl7z4xPOOyl0Qp1GG21Pxf5v4sGFcWuGGB7UEPSQ # MjZeoM0rSnLDtNg18sGwD5RPLQs13TbtgsVwijI79c3w3rcSpPNhGR5OWkdRCIYF # 8A0Nhq0Yfo0ogTht7yt1QNPf/ktJkuoBuGVirvpDaix2tCBECes= # =Zvq/ # -----END PGP SIGNATURE----- # gpg: Signature made Sun 05 Oct 2025 01:19:25 PM PDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [unknown] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (75 commits) virtio: improve virtqueue mapping error messages pci: Fix wrong parameter passing to pci_device_get_iommu_bus_devfn() intel_iommu: Simplify caching mode check with VFIO device intel_iommu: Enable Enhanced Set Root Table Pointer Support (ESRTPS) vdpa-dev: add get_vhost() callback for vhost-vdpa device amd_iommu: HATDis/HATS=11 support intel-iommu: Move dma_translation to x86-iommu amd_iommu: Refactor amdvi_page_walk() to use common code for page walk amd_iommu: Do not assume passthrough translation when DTE[TV]=0 amd_iommu: Toggle address translation mode on devtab entry invalidation amd_iommu: Add dma-remap property to AMD vIOMMU device amd_iommu: Set all address spaces to use passthrough mode on reset amd_iommu: Toggle memory regions based on address translation mode amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL amd_iommu: Add replay callback amd_iommu: Unmap all address spaces under the AMD IOMMU on reset amd_iommu: Use iova_tree records to determine large page size on UNMAP amd_iommu: Sync shadow page tables on page invalidation amd_iommu: Add basic structure to support IOMMU notifier updates amd_iommu: Add a page walker to sync shadow page tables on invalidation ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2025-10-05virtio: improve virtqueue mapping error messagesAlessandro Ratti1-3/+12
Improve error reporting when virtqueue ring mapping fails by including a device identifier in the error message. Introduce a helper qdev_get_printable_name() in qdev-core, which returns either: - the device ID, if explicitly provided (e.g. -device ...,id=foo) - the QOM path from qdev_get_dev_path(dev) otherwise - "<unknown device>" as a fallback when no identifier is present This makes it easier to identify which device triggered the error in multi-device setups or when debugging complex guest configurations. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/230 Buglink: https://bugs.launchpad.net/qemu/+bug/1919021 Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alessandro Ratti <alessandro@0x65c.net> Message-Id: <20250924093138.559872-2-alessandro@0x65c.net> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-05vdpa-dev: add get_vhost() callback for vhost-vdpa deviceLi Zhaoxin1-0/+7
Commit c255488d67 "virtio: add vhost support for virtio devices" added the get_vhost() function, but it did not include vhost-vdpa devices. So when I use the vdpa device and query the status of the vdpa device with the x-query-virtio-status qmp command, since vdpa does not implement vhost_get, it will cause qemu to crash. Therefore, in order to obtain the status of the virtio device under vhost-vdpa, we need to add a vhost_get implement for the vdpa device. Co-developed-by: Miao Kezhan <miaokezhan@baidu.com> Signed-off-by: Miao Kezhan <miaokezhan@baidu.com> Signed-off-by: Li Zhaoxin <lizhaoxin04@baidu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <2778f817cb6740a15ecb37927804a67288b062d1.1758860411.git.lizhaoxin04@baidu.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-05virtio: support irqfd in virtio_notify_config()Stefan Hajnoczi1-1/+6
virtio_error() calls virtio_notify_config() to inject a VIRTIO Configuration Change Notification. This doesn't work from IOThreads because the BQL is not held and the interrupt code path requires the BQL. Follow the same approach as virtio_notify() and use ->config_notifier (an irqfd) when called from the IOThread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250922220149.498967-4-stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-05virtio: unify virtio_notify_irqfd() and virtio_notify()Stefan Hajnoczi2-16/+13
The difference between these two functions: - virtio_notify() uses the interrupt code path (MSI or classic IRQs) - virtio_notify_irqfd() uses guest notifiers (irqfds) virtio_notify() can only be called with the BQL held because the interrupt code path requires the BQL. Device models use virtio_notify_irqfd() from IOThreads since the BQL is not held. The two functions can be unified by pushing down the if (qemu_in_iothread()) check from virtio-blk and virtio-scsi into core virtio code. This is in preparation for the next commit that will add irqfd support to virtio_notify_config() and where it's unattractive to introduce another irqfd-only API for device model callers. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250922220149.498967-3-stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-05vhost: use virtio_config_get_guest_notifier()Stefan Hajnoczi1-4/+7
There is a getter function so avoid accessing the ->config_notifier field directly. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250922220149.498967-2-stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-05hw/virtio: rename vhost-user-device and make user creatableAlex Bennée4-18/+18
We didn't make the device user creatable in the first place because we were worried users might get confused. Rename the device to make its nature as a test device even more explicit. While we are at it add a Kconfig variable so it can be skipped for those that want to thin out their build configuration even further. Acked-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-ID: <20250820195632.1956795-1-alex.bennee@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250901105948.982583-1-alex.bennee@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-04vhost-backend: implement extended features supportPaolo Abeni1-11/+51
Leverage the kernel extended features manipulation ioctls(), if available, and fallback to old ops otherwise. Error out when setting extended features but kernel support is not available. Note that extended support for get/set backend features is not needed, as the only feature that can be changed belongs to the 64 bit range. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <150daade3d59e77629276920e014ee8e5fc12121.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-04qmp: update virtio features map to support extended featuresPaolo Abeni3-30/+67
Extend the VirtioDeviceFeatures struct with an additional u64 to track unknown features in the 64-127 bit range and decode the full virtio features spaces for vhost and virtio devices. Also add entries for the soon-to-be-supported virtio net GSO over UDP features. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <e51969f94d89045b333f1bc5ef5fca9e12fc371a.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-04vhost: add support for negotiating extended featuresPaolo Abeni1-20/+48
Similar to virtio infra, vhost core maintains the features status in the full extended format and allows the devices to implement extended version of the getter/setter. Note that 'protocol_features' are not extended: they are only used by vhost-user, and the latter device is not going to implement extended features soon. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <a0062c3b1847fb2baedd6cd8f6ef13b051d6beb2.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-04virtio-pci: implement support for extended featuresPaolo Abeni1-9/+67
Extend the features configuration space to 128 bits. If the virtio device supports any extended features, allow the common read/write operation to access all of it, otherwise keep exposing only the lower 64 bits. On migration, save the 128 bit version of the features only if the upper bits are non zero. Relay on reset to clear all the feature space before load. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <c0b81601f65b41ca8310eba8f05e2dcf3702de89.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-04virtio: add support for negotiating extended featuresPaolo Abeni2-6/+19
The virtio specifications allows for a device features space up to 128 bits and more. Soon we are going to use some of the 'extended' bits features for the virtio net driver. Add support to allow extended features negotiation on a per devices basis. Devices willing to negotiated extended features need to implemented a new pair of features getter/setter, the core will conditionally use them instead of the basic one. Note that 'bad_features' don't need to be extended, as they are bound to the 64 bits limit. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <9bb29d70adc3f2b8c7756d4e3cd076cffee87826.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-04virtio: serialize extended features statePaolo Abeni1-31/+57
If the driver uses any of the extended features (i.e. 64 or above), store the extended features range (64-127 bits). At load time, let legacy features initialize the full features range and pass it to the set helper; sub-states loading will have filled-up the extended part as needed. This is one of the few spots that need explicitly to know and set in stone the extended features array size; add a build bug to prevent breaking the migration should such size change again in the future: more serialization plumbing will be needed. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <d5d9d398675bee6c4c7d7308c5d3d5d3c6d17d87.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-03migration: Remove error variant of vmstate_save_state() functionArun Menon3-4/+6
This commit removes the redundant vmstate_save_state_with_err() function. Previously, commit 969298f9d7 introduced vmstate_save_state_with_err() to handle error propagation, while vmstate_save_state() existed for non-error scenarios. This is because there were code paths where vmstate_save_state_v() (called internally by vmstate_save_state) did not explicitly set errors on failure. This change unifies error handling by - updating vmstate_save_state() to accept an Error **errp argument. - vmstate_save_state_v() ensures errors are set directly within the errp object, eliminating the need for two separate functions. All calls to vmstate_save_state_with_err() are replaced with vmstate_save_state(). This simplifies the API and improves code maintainability. vmstate_save_state() that only calls vmstate_save_state_v(), by inference, also has errors set in errp in case of failure. The errors are reported using error_report_err(). If we want the function to exit on error, then &error_fatal is passed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-24-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into vmstate_load_state()Arun Menon3-4/+8
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load_state() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series, when we are actually able to propagate the error to the calling function using errp. Whereas, if we want the function to exit on error, then error_fatal is passed. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-2-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-09-19treewide: use qemu_set_blocking instead of g_unix_set_fd_nonblockingVladimir Sementsov-Ogievskiy1-6/+2
Instead of open-coded g_unix_set_fd_nonblocking() calls, use QEMU wrapper qemu_set_blocking(). Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> [DB: fix missing closing ) in tap-bsd.c, remove now unused GError var] Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2025-09-19util: drop qemu_socket_set_nonblock()Vladimir Sementsov-Ogievskiy1-1/+4
Use common qemu_set_blocking() instead. Note that pre-patch the behavior of Win32 and Linux realizations are inconsistent: we ignore failure for Win32, and assert success for Linux. How do we convert the callers? 1. Most of callers call qemu_socket_set_nonblock() on a freshly created socket fd, in conditions when we may simply report an error. Seems correct switching to error handling both for Windows (pre-patch error is ignored) and Linux (pre-patch we assert success). Anyway, we normally don't expect errors in these cases. Still in tests let's use &error_abort for simplicity. What are exclusions? 2. hw/virtio/vhost-user.c - we are inside #ifdef CONFIG_LINUX, so no damage in switching to error handling from assertion. 3. io/channel-socket.c: here we convert both old calls to qemu_socket_set_nonblock() and qemu_socket_set_block() to one new call. Pre-patch we assert success for Linux in qemu_socket_set_nonblock(), and ignore all other errors here. So, for Windows switch is a bit dangerous: we may get new errors or crashes(when error_abort is passed) in cases where we have silently ignored the error before (was it correct in all such cases, if they were?) Still, there is no other way to stricter API than take this risk. 4. util/vhost-user-server - compiled only for Linux (see util/meson.build), so we are safe, switching from assertion to &error_abort. Note: In qga/channel-posix.c we use g_warning(), where g_printerr() would actually be a better choice. Still let's for now follow common style of qga, where g_warning() is commonly used to print such messages, and no call to g_printerr(). Converting everything to use g_printerr() should better be another series. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2025-08-01vhost: Do not abort on log-stop errorHanna Czenczek1-1/+2
Failing to stop logging in a vhost device is not exactly fatal. We can log such an error, but there is no need to abort the whole qemu process because of it. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20250724125928.61045-3-hreitz@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01vhost: Do not abort on log-start errorHanna Czenczek1-1/+2
Commit 3688fec8923 ("memory: Add Error** argument to .log_global_start() handler") enabled vhost_log_global_start() to return a proper error, but did not change it to do so; instead, it still aborts the whole process on error. This crash can be reproduced by e.g. killing a virtiofsd daemon before initiating migration. In such a case, qemu should not crash, but just make the attempted migration fail. Buglink: https://issues.redhat.com/browse/RHEL-94534 Reported-by: Tingting Mao <timao@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20250724125928.61045-2-hreitz@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-08-01virtio: fix off-by-one and invalid access in virtqueue_ordered_fillJonah Palmer1-6/+16
Commit b44135daa372 introduced virtqueue_ordered_fill for VIRTIO_F_IN_ORDER support but had a few issues: * Conditional while loop used 'steps <= max_steps' but should've been 'steps < max_steps' since reaching steps == max_steps would indicate that we didn't find an element, which is an error. Without this change, the code would attempt to read invalid data at an index outside of our search range. * Incremented 'steps' using the next chain's ndescs instead of the current one. This patch corrects the loop bounds and synchronizes 'steps' and index increments. We also add a defensive sanity check against malicious or invalid descriptor counts to avoid a potential infinite loop and DoS. Fixes: b44135daa372 ("virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support") Reported-by: terrynini <terrynini38514@gmail.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250721150208.2409779-1-jonah.palmer@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-07-16Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi5-51/+62
into staging virtio,pci,pc: features, fixes, tests SPCR acpi table can now be disabled vhost-vdpa can now report hashing capability to guest PPTT acpi table now tells guest vCPUs are identical vost-user-blk now shuts down faster loongarch64 now supports bios-tables-test intel_iommu now supports ATS cxl now supports DCD Fabric Management Command Set arm now supports acpi pci hotplug fixes, cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ # pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv # joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i # TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg # h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z # ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc= # =sktm # -----END PGP SIGNATURE----- # gpg: Signature made Tue 15 Jul 2025 02:56:48 EDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (97 commits) hw/cxl: mailbox-utils: 0x5605 - FMAPI Initiate DC Release hw/cxl: mailbox-utils: 0x5604 - FMAPI Initiate DC Add hw/cxl: Create helper function to create DC Event Records from extents hw/cxl: mailbox-utils: 0x5603 - FMAPI Get DC Region Extent Lists hw/cxl: mailbox-utils: 0x5602 - FMAPI Set DC Region Config hw/mem: cxl_type3: Add DC Region bitmap lock hw/cxl: Move definition for dynamic_capacity_uuid and enum for DC event types to header hw/cxl: mailbox-utils: 0x5601 - FMAPI Get Host Region Config hw/mem: cxl_type3: Add dsmas_flags to CXLDCRegion struct hw/cxl: mailbox-utils: 0x5600 - FMAPI Get DCD Info hw/cxl: fix DC extent capacity tracking tests: virt: Update expected ACPI tables for virt test hw/acpi/aml-build: Build a root node in the PPTT table hw/acpi/aml-build: Set identical implementation flag for PPTT processor nodes tests: virt: Allow changes to PPTT test table qtest/bios-tables-test: Generate reference blob for DSDT.acpipcihp qtest/bios-tables-test: Generate reference blob for DSDT.hpoffacpiindex tests/qtest/bios-tables-test: Add aarch64 ACPI PCI hotplug test tests/qtest/bios-tables-test: Prepare for addition of acpi pci hp tests hw/arm/virt: Let virt support pci hotplug/unplug GED event ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Conflicts: net/vhost-vdpa.c vhost_vdpa_set_steering_ebpf() was removed, resolve the context conflict.
2025-07-15hw/virtio: Build various files oncePhilippe Mathieu-Daudé2-10/+11
Now that various VirtIO files don't use target specific API anymore, we can move them to the system_ss[] source set to build them once. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20250708215320.70426-9-philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-07-15qemu: Declare all load/store helper in 'qemu/bswap.h'Philippe Mathieu-Daudé1-0/+1
Restrict "exec/tswap.h" to the tswap*() methods, move the load/store helpers with the other ones declared in "qemu/bswap.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20250708215320.70426-8-philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-07-15qemu: Convert target_words_bigendian() to TargetInfo APIPhilippe Mathieu-Daudé1-1/+1
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250708215320.70426-6-philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-07-14vhost: add a helper for force stopping a deviceDaniil Tatianin1-13/+39
This adds an ability to skip GET_VRING_BASE during device stop entirely, and thus the expensive drain operation that this call entails as well, which may be useful during a non-graceful shutdown in case the guest operating system hangs or refuses to react to a previously requested ACPI shutdown for whatever reason. Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Message-Id: <20250609212547.2859224-3-d-tatianin@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-07-14vhost: Fix used memslot tracking when destroying a vhost deviceDavid Hildenbrand1-27/+10
When we unplug a vhost device, we end up calling vhost_dev_cleanup() where we do a memory_listener_unregister(). This memory_listener_unregister() call will end up disconnecting the listener from the address space through listener_del_address_space(). In that process, we effectively communicate the removal of all memory regions from that listener, resulting in region_del() + commit() callbacks getting triggered. So in case of vhost, we end up calling vhost_commit() with no remaining memory slots (0). In vhost_commit() we end up overwriting the global variables used_memslots / used_shared_memslots, used for detecting the number of free memslots. With used_memslots / used_shared_memslots set to 0 by vhost_commit() during device removal, we'll later assume that the other vhost devices still have plenty of memslots left when calling vhost_get_free_memslots(). Let's fix it by simply removing the global variables and depending only on the actual per-device count. Easy to reproduce by adding two vhost-user devices to a VM and then hot-unplugging one of them. While at it, detect unexpected underflows in vhost_get_free_memslots() and issue a warning. Reported-by: yuanminghao <yuanmh12@chinatelecom.cn> Link: https://lore.kernel.org/qemu-devel/20241121060755.164310-1-yuanmh12@chinatelecom.cn/ Fixes: 2ce68e4cf5be ("vhost: add vhost_has_free_slot() interface") Cc: Igor Mammedov <imammedo@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20250603111336.1858888-1-david@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-07-14virtio-net: Add queues for RSS during migrationAkihiko Odaki1-7/+7
virtio_net_pre_load_queues() inspects vdev->guest_features to tell if VIRTIO_NET_F_RSS or VIRTIO_NET_F_MQ is enabled to infer the required number of queues. This works for VIRTIO_NET_F_MQ but it doesn't for VIRTIO_NET_F_RSS because only the lowest 32 bits of vdev->guest_features is set at the point and VIRTIO_NET_F_RSS uses bit 60 while VIRTIO_NET_F_MQ uses bit 22. Instead of inferring the required number of queues from vdev->guest_features, use the number loaded from the vm state. This change also has a nice side effect to remove a duplicate peer queue pair change by circumventing virtio_net_set_multiqueue(). Also update the comment in include/hw/virtio/virtio.h to prevent an implementation of pre_load_queues() from refering to any fields being loaded during migration by accident in the future. Fixes: 8c49756825da ("virtio-net: Add only one queue pair when realizing") Tested-by: Lei Yang <leiyang@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2025-06-23memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard()Chenyi Qiang1-11/+10
Update ReplayRamDiscard() function to return the result and unify the ReplayRamPopulate() and ReplayRamDiscard() to ReplayRamDiscardState() at the same time due to their identical definitions. This unification simplifies related structures, such as VirtIOMEMReplayData, which makes it cleaner. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250612082747.51539-4-chenyi.qiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-06-23memory: Change memory_region_set_ram_discard_manager() to return the resultChenyi Qiang1-13/+17
Modify memory_region_set_ram_discard_manager() to return -EBUSY if a RamDiscardManager is already set in the MemoryRegion. The caller must handle this failure, such as having virtio-mem undo its actions and fail the realize() process. Opportunistically move the call earlier to avoid complex error handling. This change is beneficial when introducing a new RamDiscardManager instance besides virtio-mem. After ram_block_coordinated_discard_require(true) unlocks all RamDiscardManager instances, only one instance is allowed to be set for one MemoryRegion at present. Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Tested-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250612082747.51539-3-chenyi.qiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-06-23memory: Export a helper to get intersection of a MemoryRegionSection with a ↵Chenyi Qiang1-27/+5
given range Rename the helper to memory_region_section_intersect_range() to make it more generic. Meanwhile, define the @end as Int128 and replace the related operations with Int128_* format since the helper is exported as a wider API. Suggested-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250612082747.51539-2-chenyi.qiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-06-12hw/virtio/virtio: avoid cost of -ftrivial-auto-var-init in hot pathStefan Hajnoczi1-4/+4
Since commit 7ff9ff039380 ("meson: mitigate against use of uninitialize stack for exploits") the -ftrivial-auto-var-init=zero compiler option is used to zero local variables. While this reduces security risks associated with uninitialized stack data, it introduced a measurable bottleneck in the virtqueue_split_pop() and virtqueue_packed_pop() functions. These virtqueue functions are in the hot path. They are called for each element (request) that is popped from a VIRTIO device's virtqueue. Using __attribute__((uninitialized)) on large stack variables in these functions improves fio randread bs=4k iodepth=64 performance from 304k to 332k IOPS (+9%). This issue was found using perf-top(1). virtqueue_split_pop() was one of the top CPU consumers and the "annotate" feature showed that the memory zeroing instructions at the beginning of the functions were hot. Fixes: 7ff9ff039380 ("meson: mitigate against use of uninitialize stack for exploits") Cc: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20250610123709.835102-3-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-06-05vfio: return mr from vfio_get_xlat_addrSteve Sistare1-2/+7
Modify memory_get_xlat_addr and vfio_get_xlat_addr to return the memory region that the translated address is found in. This will be needed by CPR in a subsequent patch to map blocks using IOMMU_IOAS_MAP_FILE. Also return the xlat offset, so we can simplify the interface by removing the out parameters that can be trivially derived from mr and xlat. Lastly, rename the functions to to memory_translate_iotlb() and vfio_translate_iotlb(). Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: John Levon <john.levon@nutanix.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/1747661203-136490-1-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-02Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi3-39/+86
into staging virtio,pci,pc: features, fixes, tests vhost will now no longer set a call notifier if unused some work towards loongarch testing based on bios-tables-test some core pci work for SVM support in vtd vhost vdpa init has been optimized for response time to QMP A couple more fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmg97ZUPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpRBsH/0Fx4NNMaynXmVOgV1rMFirTydhQG5NSdeJv # i1RHd25Rne/RXH0CL71UPuOPADWh6bv9iZTg6RU6g7TwI8K9v3M0R71RlPLh1Lh1 # x7fifWNSNXVi18fM9/j+mIg7I2Ye0AaqveezRJWGzqoOxQKKlVI2xspKZBCCkygd # i2tgtR1ORB6+ji6wVoTDPlL42X5Jef5MUT3XOcRR5biHm0JfqxxQKVM83mD+5yMI # 0YqjT2BVRzo5rGN7mSuf7tQ50xI6I0wI1+eoWeKHRbg08f709M8TZRDKuVh24Evg # 9WnIhKLTzRVdCNLNbw9h9EhxoANpWCyvmnn6GCfkJui40necFHY= # =0lO6 # -----END PGP SIGNATURE----- # gpg: Signature made Mon 02 Jun 2025 14:29:41 EDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (26 commits) hw/i386/pc_piix: Fix RTC ISA IRQ wiring of isapc machine vdpa: move memory listener register to vhost_vdpa_init vdpa: move iova_tree allocation to net_vhost_vdpa_init vdpa: reorder listener assignment vdpa: add listener_registered vdpa: set backend capabilities at vhost_vdpa_init vdpa: reorder vhost_vdpa_set_backend_cap vdpa: check for iova tree initialized at net_client_start vhost: Don't set vring call if guest notifier is unused tests/qtest/bios-tables-test: Use MiB macro rather hardcode value tests/data/uefi-boot-images: Add ISO image for LoongArch system uefi-test-tools:: Add LoongArch64 support pci: Add a PCI-level API for PRI pci: Add a pci-level API for ATS pci: Add a pci-level initialization function for IOMMU notifiers memory: Store user data pointer in the IOMMU notifiers pci: Add an API to get IOMMU's min page size and virtual address width pci: Cache the bus mastering status in the device pcie: Helper functions to check to check if PRI is enabled pcie: Add a helper to declare the PRI capability for a pcie device ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-06-02vdpa: move memory listener register to vhost_vdpa_initEugenio Pérez1-7/+28
Current memory operations like pinning may take a lot of time at the destination. Currently they are done after the source of the migration is stopped, and before the workload is resumed at the destination. This is a period where neigher traffic can flow, nor the VM workload can continue (downtime). We can do better as we know the memory layout of the guest RAM at the destination from the moment that all devices are initializaed. So moving that operation allows QEMU to communicate the kernel the maps while the workload is still running in the source, so Linux can start mapping them. As a small drawback, there is a time in the initialization where QEMU cannot respond to QMP etc. By some testing, this time is about 0.2seconds. This may be further reduced (or increased) depending on the vdpa driver and the platform hardware, and it is dominated by the cost of memory pinning. This matches the time that we move out of the called downtime window. The downtime is measured as the elapsed trace time between the last vhost_vdpa_suspend on the source and the last vhost_vdpa_set_vring_enable_one on the destination. In other words, from "guest CPUs freeze" to the instant the final Rx/Tx queue-pair is able to start moving data. Using ConnectX-6 Dx (MLX5) NICs in vhost-vDPA mode with 8 queue-pairs, the series reduces guest-visible downtime during back-to-back live migrations by more than half: - 39G VM: 4.72s -> 2.09s (-2.63s, ~56% improvement) - 128G VM: 14.72s -> 5.83s (-8.89s, ~60% improvement) Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250522145839.59974-8-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>