aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2025-06-11vfio/container: register container for cprSteve Sistare2-0/+17
Register a legacy container for cpr-transfer, replacing the generic CPR register call with a more specific legacy container register call. Add a blocker if the kernel does not support VFIO_UPDATE_VADDR or VFIO_UNMAP_ALL. This is mostly boiler plate. The fields to to saved and restored are added in subsequent patches. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-4-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11migration: lower handler prioritySteve Sistare1-1/+5
Define a vmstate priority that is lower than the default, so its handlers run after all default priority handlers. Since 0 is no longer the default priority, translate an uninitialized priority of 0 to MIG_PRI_DEFAULT. CPR for vfio will use this to install handlers for containers that run after handlers for the devices that they contain. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11migration: cpr helpersSteve Sistare1-0/+5
Add the cpr_incoming_needed, cpr_open_fd, and cpr_resave_fd helpers, for use when adding cpr support for vfio and iommufd. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11vfio: mark posted writes in region write callbacksJohn Levon2-2/+3
For vfio-user, the region write implementation needs to know if the write is posted; add the necessary plumbing to support this. Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250607001056.335310-5-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11vfio: add per-region fd supportJohn Levon1-2/+5
For vfio-user, each region has its own fd rather than sharing vbasedev's. Add the necessary plumbing to support this, and use the correct fd in vfio_region_mmap(). Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250607001056.335310-4-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-11hw/vfio/ap: Storing event information for an AP configuration change eventRorie Reyes1-0/+39
These functions can be invoked by the function that handles interception of the CHSC SEI instruction for requests indicating the accessibility of one or more adjunct processors has changed. Signed-off-by: Rorie Reyes <rreyes@linux.ibm.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Link: https://lore.kernel.org/qemu-devel/20250609164418.17585-4-rreyes@linux.ibm.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-10hw/riscv/riscv-iommu: Remove definition of RISCVIOMMU[Pci|Sys]ClassZhenzhong Duan1-4/+2
RISCVIOMMUPciClass and RISCVIOMMUSysClass are defined with missed parent class, class_init on them may corrupt their parent class fields. It's lucky that parent_realize and parent_phases are not initialized or used until now, so just remove the definitions. They can be added back when really necessary. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20250606092406.229833-6-zhenzhong.duan@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-06-10hw/gpio/aspeed: Fix definition of AspeedGPIOClassZhenzhong Duan1-1/+1
AspeedGPIOClass's parent is SysBusDeviceClass rather than SysBusDevice. This isn't catastrophic only because sizeof(SysBusDevice) > sizeof(SysBusDeviceClass). Fixes: 4b7f956862dc ("hw/gpio: Add basic Aspeed GPIO model for AST2400 and AST2500") Closes: https://lists.gnu.org/archive/html/qemu-devel/2025-06/msg00586.html Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-ID: <20250606092406.229833-4-zhenzhong.duan@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-06-10hw/virtio/virtio-pmem: Fix definition of VirtIOPMEMClassZhenzhong Duan1-1/+1
VirtIOPMEMClass's parent is VirtioDeviceClass rather than VirtIODevice. This isn't catastrophic only because sizeof(VirtIODevice) > sizeof(VirtioDeviceClass). Fixes: 5f503cd9f388 ("virtio-pmem: add virtio device") Closes: https://lists.gnu.org/archive/html/qemu-devel/2025-06/msg00586.html Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-ID: <20250606092406.229833-3-zhenzhong.duan@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-06-10hw/virtio/virtio-mem: Fix definition of VirtIOMEMClassZhenzhong Duan1-1/+1
Parent of VirtIOMEMClass is VirtioDeviceClass rather than VirtIODevice. This isn't catastrophic only because sizeof(VirtIODevice) > sizeof(VirtioDeviceClass). Fixes: 910b25766b33 ("virtio-mem: Paravirtualized memory hot(un)plug") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20250606092406.229833-2-zhenzhong.duan@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-06-10hw/core/cpu: Move CacheType to general cpu.hZhao Liu1-0/+6
I386 has already defined cache types in target/i386/cpu.h. Move CacheType to hw/core/cpu.h, so that ARM and other architectures could use it. Cc: Alireza Sanaee <alireza.sanaee@huawei.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20250605132722.3597593-1-zhao1.liu@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-06-10accel/hvf: Fix TYPE_HVF_ACCEL instance sizePhilippe Mathieu-Daudé1-0/+1
Fixes: c97d6d2cdf9 ("i386: hvf: add code base from Google repo") Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250606164418.98655-7-philmd@linaro.org>
2025-06-10hw/core/resetcontainer: Consolidate OBJECT_DECLARE_SIMPLE_TYPEZhao Liu1-1/+1
The QOM type of ResettableContainer is defined by OBJECT_DEFINE_SIMPLE_TYPE_WITH_INTERFACES, which means it doesn't need the class! Therefore, use OBJECT_DECLARE_SIMPLE_TYPE to declare the type, then there's no need for class definition. Cc: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20250514084957.2221975-8-zhao1.liu@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-06-10hw/loongarch/virt: Remove global variables about memmap tablesBibo Mao2-3/+2
Global variables memmap_table and memmap_entries stores UEFI memory map table informations. It can be moved into structure LoongArchVirtMachineState. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Song Gao <gaosong@loongson.cn> Message-Id: <20250430094738.1556670-3-maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn>
2025-06-10hw/loongarch/virt: Remove global variables about initrdBibo Mao1-0/+2
Global variables initrd_offset and initrd_size records loading information about initrd, it can be moved to structure loongarch_boot_info. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20250430094738.1556670-2-maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn>
2025-06-10hw/intc/loongarch_extioi: Fix typo issue about register EXTIOI_COREISR_ENDBibo Mao1-1/+1
Interrupt controller extioi supports 256 vectors, register EXTIOI_COREISR records pending interrupt status with bitmap method. Size of EXTIOI_COREISR is 256 / 8 = 0x20 bytes, EXTIOI_COREISR_END should be EXTIOI_COREISR_START + 0x20 rather than 0xB20. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Song Gao <gaosong@loongson.cn> Message-Id: <20250605092848.1550985-1-maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn>
2025-06-07include/gdbstub: fix include guard in commands.hAlex Bennée1-1/+1
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-ID: <20250603110204.838117-15-alex.bennee@linaro.org>
2025-06-07include/exec: fix assert in size_memopAlex Bennée1-2/+2
We can handle larger sized memops now, expand the range of the assert. Fixes: 4b473e0c60 (tcg: Expand MO_SIZE to 3 bits) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-ID: <20250603110204.838117-14-alex.bennee@linaro.org>
2025-06-06Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingStefan Hajnoczi7-20/+98
* futex: support Windows * qemu-thread: Avoid futex abstraction for non-Linux * migration, hw/display/apple-gfx: replace QemuSemaphore with QemuEvent * rust: bindings for Error * hpet, rust/hpet: return errors from realize if properties are incorrect * rust/hpet: Drop BqlCell wrapper for num_timers * target/i386: Emulate ftz and denormal flag bits correctly * i386/kvm: Prefault memory on page state change # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmhC4AgUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroP09wf+K9e0TaaZRxTsw7WU9pXsDoYPzTLd # F5CkBZPY770X1JW75f8Xw5qKczI0t6s26eFK1NUZxYiDVWzW/lZT6hreCUQSwzoS # b0wlAgPW+bV5dKlKI2wvnadrgDvroj4p560TS+bmRftiu2P0ugkHHtIJNIQ+byUQ # sWdhKlUqdOXakMrC4H4wDyIgRbK4CLsRMbnBHBUENwNJYJm39bwlicybbagpUxzt # w4mgjbMab0jbAd2hVq8n+A+1sKjrroqOtrhQLzEuMZ0VAwocwuP2Adm6gBu9kdHV # tpa8RLopninax3pWVUHnypHX780jkZ8E7zk9ohaaK36NnWTF4W/Z41EOLw== # =Vs6V # -----END PGP SIGNATURE----- # gpg: Signature made Fri 06 Jun 2025 08:33:12 EDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (31 commits) tests/tcg/x86_64/fma: add test for exact-denormal output target/i386: Wire up MXCSR.DE and FPUS.DE correctly target/i386: Use correct type for get_float_exception_flags() values target/i386: Detect flush-to-zero after rounding hw/display/apple-gfx: Replace QemuSemaphore with QemuEvent migration/postcopy: Replace QemuSemaphore with QemuEvent migration/colo: Replace QemuSemaphore with QemuEvent migration: Replace QemuSemaphore with QemuEvent qemu-thread: Document QemuEvent qemu-thread: Use futex if available for QemuLockCnt qemu-thread: Use futex for QemuEvent on Windows qemu-thread: Avoid futex abstraction for non-Linux qemu-thread: Replace __linux__ with CONFIG_LINUX futex: Support Windows futex: Check value after qemu_futex_wait() i386/kvm: Prefault memory on page state change rust: make TryFrom macro more resilient docs: update Rust module status rust/hpet: Drop BqlCell wrapper for num_timers rust/hpet: return errors from realize if properties are incorrect ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-06-06qemu-thread: Document QemuEventAkihiko Odaki1-0/+10
Document QemuEvent to help choose an appropriate synchronization primitive. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Link: https://lore.kernel.org/r/20250529-event-v5-12-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-06qemu-thread: Use futex if available for QemuLockCntAkihiko Odaki1-1/+1
This unlocks the futex-based implementation of QemuLockCnt to Windows. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Link: https://lore.kernel.org/r/20250529-event-v5-6-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-06qemu-thread: Use futex for QemuEvent on WindowsAkihiko Odaki3-16/+10
Use the futex-based implementation of QemuEvent on Windows to remove code duplication and remove the overhead of event object construction and destruction. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250526-event-v4-6-5b784cc8e1de@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-06qemu-thread: Replace __linux__ with CONFIG_LINUXAkihiko Odaki1-1/+1
scripts/checkpatch.pl warns for __linux__ saying "architecture specific defines should be avoided". Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250526-event-v4-4-5b784cc8e1de@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-06futex: Support WindowsAkihiko Odaki1-12/+41
Windows supports futex-like APIs since Windows 8 and Windows Server 2012. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Link: https://lore.kernel.org/r/20250529-event-v5-2-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-06futex: Check value after qemu_futex_wait()Akihiko Odaki1-0/+9
futex(2) - Linux manual page https://man7.org/linux/man-pages/man2/futex.2.html > Note that a wake-up can also be caused by common futex usage patterns > in unrelated code that happened to have previously used the futex > word's memory location (e.g., typical futex-based implementations of > Pthreads mutexes can cause this under some conditions). Therefore, > callers should always conservatively assume that a return value of 0 > can mean a spurious wake-up, and use the futex word's value (i.e., > the user-space synchronization scheme) to decide whether to continue > to block or not. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Link: https://lore.kernel.org/r/20250529-event-v5-1-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-06i386/kvm: Prefault memory on page state changeTom Lendacky1-0/+1
A page state change is typically followed by an access of the page(s) and results in another VMEXIT in order to map the page into the nested page table. Depending on the size of page state change request, this can generate a number of additional VMEXITs. For example, under SNP, when Linux is utilizing lazy memory acceptance, memory is typically accepted in 4M chunks. A page state change request is submitted to mark the pages as private, followed by validation of the memory. Since the guest_memfd currently only supports 4K pages, each page validation will result in VMEXIT to map the page, resulting in 1024 additional exits. When performing a page state change, invoke KVM_PRE_FAULT_MEMORY for the size of the page state change in order to pre-map the pages and avoid the additional VMEXITs. This helps speed up boot times. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Link: https://lore.kernel.org/r/f5411c42340bd2f5c14972551edb4e959995e42b.1743193824.git.thomas.lendacky@amd.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-05util/error: make func optionalPaolo Bonzini1-0/+2
The function name is not available in Rust, so make it optional. Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-05util/error: allow non-NUL-terminated err->srcPaolo Bonzini1-1/+8
Rust makes the current file available as a statically-allocated string, but without a NUL terminator. Allow this by storing an optional maximum length in the Error. Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-05util/error: expose Error definition to Rust codePaolo Bonzini1-0/+26
This is used to preserve the file and line in a roundtrip from C Error to Rust and back to C. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-06-05Merge tag 'pull-vfio-20250605' of https://github.com/legoater/qemu into stagingStefan Hajnoczi5-15/+174
vfio queue: * Fixed OpRegion detection in IGD * Added prerequisite rework for IOMMU nesting support * Added prerequisite rework for vfio-user * Added prerequisite rework for VFIO live update * Modified memory_get_xlat_addr() to return a MemoryRegion # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmhBWBgACgkQUaNDx8/7 # 7KEg8RAAyNuFzKs1dUHc24QeApjnd56PF3DhmjT++19hh2VH/CpKeJkWnNuWQupo # 7yqQmYuxYMrHrh0ncdv5S+sIU2fHGMuxsL4d129H3/BPaAr92Zgtk2ID5deFTG9c # Ns5sC/1Z6UyRqgh5PRxDmkfMVxyJ73dofTWyAQGNwwt5ASV876JEApMSO4smGpyy # cu0tpya6WVaYp/Ry2MjpK1N6utr1pJgzIVWQ2ww595OtaqQMa9OD5Sepafp5kf+y # ZqihINpMY9eGuu4olDQYcaUKThH0DAWR4Eb6ndgG9gOSh0M2YI0YygvG9q9giQzA # WXlmM2e9ZVAULl2Y8Eb4PVybyk3U9eDK3MzI9PzKBLNdROjJNwNK9ahjtFgPWN9H # cIYnBEnTP2d1e4BOtJIoQRXdDFOQHqzzEPwFhqMLEnzu1beVRnnt8SiYPKV/pEO0 # ZEAzWka7WN27DDoqgSNzc8ptIzbM6yO66dvLwOhXyr+WyqVaiehxhvfZiEbpeIWa # 6LuCnyJkgEcAX7I7BaqZxAVvBqwR0z0TElfxadAj6YXgjVEUTahaBV+6M7bBDoid # BlXTFBrdhlTOjrzV0LkZe9ac9VbxPc9fW/uGoYntD0cRsuWqpDpgNoDlmHDjVudz # b4TCVksIsfrVkNqQclXfYuSNMZV0KwBADD1wVqZ42nyx1KcgqMQ= # =tHwb # -----END PGP SIGNATURE----- # gpg: Signature made Thu 05 Jun 2025 04:40:56 EDT # gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1 # gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full] # gpg: aka "Cédric Le Goater <clg@kaod.org>" [full] # Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1 * tag 'pull-vfio-20250605' of https://github.com/legoater/qemu: vfio: move vfio-cpr.h vfio: vfio_find_ram_discard_listener MAINTAINERS: Add reviewer for CPR vfio/iommufd: Save vendor specific device info vfio/iommufd: Implement [at|de]tach_hwpt handlers vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFD backends/iommufd: Add a helper to invalidate user-managed HWPT vfio/container: pass MemoryRegion to DMA operations vfio: return mr from vfio_get_xlat_addr vfio/igd: Fix incorrect error propagation in vfio_pci_igd_opregion_detect() vfio/iommufd: Add comment emphasizing no movement of hiod->realize() call vfio: refactor out IRQ signalling setup vfio: move config space read into vfio_pci_config_setup() vfio: move more cleanup into vfio_pci_put_device() vfio: add more VFIOIOMMUClass docs vfio/igd: OpRegion not found fix error typo Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-06-05vfio: move vfio-cpr.hSteve Sistare1-0/+18
Move vfio-cpr.h to include/hw/vfio, because it will need to be included by other files there. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/1748546679-154091-9-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05vfio: vfio_find_ram_discard_listenerSteve Sistare1-0/+3
Define vfio_find_ram_discard_listener as a subroutine so additional calls to it may be added in a subsequent patch. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/1748546679-154091-8-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05vfio/iommufd: Save vendor specific device infoZhenzhong Duan1-0/+15
Some device information returned by ioctl(IOMMU_GET_HW_INFO) are vendor specific. Save them as raw data in a union supporting different vendors, then vendor IOMMU can query the raw data with its fixed format for capability directly. Because IOMMU_GET_HW_INFO is only supported in linux, so declare those capability related structures with CONFIG_LINUX. Suggested-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250604062115.4004200-5-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05vfio/iommufd: Add properties and handlers to TYPE_HOST_IOMMU_DEVICE_IOMMUFDZhenzhong Duan1-0/+50
Enhance HostIOMMUDeviceIOMMUFD object with 3 new members, specific to the iommufd BE + 2 new class functions. IOMMUFD BE includes IOMMUFD handle, devid and hwpt_id. IOMMUFD handle and devid are used to allocate/free ioas and hwpt. hwpt_id is used to re-attach IOMMUFD backed device to its default VFIO sub-system created hwpt, i.e., when vIOMMU is disabled by guest. These properties are initialized in hiod::realize() after attachment. 2 new class functions are [at|de]tach_hwpt(). They are used to attach/detach hwpt. VFIO and VDPA can have different implementions, so implementation will be in sub-class instead of HostIOMMUDeviceIOMMUFD, e.g., in HostIOMMUDeviceIOMMUFDVFIO. Add two wrappers host_iommu_device_iommufd_[at|de]tach_hwpt to wrap the two functions. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250604062115.4004200-3-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05backends/iommufd: Add a helper to invalidate user-managed HWPTZhenzhong Duan1-0/+4
This helper passes cache invalidation request from guest to invalidate stage-1 page table cache in host hardware. Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250604062115.4004200-2-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05vfio/container: pass MemoryRegion to DMA operationsJohn Levon1-4/+5
Pass through the MemoryRegion to DMA operation handlers of vfio containers. The vfio-user container will need this later, to translate the vaddr into an offset for the dma map vfio-user message; CPR will also will need this. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/qemu-devel/20250521215534.2688540-1-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05vfio: return mr from vfio_get_xlat_addrSteve Sistare1-10/+9
Modify memory_get_xlat_addr and vfio_get_xlat_addr to return the memory region that the translated address is found in. This will be needed by CPR in a subsequent patch to map blocks using IOMMU_IOAS_MAP_FILE. Also return the xlat offset, so we can simplify the interface by removing the out parameters that can be trivially derived from mr and xlat. Lastly, rename the functions to to memory_translate_iotlb() and vfio_translate_iotlb(). Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: John Levon <john.levon@nutanix.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/1747661203-136490-1-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-05vfio: add more VFIOIOMMUClass docsJohn Levon1-3/+72
Add some additional doc comments for these class methods. Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250520162530.2194548-1-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-06-04block: mark bdrv_drained_begin() and friends as GRAPH_UNLOCKEDFiona Ebner2-3/+3
All of bdrv_drain_all_begin(), bdrv_drain_all() and bdrv_drained_begin() poll and are not allowed to be called with the block graph lock held. Mark the function as such. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-20-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of quorum_del_child()Fiona Ebner1-0/+7
The quorum_del_child() callback runs under the graph lock, so it is not allowed to drain. It is only called as the .bdrv_del_child() callback, which is only called in the bdrv_del_child() function, which also runs under the graph lock. The bdrv_del_child() function is called by qmp_x_blockdev_change(). A drained section was already introduced there by commit "block: move drain out of quorum_add_child()". This finally finishes moving out the drain to places that are not under the graph lock started in "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-17-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of quorum_add_child()Fiona Ebner1-0/+7
This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". The quorum_add_child() callback runs under the graph lock, so it is not allowed to drain. It is only called as the .bdrv_add_child() callback, which is only called in the bdrv_add_child() function, which also runs under the graph lock. The bdrv_add_child() function is called by qmp_x_blockdev_change(), where a drained section is introduced. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-15-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of bdrv_root_attach_child()Fiona Ebner1-0/+2
This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". The function bdrv_root_attach_child() runs under the graph lock, so it is not allowed to drain. It is called by: 1. blk_insert_bs(), where a drained section is introduced. 2. block_job_add_bdrv(), which holds the graph lock itself. block_job_add_bdrv() is called by: 1. mirror_start_job() 2. stream_start() 3. commit_start() 4. backup_job_create() 5. block_job_create() 6. In the test_blockjob_common_drain_node() unit test In all callers, a drained section is introduced. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250530151125.955508-13-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of bdrv_try_change_aio_context()Fiona Ebner1-2/+6
This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". Convert the function to a _locked() version that has to be called with the graph lock held and add a convenience wrapper that has to be called with the graph unlocked, which drains and takes the lock itself. Since bdrv_try_change_aio_context() is global state code, the wrapper is too. Callers are adapted to use the appropriate variant, depending on whether the caller already holds the lock. In the test_set_aio_context() unit test, prior drains can be removed, because draining already happens inside the new wrapper. Note that bdrv_attach_child_common_abort(), bdrv_attach_child_common() and bdrv_root_unref_child() hold the graph lock and are not actually allowed to drain either. This will be addressed in the following commits. Functions like qmp_blockdev_mirror() query the nodes to act on before draining and locking. In theory, draining could invalidate those nodes. This kind of issue is not addressed by these commits. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250530151125.955508-10-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of bdrv_change_aio_context() and mark GRAPH_RDLOCKFiona Ebner1-0/+12
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED. Note that even if bdrv_drained_begin() were already marked as GRAPH_UNLOCKED, TSA would not complain about the instance in bdrv_change_aio_context() before this change, because it is preceded by a bdrv_graph_rdunlock_main_loop() call. It is not correct to release the lock here, and in case the caller holds a write lock, it wouldn't actually release the lock. In combination with block-stream, there is a deadlock that can happen because of this [0]. In particular, it can happen that main thread IO thread 1. acquires write lock in blk_co_do_preadv_part(): 2. have non-zero blk->in_flight 3. try to acquire read lock 4. begin drain Steps 3 and 4 might be switched. Draining will poll and get stuck, because it will see the non-zero in_flight counter. But the IO thread will not make any progress either, because it cannot acquire the read lock. After this change, all paths to bdrv_change_aio_context() drain: bdrv_change_aio_context() is called by: 1. bdrv_child_cb_change_aio_ctx() which is only called via the change_aio_ctx() callback, see below. 2. bdrv_child_change_aio_context(), see below. 3. bdrv_try_change_aio_context(), where a drained section is introduced. The change_aio_ctx() callback is called by: 1. bdrv_attach_child_common_abort(), where a drained section is introduced. 2. bdrv_attach_child_common(), where a drained section is introduced. 3. bdrv_parent_change_aio_context(), see below. bdrv_child_change_aio_context() is called by: 1. bdrv_change_aio_context(), i.e. recursive, so being in a drained section is invariant. 2. child_job_change_aio_ctx(), which is only called via the change_aio_ctx() callback, see above. bdrv_parent_change_aio_context() is called by: 1. bdrv_change_aio_context(), i.e. recursive, so being in a drained section is invariant. This resolves all code paths. Note that bdrv_attach_child_common() and bdrv_attach_child_common_abort() hold the graph write lock and callers of bdrv_try_change_aio_context() might too, so they are not actually allowed to drain either. This will be addressed in the following commits. More granular draining is not trivially possible, because bdrv_change_aio_context() can recursively call itself e.g. via bdrv_child_change_aio_context(). [0]: https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63f51@virtuozzo.com/ Reported-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250530151125.955508-9-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: mark bdrv_child_change_aio_context() GRAPH_RDLOCKFiona Ebner1-3/+4
This is a small step in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED. More concretely, it is in preparation to move the drain out of bdrv_change_aio_context() and marking that function as GRAPH_RDLOCK. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250530151125.955508-8-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR)Fiona Ebner1-3/+3
This is a small step in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED. More concretely, it is in preparation to move the drain out of bdrv_change_aio_context() and marking that function as GRAPH_RDLOCK. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250530151125.955508-7-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-02Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi6-2/+359
into staging virtio,pci,pc: features, fixes, tests vhost will now no longer set a call notifier if unused some work towards loongarch testing based on bios-tables-test some core pci work for SVM support in vtd vhost vdpa init has been optimized for response time to QMP A couple more fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmg97ZUPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpRBsH/0Fx4NNMaynXmVOgV1rMFirTydhQG5NSdeJv # i1RHd25Rne/RXH0CL71UPuOPADWh6bv9iZTg6RU6g7TwI8K9v3M0R71RlPLh1Lh1 # x7fifWNSNXVi18fM9/j+mIg7I2Ye0AaqveezRJWGzqoOxQKKlVI2xspKZBCCkygd # i2tgtR1ORB6+ji6wVoTDPlL42X5Jef5MUT3XOcRR5biHm0JfqxxQKVM83mD+5yMI # 0YqjT2BVRzo5rGN7mSuf7tQ50xI6I0wI1+eoWeKHRbg08f709M8TZRDKuVh24Evg # 9WnIhKLTzRVdCNLNbw9h9EhxoANpWCyvmnn6GCfkJui40necFHY= # =0lO6 # -----END PGP SIGNATURE----- # gpg: Signature made Mon 02 Jun 2025 14:29:41 EDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (26 commits) hw/i386/pc_piix: Fix RTC ISA IRQ wiring of isapc machine vdpa: move memory listener register to vhost_vdpa_init vdpa: move iova_tree allocation to net_vhost_vdpa_init vdpa: reorder listener assignment vdpa: add listener_registered vdpa: set backend capabilities at vhost_vdpa_init vdpa: reorder vhost_vdpa_set_backend_cap vdpa: check for iova tree initialized at net_client_start vhost: Don't set vring call if guest notifier is unused tests/qtest/bios-tables-test: Use MiB macro rather hardcode value tests/data/uefi-boot-images: Add ISO image for LoongArch system uefi-test-tools:: Add LoongArch64 support pci: Add a PCI-level API for PRI pci: Add a pci-level API for ATS pci: Add a pci-level initialization function for IOMMU notifiers memory: Store user data pointer in the IOMMU notifiers pci: Add an API to get IOMMU's min page size and virtual address width pci: Cache the bus mastering status in the device pcie: Helper functions to check to check if PRI is enabled pcie: Add a helper to declare the PRI capability for a pcie device ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-06-02vdpa: move iova_tree allocation to net_vhost_vdpa_initEugenio Pérez1-1/+15
As we are moving to keep the mapping through all the vdpa device life instead of resetting it at VirtIO reset, we need to move all its dependencies to the initialization too. In particular devices with x-svq=on need a valid iova_tree from the beginning. Simplify the code also consolidating the two creation points: the first data vq in case of SVQ active and CVQ start in case only CVQ uses it. Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Suggested-by: Si-Wei Liu <si-wei.liu@oracle.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250522145839.59974-7-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-06-02vdpa: add listener_registeredEugenio Pérez1-0/+6
Check if the listener has been registered or not, so it needs to be registered again at start. Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250522145839.59974-5-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-06-02vhost: Don't set vring call if guest notifier is unusedHuaitong Han1-0/+1
The vring call fd is set even when the guest does not use MSI-X (e.g., in the case of virtio PMD), leading to unnecessary CPU overhead for processing interrupts. The commit 96a3d98d2c("vhost: don't set vring call if no vector") optimized the case where MSI-X is enabled but the queue vector is unset. However, there's an additional case where the guest uses INTx and the INTx_DISABLED bit in the PCI config is set, meaning that no interrupt notifier will actually be used. In such cases, the vring call fd should also be cleared to avoid redundant interrupt handling. Fixes: 96a3d98d2c("vhost: don't set vring call if no vector") Reported-by: Zhiyuan Yuan <yuanzhiyuan@chinatelecom.cn> Signed-off-by: Jidong Xia <xiajd@chinatelecom.cn> Signed-off-by: Huaitong Han <hanht2@chinatelecom.cn> Message-Id: <20250522100548.212740-1-hanht2@chinatelecom.cn> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>