aboutsummaryrefslogtreecommitdiff
path: root/hw/vfio
AgeCommit message (Collapse)AuthorFilesLines
2025-03-11vfio/igd: Do not include GTT stolen size in etc/igd-bdsm-sizeTomita Moeko1-25/+3
Though GTT Stolen Memory (GSM) is right below Data Stolen Memory (DSM) in host address space, direct access to GSM is prohibited, and it is not mapped to guest address space. Both host and guest accesses GSM indirectly through the second half of MMIO BAR0 (GTTMMADR). Guest firmware only need to reserve a memory region for DSM and program the BDSM register with the base address of that region, that's actually what both SeaBIOS[1] and IgdAssignmentDxe does now. [1] https://gitlab.com/qemu-project/seabios/-/blob/1.12-stable/src/fw/pciinit.c#L319-332 Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-3-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11vfio/igd: Remove GTT write quirk in IO BAR 4Tomita Moeko1-190/+1
The IO BAR4 of IGD devices contains a pair of 32-bit address/data registers, MMIO_Index (0x0) and MMIO_Data (0x4), which provide access to the MMIO BAR0 (GTTMMADR) from IO space. These registers are probably only used by the VBIOS, and are not documented by intel. The observed layout of MMIO_Index register is: 31 2 1 0 +-------------------------------------------------------------------+ | Offset | Rsvd | Sel | +-------------------------------------------------------------------+ - Offset: Byte offset in specified region, 4-byte aligned. - Sel: Region selector 0: MMIO register region (first half of MMIO BAR0) 1: GTT region (second half of MMIO BAR0). Pre Gen11 only. Currently, QEMU implements a quirk that adjusts the guest Data Stolen Memory (DSM) region address to be (addr - host BDSM + guest BDSM) when programming GTT entries via IO BAR4, assuming guest still programs GTT with host DSM address, which is not the case. Guest's BDSM register is emulated and initialized to 0 at startup by QEMU, then SeaBIOS programs its value[1]. As result, the address programmed to GTT entries by VBIOS running in guest are valid GPA, and this unnecessary adjustment brings inconsistency. [1] https://gitlab.com/qemu-project/seabios/-/blob/1.12-stable/src/fw/pciinit.c#L319-332 Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-2-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-07Merge tag 'pull-vfio-20250306' of https://github.com/legoater/qemu into stagingStefan Hajnoczi12-58/+1021
vfio queue: * Added property documentation * Added Minor fixes * Implemented basic PCI PM capability backing * Promoted new IGD maintainer * Deprecated vfio-plaform * Extended VFIO migration with multifd support # -----BEGIN PGP SIGNATURE----- # # iQIyBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmfJrZoACgkQUaNDx8/7 # 7KFE2A/0Dmief9u/dDJIKGIDa+iawcf4hu8iX4v5pB0DlGniT3rgK8WMGnhDpPxq # Q4wsKfo+JJ2q6msInrT7Ckqyydu9nQztI3vwmfMuWxLhTMyH28K96ptwPqIZBjOx # rPTEXfnVX4W3tpn1+48S+vefWVa/gkBkIvv7RpK18rMBXv1kDeyOvc/d2dbAt7ft # zJc4f8gH3jfQzGwmnYVZU1yPrZN7p6zhYR/AD3RQOY97swgZIEyYxXhOuTPiCuEC # zC+2AMKi9nmnCG6x/mnk7l2yJXSlv7lJdqcjYZhJ9EOIYfiUGTREYIgQbARcafE/ # 4KSg2QR35BoUd4YrmEWxXJCRf3XnyWXDY36dDKVhC0OHng1F/U44HuL4QxwoTIay # s1SP/DHcvDiPAewVTvdgt7Iwfn9xGhcQO2pkrxBoNLB5JYwW+R6mG7WXeDv1o3GT # QosTu1fXZezQqFd4v6+q5iRNS2KtBZLTspwAmVdywEFUs+ZLBRlC+bodYlinZw6B # Yl/z0LfAEh4J55QmX2espbp8MH1+mALuW2H2tgSGSrTBX1nwxZFI5veFzPepgF2S # eTx69BMjiNMwzIjq1T7e9NpDCceiW0fXDu7IK1MzYhqg1nM9lX9AidhFTeiF2DB2 # EPb3ljy/8fyxcPKa1T9X47hQaSjbMwofaO8Snoh0q0jokY246Q== # =hIBw # -----END PGP SIGNATURE----- # gpg: Signature made Thu 06 Mar 2025 22:13:46 HKT # gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1 # gpg: Good signature from "Cédric Le Goater <clg@redhat.com>" [full] # gpg: aka "Cédric Le Goater <clg@kaod.org>" [full] # Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1 * tag 'pull-vfio-20250306' of https://github.com/legoater/qemu: (42 commits) hw/core/machine: Add compat for x-migration-multifd-transfer VFIO property vfio/migration: Make x-migration-multifd-transfer VFIO property mutable vfio/migration: Add x-migration-multifd-transfer VFIO property vfio/migration: Multifd device state transfer support - send side vfio/migration: Multifd device state transfer support - config loading support migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile vfio/migration: Multifd device state transfer support - load thread vfio/migration: Multifd device state transfer support - received buffers queuing vfio/migration: Setup and cleanup multifd transfer in these general methods vfio/migration: Multifd setup/cleanup functions and associated VFIOMultifd vfio/migration: Multifd device state transfer - add support checking function vfio/migration: Multifd device state transfer support - basic types vfio/migration: Move migration channel flags to vfio-common.h header file vfio/migration: Add vfio_add_bytes_transferred() vfio/migration: Convert bytes_transferred counter to atomic vfio/migration: Add load_device_config_state_start trace event migration: Add save_live_complete_precopy_thread handler migration/multifd: Add multifd_device_state_supported() migration/multifd: Make MultiFDSendData a struct migration/multifd: Device state transfer support - send side ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-03-06qdev: Change values of PropertyInfo member @type to be QAPI typesMarkus Armbruster1-1/+1
PropertyInfo member @type is externally visible via QMP device-list-properties and qom-list-properies. Its meaning is not documented at its definition. It gets passed as @type argument to object_property_add() and object_class_property_add(). This argument's documentation isn't of much help, either: * @type: the type name of the property. This namespace is pretty loosely * defined. Sub namespaces are constructed by using a prefix and then * to angle brackets. For instance, the type 'virtio-net-pci' in the * 'link' namespace would be 'link<virtio-net-pci>'. The two QMP commands document it as # @type: the type of the property. This will typically come in one of # four forms: # # 1) A primitive type such as 'u8', 'u16', 'bool', 'str', or # 'double'. These types are mapped to the appropriate JSON # type. # # 2) A child type in the form 'child<subtype>' where subtype is a # qdev device type name. Child properties create the # composition tree. # # 3) A link type in the form 'link<subtype>' where subtype is a # qdev device type name. Link properties form the device model # graph. "Typically come in one of four forms" followed by three items inspires the level of trust that is appropriate here. Clean up a bunch of funnies: * qdev_prop_fdc_drive_type.type is "FdcDriveType". Its .enum_table refers to QAPI type "FloppyDriveType". So use that. * qdev_prop_reserved_region is "reserved_region". Its only user is an array property called "reserved-regions". Its .set() visits str. So change @type to "str". * trng_prop_fault_event_set.type is "uint32:bits". Its .set() visits uint32, so change @type to "uint32". If we believe mentioning it's actually bits is useful, the proper place would be .description. * ccw_loadparm.type is "ccw_loadparm". It's users are properties called "loadparm". Its .set() visits str. So change @type to "str". * qdev_prop_nv_gpudirect_clique.type is "uint4". Its set() visits uint8, so change @type to "uint8". If we believe mentioning the range is useful, the proper place would be .description. * s390_pci_fid_propinfo.type is "zpci_fid". Its .set() visits uint32. So change type to that, and move the "zpci_fid" to .description. This is admittedly a lousy description, but it's still an improvement; for instance, output of -device zpci,help changes from fid=<zpci_fid> to fid=<uint32> - zpci_fid * Similarly for a raft of PropertyInfo in target/riscv/cpu.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20250227085601.4140852-5-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [Commit message typo fixed]
2025-03-06qdev: Rename PropertyInfo member @name to @typeMarkus Armbruster1-1/+1
PropertyInfo member @name becomes ObjectProperty member @type, while Property member @name becomes ObjectProperty member @name. Rename the former. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20250227085601.4140852-4-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [One missed instance of @type fixed]
2025-03-06vfio/migration: Make x-migration-multifd-transfer VFIO property mutableMaciej S. Szmigiero2-3/+21
DEFINE_PROP_ON_OFF_AUTO() property isn't runtime-mutable so using it would mean that the source VM would need to decide upfront at startup time whether it wants to do a multifd device state transfer at some point. Source VM can run for a long time before being migrated so it is desirable to have a fallback mechanism to the old way of transferring VFIO device state if it turns to be necessary. This brings this property to the same mutability level as ordinary migration parameters, which too can be adjusted at the run time. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/f2f2d66bda477da3e6cb8c0311006cff36e8651d.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Add x-migration-multifd-transfer VFIO propertyMaciej S. Szmigiero2-1/+24
This property allows configuring whether to transfer the particular device state via multifd channels when live migrating that device. It defaults to AUTO, which means that VFIO device state transfer via multifd channels is attempted in configurations that otherwise support it. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/d6dbb326e3d53c7104d62c96c9e3dd64e1c7b940.1741124640.git.maciej.szmigiero@oracle.com [ clg: Added documentation ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Multifd device state transfer support - send sideMaciej S. Szmigiero4-6/+166
Implement the multifd device state transfer via additional per-device thread inside save_live_complete_precopy_thread handler. Switch between doing the data transfer in the new handler and doing it in the old save_state handler depending if VFIO multifd transfer is enabled or not. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/4d727e2e0435e0022d50004e474077632830e08d.1741124640.git.maciej.szmigiero@oracle.com [ clg: - Reordered savevm_vfio_handlers - Updated save_live_complete_precopy* documentation ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Multifd device state transfer support - config loading supportMaciej S. Szmigiero2-3/+55
Load device config received via multifd using the existing machinery behind vfio_load_device_config_state(). Also, make sure to process the relevant main migration channel flags. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/5dbd3f3703ec1097da2cf82a7262233452146fee.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Multifd device state transfer support - load threadMaciej S. Szmigiero4-0/+247
Add a thread which loads the VFIO device state buffers that were received via multifd. Each VFIO device that has multifd device state transfer enabled has one such thread, which is created using migration core API qemu_loadvm_start_load_thread(). Since it's important to finish loading device state transferred via the main migration channel (via save_live_iterate SaveVMHandler) before starting loading the data asynchronously transferred via multifd the thread doing the actual loading of the multifd transferred data is only started from switchover_start SaveVMHandler. switchover_start handler is called when MIG_CMD_SWITCHOVER_START sub-command of QEMU_VM_COMMAND is received via the main migration channel. This sub-command is only sent after all save_live_iterate data have already been posted so it is safe to commence loading of the multifd-transferred device state upon receiving it - loading of save_live_iterate data happens synchronously in the main migration thread (much like the processing of MIG_CMD_SWITCHOVER_START) so by the time MIG_CMD_SWITCHOVER_START is processed all the proceeding data must have already been loaded. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/9abe612d775aaf42e31646796acd2363c723a57a.1741124640.git.maciej.szmigiero@oracle.com [ clg: - Reordered savevm_vfio_handlers - Added switchover_start documentation ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Multifd device state transfer support - received buffers queuingMaciej S. Szmigiero4-0/+171
The multifd received data needs to be reassembled since device state packets sent via different multifd channels can arrive out-of-order. Therefore, each VFIO device state packet carries a header indicating its position in the stream. The raw device state data is saved into a VFIOStateBuffer for later in-order loading into the device. The last such VFIO device state packet should have VFIO_DEVICE_STATE_CONFIG_STATE flag set and carry the device config state. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/e3bff515a8d61c582b94b409eb12a45b1a143a69.1741124640.git.maciej.szmigiero@oracle.com [ clg: - Reordered savevm_vfio_handlers - Added load_state_buffer documentation ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Setup and cleanup multifd transfer in these general methodsMaciej S. Szmigiero1-2/+22
Wire VFIO multifd transfer specific setup and cleanup functions into general VFIO load/save setup and cleanup methods. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/b1f864a65fafd4fdab1f89230df52e46ae41f2ac.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Multifd setup/cleanup functions and associated VFIOMultifdMaciej S. Szmigiero2-0/+48
Add multifd setup/cleanup functions and an associated VFIOMultifd data structure that will contain most of the receive-side data together with its init/cleanup methods. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/c0520523053b1087787152ddf2163257d3030be0.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Multifd device state transfer - add support checking functionMaciej S. Szmigiero2-0/+8
Add vfio_multifd_transfer_supported() function that tells whether the multifd device state transfer is supported. Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/8ce50256f341b3d47342bb217cb5fbb2deb14639.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Multifd device state transfer support - basic typesMaciej S. Szmigiero4-0/+52
Add basic types and flags used by VFIO multifd device state transfer support. Since we'll be introducing a lot of multifd transfer specific code, add a new file migration-multifd.c to home it, wired into main VFIO migration code (migration.c) via migration-multifd.h header file. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/4eedd529e6617f80f3d6a66d7268a0db2bc173fa.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Move migration channel flags to vfio-common.h header fileMaciej S. Szmigiero1-17/+0
This way they can also be referenced in other translation units than migration.c. Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/26a940f6b22c1b685818251b7a3ddbbca601b1d6.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Add vfio_add_bytes_transferred()Maciej S. Szmigiero1-1/+6
This way bytes_transferred can also be incremented in other translation units than migration.c. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/d1fbc27ac2417b49892f354ba20f6c6b3f7209f8.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Convert bytes_transferred counter to atomicMaciej S. Szmigiero1-4/+4
So it can be safety accessed from multiple threads. This variable type needs to be changed to unsigned long since 32-bit host platforms lack the necessary addition atomics on 64-bit variables. Using 32-bit counters on 32-bit host platforms should not be a problem in practice since they can't realistically address more memory anyway. Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/dc391771d2d9ad0f311994f0cb9e666da564aeaf.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/migration: Add load_device_config_state_start trace eventMaciej S. Szmigiero2-2/+5
And rename existing load_device_config_state trace event to load_device_config_state_end for consistency since it is triggered at the end of loading of the VFIO device config state. This way both the start and end points of particular device config loading operation (a long, BQL-serialized operation) are known. Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/1b6c5a2097e64c272eb7e53f9e4cca4b79581b38.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio-platform: Deprecate all forms of vfio-platform devicesEric Auger3-0/+5
As an outcome of KVM forum 2024 "vfio-platform: live and let die?" talk, let's deprecate vfio-platform devices. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250305124225.952791-1-eric.auger@redhat.com [ clg: Fixed spelling in vfio-amd-xgbe section ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06hw/vfio/pci: Re-order pre-resetAlex Williamson1-9/+9
We want the device in the D0 power state going into reset, but the config write can enable the BARs in the address space, which are then removed from the address space once we clear the memory enable bit in the command register. Re-order to clear the command bit first, so the power state change doesn't enable the BARs. Cc: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250225215237.3314011-6-alex.williamson@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/pci: Delete local pm_capAlex Williamson2-6/+4
This is now redundant to PCIDevice.pm_cap. Cc: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250225215237.3314011-4-alex.williamson@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06pci: Use PCI PM capability initializerAlex Williamson1-1/+6
Switch callers directly initializing the PCI PM capability with pci_add_capability() to use pci_pm_init(). Cc: Dmitry Fleytman <dmitry.fleytman@gmail.com> Cc: Akihiko Odaki <akihiko.odaki@daynix.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Stefan Weil <sw@weilnetz.de> Cc: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com> Cc: Keith Busch <kbusch@kernel.org> Cc: Klaus Jensen <its@irrelevant.dk> Cc: Jesper Devantier <foss@defmacro.it> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250225215237.3314011-3-alex.williamson@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio/ccw: Replace warn_once_pfch() with warn_report_once()Cédric Le Goater1-10/+2
Use the common helper warn_report_once() instead of implementing its own. Cc: Eric Farman <farman@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/qemu-devel/20250214161936.1720039-1-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06vfio: Add property documentationCédric Le Goater4-0/+173
Investigate the git history to uncover when and why the VFIO properties were introduced and update the models. This is mostly targeting vfio-pci device, since vfio-platform, vfio-ap and vfio-ccw devices are simpler. Sort the properties based on the QEMU version in which they were introduced. Cc: Tony Krowiak <akrowiak@linux.ibm.com> Cc: Eric Farman <farman@linux.ibm.com> Cc: Eric Auger <eric.auger@redhat.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Anthony Krowiak <akrowiak@linux.ibm.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> # vfio-ccw Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250217173455.449983-1-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-22Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi2-0/+2
into staging virtio,pc,pci: features, fixes, cleanups Features: SR-IOV emulation for pci virtio-mem-pci support for s390 interleave support for cxl big endian support for vdpa svq new QAPI events for vhost-user Also vIOMMU reset order fixups are in. Fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAme4b8sPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpHKcIAKPJsVqPdda2dJ7b7FdyRT0Q+uwezXqaGHd4 # 7Lzih1wsxYNkwIAyPtEb76/21qiS7BluqlUCfCB66R9xWjP5/KfvAFj4/r4AEduE # fxAgYzotNpv55zcRbcflMyvQ42WGiZZHC+o5Lp7vDXUP3pIyHrl0Ydh5WmcD+hwS # BjXvda58TirQpPJ7rUL+sSfLih17zQkkDcfv5/AgorDy1wK09RBKwMx/gq7wG8yJ # twy8eBY2CmfmFD7eTM+EKqBD2T0kwLEeLfS/F/tl5Fyg6lAiYgYtCbGLpAmWErsg # XZvfZmwqL7CNzWexGvPFnnLyqwC33WUP0k0kT88Y5wh3/h98blw= # =tej8 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 21 Feb 2025 20:21:31 HKT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (41 commits) docs/devel/reset: Document reset expectations for DMA and IOMMU hw/vfio/common: Add a trace point in vfio_reset_handler hw/arm/smmuv3: Move reset to exit phase hw/i386/intel-iommu: Migrate to 3-phase reset hw/virtio/virtio-iommu: Migrate to 3-phase reset vhost-user-snd: correct the calculation of config_size net: vhost-user: add QAPI events to report connection state hw/virtio/virtio-nsm: Respond with correct length vdpa: Fix endian bugs in shadow virtqueue MAINTAINERS: add more files to `vhost` cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0() vhost-iova-tree: Update documentation vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees vhost-iova-tree: Implement an IOVA-only tree amd_iommu: Use correct bitmask to set capability BAR amd_iommu: Use correct DTE field for interrupt passthrough hw/virtio: reset virtio balloon stats on machine reset mem/cxl_type3: support 3, 6, 12 and 16 interleave ways hw/mem/cxl_type3: Ensure errp is set on realization failure hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-02-21hw/vfio/common: Add a trace point in vfio_reset_handlerEric Auger2-0/+2
To ease the debug of reset sequence, let's add a trace point in vfio_reset_handler() Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20250218182737.76722-5-eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-02-16hw/vfio: Have VFIO_PLATFORM devices inherit from DYNAMIC_SYS_BUS_DEVICEPhilippe Mathieu-Daudé3-7/+1
Do not explain why VFIO_PLATFORM devices are user_creatable, have them inherit TYPE_DYNAMIC_SYS_BUS_DEVICE, to make explicit that they can optionally be plugged on TYPE_PLATFORM_BUS_DEVICE. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alexander Graf <graf@amazon.com> Message-Id: <20250125181343.59151-5-philmd@linaro.org>
2025-02-11vfio: Remove superfluous error report in vfio_listener_region_add()Cédric Le Goater1-6/+4
For pseries machines, commit 567b5b309abe ("vfio/pci: Relax DMA map errors for MMIO regions") introduced 2 error reports to notify the user of MMIO mapping errors. Consolidate both code paths under one. Cc: Harsh Prateek Bora <harshpb@linux.ibm.com> Cc: Shivaprasad G Bhat <sbhat@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250206131438.1505542-8-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio: Remove reports of DMA mapping errors in backendsCédric Le Goater1-2/+0
Currently, the mapping handlers of the IOMMU backends, VFIO IOMMU Type 1 aka. legacy and IOMMUFD, return an errno and also report an error. This can lead to excessive log messages at runtime for recurring DMA mapping errors. Since these errors are already reported by the callers in the vfio_container_dma_un/map() routines, simply remove them and allow the callers to handle the reporting. The mapping handler of the IOMMUFD backend has a comment suggesting MMIO region mapping failures return EFAULT. I am not sure this is entirely true, so keep the EFAULT case until the conditions are clarified. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250206131438.1505542-7-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio: Improve error reporting when MMIO region mapping failsCédric Le Goater1-1/+16
When the IOMMU address space width is smaller than the physical address width, a MMIO region of a device can fail to map because the region is outside the supported IOVA ranges of the VM. In this case, PCI peer-to-peer transactions on BARs are not supported. This can occur with the 39-bit IOMMU address space width, as can be the case on some Intel consumer processors, or when using a vIOMMU device with default settings. The current error message is unclear, improve it and also change the error report to a warning because it is a non fatal condition for the VM. To prevent excessive log messages, restrict these recurring DMA mapping errors to a single warning at runtime. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250206131438.1505542-6-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio: Introduce vfio_get_vfio_device()Cédric Le Goater1-0/+10
This helper will be useful in the listener handlers to extract the VFIO device from a memory region using memory_region_owner(). At the moment, we only care for PCI passthrough devices. If the need arises, we will add more. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250206131438.1505542-5-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio: Rephrase comment in vfio_listener_region_add() error pathCédric Le Goater1-5/+10
Rephrase a bit the ending comment about how errors are handled depending on the phase in which vfio_listener_region_add() is called. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250206131438.1505542-4-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio/pci: Replace "iommu_device" by "vIOMMU"Cédric Le Goater1-1/+1
This is to be consistent with other reported errors related to vIOMMU devices. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250206131438.1505542-3-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio/iommufd: Fix SIGSEV in iommufd_cdev_attach()Zhenzhong Duan1-2/+3
When iommufd_cdev_ram_block_discard_disable() fails for whatever reason, errp should be set or else SIGSEV is triggered in vfio_realize() when error_prepend() is called. By this chance, use the same error message for both legacy and iommufd backend. Fixes: 5ee3dc7af785 ("vfio/iommufd: Implement the iommufd backend") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20250116102307.260849-1-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio/igd: use VFIOConfigMirrorQuirk for mirrored registersTomita Moeko1-94/+31
With the introduction of config_offset field, VFIOConfigMirrorQuirk can now be used for those mirrored register in igd bar0. This eliminates the need for the macro intoduced in 1a2623b5c9e7 ("vfio/igd: add macro for declaring mirrored registers"). Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20250104154219.7209-4-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio/pci: introduce config_offset field in VFIOConfigMirrorQuirkTomita Moeko2-1/+4
Device may only expose a specific portion of PCI config space through a region in a BAR, such behavior is seen in igd GGC and BDSM mirrors in BAR0. To handle these, config_offset is introduced to allow mirroring arbitrary region in PCI config space. Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20250104154219.7209-3-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio/pci: declare generic quirks in a new header fileTomita Moeko2-51/+75
Declare generic vfio_generic_{window_address,window_data,mirror}_quirk in newly created pci_quirks.h so that they can be used elsewhere, like igd.c. Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20250104154219.7209-2-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio/igd: Fix potential overflow in igd_gtt_memory_size()Cédric Le Goater1-1/+1
The risk is mainly theoretical since the applied bit mask will keep the 'ggms' shift value below 3. Nevertheless, let's use a 64 bit integer type and resolve the coverity issue. Resolves: Coverity CID 1585908 Fixes: 1e1eac5f3dcd ("vfio/igd: canonicalize memory size calculations") Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20250107130604.669697-1-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-10qapi: Move include/qapi/qmp/ to include/qobject/Daniel P. Berrangé1-1/+1
The general expectation is that header files should follow the same file/path naming scheme as the corresponding source file. There are various historical exceptions to this practice in QEMU, with one of the most notable being the include/qapi/qmp/ directory. Most of the headers there correspond to source files in qobject/. This patch corrects most of that inconsistency by creating include/qobject/ and moving the headers for qobject/ there. This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h: scripts/get_maintainer.pl now reports "QAPI" instead of "No maintainers found". Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20241118151235.2665921-2-armbru@redhat.com> [Rebased]
2025-01-09hw/pci: Use -1 as the default value for rombarAkihiko Odaki1-3/+2
vfio_pci_size_rom() distinguishes whether rombar is explicitly set to 1 by checking dev->opts, bypassing the QOM property infrastructure. Use -1 as the default value for rombar to tell if the user explicitly set it to 1. The property is also converted from unsigned to signed. -1 is signed so it is safe to give it a new meaning. The values in [2 ^ 31, 2 ^ 32) become invalid, but nobody should have typed these values by chance. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20250104-reuse-v18-13-c349eafd8673@daynix.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-12-26vfio/migration: Rename vfio_devices_all_dirty_tracking()Avihai Horon1-2/+2
vfio_devices_all_dirty_tracking() is used to check if dirty page log sync is needed. However, besides checking the dirty page tracking status, it also checks the pre_copy_dirty_page_tracking flag. Rename it to vfio_devices_log_sync_needed() which reflects its purpose more accurately and makes the code clearer as there are already several helpers with similar names. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Tested-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20241218134022.21264-5-avihaih@nvidia.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/migration: Refactor vfio_devices_all_running_and_mig_active() logicAvihai Horon2-33/+9
During DMA unmap with vIOMMU, vfio_devices_all_running_and_mig_active() is used to check whether a dirty page log sync of the unmapped pages is required. Such log sync is needed during migration pre-copy phase, and the current logic detects it by checking if migration is active and if the VFIO devices are running. However, recently there has been an effort to simplify the migration status API and reduce it to a single migration_is_running() function. To accommodate this, refactor vfio_devices_all_running_and_mig_active() logic so it won't use migration_is_active(). Do it by simply checking if dirty tracking has been started using internal VFIO flags. This should be equivalent to the previous logic as during migration dirty tracking is active and when the guest is stopped there shouldn't be DMA unmaps coming from it. As a side effect, now that migration status is no longer used, DMA unmap log syncs are untied from migration. This will make calc-dirty-rate more accurate as now it will also include VFIO dirty pages that were DMA unmapped. Also rename the function to properly reflect its new logic and extract common code from vfio_devices_all_dirty_tracking(). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Tested-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20241218134022.21264-4-avihaih@nvidia.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/migration: Refactor vfio_devices_all_dirty_tracking() logicAvihai Horon1-1/+16
During dirty page log sync, vfio_devices_all_dirty_tracking() is used to check if dirty tracking has been started in order to avoid errors. The current logic checks if migration is in ACTIVE or DEVICE states to ensure dirty tracking has been started. However, recently there has been an effort to simplify the migration status API and reduce it to a single migration_is_running() function. To accommodate this, refactor vfio_devices_all_dirty_tracking() logic so it won't use migration_is_active() and migration_is_device(). Instead, use internal VFIO dirty tracking flags. As a side effect, now that migration status is no longer used to detect dirty tracking status, VFIO log syncs are untied from migration. This will make calc-dirty-rate more accurate as now it will also include VFIO dirty pages. While at it, as VFIODevice->dirty_tracking is now used to detect dirty tracking status, add a comment that states how it's protected. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Tested-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20241218134022.21264-3-avihaih@nvidia.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/container: Add dirty tracking started flagAvihai Horon1-1/+11
Add a flag to VFIOContainerBase that indicates whether dirty tracking has been started for the container or not. This will be used in the following patches to allow dirty page syncs only if dirty tracking has been started. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Tested-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20241218134022.21264-2-avihaih@nvidia.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/igd: add x-igd-gms option back to set DSM region size for guestTomita Moeko1-0/+26
DSM region is likely to store framebuffer in Windows, a small DSM region may cause display issues (e.g. half of the screen is black). Since 971ca22f041b ("vfio/igd: don't set stolen memory size to zero"), the x-igd-gms option was functionally removed, QEMU uses host's original value, which is determined by DVMT Pre-Allocated option in Intel FSP of host bios. However, some vendors do not expose this config item to users. In such cases, x-igd-gms option can be used to manually set the data stolen memory size for guest. So this commit brings this option back, keeping its old behavior. When it is not specified, QEMU uses host's value. When DVMT Pre-Allocated option is available in host BIOS, user should set DSM region size there instead of using x-igd-gms option. Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20241206122749.9893-11-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/igd: emulate BDSM in mmio bar0 for gen 6-10 devicesTomita Moeko1-8/+18
A recent commit in i915 driver [1] claims the BDSM register at 0x1080c0 of mmio bar0 has been there since gen 6. Mirror this register to the 32 bit BDSM register at 0x5c in pci config space for gen6-10 devices. [1] https://patchwork.freedesktop.org/patch/msgid/20240202224340.30647-7-ville.syrjala@linux.intel.com Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20241206122749.9893-10-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/igd: emulate GGC register in mmio bar0Tomita Moeko1-2/+11
The GGC register at 0x50 of pci config space is a mirror of the same register at 0x108040 of mmio bar0 [1]. i915 driver also reads that register from mmio bar0 instead of config space. As GGC is programmed and emulated by qemu, the mmio address should also be emulated, in the same way of BDSM register. [1] 4.1.28, 12th Generation Intel Core Processors Datasheet Volume 2 https://www.intel.com/content/www/us/en/content-details/655259 Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20241206122749.9893-9-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/igd: add macro for declaring mirrored registersTomita Moeko1-24/+36
igd devices have multipe registers mirroring mmio address and pci config space, more than a single BDSM register. To support this, the read/write functions are made common and a macro is defined to simplify the declaration of MemoryRegionOps. Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20241206122749.9893-8-tomitamoeko@gmail.com [ clg : Fixed conversion specifier on 32-bit platform ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/igd: add Alder/Raptor/Rocket/Ice/Jasper Lake device idsTomita Moeko1-0/+5
All gen 11 and 12 igd devices have 64 bit BDSM register at 0xC0 in its config space, add them to the list to support igd passthrough on Alder/ Raptor/Rocket/Ice/Jasper Lake platforms. Tested legacy mode of igd passthrough works properly on both linux and windows guests with AlderLake-S GT1 (8086:4680). Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20241206122749.9893-7-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>