aboutsummaryrefslogtreecommitdiff
path: root/hw/vfio/pci.c
AgeCommit message (Collapse)AuthorFilesLines
2024-12-15hw/vfio: Constify all PropertyRichard Henderson1-2/+2
Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-09-13hw: Use device_class_set_legacy_reset() instead of opencodingPeter Maydell1-1/+1
Use device_class_set_legacy_reset() instead of opencoding an assignment to DeviceClass::reset. This change was produced with: spatch --macro-file scripts/cocci-macro-file.h \ --sp-file scripts/coccinelle/device-reset.cocci \ --keep-comments --smpl-spacing --in-place --dir hw Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org
2024-09-10qapi/common: Drop temporary 'prefix'Markus Armbruster1-5/+5
Recent commit "qapi: Smarter camel_to_upper() to reduce need for 'prefix'" added a temporary 'prefix' to delay changing the generated code. Revert it. This improves OffAutoPCIBAR's generated enumeration constant prefix from OFF_AUTOPCIBAR to OFF_AUTO_PCIBAR. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20240904111836.3273842-5-armbru@redhat.com>
2024-07-23vfio/common: Allow disabling device dirty page trackingJoao Martins1-0/+3
The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole tracking of VF pre-copy phase of dirty page tracking, though it means that it will only be used at the start of the switchover phase. Add an option that disables the VF dirty page tracking, and fall back into container-based dirty page tracking. This also allows to use IOMMU dirty tracking even on VFs with their own dirty tracker scheme. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdevJoao Martins1-3/+8
mdevs aren't "physical" devices and when asking for backing IOMMU info, it fails the entire provisioning of the guest. Fix that by skipping HostIOMMUDevice initialization in the presence of mdevs, and skip setting an iommu device when it is known to be an mdev. Cc: Zhenzhong Duan <zhenzhong.duan@intel.com> Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/pci: Extract mdev check into an helperJoao Martins1-9/+3
In preparation to skip initialization of the HostIOMMUDevice for mdev, extract the checks that validate if a device is an mdev into helpers. A vfio_device_is_mdev() is created, and subsystems consult VFIODevice::mdev to check if it's mdev or not. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-06-24vfio/container: Remove VFIOContainerBase::opsCédric Le Goater1-2/+2
Instead, use VFIO_IOMMU_GET_CLASS() to get the class pointer. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-06-24vfio/pci: Pass HostIOMMUDevice to vIOMMUZhenzhong Duan1-5/+14
With HostIOMMUDevice passed, vIOMMU can check compatibility with host IOMMU, call into IOMMUFD specific methods, etc. Originally-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
2024-05-22vfio: Use g_autofree in all call site of vfio_get_region_info()Zhenzhong Duan1-10/+3
There are some exceptions when pointer to vfio_region_info is reused. In that case, the pointed memory is freed manually. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/pci-quirks: Make vfio_add_*_cap() return boolZhenzhong Duan1-2/+1
This is to follow the coding standand in qapi/error.h to return bool for bool-valued functions. Include below functions: vfio_add_virt_caps() vfio_add_nv_gpudirect_cap() vfio_add_vmd_shadow_cap() Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/pci-quirks: Make vfio_pci_igd_opregion_init() return boolZhenzhong Duan1-2/+1
This is to follow the coding standand in qapi/error.h to return bool for bool-valued functions. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/pci: Use g_autofree for vfio_region_info pointerZhenzhong Duan1-2/+1
Pointer opregion is freed after vfio_pci_igd_opregion_init(). Use 'g_autofree' to avoid the g_free() calls. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/pci: Make capability related functions return boolZhenzhong Duan1-41/+36
The functions operating on capability don't have a consistent return style. Below functions are in bool-valued functions style: vfio_msi_setup() vfio_msix_setup() vfio_add_std_cap() vfio_add_capabilities() Below two are integer-valued functions: vfio_add_vendor_specific_cap() vfio_setup_pcie_cap() But the returned integer is only used for check succeed/failure. Change them all to return bool so now all capability related functions follow the coding standand in qapi/error.h to return bool. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/pci: Make vfio_populate_vga() return boolZhenzhong Duan1-6/+5
This is to follow the coding standand in qapi/error.h to return bool for bool-valued functions. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/pci: Make vfio_intx_enable() return boolZhenzhong Duan1-11/+8
This is to follow the coding standand in qapi/error.h to return bool for bool-valued functions. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/pci: Make vfio_populate_device() return a boolZhenzhong Duan1-11/+10
Since vfio_populate_device() takes an 'Error **' argument, best practices suggest to return a bool. See the qapi/error.h Rules section. By this chance, pass errp directly to vfio_populate_device() to avoid calling error_propagate(). Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/pci: Make vfio_pci_relocate_msix() and vfio_msix_early_setup() return a ↵Zhenzhong Duan1-17/+16
bool Since vfio_pci_relocate_msix() and vfio_msix_early_setup() takes an 'Error **' argument, best practices suggest to return a bool. See the qapi/error.h Rules section. By this chance, pass errp directly to vfio_msix_early_setup() to avoid calling error_propagate(). Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/pci: Make vfio_intx_enable_kvm() return a boolZhenzhong Duan1-7/+8
Since vfio_intx_enable_kvm() takes an 'Error **' argument, best practices suggest to return a bool. See the qapi/error.h Rules section. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/helpers: Make vfio_device_get_name() return boolZhenzhong Duan1-1/+1
This is to follow the coding standand in qapi/error.h to return bool for bool-valued functions. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/helpers: Make vfio_set_irq_signaling() return boolZhenzhong Duan1-19/+21
This is to follow the coding standand in qapi/error.h to return bool for bool-valued functions. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/display: Make vfio_display_*() return boolZhenzhong Duan1-2/+1
This is to follow the coding standand in qapi/error.h to return bool for bool-valued functions. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-16vfio: Make VFIOIOMMUClass::attach_device() and its wrapper return boolZhenzhong Duan1-3/+2
Make VFIOIOMMUClass::attach_device() and its wrapper function vfio_attach_device() return bool. This is to follow the coding standand to return bool if 'Error **' is used to pass error. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-16vfio/pci: Use g_autofree in vfio_realizeZhenzhong Duan1-4/+3
Local pointer name is allocated before vfio_attach_device() call and freed after the call. Same for tmp when calling realpath(). Use 'g_autofree' to avoid the g_free() calls. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-16vfio/migration: Emit VFIO migration QAPI eventAvihai Horon1-0/+2
Emit VFIO migration QAPI event when a VFIO device changes its migration state. This can be used by management applications to get updates on the current state of the VFIO device for their own purposes. A new per VFIO device capability, "migration-events", is added so events can be enabled only for the required devices. It is disabled by default. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-16vfio/pci: migration: Skip config space check for Vendor Specific Information ↵Vinayak Kale1-0/+26
in VSC during restore/load In case of migration, during restore operation, qemu checks config space of the pci device with the config space in the migration stream captured during save operation. In case of config space data mismatch, restore operation is failed. config space check is done in function get_pci_config_device(). By default VSC (vendor-specific-capability) in config space is checked. Due to qemu's config space check for VSC, live migration is broken across NVIDIA vGPU devices in situation where source and destination host driver is different. In this situation, Vendor Specific Information in VSC varies on the destination to ensure vGPU feature capabilities exposed to the guest driver are compatible with destination host. If a vfio-pci device is migration capable and vfio-pci vendor driver is OK with volatile Vendor Specific Info in VSC then qemu should exempt config space check for Vendor Specific Info. It is vendor driver's responsibility to ensure that VSC is consistent across migration. Here consistency could mean that VSC format should be same on source and destination, however actual Vendor Specific Info may not be byte-to-byte identical. This patch skips the check for Vendor Specific Information in VSC for VFIO-PCI device by clearing pdev->cmask[] offsets. Config space check is still enforced for 3 byte VSC header. If cmask[] is not set for an offset, then qemu skips config space check for that offset. VSC check is skipped for machine types >= 9.1. The check would be enforced on older machine types (<= 9.0). Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Vinayak Kale <vkale@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-16vfio/migration: Add Error** argument to .vfio_save_config() handlerCédric Le Goater1-2/+3
Use vmstate_save_state_with_err() to improve error reporting in the callers and store a reported error under the migration stream. Add documentation while at it. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-03-12hw/vfio/pci: Fix missing ERRP_GUARD() for error_prepend()Zhao Liu1-0/+2
As the comment in qapi/error, passing @errp to error_prepend() requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: ... * - It should not be passed to error_prepend(), error_vprepend() or * error_append_hint(), because that doesn't work with &error_fatal. * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. ERRP_GUARD() could avoid the case when @errp is &error_fatal, the user can't see this additional information, because exit() happens in error_setg earlier than information is added [1]. In hw/vfio/pci.c, there are 2 functions passing @errp to error_prepend() without ERRP_GUARD(): - vfio_add_std_cap() - vfio_realize() The @errp of vfio_add_std_cap() is also from vfio_realize(). And vfio_realize(), as a PCIDeviceClass.realize method, its @errp is from DeviceClass.realize so that there is no guarantee that the @errp won't point to @error_fatal. To avoid the issue like [1] said, add missing ERRP_GUARD() at their beginning. [1]: Issue description in the commit message of commit ae7c80a7bd73 ("error: New macro ERRP_GUARD()"). Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20240311033822.3142585-24-zhao1.liu@linux.intel.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-03-09hw/vfio/pci.c: Make some structure staticFrediano Ziglio1-2/+2
Not used outside C module. Signed-off-by: Frediano Ziglio <frediano.ziglio@cloud.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by Tokarev <mjt@tls.msk.ru> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-01-29vfio/pci: Clear MSI-X IRQ index alwaysCédric Le Goater1-3/+5
When doing device assignment of a physical device, MSI-X can be enabled with no vectors enabled and this sets the IRQ index to VFIO_PCI_MSIX_IRQ_INDEX. However, when MSI-X is disabled, the IRQ index is left untouched if no vectors are in use. Then, when INTx is enabled, the IRQ index value is considered incompatible (set to MSI-X) and VFIO_DEVICE_SET_IRQS fails. QEMU complains with : qemu-system-x86_64: vfio 0000:08:00.0: Failed to set up TRIGGER eventfd signaling for interrupt INTX-0: VFIO_DEVICE_SET_IRQS failure: Invalid argument To avoid that, unconditionaly clear the IRQ index when MSI-X is disabled. Buglink: https://issues.redhat.com/browse/RHEL-21293 Fixes: 5ebffa4e87e7 ("vfio/pci: use an invalid fd to enable MSI-X") Cc: Jing Liu <jing2.liu@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-01-05vfio/container: Introduce a VFIOIOMMU QOM interfaceCédric Le Goater1-1/+1
VFIOContainerBase was not introduced as an abstract QOM object because it felt unnecessary to expose all the IOMMU backends to the QEMU machine and human interface. However, we can still abstract the IOMMU backend handlers using a QOM interface class. This provides more flexibility when referencing the various implementations. Simply transform the VFIOIOMMUOps struct in an InterfaceClass and do some initial name replacements. Next changes will start converting VFIOIOMMUOps. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-30hw/vfio: Constify VMStateRichard Henderson1-3/+3
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20231221031652.119827-60-richard.henderson@linaro.org>
2023-12-19vfio: Introduce a helper function to initialize VFIODeviceZhenzhong Duan1-4/+2
Introduce a helper function to replace the common code to initialize VFIODevice in pci, platform, ap and ccw VFIO device. No functional change intended. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19vfio/pci: Move VFIODevice initializations in vfio_instance_initZhenzhong Duan1-4/+6
Some of the VFIODevice initializations is in vfio_realize, move all of them in vfio_instance_init. No functional change intended. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19vfio/pci: Make vfio cdev pre-openable by passing a file handleZhenzhong Duan1-12/+16
This gives management tools like libvirt a chance to open the vfio cdev with privilege and pass FD to qemu. This way qemu never needs to have privilege to open a VFIO or iommu cdev node. Together with the earlier support of pre-opening /dev/iommu device, now we have full support of passing a vfio device to unprivileged qemu by management tool. This mode is no more considered for the legacy backend. So let's remove the "TODO" comment. Add helper functions vfio_device_set_fd() and vfio_device_get_name() to set fd and get device name, they will also be used by other vfio devices. There is no easy way to check if a device is mdev with FD passing, so fail the x-balloon-allowed check unconditionally in this case. There is also no easy way to get BDF as name with FD passing, so we fake a name by VFIO_FD[fd]. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19vfio/pci: Allow the selection of a given iommu backendEric Auger1-0/+6
Now we support two types of iommu backends, let's add the capability to select one of them. This depends on whether an iommufd object has been linked with the vfio-pci device: If the user wants to use the legacy backend, it shall not link the vfio-pci device with any iommufd object: -device vfio-pci,host=0000:02:00.0 This is called the legacy mode/backend. If the user wants to use the iommufd backend (/dev/iommu) it shall pass an iommufd object id in the vfio-pci device options: -object iommufd,id=iommufd0 -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19vfio/pci: Introduce a vfio pci hot reset interfaceZhenzhong Duan1-162/+6
Legacy vfio pci and iommufd cdev have different process to hot reset vfio device, expand current code to abstract out pci_hot_reset callback for legacy vfio, this same interface will also be used by iommufd cdev vfio device. Rename vfio_pci_hot_reset to vfio_legacy_pci_hot_reset and move it into container.c. vfio_pci_[pre/post]_reset and vfio_pci_host_match are exported so they could be called in legacy and iommufd pci_hot_reset callback. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_infoZhenzhong Duan1-17/+37
This helper will be used by both legacy and iommufd backends. No functional changes intended. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-03vfio/pci: Fix buffer overrun when writing the VF tokenCédric Le Goater1-1/+1
qemu_uuid_unparse() includes a trailing NUL when writing the uuid string and the buffer size should be UUID_FMT_LEN + 1 bytes. Use the recently added UUID_STR_LEN which defines the correct size. Fixes: CID 1522913 Fixes: 2dca1b37a760 ("vfio/pci: add support for VF token") Cc: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: "Denis V. Lunev" <den@openvz.org> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-18hw/vfio: add ramfb migration supportMarc-André Lureau1-0/+47
Add a "VFIODisplay" subsection whenever "x-ramfb-migrate" is turned on. Turn it off by default on machines <= 8.1 for compatibility reasons. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> [ clg: - checkpatch fixes - improved warn_report() in vfio_realize() ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-18vfio/pci: Remove vfio_detach_device from vfio_realize error pathEric Auger1-9/+7
In vfio_realize, on the error path, we currently call vfio_detach_device() after a successful vfio_attach_device. While this looks natural, vfio_instance_finalize also induces a vfio_detach_device(), and it seems to be the right place instead as other resources are released there which happen to be a prerequisite to a successful UNSET_CONTAINER. So let's rely on the finalize vfio_detach_device call to free all the relevant resources. Fixes: a28e06621170 ("vfio/pci: Introduce vfio_[attach/detach]_device") Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-18vfio/pci: Introduce vfio_[attach/detach]_deviceEric Auger1-52/+14
We want the VFIO devices to be able to use two different IOMMU backends, the legacy VFIO one and the new iommufd one. Introduce vfio_[attach/detach]_device which aim at hiding the underlying IOMMU backend (IOCTLs, datatypes, ...). Once vfio_attach_device completes, the device is attached to a security context and its fd can be used. Conversely When vfio_detach_device completes, the device has been detached from the security context. At the moment only the implementation based on the legacy container/group exists. Let's use it from the vfio-pci device. Subsequent patches will handle other devices. We also take benefit of this patch to properly free vbasedev->name on failure. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-05vfio/pci: enable MSI-X in interrupt restoring on dynamic allocationJing Liu1-0/+17
During migration restoring, vfio_enable_vectors() is called to restore enabling MSI-X interrupts for assigned devices. It sets the range from 0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in guest. During the MSI-X enabling, all the vectors within the range are allocated according to the VFIO_DEVICE_SET_IRQS ioctl. When dynamic MSI-X allocation is supported, we only want the guest unmasked vectors being allocated and enabled. Use vector 0 with an invalid fd to get MSI-X enabled, after that, all the vectors can be allocated in need. Signed-off-by: Jing Liu <jing2.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-05vfio/pci: use an invalid fd to enable MSI-XJing Liu1-8/+36
Guests typically enable MSI-X with all of the vectors masked in the MSI-X vector table. To match the guest state of device, QEMU enables MSI-X by enabling vector 0 with userspace triggering and immediately release. However the release function actually does not release it due to already using userspace mode. It is no need to enable triggering on host and rely on the mask bit to avoid spurious interrupts. Use an invalid fd (i.e. fd = -1) is enough to get MSI-X enabled. After dynamic MSI-X allocation is supported, the interrupt restoring also need use such way to enable MSI-X, therefore, create a function for that. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-05vfio/pci: enable vector on dynamic MSI-X allocationJing Liu1-18/+28
The vector_use callback is used to enable vector that is unmasked in guest. The kernel used to only support static MSI-X allocation. When allocating a new interrupt using "static MSI-X allocation" kernels, QEMU first disables all previously allocated vectors and then re-allocates all including the new one. The nr_vectors of VFIOPCIDevice indicates that all vectors from 0 to nr_vectors are allocated (and may be enabled), which is used to loop all the possibly used vectors when e.g., disabling MSI-X interrupts. Extend the vector_use function to support dynamic MSI-X allocation when host supports the capability. QEMU therefore can individually allocate and enable a new interrupt without affecting others or causing interrupts lost during runtime. Utilize nr_vectors to calculate the upper bound of enabled vectors in dynamic MSI-X allocation mode since looping all msix_entries_nr is not efficient and unnecessary. Signed-off-by: Jing Liu <jing2.liu@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-05vfio/pci: detect the support of dynamic MSI-X allocationJing Liu1-2/+14
Kernel provides the guidance of dynamic MSI-X allocation support of passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag to guide user space. Fetch the flags from host to determine if dynamic MSI-X allocation is supported. Originally-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-05vfio/pci: rename vfio_put_device to vfio_pci_put_deviceZhenzhong Duan1-2/+2
vfio_put_device() is a VFIO PCI specific function, rename it with 'vfio_pci' prefix to avoid confusing. No functional change. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-09-18spapr: Remove support for NVIDIA V100 GPU with NVLink2Cédric Le Goater1-14/+0
NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits : 562d1e207d32 ("powerpc/powernv: remove the nvlink support") b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2") This was 2.5 years ago. Do the same in QEMU with a revert of commit ec132efaa81f ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some adjustements are required on the NUMA part. Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20230918091717.149950-1-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-07-10vfio/pci: Enable AtomicOps completers on root portsAlex Williamson1-0/+78
Dynamically enable Atomic Ops completer support around realize/exit of vfio-pci devices reporting host support for these accesses and adhering to a minimal configuration standard. While the Atomic Ops completer bits in the root port device capabilities2 register are read-only, the PCIe spec does allow RO bits to change to reflect hardware state. We take advantage of that here around the realize and exit functions of the vfio-pci device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Robin Voetter <robin@streamhpc.com> Tested-by: Robin Voetter <robin@streamhpc.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-07-10vfio: Fix null pointer dereference bug in vfio_bars_finalize()Avihai Horon1-1/+3
vfio_realize() has the following flow: 1. vfio_bars_prepare() -- sets VFIOBAR->size. 2. msix_early_setup(). 3. vfio_bars_register() -- allocates VFIOBAR->mr. After vfio_bars_prepare() is called msix_early_setup() can fail. If it does fail, vfio_bars_register() is never called and VFIOBAR->mr is not allocated. In this case, vfio_bars_finalize() is called as part of the error flow to free the bars' resources. However, vfio_bars_finalize() calls object_unparent() for VFIOBAR->mr after checking only VFIOBAR->size, and thus we get a null pointer dereference. Fix it by checking VFIOBAR->mr in vfio_bars_finalize(). Fixes: 89d5202edc50 ("vfio/pci: Allow relocating MSI-X MMIO") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-07-10vfio/migration: Return bool type for vfio_migration_realize()Zhenzhong Duan1-2/+1
Make vfio_migration_realize() adhere to the convention of other realize() callbacks(like qdev_realize) by returning bool instead of int. Suggested-by: Cédric Le Goater <clg@redhat.com> Suggested-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>