aboutsummaryrefslogtreecommitdiff
path: root/hw/vfio/pci.c
AgeCommit message (Collapse)AuthorFilesLines
2024-01-29vfio/pci: Clear MSI-X IRQ index alwaysCédric Le Goater1-3/+5
When doing device assignment of a physical device, MSI-X can be enabled with no vectors enabled and this sets the IRQ index to VFIO_PCI_MSIX_IRQ_INDEX. However, when MSI-X is disabled, the IRQ index is left untouched if no vectors are in use. Then, when INTx is enabled, the IRQ index value is considered incompatible (set to MSI-X) and VFIO_DEVICE_SET_IRQS fails. QEMU complains with : qemu-system-x86_64: vfio 0000:08:00.0: Failed to set up TRIGGER eventfd signaling for interrupt INTX-0: VFIO_DEVICE_SET_IRQS failure: Invalid argument To avoid that, unconditionaly clear the IRQ index when MSI-X is disabled. Buglink: https://issues.redhat.com/browse/RHEL-21293 Fixes: 5ebffa4e87e7 ("vfio/pci: use an invalid fd to enable MSI-X") Cc: Jing Liu <jing2.liu@intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-01-05vfio/container: Introduce a VFIOIOMMU QOM interfaceCédric Le Goater1-1/+1
VFIOContainerBase was not introduced as an abstract QOM object because it felt unnecessary to expose all the IOMMU backends to the QEMU machine and human interface. However, we can still abstract the IOMMU backend handlers using a QOM interface class. This provides more flexibility when referencing the various implementations. Simply transform the VFIOIOMMUOps struct in an InterfaceClass and do some initial name replacements. Next changes will start converting VFIOIOMMUOps. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-30hw/vfio: Constify VMStateRichard Henderson1-3/+3
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20231221031652.119827-60-richard.henderson@linaro.org>
2023-12-19vfio: Introduce a helper function to initialize VFIODeviceZhenzhong Duan1-4/+2
Introduce a helper function to replace the common code to initialize VFIODevice in pci, platform, ap and ccw VFIO device. No functional change intended. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19vfio/pci: Move VFIODevice initializations in vfio_instance_initZhenzhong Duan1-4/+6
Some of the VFIODevice initializations is in vfio_realize, move all of them in vfio_instance_init. No functional change intended. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19vfio/pci: Make vfio cdev pre-openable by passing a file handleZhenzhong Duan1-12/+16
This gives management tools like libvirt a chance to open the vfio cdev with privilege and pass FD to qemu. This way qemu never needs to have privilege to open a VFIO or iommu cdev node. Together with the earlier support of pre-opening /dev/iommu device, now we have full support of passing a vfio device to unprivileged qemu by management tool. This mode is no more considered for the legacy backend. So let's remove the "TODO" comment. Add helper functions vfio_device_set_fd() and vfio_device_get_name() to set fd and get device name, they will also be used by other vfio devices. There is no easy way to check if a device is mdev with FD passing, so fail the x-balloon-allowed check unconditionally in this case. There is also no easy way to get BDF as name with FD passing, so we fake a name by VFIO_FD[fd]. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19vfio/pci: Allow the selection of a given iommu backendEric Auger1-0/+6
Now we support two types of iommu backends, let's add the capability to select one of them. This depends on whether an iommufd object has been linked with the vfio-pci device: If the user wants to use the legacy backend, it shall not link the vfio-pci device with any iommufd object: -device vfio-pci,host=0000:02:00.0 This is called the legacy mode/backend. If the user wants to use the iommufd backend (/dev/iommu) it shall pass an iommufd object id in the vfio-pci device options: -object iommufd,id=iommufd0 -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19vfio/pci: Introduce a vfio pci hot reset interfaceZhenzhong Duan1-162/+6
Legacy vfio pci and iommufd cdev have different process to hot reset vfio device, expand current code to abstract out pci_hot_reset callback for legacy vfio, this same interface will also be used by iommufd cdev vfio device. Rename vfio_pci_hot_reset to vfio_legacy_pci_hot_reset and move it into container.c. vfio_pci_[pre/post]_reset and vfio_pci_host_match are exported so they could be called in legacy and iommufd pci_hot_reset callback. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-12-19vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_infoZhenzhong Duan1-17/+37
This helper will be used by both legacy and iommufd backends. No functional changes intended. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-11-03vfio/pci: Fix buffer overrun when writing the VF tokenCédric Le Goater1-1/+1
qemu_uuid_unparse() includes a trailing NUL when writing the uuid string and the buffer size should be UUID_FMT_LEN + 1 bytes. Use the recently added UUID_STR_LEN which defines the correct size. Fixes: CID 1522913 Fixes: 2dca1b37a760 ("vfio/pci: add support for VF token") Cc: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: "Denis V. Lunev" <den@openvz.org> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-18hw/vfio: add ramfb migration supportMarc-André Lureau1-0/+47
Add a "VFIODisplay" subsection whenever "x-ramfb-migrate" is turned on. Turn it off by default on machines <= 8.1 for compatibility reasons. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> [ clg: - checkpatch fixes - improved warn_report() in vfio_realize() ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-18vfio/pci: Remove vfio_detach_device from vfio_realize error pathEric Auger1-9/+7
In vfio_realize, on the error path, we currently call vfio_detach_device() after a successful vfio_attach_device. While this looks natural, vfio_instance_finalize also induces a vfio_detach_device(), and it seems to be the right place instead as other resources are released there which happen to be a prerequisite to a successful UNSET_CONTAINER. So let's rely on the finalize vfio_detach_device call to free all the relevant resources. Fixes: a28e06621170 ("vfio/pci: Introduce vfio_[attach/detach]_device") Reported-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Eric Auger <eric.auger@redhat.com> Tested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-18vfio/pci: Introduce vfio_[attach/detach]_deviceEric Auger1-52/+14
We want the VFIO devices to be able to use two different IOMMU backends, the legacy VFIO one and the new iommufd one. Introduce vfio_[attach/detach]_device which aim at hiding the underlying IOMMU backend (IOCTLs, datatypes, ...). Once vfio_attach_device completes, the device is attached to a security context and its fd can be used. Conversely When vfio_detach_device completes, the device has been detached from the security context. At the moment only the implementation based on the legacy container/group exists. Let's use it from the vfio-pci device. Subsequent patches will handle other devices. We also take benefit of this patch to properly free vbasedev->name on failure. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-05vfio/pci: enable MSI-X in interrupt restoring on dynamic allocationJing Liu1-0/+17
During migration restoring, vfio_enable_vectors() is called to restore enabling MSI-X interrupts for assigned devices. It sets the range from 0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in guest. During the MSI-X enabling, all the vectors within the range are allocated according to the VFIO_DEVICE_SET_IRQS ioctl. When dynamic MSI-X allocation is supported, we only want the guest unmasked vectors being allocated and enabled. Use vector 0 with an invalid fd to get MSI-X enabled, after that, all the vectors can be allocated in need. Signed-off-by: Jing Liu <jing2.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-05vfio/pci: use an invalid fd to enable MSI-XJing Liu1-8/+36
Guests typically enable MSI-X with all of the vectors masked in the MSI-X vector table. To match the guest state of device, QEMU enables MSI-X by enabling vector 0 with userspace triggering and immediately release. However the release function actually does not release it due to already using userspace mode. It is no need to enable triggering on host and rely on the mask bit to avoid spurious interrupts. Use an invalid fd (i.e. fd = -1) is enough to get MSI-X enabled. After dynamic MSI-X allocation is supported, the interrupt restoring also need use such way to enable MSI-X, therefore, create a function for that. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-05vfio/pci: enable vector on dynamic MSI-X allocationJing Liu1-18/+28
The vector_use callback is used to enable vector that is unmasked in guest. The kernel used to only support static MSI-X allocation. When allocating a new interrupt using "static MSI-X allocation" kernels, QEMU first disables all previously allocated vectors and then re-allocates all including the new one. The nr_vectors of VFIOPCIDevice indicates that all vectors from 0 to nr_vectors are allocated (and may be enabled), which is used to loop all the possibly used vectors when e.g., disabling MSI-X interrupts. Extend the vector_use function to support dynamic MSI-X allocation when host supports the capability. QEMU therefore can individually allocate and enable a new interrupt without affecting others or causing interrupts lost during runtime. Utilize nr_vectors to calculate the upper bound of enabled vectors in dynamic MSI-X allocation mode since looping all msix_entries_nr is not efficient and unnecessary. Signed-off-by: Jing Liu <jing2.liu@intel.com> Tested-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-05vfio/pci: detect the support of dynamic MSI-X allocationJing Liu1-2/+14
Kernel provides the guidance of dynamic MSI-X allocation support of passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag to guide user space. Fetch the flags from host to determine if dynamic MSI-X allocation is supported. Originally-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-10-05vfio/pci: rename vfio_put_device to vfio_pci_put_deviceZhenzhong Duan1-2/+2
vfio_put_device() is a VFIO PCI specific function, rename it with 'vfio_pci' prefix to avoid confusing. No functional change. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-09-18spapr: Remove support for NVIDIA V100 GPU with NVLink2Cédric Le Goater1-14/+0
NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits : 562d1e207d32 ("powerpc/powernv: remove the nvlink support") b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2") This was 2.5 years ago. Do the same in QEMU with a revert of commit ec132efaa81f ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some adjustements are required on the NUMA part. Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20230918091717.149950-1-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-07-10vfio/pci: Enable AtomicOps completers on root portsAlex Williamson1-0/+78
Dynamically enable Atomic Ops completer support around realize/exit of vfio-pci devices reporting host support for these accesses and adhering to a minimal configuration standard. While the Atomic Ops completer bits in the root port device capabilities2 register are read-only, the PCIe spec does allow RO bits to change to reflect hardware state. We take advantage of that here around the realize and exit functions of the vfio-pci device. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Robin Voetter <robin@streamhpc.com> Tested-by: Robin Voetter <robin@streamhpc.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-07-10vfio: Fix null pointer dereference bug in vfio_bars_finalize()Avihai Horon1-1/+3
vfio_realize() has the following flow: 1. vfio_bars_prepare() -- sets VFIOBAR->size. 2. msix_early_setup(). 3. vfio_bars_register() -- allocates VFIOBAR->mr. After vfio_bars_prepare() is called msix_early_setup() can fail. If it does fail, vfio_bars_register() is never called and VFIOBAR->mr is not allocated. In this case, vfio_bars_finalize() is called as part of the error flow to free the bars' resources. However, vfio_bars_finalize() calls object_unparent() for VFIOBAR->mr after checking only VFIOBAR->size, and thus we get a null pointer dereference. Fix it by checking VFIOBAR->mr in vfio_bars_finalize(). Fixes: 89d5202edc50 ("vfio/pci: Allow relocating MSI-X MMIO") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-07-10vfio/migration: Return bool type for vfio_migration_realize()Zhenzhong Duan1-2/+1
Make vfio_migration_realize() adhere to the convention of other realize() callbacks(like qdev_realize) by returning bool instead of int. Suggested-by: Cédric Le Goater <clg@redhat.com> Suggested-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-07-10vfio/migration: Remove print of "Migration disabled"Zhenzhong Duan1-1/+0
Property enable_migration supports [on/off/auto]. In ON mode, error pointer is passed to errp and logged. In OFF mode, we doesn't need to log "Migration disabled" as it's intentional. In AUTO mode, we should only ever see errors or warnings if the device supports migration and an error or incompatibility occurs while further probing or configuring it. Lack of support for migration shoundn't generate an error or warning. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-07-10vfio/migration: Free resources when vfio_migration_realize failsZhenzhong Duan1-0/+1
When vfio_realize() succeeds, hot unplug will call vfio_exitfn() to free resources allocated in vfio_realize(); when vfio_realize() fails, vfio_exitfn() is never called and we need to free resources in vfio_realize(). In the case that vfio_migration_realize() fails, e.g: with -only-migratable & enable-migration=off, we see below: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off 0000:81:11.1: Migration disabled Error: disallowing migration blocker (--only-migratable) for: 0000:81:11.1: Migration is disabled for VFIO device If we hotplug again we should see same log as above, but we see: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,enable-migration=off Error: vfio 0000:81:11.1: device is already attached That's because some references to VFIO device isn't released. For resources allocated in vfio_migration_realize(), free them by jumping to out_deinit path with calling a new function vfio_migration_deinit(). For resources allocated in vfio_realize(), free them by jumping to de-register path in vfio_realize(). Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Fixes: a22651053b59 ("vfio: Make vfio-pci device migration capable") Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-07-10vfio/migration: Change vIOMMU blocker from global to per deviceZhenzhong Duan1-1/+0
Contrary to multiple device blocker which needs to consider already-attached devices to unblock/block dynamically, the vIOMMU migration blocker is a device specific config. Meaning it only needs to know whether the device is bypassing or not the vIOMMU (via machine property, or per pxb-pcie::bypass_iommu), and does not need the state of currently present devices. For this reason, the vIOMMU global migration blocker can be consolidated into the per-device migration blocker, allowing us to remove some unnecessary code. This change also makes vfio_mig_active() more accurate as it doesn't check for global blocker. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-07-10vfio/pci: Disable INTx in vfio_realize error pathZhenzhong Duan1-0/+3
When vfio realize fails, INTx isn't disabled if it has been enabled. This may confuse host side with unhandled interrupt report. Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30vfio/pci: Free leaked timer in vfio_realize error pathZhenzhong Duan1-0/+3
When vfio_realize fails, the mmap_timer used for INTx optimization isn't freed. As this timer isn't activated yet, the potential impact is just a piece of leaked memory. Fixes: ea486926b07d ("vfio-pci: Update slow path INTx algorithm timer related") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30vfio/pci: Fix a segfault in vfio_realizeZhenzhong Duan1-1/+3
The kvm irqchip notifier is only registered if the device supports INTx, however it's unconditionally removed in vfio realize error path. If the assigned device does not support INTx, this will cause QEMU to crash when vfio realize fails. Change it to conditionally remove the notifier only if the notify hook is setup. Before fix: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Connection closed by foreign host. After fix: (qemu) device_add vfio-pci,host=81:11.1,id=vfio1,bus=root1,xres=1 Error: vfio 0000:81:11.1: xres and yres properties require display=on (qemu) Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30vfio/migration: Make VFIO migration non-experimentalAvihai Horon1-2/+2
The major parts of VFIO migration are supported today in QEMU. This includes basic VFIO migration, device dirty page tracking and precopy support. Thus, at this point in time, it seems appropriate to make VFIO migration non-experimental: remove the x prefix from enable_migration property, change it to ON_OFF_AUTO and let the default value be AUTO. In addition, make the following adjustments: 1. When enable_migration is ON and migration is not supported, fail VFIO device realization. 2. When enable_migration is AUTO (i.e., not explicitly enabled), require device dirty tracking support. This is because device dirty tracking is currently the only method to do dirty page tracking, which is essential for migrating in a reasonable downtime. Setting enable_migration to ON will not require device dirty tracking. 3. Make migration error and blocker messages more elaborate. 4. Remove error prints in vfio_migration_query_flags(). 5. Rename trace_vfio_migration_probe() to trace_vfio_migration_realize(). Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30vfio/pci: Call vfio_prepare_kvm_msi_virq_batch() in MSI retry pathShameer Kolothum1-2/+2
When vfio_enable_vectors() returns with less than requested nr_vectors we retry with what kernel reported back. But the retry path doesn't call vfio_prepare_kvm_msi_virq_batch() and this results in, qemu-system-aarch64: vfio: Error: Failed to enable 4 MSI vectors, retry with 1 qemu-system-aarch64: ../hw/vfio/pci.c:602: vfio_commit_kvm_msi_virq_batch: Assertion `vdev->defer_kvm_irq_routing' failed Fixes: dc580d51f7dd ("vfio: defer to commit kvm irq routing when enable msi/msix") Reviewed-by: Longpeng <longpeng2@huawei.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-05-24vfio/pci: Fix a use-after-free issueZhenzhong Duan1-1/+1
vbasedev->name is freed wrongly which leads to garbage VFIO trace log. Fix it by allocating a dup of vbasedev->name and then free the dup. Fixes: 2dca1b37a760 ("vfio/pci: add support for VF token") Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-05-09vfio/pci: Static Resizable BAR capabilityAlex Williamson1-1/+53
The PCI Resizable BAR (ReBAR) capability is currently hidden from the VM because the protocol for interacting with the capability does not support a mechanism for the device to reject an advertised supported BAR size. However, when assigned to a VM, the act of resizing the BAR requires adjustment of host resources for the device, which absolutely can fail. Linux does not currently allow us to reserve resources for the device independent of the current usage. The only writable field within the ReBAR capability is the BAR Size register. The PCIe spec indicates that when written, the device should immediately begin to operate with the provided BAR size. The spec however also notes that software must only write values corresponding to supported sizes as indicated in the capability and control registers. Writing unsupported sizes produces undefined results. Therefore, if the hypervisor were to virtualize the capability and control registers such that the current size is the only indicated available size, then a write of anything other than the current size falls into the category of undefined behavior, where we can essentially expose the modified ReBAR capability as read-only. This may seem pointless, but users have reported that virtualizing the capability in this way not only allows guest software to expose related features as available (even if only cosmetic), but in some scenarios can resolve guest driver issues. Additionally, no regressions in behavior have been reported for this change. A caveat here is that the PCIe spec requires for compatibility that devices report support for a size in the range of 1MB to 512GB, therefore if the current BAR size falls outside that range we revert to hiding the capability. Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230505232308.2869912-1-alex.williamson@redhat.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-05-09vfio/pci: add support for VF tokenMinwoo Im1-1/+12
VF token was introduced [1] to kernel vfio-pci along with SR-IOV support [2]. This patch adds support VF token among PF and VF(s). To passthu PCIe VF to a VM, kernel >= v5.7 needs this. It can be configured with UUID like: -device vfio-pci,host=DDDD:BB:DD:F,vf-token=<uuid>,... [1] https://lore.kernel.org/linux-pci/158396393244.5601.10297430724964025753.stgit@gimli.home/ [2] https://lore.kernel.org/linux-pci/158396044753.5601.14804870681174789709.stgit@gimli.home/ Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Link: https://lore.kernel.org/r/20230320073522epcms2p48f682ecdb73e0ae1a4850ad0712fd780@epcms2p4 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-03-07vfio/migration: Rename entry pointsAlex Williamson1-3/+3
Pick names that align with the section drivers should use them from, avoiding the confusion of calling a _finalize() function from _exit() and generalizing the actual _finalize() to handle removing the viommu blocker. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/167820912978.606734.12740287349119694623.stgit@omen Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2023-03-07vfio/migration: Block migration with vIOMMUJoao Martins1-0/+1
Migrating with vIOMMU will require either tracking maximum IOMMU supported address space (e.g. 39/48 address width on Intel) or range-track current mappings and dirty track the new ones post starting dirty tracking. This will be done as a separate series, so add a live migration blocker until that is fixed. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20230307125450.62409-14-joao.m.martins@oracle.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06vfio/pci: Use vbasedev local variable in vfio_realize()Eric Auger1-24/+25
Using a VFIODevice handle local variable to improve the code readability. no functional change intended Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20220502094223.36384-3-yi.l.liu@intel.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06hw/vfio/pci: fix vfio_pci_hot_reset_result trace pointEric Auger1-1/+1
"%m" format specifier is not interpreted by the trace infrastructure and thus "%m" is output instead of the actual errno string. Fix it by outputting strerror(errno). Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/r/20220502094223.36384-2-yi.l.liu@intel.com [aw: replace commit log as provided by Eric] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06vfio: defer to commit kvm irq routing when enable msi/msixLongpeng(Mike)1-33/+97
In migration resume phase, all unmasked msix vectors need to be setup when loading the VF state. However, the setup operation would take longer if the VM has more VFs and each VF has more unmasked vectors. The hot spot is kvm_irqchip_commit_routes, it'll scan and update all irqfds that are already assigned each invocation, so more vectors means need more time to process them. vfio_pci_load_config vfio_msix_enable msix_set_vector_notifiers for (vector = 0; vector < dev->msix_entries_nr; vector++) { vfio_msix_vector_do_use vfio_add_kvm_msi_virq kvm_irqchip_commit_routes <-- expensive } We can reduce the cost by only committing once outside the loop. The routes are cached in kvm_state, we commit them first and then bind irqfd for each vector. The test VM has 128 vcpus and 8 VF (each one has 65 vectors), we measure the cost of the vfio_msix_enable for each VF, and we can see 90+% costs can be reduce. VF Count of irqfds[*] Original With this patch 1st 65 8 2 2nd 130 15 2 3rd 195 22 2 4th 260 24 3 5th 325 36 2 6th 390 44 3 7th 455 51 3 8th 520 58 4 Total 258ms 21ms [*] Count of irqfds How many irqfds that already assigned and need to process in this round. The optimization can be applied to msi type too. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220326060226.1892-6-longpeng2@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06Revert "vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration"Longpeng(Mike)1-17/+3
Commit ecebe53fe993 ("vfio: Avoid disabling and enabling vectors repeatedly in VFIO migration") avoids inefficiently disabling and enabling vectors repeatedly and lets the unmasked vectors be enabled one by one. But we want to batch multiple routes and defer the commit, and only commit once outside the loop of setting vector notifiers, so we cannot enable the vectors one by one in the loop now. Revert that commit and we will take another way in the next patch, it can not only avoid disabling/enabling vectors repeatedly, but also satisfy our requirement of defer to commit. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220326060226.1892-5-longpeng2@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06vfio: simplify the failure path in vfio_msi_enableLongpeng(Mike)1-14/+2
Use vfio_msi_disable_common to simplify the error handling in vfio_msi_enable. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220326060226.1892-4-longpeng2@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06vfio: move re-enabling INTX out of the common helperLongpeng(Mike)1-6/+11
Move re-enabling INTX out, and the callers should decide to re-enable it or not. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220326060226.1892-3-longpeng2@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-05-06vfio: simplify the conditional statements in vfio_msi_enableLongpeng(Mike)1-2/+2
It's unnecessary to test against the specific return value of VFIO_DEVICE_SET_IRQS, since any positive return is an error indicating the number of vectors we should retry with. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Link: https://lore.kernel.org/r/20220326060226.1892-2-longpeng2@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2022-04-06Replace qemu_real_host_page variables with inlined functionsMarc-André Lureau1-5/+5
Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-21Use g_new() & friends where that makes obvious senseMarkus Armbruster1-2/+2
g_new(T, n) is neater than g_malloc(sizeof(T) * n). It's also safer, for two reasons. One, it catches multiplication overflowing size_t. Two, it returns T * rather than void *, which lets the compiler catch more type errors. This commit only touches allocations with size arguments of the form sizeof(T). Patch created mechanically with: $ spatch --in-place --sp-file scripts/coccinelle/use-g_new-etc.cocci \ --macro-file scripts/cocci-macro-file.h FILES... Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20220315144156.1595462-4-armbru@redhat.com> Reviewed-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru>
2022-03-15kvm/msi: do explicit commit when adding msi routesLongpeng(Mike)1-1/+4
We invoke the kvm_irqchip_commit_routes() for each addition to MSI route table, which is not efficient if we are adding lots of routes in some cases. This patch lets callers invoke the kvm_irqchip_commit_routes(), so the callers can decide how to optimize. [1] https://lists.gnu.org/archive/html/qemu-devel/2021-11/msg00967.html Signed-off-by: Longpeng <longpeng2@huawei.com> Message-Id: <20220222141116.2091-3-longpeng2@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-11-01vfio/pci: Add support for mmapping sub-page MMIO BARs after live migrationKunkun Jiang1-1/+18
We can expand MemoryRegions of sub-page MMIO BARs in vfio_pci_write_config() to improve IO performance for some devices. However, the MemoryRegions of destination VM are not expanded any more after live migration. Because their addresses have been updated in vmstate_load_state() (vfio_pci_load_config) and vfio_sub_page_bar_update_mapping() will not be called. This may result in poor performance after live migration. So iterate BARs in vfio_pci_load_config() and try to update sub-page BARs. Reported-by: Nianyao Tang <tangnianyao@huawei.com> Reported-by: Qixin Gan <ganqixin@huawei.com> Signed-off-by: Kunkun Jiang <jiangkunkun@huawei.com> Link: https://lore.kernel.org/r/20211027090406.761-2-jiangkunkun@huawei.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-10-15qdev: Base object creation on QDict rather than QemuOptsKevin Wolf1-2/+2
QDicts are both what QMP natively uses and what the keyval parser produces. Going through QemuOpts isn't useful for either one, so switch the main device creation function to QDicts. By sharing more code with the -object/object-add code path, we can even reduce the code size a bit. This commit doesn't remove the detour through QemuOpts from any code path yet, but it allows the following commits to do so. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211008133442.141332-15-kwolf@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Peter Krempa <pkrempa@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-09-16hw/vfio: Fix typo in commentsCai Huoqing1-3/+3
Fix typo in comments: *programatically ==> programmatically *disconecting ==> disconnecting *mulitple ==> multiple *timout ==> timeout *regsiter ==> register *forumula ==> formula Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210730012613.2198-1-caihuoqing@baidu.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-07-14vfio/pci: Add pba_offset PCI quirk for BAIDU KUNLUN AI processorCai Huoqing1-0/+8
Fix pba_offset initialization value for BAIDU KUNLUN Virtual Function device. The KUNLUN hardware returns an incorrect value for the VF PBA offset, and add a quirk to instead return a hardcoded value of 0xb400. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210713093743.942-1-caihuoqing@baidu.com [aw: comment & whitespace tuning] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-07-14vfio/pci: Change to use vfio_pci_is()Cai Huoqing1-2/+2
Make use of vfio_pci_is() helper function. Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210713014831.742-1-caihuoqing@baidu.com [aw: commit log wording] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>