aboutsummaryrefslogtreecommitdiff
path: root/hw/vfio/pci.c
AgeCommit message (Collapse)AuthorFilesLines
2020-01-24qdev: set properties with device_class_set_props()Marc-André Lureau1-2/+2
The following patch will need to handle properties registration during class_init time. Let's use a device_class_set_props() setter. spatch --macro-file scripts/cocci-macro-file.h --sp-file ./scripts/coccinelle/qdev-set-props.cocci --keep-comments --in-place --dir . @@ typedef DeviceClass; DeviceClass *d; expression val; @@ - d->props = val + device_class_set_props(d, val) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20200110153039.1379601-20-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-01-06vfio/pci: Don't remove irqchip notifier if not registeredPeter Xu1-1/+3
The kvm irqchip notifier is only registered if the device supports INTx, however it's unconditionally removed. If the assigned device does not support INTx, this will cause QEMU to crash when unplugging the device from the system. Change it to conditionally remove the notifier only if the notify hook is setup. CC: Eduardo Habkost <ehabkost@redhat.com> CC: David Gibson <david@gibson.dropbear.id.au> CC: Alex Williamson <alex.williamson@redhat.com> Cc: qemu-stable@nongnu.org # v4.2 Reported-by: yanghliu@redhat.com Debugged-by: Eduardo Habkost <ehabkost@redhat.com> Fixes: c5478fea27ac ("vfio/pci: Respond to KVM irqchip change notifier") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1782678 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-26vfio/pci: Respond to KVM irqchip change notifierDavid Gibson1-6/+19
VFIO PCI devices already respond to the pci intx routing notifier, in order to update kernel irqchip mappings when routing is updated. However this won't handle the case where the irqchip itself is replaced by a different model while retaining the same routing. This case can happen on the pseries machine type due to PAPR feature negotiation. To handle that case, add a handler for the irqchip change notifier, which does much the same thing as the routing notifier, but is unconditional, rather than being a no-op when the routing hasn't changed. Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-26vfio/pci: Split vfio_intx_update()David Gibson1-17/+22
This splits the vfio_intx_update() function into one part doing the actual reconnection with the KVM irqchip (vfio_intx_update(), now taking an argument with the new routing) and vfio_intx_routing_notifier() which handles calls to the pci device intx routing notifier and calling vfio_intx_update() when necessary. This will make adding support for the irqchip change notifier easier. Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Acked-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-18vfio: don't ignore return value of migrate_add_blockerJens Freimann1-1/+1
When an error occurs in migrate_add_blocker() it sets a negative return value and uses error pointer we pass in. Instead of just looking at the error pointer check for a negative return value and avoid a coverity error because the return value is set but never used. This fixes CID 1407219. Reported-by: Coverity (CID 1407219) Fixes: f045a0104c8c ("vfio: unplug failover primary device before migration") Signed-off-by: Jens Freimann <jfreimann@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-11-18hw/vfio/pci: Fix double free of migration_blockerMichal Privoznik1-0/+2
When user tries to hotplug a VFIO device, but the operation fails somewhere in the middle (in my testing it failed because of RLIMIT_MEMLOCK forbidding more memory allocation), then a double free occurs. In vfio_realize() the vdev->migration_blocker is allocated, then something goes wrong which causes control to jump onto 'error' label where the error is freed. But the pointer is left pointing to invalid memory. Later, when vfio_instance_finalize() is called, the memory is freed again. In my testing the second hunk was sufficient to fix the bug, but I figured the first hunk doesn't hurt either. ==169952== Invalid read of size 8 ==169952== at 0xA47DCD: error_free (error.c:266) ==169952== by 0x4E0A18: vfio_instance_finalize (pci.c:3040) ==169952== by 0x8DF74C: object_deinit (object.c:606) ==169952== by 0x8DF7BE: object_finalize (object.c:620) ==169952== by 0x8E0757: object_unref (object.c:1074) ==169952== by 0x45079C: memory_region_unref (memory.c:1779) ==169952== by 0x45376B: do_address_space_destroy (memory.c:2793) ==169952== by 0xA5C600: call_rcu_thread (rcu.c:283) ==169952== by 0xA427CB: qemu_thread_start (qemu-thread-posix.c:519) ==169952== by 0x80A8457: start_thread (in /lib64/libpthread-2.29.so) ==169952== by 0x81C96EE: clone (in /lib64/libc-2.29.so) ==169952== Address 0x143137e0 is 0 bytes inside a block of size 48 free'd ==169952== at 0x4A342BB: free (vg_replace_malloc.c:530) ==169952== by 0xA47E05: error_free (error.c:270) ==169952== by 0x4E0945: vfio_realize (pci.c:3025) ==169952== by 0x76A4FF: pci_qdev_realize (pci.c:2099) ==169952== by 0x689B9A: device_set_realized (qdev.c:876) ==169952== by 0x8E2C80: property_set_bool (object.c:2080) ==169952== by 0x8E0EF6: object_property_set (object.c:1272) ==169952== by 0x8E3FC8: object_property_set_qobject (qom-qobject.c:26) ==169952== by 0x8E11DB: object_property_set_bool (object.c:1338) ==169952== by 0x5E7BDD: qdev_device_add (qdev-monitor.c:673) ==169952== by 0x5E81E5: qmp_device_add (qdev-monitor.c:798) ==169952== by 0x9E18A8: do_qmp_dispatch (qmp-dispatch.c:132) ==169952== Block was alloc'd at ==169952== at 0x4A35476: calloc (vg_replace_malloc.c:752) ==169952== by 0x51B1158: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.6) ==169952== by 0xA47357: error_setv (error.c:61) ==169952== by 0xA475D9: error_setg_internal (error.c:97) ==169952== by 0x4DF8C2: vfio_realize (pci.c:2737) ==169952== by 0x76A4FF: pci_qdev_realize (pci.c:2099) ==169952== by 0x689B9A: device_set_realized (qdev.c:876) ==169952== by 0x8E2C80: property_set_bool (object.c:2080) ==169952== by 0x8E0EF6: object_property_set (object.c:1272) ==169952== by 0x8E3FC8: object_property_set_qobject (qom-qobject.c:26) ==169952== by 0x8E11DB: object_property_set_bool (object.c:1338) ==169952== by 0x5E7BDD: qdev_device_add (qdev-monitor.c:673) Fixes: f045a0104c8c ("vfio: unplug failover primary device before migration") Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-10-29vfio: unplug failover primary device before migrationJens Freimann1-6/+20
As usual block all vfio-pci devices from being migrated, but make an exception for failover primary devices. This is achieved by setting unmigratable to 0 but also add a migration blocker for all vfio-pci devices except failover primary devices. These will be unplugged before migration happens by the migration handler of the corresponding virtio-net standby device. Signed-off-by: Jens Freimann <jfreimann@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20191029114905.6856-12-jfreimann@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2019-10-10hw/vfio/pci: fix double free in vfio_msi_disableEvgeny Yakovlev1-0/+1
The following guest behaviour patter leads to double free in VFIO PCI: 1. Guest enables MSI interrupts vfio_msi_enable is called, but fails in vfio_enable_vectors. In our case this was because VFIO GPU device was in D3 state. Unhappy path in vfio_msi_enable will g_free(vdev->msi_vectors) but not set this pointer to NULL 2. Guest still sees MSI an enabled after that because emulated config write is done in vfio_pci_write_config unconditionally before calling vfio_msi_enable 3. Guest disables MSI interrupts vfio_msi_disable is called and tries to g_free(vdev->msi_vectors) in vfio_msi_disable_common => double free Signed-off-by: Evgeny Yakovlev <wrfsh@yandex-team.ru> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-09-19vfio: fix a typoChen Zhang1-2/+2
Signed-off-by: Chen Zhang <tgfbeta@me.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <8E5A9C27-C76D-46CF-85B0-79121A00B05F@me.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-08-16sysemu: Split sysemu/runstate.h off sysemu/sysemu.hMarkus Armbruster1-0/+1
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]
2019-08-16Include hw/qdev-properties.h lessMarkus Armbruster1-0/+1
In my "build everything" tree, changing hw/qdev-properties.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). Many places including hw/qdev-properties.h (directly or via hw/qdev.h) actually need only hw/qdev-core.h. Include hw/qdev-core.h there instead. hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h and hw/qdev-properties.h, which in turn includes hw/qdev-core.h. Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h. While there, delete a few superfluous inclusions of hw/qdev-core.h. Touching hw/qdev-properties.h now recompiles some 1200 objects. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Daniel P. Berrangé" <berrange@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-22-armbru@redhat.com>
2019-08-16Include qemu/main-loop.h lessMarkus Armbruster1-0/+1
In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16Include hw/hw.h exactly where neededMarkus Armbruster1-0/+1
In my "build everything" tree, changing hw/hw.h triggers a recompile of some 2600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). The previous commits have left only the declaration of hw_error() in hw/hw.h. This permits dropping most of its inclusions. Touching it now recompiles less than 200 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-19-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16Include migration/vmstate.h lessMarkus Armbruster1-0/+1
In my "build everything" tree, changing migration/vmstate.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). hw/hw.h supposedly includes it for convenience. Several other headers include it just to get VMStateDescription. The previous commit made that unnecessary. Include migration/vmstate.h only where it's still needed. Touching it now recompiles only some 1600 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-16-armbru@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-07-02vfio/pci: Trace vfio_set_irq_signaling() failure in vfio_msix_vector_release()Eric Auger1-2/+5
Report an error in case we fail to set a trigger action on any VFIO_PCI_MSIX_IRQ_INDEX subindex. This might be useful in debugging a device that is not working properly. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reported-by: Coverity (CID 1402196) Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-13vfio/common: Introduce vfio_set_irq_signaling helperEric Auger1-166/+51
The code used to assign an interrupt index/subindex to an eventfd is duplicated many times. Let's introduce an helper that allows to set/unset the signaling for an ACTION_TRIGGER, ACTION_MASK or ACTION_UNMASK action. In the error message, we now use errno in case of any VFIO_DEVICE_SET_IRQS ioctl failure. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-13vfio/pci: Allow MSI-X relocation to fixup bogus PBAAlex Williamson1-1/+1
The MSI-X relocation code can sometimes be used to work around bogus MSI-X capabilities, but this test for whether the PBA is outside of the specified BAR causes the device to error before we can apply a relocation. Let it proceed if we intend to relocate MSI-X anyway. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-13vfio/pci: Hide Resizable BAR capabilityAlex Williamson1-0/+1
The resizable BAR capability is currently exposed read-only from the kernel and we don't yet implement a protocol for virtualizing it to the VM. Exposing it to the guest read-only introduces poor behavior as the guest has no reason to test that a control register write is accepted by the hardware. This can lead to cases where the guest OS assumes the BAR has been resized, but it hasn't. This has been observed when assigning AMD Vega GPUs. Note, this does not preclude future enablement of resizable BARs, but it's currently incorrect to expose this capability as read-only, so better to not expose it at all. Reported-by: James Courtier-Dutton <james.dutton@gmail.com> Tested-by: James Courtier-Dutton <james.dutton@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-06-12Include qemu/module.h where needed, drop it from qemu-common.hMarkus Armbruster1-0/+1
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]
2019-06-06hw/vfio/pci: Use the QOM DEVICE() macro to access DeviceState.qdevPhilippe Mathieu-Daudé1-2/+2
Rather than looking inside the definition of a DeviceState with "s->qdev", use the QOM prefered style: "DEVICE(s)". This patch was generated using the following Coccinelle script: // Use DEVICE() macros to access DeviceState.qdev @use_device_macro_to_access_qdev@ expression obj; identifier dev; @@ -&obj->dev.qdev +DEVICE(obj) Suggested-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190528164020.32250-10-philmd@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-05-22pci: msix: move 'MSIX_CAP_LENGTH' to header fileLi Qiang1-2/+0
'MSIX_CAP_LENGTH' is defined in two .c file. Move it to hw/pci/msix.h file to reduce duplicated code. CC: qemu-trivial@nongnu.org Signed-off-by: Li Qiang <liq3ea@163.com> Message-Id: <20190521151543.92274-5-liq3ea@163.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-05-22vfio: pci: make "vfio-pci-nohotplug" as MACROLi Qiang1-2/+4
The QOMConventions recommends we should use TYPE_FOO for a TypeInfo's name. Though "vfio-pci-nohotplug" is not used in other parts, for consistency we should make this change. CC: qemu-trivial@nongnu.org Signed-off-by: Li Qiang <liq3ea@163.com> Message-Id: <20190521151543.92274-2-liq3ea@163.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-04-26spapr: Support NVIDIA V100 GPU with NVLink2Alexey Kardashevskiy1-0/+14
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver implements special regions for such GPUs and emulates an NVLink bridge. NVLink2-enabled POWER9 CPUs also provide address translation services which includes an ATS shootdown (ATSD) register exported via the NVLink bridge device. This adds a quirk to VFIO to map the GPU memory and create an MR; the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses this to get the MR and map it to the system address space. Another quirk does the same for ATSD. This adds additional steps to sPAPR PHB setup: 1. Search for specific GPUs and NPUs, collect findings in sPAPRPHBState::nvgpus, manage system address space mappings; 2. Add device-specific properties such as "ibm,npu", "ibm,gpu", "memory-block", "link-speed" to advertise the NVLink2 function to the guest; 3. Add "mmio-atsd" to vPHB to advertise the ATSD capability; 4. Add new memory blocks (with extra "linux,memory-usable" to prevent the guest OS from accessing the new memory until it is onlined) and npuphb# nodes representing an NPU unit for every vPHB as the GPU driver uses it for link discovery. This allocates space for GPU RAM and ATSD like we do for MMIOs by adding 2 new parameters to the phb_placement() hook. Older machine types set these to zero. This puts new memory nodes in a separate NUMA node to as the GPU RAM needs to be configured equally distant from any other node in the system. Unlike the host setup which assigns numa ids from 255 downwards, this adds new NUMA nodes after the user configures nodes or from 1 if none were configured. This adds requirement similar to EEH - one IOMMU group per vPHB. The reason for this is that ATSD registers belong to a physical NPU so they cannot invalidate translations on GPUs attached to another NPU. It is guaranteed by the host platform as it does not mix NVLink bridges or GPUs from different NPU in the same IOMMU group. If more than one IOMMU group is detected on a vPHB, this disables ATSD support for that vPHB and prints a warning. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for vfio portions] Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190312082103.130561-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-04-18vfio: Report warnings with warn_report(), not error_printf()Markus Armbruster1-6/+13
Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190417190641.26814-8-armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com>
2019-03-11vfio/display: add xres + yres propertiesGerd Hoffmann1-0/+12
This allows configure the display resolution which the vgpu should use. The information will be passed to the guest using EDID, so the mdev driver must support the vfio edid region for this to work. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-01-24trace: forbid use of %m in trace event format stringsDaniel P. Berrangé1-1/+1
The '%m' format instructs glibc's printf()/syslog() implementation to insert the contents of strerror(errno). Since this is a glibc extension it should generally be avoided in QEMU due to need for portability to a variety of platforms. Even though vfio is Linux-only code that could otherwise use "%m", it must still be avoided in trace-events files because several of the backends do not use the format string and so this error information is invisible to them. The errno string value should be given as an explicit trace argument instead, making it accessible to all backends. This also allows it to work correctly with future patches that use the format string with systemtap's simple printf code. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-id: 20190123120016.4538-4-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-12-19vfio/pci: Remove PCIe Link Status emulationAlex Williamson1-6/+0
Now that the downstream port will virtually negotiate itself to the link status of the downstream device, we can remove this emulation. It's not clear that it was every terribly useful anyway. Tested-by: Geoffrey McRae <geoff@hostfission.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-12-19pcie: Create enums for link speed and widthAlex Williamson1-1/+2
In preparation for reporting higher virtual link speeds and widths, create enums and macros to help us manage them. Cc: Marcel Apfelbaum <marcel.apfelbaum@gmail.com> Tested-by: Geoffrey McRae <geoff@hostfission.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-10-19vfio: Clean up error reporting after previous commitMarkus Armbruster1-4/+4
The previous commit changed vfio's warning messages from vfio warning: DEV-NAME: Could not frobnicate to warning: vfio DEV-NAME: Could not frobnicate To match this change, change error messages from vfio error: DEV-NAME: On fire to vfio DEV-NAME: On fire Note the loss of "error". If we think marking error messages that way is a good idea, we should mark *all* error messages, i.e. make error_report() print it. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20181017082702.5581-7-armbru@redhat.com>
2018-10-19vfio: Use warn_report() & friends to report warningsMarkus Armbruster1-7/+7
The vfio code reports warnings like error_report(WARN_PREFIX "Could not frobnicate", DEV-NAME); where WARN_PREFIX is defined so the message comes out as vfio warning: DEV-NAME: Could not frobnicate This usage predates the introduction of warn_report() & friends in commit 97f40301f1d. It's time to convert to that interface. Since these functions already prefix the message with "warning: ", replace WARN_PREFIX by VFIO_MSG_PREFIX, so the messages come out like warning: vfio DEV-NAME: Could not frobnicate The next commit will replace ERR_PREFIX. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-6-armbru@redhat.com>
2018-10-19error: Fix use of error_prepend() with &error_fatal, &error_abortMarkus Armbruster1-2/+1
From include/qapi/error.h: * Pass an existing error to the caller with the message modified: * error_propagate(errp, err); * error_prepend(errp, "Could not frobnicate '%s': ", name); Fei Li pointed out that doing error_propagate() first doesn't work well when @errp is &error_fatal or &error_abort: the error_prepend() is never reached. Since I doubt fixing the documentation will stop people from getting it wrong, introduce error_propagate_prepend(), in the hope that it lures people away from using its constituents in the wrong order. Update the instructions in error.h accordingly. Convert existing error_prepend() next to error_propagate to error_propagate_prepend(). If any of these get reached with &error_fatal or &error_abort, the error messages improve. I didn't check whether that's the case anywhere. Cc: Fei Li <fli@suse.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-2-armbru@redhat.com>
2018-10-15vfio-pci: make vfio-pci device more QOM conventionalLi Qiang1-14/+15
Define a TYPE_VFIO_PCI and drop DO_UPCAST. Signed-off-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-10-15hw/vfio/display: add ramfb supportGerd Hoffmann1-0/+25
So we have a boot display when using a vgpu as primary display. ramfb depends on a fw_cfg file. fw_cfg files can not be added and removed at runtime, therefore a ramfb-enabled vfio device can't be hotplugged. Add a nohotplug variant of the vfio-pci device (as child class). Add the ramfb property to the nohotplug variant only. So to enable the vgpu display with boot support use this: -device vfio-pci-nohotplug,display=on,ramfb=on,sysfsdev=... Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-23vfio/pci: Handle subsystem realpath() returning NULLAlex Williamson1-1/+1
Fix error reported by Coverity where realpath can return NULL, resulting in a segfault in strcmp(). This should never happen given that we're working through regularly structured sysfs paths, but trivial enough to easily avoid. Fixes: 238e91728503 ("vfio/ccw/pci: Allow devices to opt-in for ballooning") Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-08-17vfio/ccw/pci: Allow devices to opt-in for ballooningAlex Williamson1-1/+25
If a vfio assigned device makes use of a physical IOMMU, then memory ballooning is necessarily inhibited due to the page pinning, lack of page level granularity at the IOMMU, and sufficient notifiers to both remove the page on balloon inflation and add it back on deflation. However, not all devices are backed by a physical IOMMU. In the case of mediated devices, if a vendor driver is well synchronized with the guest driver, such that only pages actively used by the guest driver are pinned by the host mdev vendor driver, then there should be no overlap between pages available for the balloon driver and pages actively in use by the device. Under these conditions, ballooning should be safe. vfio-ccw devices are always mediated devices and always operate under the constraints above. Therefore we can consider all vfio-ccw devices as balloon compatible. The situation is far from straightforward with vfio-pci. These devices can be physical devices with physical IOMMU backing or mediated devices where it is unknown whether a physical IOMMU is in use or whether the vendor driver is well synchronized to the working set of the guest driver. The safest approach is therefore to assume all vfio-pci devices are incompatible with ballooning, but allow user opt-in should they have further insight into mediated devices. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-07-11vfio/pci: do not set the PCIDevice 'has_rom' attributeCédric Le Goater1-1/+0
PCI devices needing a ROM allocate an optional MemoryRegion with pci_add_option_rom(). pci_del_option_rom() does the cleanup when the device is destroyed. The only action taken by this routine is to call vmstate_unregister_ram() which clears the id string of the optional ROM RAMBlock and now, also flags the RAMBlock as non-migratable. This was recently added by commit b895de502717 ("migration: discard non-migratable RAMBlocks"), . VFIO devices do their own loading of the PCI option ROM in vfio_pci_size_rom(). The memory region is switched to an I/O region and the PCI attribute 'has_rom' is set but the RAMBlock of the ROM region is not allocated. When the associated PCI device is deleted, pci_del_option_rom() calls vmstate_unregister_ram() which tries to flag a NULL RAMBlock, leading to a SEGV. It seems that 'has_rom' was set to have memory_region_destroy() called, but since commit 469b046ead06 ("memory: remove memory_region_destroy") this is not necessary anymore as the MemoryRegion is freed automagically. Remove the PCIDevice 'has_rom' attribute setting in vfio. Fixes: b895de502717 ("migration: discard non-migratable RAMBlocks") Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-07-02hw/vfio: Use the IEC binary prefix definitionsPhilippe Mathieu-Daudé1-1/+2
It eases code review, unit is explicit. Patch generated using: $ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/ and modified manually. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180625124238.25339-38-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-05vfio/pci: Default display option to "off"Alex Williamson1-1/+1
Commit a9994687cb9b ("vfio/display: core & wireup") added display support to vfio-pci with the default being "auto", which breaks existing VMs when the vGPU requires GL support but had no previous requirement for a GL compatible configuration. "Off" is the safer default as we impose no new requirements to VM configurations. Fixes: a9994687cb9b ("vfio/display: core & wireup") Cc: qemu-stable@nongnu.org Cc: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: Enable ioeventfd quirks to be handled by vfio directlyAlex Williamson1-0/+2
With vfio ioeventfd support, we can program vfio-pci to perform a specified BAR write when an eventfd is triggered. This allows the KVM ioeventfd to be wired directly to vfio-pci, entirely avoiding userspace handling for these events. On the same micro-benchmark where the ioeventfd got us to almost 90% of performance versus disabling the GeForce quirks, this gets us to within 95%. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: ioeventfd quirk accelerationAlex Williamson1-0/+2
The NVIDIA BAR0 quirks virtualize the PCI config space mirrors found in device MMIO space. Normally PCI config space is considered a slow path and further optimization is unnecessary, however NVIDIA uses a register here to enable the MSI interrupt to re-trigger. Exiting to QEMU for this MSI-ACK handling can therefore rate limit our interrupt handling. Fortunately the MSI-ACK write is easily detected since the quirk MemoryRegion otherwise has very few accesses, so simply looking for consecutive writes with the same data is sufficient, in this case 10 consecutive writes with the same data and size is arbitrarily chosen. We configure the KVM ioeventfd with data match, so there's no risk of triggering for the wrong data or size, but we do risk that pathological driver behavior might consume all of QEMU's file descriptors, so we cap ourselves to 10 ioeventfds for this purpose. In support of the above, generic ioeventfd infrastructure is added for vfio quirks. This automatically initializes an ioeventfd list per quirk, disables and frees ioeventfds on exit, and allows ioeventfds marked as dynamic to be dropped on device reset. The rationale for this latter feature is that useful ioeventfds may depend on specific driver behavior and since we necessarily place a cap on our use of ioeventfds, a machine reset is a reasonable point at which to assume a new driver and re-profile. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: Add quirk reset callbackAlex Williamson1-0/+2
Quirks can be self modifying, provide a hook to allow them to cleanup on device reset if desired. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-04-27ui: introduce vfio_display_resetTina Zhang1-0/+4
During guest OS reboot, guest framebuffer is invalid. It will cause bugs, if the invalid guest framebuffer is still used by host. This patch is to introduce vfio_display_reset which is invoked during vfio display reset. This vfio_display_reset function is used to release the invalid display resource, disable scanout mode and replace the invalid surface with QemuConsole's DisplaySurafce. This patch can fix the GPU hang issue caused by gd_egl_draw during guest OS reboot. Changes v3->v4: - Move dma-buf based display check into the vfio_display_reset(). (Gerd) Changes v2->v3: - Limit vfio_display_reset to dma-buf based vfio display. (Gerd) Changes v1->v2: - Use dpy_gfx_update_full() update screen after reset. (Gerd) - Remove dpy_gfx_switch_surface(). (Gerd) Signed-off-by: Tina Zhang <tina.zhang@intel.com> Message-id: 1524820266-27079-3-git-send-email-tina.zhang@intel.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
2018-03-13ppc/spapr, vfio: Turn off MSIX emulation for VFIO devicesAlexey Kardashevskiy1-0/+13
This adds a possibility for the platform to tell VFIO not to emulate MSIX so MMIO memory regions do not get split into chunks in flatview and the entire page can be registered as a KVM memory slot and make direct MMIO access possible for the guest. This enables the entire MSIX BAR mapping to the guest for the pseries platform in order to achieve the maximum MMIO preformance for certain devices. Tested on: LSI Logic / Symbios Logic SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-13vfio-pci: Allow mmap of MSIX BARAlexey Kardashevskiy1-0/+9
At the moment we unconditionally avoid mapping MSIX data of a BAR and emulate MSIX table in QEMU. However it is 1) not always necessary as a platform may provide a paravirt interface for MSIX configuration; 2) can affect the speed of MMIO access by emulating them in QEMU when frequently accessed registers share same system page with MSIX data, this is particularly a problem for systems with the page size bigger than 4KB. A new capability - VFIO_REGION_INFO_CAP_MSIX_MAPPABLE - has been added to the kernel [1] which tells the userspace that mapping of the MSIX data is possible now. This makes use of it so from now on QEMU tries mapping the entire BAR as a whole and emulate MSIX on top of that. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a32295c612c57990d17fb0f41e7134394b2f35f6 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-13vfio/display: core & wireupGerd Hoffmann1-0/+10
Infrastructure for display support. Must be enabled using 'display' property. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed By: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-03-06use g_path_get_basename instead of basenameJulia Suvorova1-1/+1
basename(3) and dirname(3) modify their argument and may return pointers to statically allocated memory which may be overwritten by subsequent calls. g_path_get_basename and g_path_get_dirname have no such issues, and therefore more preferable. Signed-off-by: Julia Suvorova <jusual@mail.ru> Message-Id: <1519888086-4207-1-git-send-email-jusual@mail.ru> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-13Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell1-1/+4
virtio,vhost,pci,pc: features, fixes and cleanups - new stats in virtio balloon - virtio eventfd rework for boot speedup - vhost memory rework for boot speedup - fixes and cleanups all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 13 Feb 2018 16:29:55 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (22 commits) virtio-balloon: include statistics of disk/file caches acpi-test: update FADT lpc: drop pcie host dependency tests: acpi: fix FADT not being compared to reference table hw/pci-bridge: fix pcie root port's IO hints capability libvhost-user: Support across-memory-boundary access libvhost-user: Fix resource leak virtio-balloon: unref the memory region before continuing pci: removed the is_express field since a uniform interface was inserted virtio-blk: enable multiple vectors when using multiple I/O queues pci/bus: let it has higher migration priority pci-bridge/i82801b11: clear bridge registers on platform reset vhost: Move log_dirty check vhost: Merge and delete unused callbacks vhost: Clean out old vhost_set_memory and friends vhost: Regenerate region list from changed sections list vhost: Merge sections added to temporary list vhost: Simplify ring verification checks vhost: Build temporary section list and deref after commit virtio: improve virtio devices initialization time ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-02-09Move include qemu/option.h from qemu-common.h to actual usersMarkus Armbruster1-0/+1
qemu-common.h includes qemu/option.h, but most places that include the former don't actually need the latter. Drop the include, and add it to the places that actually need it. While there, drop superfluous includes of both headers, and separate #include from file comment with a blank line. This cleanup makes the number of objects depending on qemu/option.h drop from 4545 (out of 4743) to 284 in my "build everything" tree. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-20-armbru@redhat.com> [Semantic conflict with commit bdd6a90a9e in block/nvme.c resolved]
2018-02-08pci: removed the is_express field since a uniform interface was insertedYoni Bettan1-1/+4
according to Eduardo Habkost's commit fd3b02c889 all PCIEs now implement INTERFACE_PCIE_DEVICE so we don't need is_express field anymore. Devices that implements only INTERFACE_PCIE_DEVICE (is_express == 1) or devices that implements only INTERFACE_CONVENTIONAL_PCI_DEVICE (is_express == 0) where not affected by the change. The only devices that were affected are those that are hybrid and also had (is_express == 1) - therefor only: - hw/vfio/pci.c - hw/usb/hcd-xhci.c - hw/xen/xen_pt.c For those 3 I made sure that QEMU_PCI_CAP_EXPRESS is on in instance_init() Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Yoni Bettan <ybettan@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-02-06vfio/pci: Add option to disable GeForce quirksAlex Williamson1-0/+2
These quirks are necessary for GeForce, but not for Quadro/GRID/Tesla assignment. Leaving them enabled is fully functional and provides the most compatibility, but due to the unique NVIDIA MSI ACK behavior[1], it also introduces latency in re-triggering the MSI interrupt. This overhead is typically negligible, but has been shown to adversely affect some (very) high interrupt rate applications. This adds the vfio-pci device option "x-no-geforce-quirks=" which can be set to "on" to disable this additional overhead. A follow-on optimization for GeForce might be to make use of an ioeventfd to allow KVM to trigger an irqfd in the kernel vfio-pci driver, avoiding the bounce through userspace to handle this device write. [1] Background: the NVIDIA driver has been observed to issue a write to the MMIO mirror of PCI config space in BAR0 in order to allow the MSI interrupt for the device to retrigger. Older reports indicated a write of 0xff to the (read-only) MSI capability ID register, while more recently a write of 0x0 is observed at config space offset 0x704, non-architected, extended config space of the device (BAR0 offset 0x88704). Virtualization of this range is only required for GeForce. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>