aboutsummaryrefslogtreecommitdiff
path: root/hw/vfio
AgeCommit message (Collapse)AuthorFilesLines
2024-12-26vfio/igd: add Gemini Lake and Comet Lake device idsTomita Moeko1-0/+2
Both Gemini Lake and Comet Lake are gen 9 devices. Many user reports on internet shows legacy mode of igd passthrough works as qemu treats them as gen 8 devices by default before e433f208973f ("vfio/igd: return an invalid generation for unknown devices"). Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20241206122749.9893-6-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/igd: canonicalize memory size calculationsTomita Moeko1-44/+57
Add helper functions igd_gtt_memory_size() and igd_stolen_size() for calculating GTT stolen memory and Data stolen memory size in bytes, and use macros to replace the hardware-related magic numbers for better readability. Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20241206122749.9893-5-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/igd: align generation with i915 kernel driverTomita Moeko1-22/+23
Define the igd device generations according to i915 kernel driver to avoid confusion, and adjust comment placement to clearly reflect the relationship between ids and devices. The condition of how GTT stolen memory size is calculated is changed accordingly as GGMS is in multiple of 2 starting from gen 8. Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20241206122749.9893-4-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/igd: remove unsupported device idsTomita Moeko1-10/+0
Since e433f208973f ("vfio/igd: return an invalid generation for unknown devices"), the default return of igd_gen() was changed to unsupported. There is no need to filter out those unsupported devices. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Link: https://lore.kernel.org/r/20241206122749.9893-3-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26vfio/igd: fix GTT stolen memory size calculation for gen 8+Tomita Moeko1-2/+2
On gen 8 and later devices, the GTT stolen memory size when GGMS equals 0 is 0 (no preallocated memory) rather than 1MB [1]. [1] 3.1.13, 5th Generation Intel Core Processor Family Datasheet Vol. 2 https://www.intel.com/content/www/us/en/content-details/330835 Fixes: c4c45e943e51 ("vfio/pci: Intel graphics legacy mode assignment") Reported-By: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20241206122749.9893-2-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-21Merge tag 'exec-20241220' of https://github.com/philmd/qemu into stagingStefan Hajnoczi11-18/+18
Accel & Exec patch queue - Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander) - Add '-d invalid_mem' logging option (Zoltan) - Create QOM containers explicitly (Peter) - Rename sysemu/ -> system/ (Philippe) - Re-orderning of include/exec/ headers (Philippe) Move a lot of declarations from these legacy mixed bag headers: . "exec/cpu-all.h" . "exec/cpu-common.h" . "exec/cpu-defs.h" . "exec/exec-all.h" . "exec/translate-all" to these more specific ones: . "exec/page-protection.h" . "exec/translation-block.h" . "user/cpu_loop.h" . "user/guest-host.h" . "user/page-protection.h" # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t # wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt # KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K # A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8 # 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe/// # 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r # xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl # VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay # ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP # 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd # +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6 # x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo= # =cjz8 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 20 Dec 2024 11:45:20 EST # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'exec-20241220' of https://github.com/philmd/qemu: (59 commits) util/qemu-timer: fix indentation meson: Do not define CONFIG_DEVICES on user emulation system/accel-ops: Remove unnecessary 'exec/cpu-common.h' header system/numa: Remove unnecessary 'exec/cpu-common.h' header hw/xen: Remove unnecessary 'exec/cpu-common.h' header target/mips: Drop left-over comment about Jazz machine target/mips: Remove tswap() calls in semihosting uhi_fstat_cb() target/xtensa: Remove tswap() calls in semihosting simcall() helper accel/tcg: Un-inline translator_is_same_page() accel/tcg: Include missing 'exec/translation-block.h' header accel/tcg: Move tcg_cflags_has/set() to 'exec/translation-block.h' accel/tcg: Restrict curr_cflags() declaration to 'internal-common.h' qemu/coroutine: Include missing 'qemu/atomic.h' header exec/translation-block: Include missing 'qemu/atomic.h' header accel/tcg: Declare cpu_loop_exit_requested() in 'exec/cpu-common.h' exec/cpu-all: Include 'cpu.h' earlier so MMU_USER_IDX is always defined target/sparc: Move sparc_restore_state_to_opc() to cpu.c target/sparc: Uninline cpu_get_tb_cpu_state() target/loongarch: Declare loongarch_cpu_dump_state() locally user: Move various declarations out of 'exec/exec-all.h' ... Conflicts: hw/char/riscv_htif.c hw/intc/riscv_aplic.c target/s390x/cpu.c Apply sysemu header path changes to not in the pull request. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-12-20include: Rename sysemu/ -> system/Philippe Mathieu-Daudé11-18/+18
Headers in include/sysemu/ are not only related to system *emulation*, they are also used by virtualization. Rename as system/ which is clearer. Files renamed manually then mechanical change using sed tool. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Message-Id: <20241203172445.28576-1-philmd@linaro.org>
2024-12-19Constify all opaque Property pointersRichard Henderson1-2/+2
Via sed "s/ Property [*]/ const Property */". The opaque pointers passed to ObjectProperty callbacks are the last instances of non-const Property pointers in the tree. For the most part, these callbacks only use object_field_prop_ptr, which now takes a const pointer itself. This logically should have accompanied d36f165d952 which allowed const Property to be registered. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218134251.4724-25-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-19include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LISTRichard Henderson4-5/+0
Now that all of the Property arrays are counted, we can remove the terminator object from each array. Update the assertions in device_class_set_props to match. With struct Property being 88 bytes, this was a rather large form of terminator. Saves 30k from qemu-system-aarch64. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218134251.4724-21-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-15hw/vfio: Constify all PropertyRichard Henderson4-5/+5
Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-11-18Merge tag 'pull-request-2024-11-18' of https://gitlab.com/thuth/qemu into ↵Peter Maydell1-0/+1
staging * Fixes & doc updates for the new "boot order" s390x bios feature * Provide a "loadparm" property for scsi-hd & scsi-cd devices on s390x (required for the "boot order" feature) * Fix the floating-point multiply-and-add NaN rules on s390x * Raise timeout on cross-accel build jobs to 60m # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmc7ercRHHRodXRoQHJl # ZGhhdC5jb20ACgkQLtnXdP5wLbVjyg//ZuhSDCj+oBSU6vwM7Lwh3CS6GwZvGECU # h60V3tizKypiRNtTJRXHoWcx95brXmoZgI+QQhDEXe3fFLkOEKT6AIlDhrKZRUsd # rpLPr6O8TVKO+rSE7JVJAP3X1tpOOQDxnq83uWBv53b0S+Da0VwDRtI9gcugRMmh # d58P8Q1bV344fQdcrebejstpSUG7RxSA4Plj2uSQx4mSHT7cy/hN+vA34Ha7reE3 # tcN9yfQq3Rmfvt0MV5I9Umd6JXEoDlEAwjSNsWRsCzo69jBZwiMtXSH8LyLtwRTp # C919G/MIRuhvImF74dStLVCr82sNq54YR1NP6CGcmqPH76FOH8Mx3vmx9Cxj9ckA # 6NI6SvIg++bW2O1efG2apz8p5fjbDzYXSAbHnaWTcEu3gPgH4PQ5QXoyKaDymvWV # JIh5/gXEy+twEXgIBsdWQ44A9E06lL/tNfKnqGdXK4ZYF2JIrI+Lq7AKBee7tebP # +72I4PljHLSHQ3GxdkoOeJ8ahu70IBdSz2/VEIwOWK1wIf5C5WFNBerLJyDmkyx8 # xIvIm0vlRLwPcuOC711nlaMaKqTNT+8W4DIqIY6fHs2Jy0psMdgey1uHQxYEj9Kh # fg7CvalK8n3MkGAwTqAvRJIwMFe0a4Ss6c6CaemSaYa38ud/pCNnv+IT+Eqr+mjq # 6y5PZWNrZi0= # =UaDH # -----END PGP SIGNATURE----- # gpg: Signature made Mon 18 Nov 2024 17:34:47 GMT # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * tag 'pull-request-2024-11-18' of https://gitlab.com/thuth/qemu: .gitlab-ci.d: Raise timeout on cross-accel build jobs to 60m pc-bios: Update the s390 bios images with the recent fixes pc-bios/s390-ccw: Re-initialize receive queue index before each boot attempt pc-bios/s390x: Initialize machine loadparm before probing IPL devices pc-bios/s390x: Initialize cdrom type to false for each IPL device hw: Add "loadparm" property to scsi disk devices for booting on s390x hw/s390x: Restrict "loadparm" property to devices that can be used for booting docs/system/bootindex: Make it clear that s390x can also boot from virtio-net docs/system/s390x/bootdevices: Update loadparm documentation tests/tcg/s390x: Add the floating-point multiply-and-add test target/s390x: Fix the floating-point multiply-and-add NaN rules hw/usb: Use __attribute__((packed)) vs __packed Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-11-18hw/s390x: Restrict "loadparm" property to devices that can be used for bootingThomas Huth1-0/+1
Commit bb185de423 ("s390x: Add individual loadparm assignment to CCW device") added a "loadparm" property to all CCW devices. This was a little bit unfortunate, since this property is only useful for devices that can be used for booting, but certainly it is not useful for devices like virtio-gpu or virtio-tablet. Thus let's restrict the property to CCW devices that we can boot from (i.e. virtio-block, virtio-net and vfio-ccw devices). Message-ID: <20241113114741.681096-1-thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Jared Rossi <jrossi@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-11-18vfio/container: Fix container object destructionCédric Le Goater1-1/+1
When commit 96b7af4388b3 intoduced a .instance_finalize() handler, it did not take into account that the container was not necessarily inserted into the container list of the address space. Hence, if the container object is destroyed, by calling object_unref() for example, before vfio_address_space_insert() is called, QEMU may crash when removing the container from the list as done in vfio_container_instance_finalize(). This was seen with an SEV-SNP guest for which discarding of RAM fails. To resolve this issue, use the safe version of QLIST_REMOVE(). Cc: Zhenzhong Duan <zhenzhong.duan@intel.com> Cc: Eric Auger <eric.auger@redhat.com> Fixes: 96b7af4388b3 ("vfio/container: Move vfio_container_destroy() to an instance_finalize() handler") Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-11-18vfio/igd: fix calculation of graphics stolen memoryCorvin Köhne1-1/+1
When copying the calculation of the stolen memory size for Intels integrated graphics device of gen 9 and later from the Linux kernel [1], we missed subtracting 0xf0 from the graphics mode select value for values above 0xf0. This leads to QEMU reporting a very large size of the graphics stolen memory area. That's just a waste of memory. Additionally the guest firmware might be unable to allocate such a large buffer. [1] https://github.com/torvalds/linux/blob/7c626ce4bae1ac14f60076d00eafe71af30450ba/arch/x86/kernel/early-quirks.c#L455-L460 Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Fixes: 871922416683 ("vfio/igd: correctly calculate stolen memory size for gen 9 and later") Reviewed-by: Alex Williamson <alex.williamson@redhat.com> [ clg: Changed commit subject ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-11-18vfio/igd: add pci id for Coffee LakeCorvin Köhne1-0/+3
I've tested and verified that Coffee Lake devices are working properly. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-11-05vfio/migration: Add vfio_save_block_precopy_empty_hit trace eventMaciej S. Szmigiero2-0/+9
This way it is clearly known when there's no more data to send for that device. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2024-11-05vfio/migration: Add save_{iterate, complete_precopy}_start trace eventsMaciej S. Szmigiero2-0/+11
This way both the start and end points of migrating a particular VFIO device are known. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
2024-10-31migration: Drop migration_is_setup_or_active()Peter Xu1-1/+1
This helper is mostly the same as migration_is_running(), except that one has COLO reported as true, the other has CANCELLING reported as true. Per my past years experience on the state changes, none of them should matter. To make it slightly safer, report both COLO || CANCELLING to be true in migration_is_running(), then drop the other one. We kept the 1st only because the name is simpler, and clear enough. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-23vfio/helpers: Align mmapsAlex Williamson1-2/+30
Thanks to work by Peter Xu, support is introduced in Linux v6.12 to allow pfnmap insertions at PMD and PUD levels of the page table. This means that provided a properly aligned mmap, the vfio driver is able to map MMIO at significantly larger intervals than PAGE_SIZE. For example on x86_64 (the only architecture currently supporting huge pfnmaps for PUD), rather than 4KiB mappings, we can map device MMIO using 2MiB and even 1GiB page table entries. Typically mmap will already provide PMD aligned mappings, so devices with moderately sized MMIO ranges, even GPUs with standard 256MiB BARs, will already take advantage of this support. However in order to better support devices exposing multi-GiB MMIO, such as 3D accelerators or GPUs with resizable BARs enabled, we need to manually align the mmap. There doesn't seem to be a way for userspace to easily learn about PMD and PUD mapping level sizes, therefore this takes the simple approach to align the mapping to the power-of-two size of the region, up to 1GiB, which is currently the maximum alignment we care about. Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-10-23vfio/helpers: Refactor vfio_region_mmap() error handlingAlex Williamson1-17/+17
Move error handling code to the end of the function so that it can more easily be shared by new mmap failure conditions. No functional change intended. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-10-23vfio/migration: Change trace formats from hex to decimalAvihai Horon1-5/+5
Data sizes in VFIO migration trace events are printed in hex format while in migration core trace events they are printed in decimal format. This inconsistency makes it less readable when using both trace event types. Hence, change the data sizes print format to decimal in VFIO migration trace events. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-10-23vfio/migration: Report only stop-copy size in vfio_state_pending_exact()Avihai Horon1-3/+0
vfio_state_pending_exact() is used to update migration core how much device data is left for the device migration. Currently, the sum of pre-copy and stop-copy sizes of the VFIO device are reported. The pre-copy size is obtained via the VFIO_MIG_GET_PRECOPY_INFO ioctl, which returns the amount of device data available to be transferred while the device is in the PRE_COPY states. The stop-copy size is obtained via the VFIO_DEVICE_FEATURE_MIG_DATA_SIZE ioctl, which returns the total amount of device data left to be transferred in order to complete the device migration. According to the above, current implementation is wrong -- it reports extra overlapping data because pre-copy size is already contained in stop-copy size. Fix it by reporting only stop-copy size. Fixes: eda7362af959 ("vfio/migration: Add VFIO migration pre-copy support") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-09-17vfio/igd: correctly calculate stolen memory size for gen 9 and laterCorvin Köhne1-4/+11
We have to update the calculation of the stolen memory size because we've seen devices using values of 0xf0 and above for the graphics mode select field. The new calculation was taken from the linux kernel [1]. [1] https://github.com/torvalds/linux/blob/7c626ce4bae1ac14f60076d00eafe71af30450ba/arch/x86/kernel/early-quirks.c#L455-L460 Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17vfio/igd: don't set stolen memory size to zeroCorvin Köhne1-17/+18
The stolen memory is required for the GOP (EFI) driver and the Windows driver. While the GOP driver seems to work with any stolen memory size, the Windows driver will crash if the size doesn't match the size allocated by the host BIOS. For that reason, it doesn't make sense to overwrite the stolen memory size. It's true that this wastes some VM memory. In the worst case, the stolen memory can take up more than a GB. However, that's uncommon. Additionally, it's likely that a bunch of RAM is assigned to VMs making use of GPU passthrough. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17vfio/igd: add ID's for ElkhartLake and TigerLakeCorvin Köhne1-0/+6
ElkhartLake and TigerLake devices were tested in legacy mode with Linux and Windows VMs. Both are working properly. It's likely that other Intel GPUs of gen 11 and 12 like IceLake device are working too. However, we're only adding known good devices for now. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17vfio/igd: add new bar0 quirk to emulate BDSM mirrorCorvin Köhne3-0/+100
The BDSM register is mirrored into MMIO space at least for gen 11 and later devices. Unfortunately, the Windows driver reads the register value from MMIO space instead of PCI config space for those devices [1]. Therefore, we either have to keep a 1:1 mapping for the host and guest address or we have to emulate the MMIO register too. Using the igd in legacy mode is already hard due to it's many constraints. Keeping a 1:1 mapping may not work in all cases and makes it even harder to use. An MMIO emulation has to trap the whole MMIO page. This makes accesses to this page slower compared to using second level address translation. Nevertheless, it doesn't have any constraints and I haven't noticed any performance degradation yet making it a better solution. [1] https://github.com/projectacrn/acrn-hypervisor/blob/5c351bee0f6ae46250eefc07f44b4a31e770f3cf/devicemodel/hw/pci/passthrough.c#L650-L653 Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17vfio/igd: use new BDSM register location and size for gen 11 and laterCorvin Köhne1-7/+24
Intel changed the location and size of the BDSM register for gen 11 devices and later. We have to adjust our emulation for these devices to properly support them. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17vfio/igd: support legacy mode for all known generationsCorvin Köhne1-1/+1
We're soon going to add support for legacy mode to ElkhartLake and TigerLake devices. Those are gen 11 and 12 devices. At the moment, all devices identified by our igd_gen function do support legacy mode. This won't change when adding our new devices of gen 11 and 12. Therefore, it makes more sense to accept legacy mode for all known devices instead of maintaining a long list of known good generations. If we add a new generation to igd_gen which doesn't support legacy mode for some reason, it'll be easy to advance the check to reject legacy mode for this specific generation. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17vfio/igd: return an invalid generation for unknown devicesCorvin Köhne1-1/+5
Intel changes it's specification quite often e.g. the location and size of the BDSM register has change for gen 11 devices and later. This causes our emulation to fail on those devices. So, it's impossible for us to use a suitable default value for unknown devices. Instead of returning a random generation value and hoping that everthing works fine, we should verify that different devices are working and add them to our list of known devices. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-09-17hw/vfio/pci.c: Use correct type in trace_vfio_msix_early_setup()Peter Maydell1-1/+1
The tracepoint trace_vfio_msix_early_setup() uses "int" for the type of the table_bar argument, but we use this to print a uint32_t. Coverity warns that this means that we could end up treating it as a negative number. We only use this in printing the value in the tracepoint, so mishandling it as a negative number would be harmless, but it's better to use the right type in the tracepoint. Use uint64_t to match how we print the table_offset in the vfio_msix_relo() tracepoint. Resolves: Coverity CID 1547690 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-09-13hw: Use device_class_set_legacy_reset() instead of opencodingPeter Maydell3-3/+3
Use device_class_set_legacy_reset() instead of opencoding an assignment to DeviceClass::reset. This change was produced with: spatch --macro-file scripts/cocci-macro-file.h \ --sp-file scripts/coccinelle/device-reset.cocci \ --keep-comments --smpl-spacing --in-place --dir hw Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org
2024-09-10qapi/vfio: Rename VfioMigrationState to Qapi*, and drop prefixMarkus Armbruster1-1/+1
QAPI's 'prefix' feature can make the connection between enumeration type and its constants less than obvious. It's best used with restraint. VfioMigrationState has a 'prefix' that overrides the generated enumeration constants' prefix to QAPI_VFIO_MIGRATION_STATE. We could simply drop 'prefix', but then the enumeration constants would look as if they came from kernel header linux/vfio.h. Rename the type to QapiVfioMigrationState instead, so that 'prefix' is not needed. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240904111836.3273842-20-armbru@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com>
2024-09-10qapi/common: Drop temporary 'prefix'Markus Armbruster1-5/+5
Recent commit "qapi: Smarter camel_to_upper() to reduce need for 'prefix'" added a temporary 'prefix' to delay changing the generated code. Revert it. This improves OffAutoPCIBAR's generated enumeration constant prefix from OFF_AUTOPCIBAR to OFF_AUTO_PCIBAR. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20240904111836.3273842-5-armbru@redhat.com>
2024-07-24Merge tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu into ↵Richard Henderson8-35/+257
staging vfio queue: * IOMMUFD Dirty Tracking support * Fix for a possible SEGV in IOMMU type1 container * Dropped initialization of host IOMMU device with mdev devices # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmafyVUACgkQUaNDx8/7 # 7KGebRAAzEYxvstDxSPNF+1xx937TKbRpiKYtspTfEgu4Ht50MwO2ZqnVWzTBSwa # qcjhDf2avMBpBvkp4O9fR7nXR0HRN2KvYrBSThZ3Qpqu4KjxCAGcHI5uYmgfizYh # BBLrw3eWME5Ry220TinQF5KFl50vGq7Z/mku5N5Tgj2qfTfCXYK1Kc19SyAga49n # LSokTIjZAGJa4vxrE7THawaEUjFRjfCJey64JUs/TPJaGr4R1snJcWgETww6juUE # 9OSw/xl0AoQhaN/ZTRC1qCsBLUI2MVPsC+x+vqVK62HlTjCx+uDRVQ8KzfDzjCeH # gaLkMjxJSuJZMpm4UU7DBzDGEGcEBCGeNyFt37BSqqPPpX55CcFhj++d8vqTiwpF # YzmTNd/znxcZTw6OJN9sQZohh+NeS86CVZ3x31HD3dXifhRf17jbh7NoIyi+0ZCb # N+mytOH5BXsD+ddwbk+yMaxXV43Fgz7ThG5tB1tjhhNtLZHDA5ezFvGZ5F/FJrqE # xAbjOhz5MC+RcOVNSzQJCULNqFpfE6Gqeys6btEDm/ltf4LpAe6W1HYuv8BJc19T # UsqGK2yKAuQX8GErYxJ1zqZCttVrgpsmXFYTC5iGbxC84mvsF0Iti96IdXz9gfzN # Vlb2OxoefcOwVqIhbkvTZW0ZwYGGDDPAYhLMfr5lSuRqj123OOo= # =cViP # -----END PGP SIGNATURE----- # gpg: Signature made Wed 24 Jul 2024 01:16:37 AM AEST # gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1 # gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1 * tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu: vfio/common: Allow disabling device dirty page tracking vfio/migration: Don't block migration device dirty tracking is unsupported vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support vfio/iommufd: Probe and request hwpt dirty tracking capability vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device() vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps vfio/{iommufd,container}: Remove caps::aw_bits vfio/iommufd: Introduce auto domain creation vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdev vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdev vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev vfio/pci: Extract mdev check into an helper hw/vfio/container: Fix SIGSEV on vfio_container_instance_finalize() Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-23vfio/common: Allow disabling device dirty page trackingJoao Martins3-1/+9
The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole tracking of VF pre-copy phase of dirty page tracking, though it means that it will only be used at the start of the switchover phase. Add an option that disables the VF dirty page tracking, and fall back into container-based dirty page tracking. This also allows to use IOMMU dirty tracking even on VFs with their own dirty tracker scheme. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/migration: Don't block migration device dirty tracking is unsupportedJoao Martins1-5/+5
By default VFIO migration is set to auto, which will support live migration if the migration capability is set *and* also dirty page tracking is supported. For testing purposes one can force enable without dirty page tracking via enable-migration=on, but that option is generally left for testing purposes. So starting with IOMMU dirty tracking it can use to accommodate the lack of VF dirty page tracking allowing us to minimize the VF requirements for migration and thus enabling migration by default for those too. While at it change the error messages to mention IOMMU dirty tracking as well. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> [ clg: - spelling in commit log ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-07-23vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap supportJoao Martins1-0/+28
ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI that fetches the bitmap that tells what was dirty in an IOVA range. A single bitmap is allocated and used across all the hwpts sharing an IOAS which is then used in log_sync() to set Qemu global bitmaps. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking supportJoao Martins1-0/+32
ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that enables or disables dirty page tracking. The ioctl is used if the hwpt has been created with dirty tracking supported domain (stored in hwpt::flags) and it is called on the whole list of iommu domains. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Probe and request hwpt dirty tracking capabilityJoao Martins1-0/+26
In preparation to using the dirty tracking UAPI, probe whether the IOMMU supports dirty tracking. This is done via the data stored in hiod::caps::hw_caps initialized from GET_HW_INFO. Qemu doesn't know if VF dirty tracking is supported when allocating hardware pagetable in iommufd_cdev_autodomains_get(). This is because VFIODevice migration state hasn't been initialized *yet* hence it can't pick between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports dirty tracking it always creates HWPTs with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if later on VFIOMigration decides to use VF dirty tracking instead. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> [ clg: - Fixed vbasedev->iommu_dirty_tracking assignment in iommufd_cdev_autodomains_get() - Added warning for heterogeneous dirty page tracking support in iommufd_cdev_autodomains_get() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during ↵Joao Martins4-10/+32
attach_device() Move the HostIOMMUDevice::realize() to be invoked during the attach of the device before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU behind the device supports dirty tracking. Note: The HostIOMMUDevice data from legacy backend is static and doesn't need any information from the (type1-iommu) backend to be initialized. In contrast however, the IOMMUFD HostIOMMUDevice data requires the iommufd FD to be connected and having a devid to be able to successfully GET_HW_INFO. This means vfio_device_hiod_realize() is called in different places within the backend .attach_device() implementation. Suggested-by: Cédric Le Goater <clg@redhat.cm> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> [ clg: Fixed error handling in iommufd_cdev_attach() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCapsJoao Martins1-0/+1
Store the value of @caps returned by iommufd_backend_get_device_info() in a new field HostIOMMUDeviceCaps::hw_caps. Right now the only value is whether device IOMMU supports dirty tracking (IOMMU_HW_CAP_DIRTY_TRACKING). This is in preparation for HostIOMMUDevice::realize() being called early during attach_device(). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/{iommufd,container}: Remove caps::aw_bitsJoao Martins2-5/+1
Remove caps::aw_bits which requires the bcontainer::iova_ranges being initialized after device is actually attached. Instead defer that to .get_cap() and call vfio_device_get_aw_bits() directly. This is in preparation for HostIOMMUDevice::realize() being called early during attach_device(). Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Introduce auto domain creationJoao Martins1-0/+85
There's generally two modes of operation for IOMMUFD: 1) The simple user API which intends to perform relatively simple things with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches to VFIO and mainly performs IOAS_MAP and UNMAP. 2) The native IOMMUFD API where you have fine grained control of the IOMMU domain and model it accordingly. This is where most new feature are being steered to. For dirty tracking 2) is required, as it needs to ensure that the stage-2/parent IOMMU domain will only attach devices that support dirty tracking (so far it is all homogeneous in x86, likely not the case for smmuv3). Such invariant on dirty tracking provides a useful guarantee to VMMs that will refuse incompatible device attachments for IOMMU domains. Dirty tracking insurance is enforced via HWPT_ALLOC, which is responsible for creating an IOMMU domain. This is contrast to the 'simple API' where the IOMMU domain is created by IOMMUFD automatically when it attaches to VFIO (usually referred as autodomains) but it has the needed handling for mdevs. To support dirty tracking with the advanced IOMMUFD API, it needs similar logic, where IOMMU domains are created and devices attached to compatible domains. Essentially mimicking kernel iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain it falls back to IOAS attach. The auto domain logic allows different IOMMU domains to be created when DMA dirty tracking is not desired (and VF can provide it), and others where it is. Here it is not used in this way given how VFIODevice migration state is initialized after the device attachment. But such mixed mode of IOMMU dirty tracking + device dirty tracking is an improvement that can be added on. Keep the 'all of nothing' of type1 approach that we have been using so far between container vs device dirty tracking. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> [ clg: Added ERRP_GUARD() in iommufd_cdev_autodomains_get() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdevZhenzhong Duan1-0/+3
mdevs aren't "physical" devices and when asking for backing IOMMU info, it fails the entire provisioning of the guest. Fix that by setting vbasedev->mdev true so skipping HostIOMMUDevice initialization in the presence of mdevs. Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdevZhenzhong Duan1-0/+3
mdevs aren't "physical" devices and when asking for backing IOMMU info, it fails the entire provisioning of the guest. Fix that by setting vbasedev->mdev true so skipping HostIOMMUDevice initialization in the presence of mdevs. Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()Joao Martins1-4/+4
In preparation to implement auto domains have the attach function return the errno it got during domain attach instead of a bool. -EINVAL is tracked to track domain incompatibilities, and decide whether to create a new IOMMU domain. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW ↵Joao Martins1-1/+3
capabilities The helper will be able to fetch vendor agnostic IOMMU capabilities supported both by hardware and software. Right now it is only iommu dirty tracking. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdevJoao Martins2-3/+12
mdevs aren't "physical" devices and when asking for backing IOMMU info, it fails the entire provisioning of the guest. Fix that by skipping HostIOMMUDevice initialization in the presence of mdevs, and skip setting an iommu device when it is known to be an mdev. Cc: Zhenzhong Duan <zhenzhong.duan@intel.com> Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-07-23vfio/pci: Extract mdev check into an helperJoao Martins2-9/+17
In preparation to skip initialization of the HostIOMMUDevice for mdev, extract the checks that validate if a device is an mdev into helpers. A vfio_device_is_mdev() is created, and subsystems consult VFIODevice::mdev to check if it's mdev or not. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
2024-07-23hw/vfio/container: Fix SIGSEV on vfio_container_instance_finalize()Eric Auger1-1/+0
In vfio_connect_container's error path, the base container is removed twice form the VFIOAddressSpace QLIST: first on the listener_release_exit label and second, on free_container_exit label, through object_unref(container), which calls vfio_container_instance_finalize(). Let's remove the first instance. Fixes: 938026053f4 ("vfio/container: Switch to QOM") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>