aboutsummaryrefslogtreecommitdiff
path: root/hw/vfio/pci-quirks.c
AgeCommit message (Collapse)AuthorFilesLines
2025-03-11vfio/pci-quirks: Exclude non-ioport BAR from ATI quirkVasilis Liaskovitis1-1/+1
The ATI BAR4 quirk is targeting an ioport BAR. Older devices may have a BAR4 which is not an ioport, causing a segfault here. Test the BAR type to skip these devices. Similar to "8f419c5b: vfio/pci-quirks: Exclude non-ioport BAR from NVIDIA quirk" Untested, as I don't have the card to test. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2856 Signed-off-by: Vasilis Liaskovitis <vliaskovitis@suse.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250310235833.41026-1-vliaskovitis@suse.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11vfio/igd: Refactor vfio_probe_igd_bar4_quirk into pci config quirkTomita Moeko1-1/+5
The actual IO BAR4 write quirk in vfio_probe_igd_bar4_quirk was removed in previous change, leaving the function not matching its name, so move it into the newly introduced vfio_config_quirk_setup. There is no functional change in this commit. For now, to align with current legacy mode behavior, it returns and proceeds on error. Later it will fail on error after decoupling the quirks from legacy mode. Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-7-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11vfio/pci: Add placeholder for device-specific config space quirksTomita Moeko1-0/+5
IGD devices require device-specific quirk to be applied to their PCI config space. Currently, it is put in the BAR4 quirk that does nothing to BAR4 itself. Add a placeholder for PCI config space quirks to hold that quirk later. Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-6-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-11vfio/igd: Consolidate OpRegion initialization into a single functionTomita Moeko1-50/+0
Both x-igd-opregion option and legacy mode require identical steps to set up OpRegion for IGD devices. Consolidate these steps into a single vfio_pci_igd_setup_opregion function. The function call in pci.c is wrapped with ifdef temporarily to prevent build error for non-x86 archs, it will be removed after we decouple it from legacy mode. Additionally, move vfio_pci_igd_opregion_init to igd.c to prevent it from being compiled in non-x86 builds. Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Corvin Köhne <c.koehne@beckhoff.com> Link: https://lore.kernel.org/qemu-devel/20250306180131.32970-4-tomitamoeko@gmail.com [ clg: Fixed spelling in vfio_pci_igd_setup_opregion() ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-03-06qdev: Change values of PropertyInfo member @type to be QAPI typesMarkus Armbruster1-1/+1
PropertyInfo member @type is externally visible via QMP device-list-properties and qom-list-properies. Its meaning is not documented at its definition. It gets passed as @type argument to object_property_add() and object_class_property_add(). This argument's documentation isn't of much help, either: * @type: the type name of the property. This namespace is pretty loosely * defined. Sub namespaces are constructed by using a prefix and then * to angle brackets. For instance, the type 'virtio-net-pci' in the * 'link' namespace would be 'link<virtio-net-pci>'. The two QMP commands document it as # @type: the type of the property. This will typically come in one of # four forms: # # 1) A primitive type such as 'u8', 'u16', 'bool', 'str', or # 'double'. These types are mapped to the appropriate JSON # type. # # 2) A child type in the form 'child<subtype>' where subtype is a # qdev device type name. Child properties create the # composition tree. # # 3) A link type in the form 'link<subtype>' where subtype is a # qdev device type name. Link properties form the device model # graph. "Typically come in one of four forms" followed by three items inspires the level of trust that is appropriate here. Clean up a bunch of funnies: * qdev_prop_fdc_drive_type.type is "FdcDriveType". Its .enum_table refers to QAPI type "FloppyDriveType". So use that. * qdev_prop_reserved_region is "reserved_region". Its only user is an array property called "reserved-regions". Its .set() visits str. So change @type to "str". * trng_prop_fault_event_set.type is "uint32:bits". Its .set() visits uint32, so change @type to "uint32". If we believe mentioning it's actually bits is useful, the proper place would be .description. * ccw_loadparm.type is "ccw_loadparm". It's users are properties called "loadparm". Its .set() visits str. So change @type to "str". * qdev_prop_nv_gpudirect_clique.type is "uint4". Its set() visits uint8, so change @type to "uint8". If we believe mentioning the range is useful, the proper place would be .description. * s390_pci_fid_propinfo.type is "zpci_fid". Its .set() visits uint32. So change type to that, and move the "zpci_fid" to .description. This is admittedly a lousy description, but it's still an improvement; for instance, output of -device zpci,help changes from fid=<zpci_fid> to fid=<uint32> - zpci_fid * Similarly for a raft of PropertyInfo in target/riscv/cpu.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20250227085601.4140852-5-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [Commit message typo fixed]
2025-03-06qdev: Rename PropertyInfo member @name to @typeMarkus Armbruster1-1/+1
PropertyInfo member @name becomes ObjectProperty member @type, while Property member @name becomes ObjectProperty member @name. Rename the former. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20250227085601.4140852-4-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> [One missed instance of @type fixed]
2025-02-11vfio/pci: introduce config_offset field in VFIOConfigMirrorQuirkTomita Moeko1-0/+2
Device may only expose a specific portion of PCI config space through a region in a BAR, such behavior is seen in igd GGC and BDSM mirrors in BAR0. To handle these, config_offset is introduced to allow mirroring arbitrary region in PCI config space. Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20250104154219.7209-3-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2025-02-11vfio/pci: declare generic quirks in a new header fileTomita Moeko1-51/+4
Declare generic vfio_generic_{window_address,window_data,mirror}_quirk in newly created pci_quirks.h so that they can be used elsewhere, like igd.c. Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/r/20250104154219.7209-2-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-19Constify all opaque Property pointersRichard Henderson1-2/+2
Via sed "s/ Property [*]/ const Property */". The opaque pointers passed to ObjectProperty callbacks are the last instances of non-const Property pointers in the tree. For the most part, these callbacks only use object_field_prop_ptr, which now takes a const pointer itself. This logically should have accompanied d36f165d952 which allowed const Property to be registered. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218134251.4724-25-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-09-17vfio/igd: add new bar0 quirk to emulate BDSM mirrorCorvin Köhne1-0/+1
The BDSM register is mirrored into MMIO space at least for gen 11 and later devices. Unfortunately, the Windows driver reads the register value from MMIO space instead of PCI config space for those devices [1]. Therefore, we either have to keep a 1:1 mapping for the host and guest address or we have to emulate the MMIO register too. Using the igd in legacy mode is already hard due to it's many constraints. Keeping a 1:1 mapping may not work in all cases and makes it even harder to use. An MMIO emulation has to trap the whole MMIO page. This makes accesses to this page slower compared to using second level address translation. Nevertheless, it doesn't have any constraints and I haven't noticed any performance degradation yet making it a better solution. [1] https://github.com/projectacrn/acrn-hypervisor/blob/5c351bee0f6ae46250eefc07f44b4a31e770f3cf/devicemodel/hw/pci/passthrough.c#L650-L653 Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
2024-05-22vfio/pci-quirks: Make vfio_add_*_cap() return boolZhenzhong Duan1-23/+19
This is to follow the coding standand in qapi/error.h to return bool for bool-valued functions. Include below functions: vfio_add_virt_caps() vfio_add_nv_gpudirect_cap() vfio_add_vmd_shadow_cap() Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-05-22vfio/pci-quirks: Make vfio_pci_igd_opregion_init() return boolZhenzhong Duan1-4/+4
This is to follow the coding standand in qapi/error.h to return bool for bool-valued functions. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-03-12hw/vfio/pci-quirks: Fix missing ERRP_GUARD() for error_prepend()Zhao Liu1-0/+2
As the comment in qapi/error, passing @errp to error_prepend() requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: ... * - It should not be passed to error_prepend(), error_vprepend() or * error_append_hint(), because that doesn't work with &error_fatal. * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. ERRP_GUARD() could avoid the case when @errp is &error_fatal, the user can't see this additional information, because exit() happens in error_setg earlier than information is added [1]. In hw/vfio/pci-quirks.c, there are 2 functions passing @errp to error_prepend() without ERRP_GUARD(): - vfio_add_nv_gpudirect_cap() - vfio_add_vmd_shadow_cap() There are too many possible callers to check the impact of this defect; it may or may not be harmless. Thus it is necessary to protect their @errp with ERRP_GUARD(). To avoid the issue like [1] said, add missing ERRP_GUARD() at the beginning of this function. [1]: Issue description in the commit message of commit ae7c80a7bd73 ("error: New macro ERRP_GUARD()"). Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20240311033822.3142585-23-zhao1.liu@linux.intel.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-09-18spapr: Remove support for NVIDIA V100 GPU with NVLink2Cédric Le Goater1-115/+0
NVLink2 support was removed from the PPC PowerNV platform and VFIO in Linux 5.13 with commits : 562d1e207d32 ("powerpc/powernv: remove the nvlink support") b392a1989170 ("vfio/pci: remove vfio_pci_nvlink2") This was 2.5 years ago. Do the same in QEMU with a revert of commit ec132efaa81f ("spapr: Support NVIDIA V100 GPU with NVLink2"). Some adjustements are required on the NUMA part. Cc: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20230918091717.149950-1-clg@kaod.org> Signed-off-by: Daniel Henrique Barboza <danielhb413@gmail.com>
2023-07-10hw/vfio/pci-quirks: Sanitize capability pointerAlex Williamson1-2/+8
Coverity reports a tained scalar when traversing the capabilities chain (CID 1516589). In practice I've never seen a device with a chain so broken as to cause an issue, but it's also pretty easy to sanitize. Fixes: f6b30c1984f7 ("hw/vfio/pci-quirks: Support alternate offset for GPUDirect Cliques") Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2023-06-30hw/vfio/pci-quirks: Support alternate offset for GPUDirect CliquesAlex Williamson1-1/+40
NVIDIA Turing and newer GPUs implement the MSI-X capability at the offset previously reserved for use by hypervisors to implement the GPUDirect Cliques capability. A revised specification provides an alternate location. Add a config space walk to the quirk to check for conflicts, allowing us to fall back to the new location or generate an error at the quirk setup rather than when the real conflicting capability is added should there be no available location. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
2022-05-24hw/vfio/pci-quirks: Resolve redundant property gettersBernhard Beschow1-25/+9
The QOM API already provides getters for uint64 and uint32 values, so reuse them. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220301225220.239065-2-shentey@gmail.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2021-09-16hw/vfio: Fix typo in commentsCai Huoqing1-1/+1
Fix typo in comments: *programatically ==> programmatically *disconecting ==> disconnecting *mulitple ==> multiple *timout ==> timeout *regsiter ==> register *forumula ==> formula Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210730012613.2198-1-caihuoqing@baidu.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-05-02hw: Remove superfluous includes of hw/hw.hThomas Huth1-1/+0
The include/hw/hw.h header only has a prototype for hw_error(), so it does not make sense to include this in files that do not use this function. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210326151848.2217216-1-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-03-16hw/vfio/pci-quirks: Replace the word 'blacklist'Philippe Mathieu-Daudé1-7/+7
Follow the inclusive terminology from the "Conscious Language in your Open Source Projects" guidelines [*] and replace the word "blacklist" appropriately. [*] https://github.com/conscious-lang/conscious-lang-docs/blob/main/faq.md Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210205171817.2108907-9-philmd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2021-02-08vfio: add quirk device write methodPrasad J Pandit1-0/+8
Add vfio quirk device mmio write method to avoid NULL pointer dereference issue. Reported-by: Lei Sun <slei.casper@gmail.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Prasad J Pandit <pjp@fedoraproject.org> Message-Id: <20200811114133.672647-4-ppandit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-18qdev: Rename qdev_get_prop_ptr() to object_field_prop_ptr()Eduardo Habkost1-2/+2
The function will be moved to common QOM code, as it is not specific to TYPE_DEVICE anymore. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Paul Durrant <paul@xen.org> Message-Id: <20201211220529.2290218-31-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-18qdev: Move dev->realized check to qdev_property_set()Eduardo Habkost1-6/+0
Every single qdev property setter function manually checks dev->realized. We can just check dev->realized inside qdev_property_set() instead. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Paul Durrant <paul@xen.org> Message-Id: <20201211220529.2290218-24-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-15qdev: Make qdev_get_prop_ptr() get Object* argEduardo Habkost1-3/+2
Make the code more generic and not specific to TYPE_DEVICE. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> #s390 parts Acked-by: Paul Durrant <paul@xen.org> Message-Id: <20201211220529.2290218-10-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-08-21meson: infrastructure for building emulatorsPaolo Bonzini1-1/+1
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-07-10error: Eliminate error_propagate() with Coccinelle, part 1Markus Armbruster1-3/+1
When all we do with an Error we receive into a local variable is propagating to somewhere else, we can just as well receive it there right away. Convert if (!foo(..., &err)) { ... error_propagate(errp, err); ... return ... } to if (!foo(..., errp)) { ... ... return ... } where nothing else needs @err. Coccinelle script: @rule1 forall@ identifier fun, err, errp, lbl; expression list args, args2; binary operator op; constant c1, c2; symbol false; @@ if ( ( - fun(args, &err, args2) + fun(args, errp, args2) | - !fun(args, &err, args2) + !fun(args, errp, args2) | - fun(args, &err, args2) op c1 + fun(args, errp, args2) op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; | return c2; | return false; ) } @rule2 forall@ identifier fun, err, errp, lbl; expression list args, args2; expression var; binary operator op; constant c1, c2; symbol false; @@ - var = fun(args, &err, args2); + var = fun(args, errp, args2); ... when != err if ( ( var | !var | var op c1 ) ) { ... when != err when != lbl: when strict - error_propagate(errp, err); ... when != err ( return; | return c2; | return false; | return var; ) } @depends on rule1 || rule2@ identifier err; @@ - Error *err = NULL; ... when != err Not exactly elegant, I'm afraid. The "when != lbl:" is necessary to avoid transforming if (fun(args, &err)) { goto out } ... out: error_propagate(errp, err); even though other paths to label out still need the error_propagate(). For an actual example, see sclp_realize(). Without the "when strict", Coccinelle transforms vfio_msix_setup(), incorrectly. I don't know what exactly "when strict" does, only that it helps here. The match of return is narrower than what I want, but I can't figure out how to express "return where the operand doesn't use @err". For an example where it's too narrow, see vfio_intx_enable(). Silently fails to convert hw/arm/armsse.c, because Coccinelle gets confused by ARMSSE being used both as typedef and function-like macro there. Converted manually. Line breaks tidied up manually. One nested declaration of @local_err deleted manually. Preexisting unwanted blank line dropped in hw/riscv/sifive_e.c. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-35-armbru@redhat.com>
2020-07-10qapi: Use returned bool to check for failure, Coccinelle partMarkus Armbruster1-2/+1
The previous commit enables conversion of visit_foo(..., &err); if (err) { ... } to if (!visit_foo(..., errp)) { ... } for visitor functions that now return true / false on success / error. Coccinelle script: @@ identifier fun =~ "check_list|input_type_enum|lv_start_struct|lv_type_bool|lv_type_int64|lv_type_str|lv_type_uint64|output_type_enum|parse_type_bool|parse_type_int64|parse_type_null|parse_type_number|parse_type_size|parse_type_str|parse_type_uint64|print_type_bool|print_type_int64|print_type_null|print_type_number|print_type_size|print_type_str|print_type_uint64|qapi_clone_start_alternate|qapi_clone_start_list|qapi_clone_start_struct|qapi_clone_type_bool|qapi_clone_type_int64|qapi_clone_type_null|qapi_clone_type_number|qapi_clone_type_str|qapi_clone_type_uint64|qapi_dealloc_start_list|qapi_dealloc_start_struct|qapi_dealloc_type_anything|qapi_dealloc_type_bool|qapi_dealloc_type_int64|qapi_dealloc_type_null|qapi_dealloc_type_number|qapi_dealloc_type_str|qapi_dealloc_type_uint64|qobject_input_check_list|qobject_input_check_struct|qobject_input_start_alternate|qobject_input_start_list|qobject_input_start_struct|qobject_input_type_any|qobject_input_type_bool|qobject_input_type_bool_keyval|qobject_input_type_int64|qobject_input_type_int64_keyval|qobject_input_type_null|qobject_input_type_number|qobject_input_type_number_keyval|qobject_input_type_size_keyval|qobject_input_type_str|qobject_input_type_str_keyval|qobject_input_type_uint64|qobject_input_type_uint64_keyval|qobject_output_start_list|qobject_output_start_struct|qobject_output_type_any|qobject_output_type_bool|qobject_output_type_int64|qobject_output_type_null|qobject_output_type_number|qobject_output_type_str|qobject_output_type_uint64|start_list|visit_check_list|visit_check_struct|visit_start_alternate|visit_start_list|visit_start_struct|visit_type_.*"; expression list args; typedef Error; Error *err; @@ - fun(args, &err); - if (err) + if (!fun(args, &err)) { ... } A few line breaks tidied up manually. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200707160613.848843-19-armbru@redhat.com>
2020-06-11hw/vfio/pci-quirks: Fix broken legacy IGD passthroughThomas Huth1-0/+1
The #ifdef CONFIG_VFIO_IGD in pci-quirks.c is not working since the required header config-devices.h is not included, so that the legacy IGD passthrough is currently broken. Let's include the right header to fix this issue. Buglink: https://bugs.launchpad.net/qemu/+bug/1882784 Fixes: 29d62771c81d ("hw/vfio: Move the IGD quirk code to a separate file") Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-06-11hw/vfio: Add VMD Passthrough QuirkJon Derrick1-12/+72
The VMD endpoint provides a real PCIe domain to the guest, including bridges and endpoints. Because the VMD domain is enumerated by the guest kernel, the guest kernel will assign Guest Physical Addresses to the downstream endpoint BARs and bridge windows. When the guest kernel performs MMIO to VMD sub-devices, MMU will translate from the guest address space to the physical address space. Because the bridges have been programmed with guest addresses, the bridges will reject the transaction containing physical addresses. VMD device 28C0 natively assists passthrough by providing the Host Physical Address in shadow registers accessible to the guest for bridge window assignment. The shadow registers are valid if bit 1 is set in VMD VMLOCK config register 0x70. In order to support existing VMDs, this quirk provides the shadow registers in a vendor-specific PCI capability to the vfio-passthrough device for all VMD device ids which don't natively assist with passthrough. The Linux VMD driver is updated to check for this new vendor-specific capability. Signed-off-by: Jon Derrick <jonathan.derrick@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-05-27vfio/nvlink: Remove exec permission to avoid SELinux AVCsLeonardo Bras1-2/+2
If SELinux is setup without 'execmem' permission for qemu, all mmap with (PROT_WRITE | PROT_EXEC) will fail and print a warning in SELinux log. If "nvlink2-mr" memory allocation fails (fist diff), it will cause guest NUMA nodes to not be correctly configured (V100 memory will not be visible for guest, nor its NUMA nodes). Not having 'execmem' permission is intesting for virtual machines to avoid buffer-overflow based attacks, and it's adopted in distros like RHEL. So, removing the PROT_EXEC flag seems the right thing to do. Browsing some other code that mmaps memory for usage with memory_region_init_ram_device_ptr, I could notice it's usual to not have PROT_EXEC (only PROT_READ | PROT_WRITE), so it should be no problem around this. Signed-off-by: Leonardo Bras <leobras.c@gmail.com> Message-Id: <20200501055448.286518-1-leobras.c@gmail.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2020-05-15qom: Drop parameter @errp of object_property_add() & friendsMarkus Armbruster1-3/+3
The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]
2020-02-06hw/vfio: Move the IGD quirk code to a separate fileThomas Huth1-611/+3
The IGD quirk code defines a separate device, the so-called "vfio-pci-igd-lpc-bridge" which shows up as a user-creatable device in all QEMU binaries that include the vfio code. This is a little bit unfortunate for two reasons: First, this device is completely useless in binaries like qemu-system-s390x. Second we also would like to disable it in downstream RHEL which currently requires some extra patches there since the device does not have a proper Kconfig-style switch yet. So it would be good if the device could be disabled more easily, thus let's move the code to a separate file instead and introduce a proper Kconfig switch for it which gets only enabled by default if we also have CONFIG_PC_PCI enabled. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2019-09-03memory: Access MemoryRegion with endiannessTony Nguyen1-2/+3
Preparation for collapsing the two byte swaps adjust_endianness and handle_bswap into the former. Call memory_region_dispatch_{read|write} with endianness encoded into the "MemOp op" operand. This patch does not change any behaviour as memory_region_dispatch_{read|write} is yet to handle the endianness. Once it does handle endianness, callers with byte swaps can collapse them into adjust_endianness. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Message-Id: <8066ab3eb037c0388dfadfe53c5118429dd1de3a.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-09-03hw/vfio: Access MemoryRegion with MemOpTony Nguyen1-2/+4
The memory_region_dispatch_{read|write} operand "unsigned size" is being converted into a "MemOp op". Convert interfaces by using no-op size_memop. After all interfaces are converted, size_memop will be implemented and the memory_region_dispatch_{read|write} operand "unsigned size" will be converted into a "MemOp op". As size_memop is a no-op, this patch does not change any behaviour. Signed-off-by: Tony Nguyen <tony.nguyen@bt.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Message-Id: <e70ff5814ac3656974180db6375397c43b0bc8b8.1566466906.git.tony.nguyen@bt.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-08-16Include hw/qdev-properties.h lessMarkus Armbruster1-0/+1
In my "build everything" tree, changing hw/qdev-properties.h triggers a recompile of some 2700 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). Many places including hw/qdev-properties.h (directly or via hw/qdev.h) actually need only hw/qdev-core.h. Include hw/qdev-core.h there instead. hw/qdev.h is actually pointless: all it does is include hw/qdev-core.h and hw/qdev-properties.h, which in turn includes hw/qdev-core.h. Replace the remaining uses of hw/qdev.h by hw/qdev-properties.h. While there, delete a few superfluous inclusions of hw/qdev-core.h. Touching hw/qdev-properties.h now recompiles some 1200 objects. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Daniel P. Berrangé" <berrange@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20190812052359.30071-22-armbru@redhat.com>
2019-08-16Include hw/hw.h exactly where neededMarkus Armbruster1-0/+1
In my "build everything" tree, changing hw/hw.h triggers a recompile of some 2600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). The previous commits have left only the declaration of hw_error() in hw/hw.h. This permits dropping most of its inclusions. Touching it now recompiles less than 200 objects. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190812052359.30071-19-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-06-12Include qemu/module.h where needed, drop it from qemu-common.hMarkus Armbruster1-0/+1
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]
2019-04-26spapr: Support NVIDIA V100 GPU with NVLink2Alexey Kardashevskiy1-0/+131
NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver implements special regions for such GPUs and emulates an NVLink bridge. NVLink2-enabled POWER9 CPUs also provide address translation services which includes an ATS shootdown (ATSD) register exported via the NVLink bridge device. This adds a quirk to VFIO to map the GPU memory and create an MR; the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses this to get the MR and map it to the system address space. Another quirk does the same for ATSD. This adds additional steps to sPAPR PHB setup: 1. Search for specific GPUs and NPUs, collect findings in sPAPRPHBState::nvgpus, manage system address space mappings; 2. Add device-specific properties such as "ibm,npu", "ibm,gpu", "memory-block", "link-speed" to advertise the NVLink2 function to the guest; 3. Add "mmio-atsd" to vPHB to advertise the ATSD capability; 4. Add new memory blocks (with extra "linux,memory-usable" to prevent the guest OS from accessing the new memory until it is onlined) and npuphb# nodes representing an NPU unit for every vPHB as the GPU driver uses it for link discovery. This allocates space for GPU RAM and ATSD like we do for MMIOs by adding 2 new parameters to the phb_placement() hook. Older machine types set these to zero. This puts new memory nodes in a separate NUMA node to as the GPU RAM needs to be configured equally distant from any other node in the system. Unlike the host setup which assigns numa ids from 255 downwards, this adds new NUMA nodes after the user configures nodes or from 1 if none were configured. This adds requirement similar to EEH - one IOMMU group per vPHB. The reason for this is that ATSD registers belong to a physical NPU so they cannot invalidate translations on GPUs attached to another NPU. It is guaranteed by the host platform as it does not mix NVLink bridges or GPUs from different NPU in the same IOMMU group. If more than one IOMMU group is detected on a vPHB, this disables ATSD support for that vPHB and prints a warning. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [aw: for vfio portions] Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20190312082103.130561-1-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-02-22pci: Move NVIDIA vendor id to the rest of idsAlexey Kardashevskiy1-2/+0
sPAPR code will use it too so move it from VFIO to the common code. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-Id: <20190214051440.59167-1-aik@ozlabs.ru> Acked-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-10-19vfio: Clean up error reporting after previous commitMarkus Armbruster1-2/+2
The previous commit changed vfio's warning messages from vfio warning: DEV-NAME: Could not frobnicate to warning: vfio DEV-NAME: Could not frobnicate To match this change, change error messages from vfio error: DEV-NAME: On fire to vfio DEV-NAME: On fire Note the loss of "error". If we think marking error messages that way is a good idea, we should mark *all* error messages, i.e. make error_report() print it. Cc: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Alex Williamson <alex.williamson@redhat.com> Message-Id: <20181017082702.5581-7-armbru@redhat.com>
2018-07-02hw/vfio: Use the IEC binary prefix definitionsPhilippe Mathieu-Daudé1-4/+5
It eases code review, unit is explicit. Patch generated using: $ git grep -E '(1024|2048|4096|8192|(<<|>>).?(10|20|30))' hw/ include/hw/ and modified manually. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20180625124238.25339-38-f4bug@amsat.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-05vfio/quirks: Enable ioeventfd quirks to be handled by vfio directlyAlex Williamson1-7/+46
With vfio ioeventfd support, we can program vfio-pci to perform a specified BAR write when an eventfd is triggered. This allows the KVM ioeventfd to be wired directly to vfio-pci, entirely avoiding userspace handling for these events. On the same micro-benchmark where the ioeventfd got us to almost 90% of performance versus disabling the GeForce quirks, this gets us to within 95%. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: ioeventfd quirk accelerationAlex Williamson1-2/+164
The NVIDIA BAR0 quirks virtualize the PCI config space mirrors found in device MMIO space. Normally PCI config space is considered a slow path and further optimization is unnecessary, however NVIDIA uses a register here to enable the MSI interrupt to re-trigger. Exiting to QEMU for this MSI-ACK handling can therefore rate limit our interrupt handling. Fortunately the MSI-ACK write is easily detected since the quirk MemoryRegion otherwise has very few accesses, so simply looking for consecutive writes with the same data is sufficient, in this case 10 consecutive writes with the same data and size is arbitrarily chosen. We configure the KVM ioeventfd with data match, so there's no risk of triggering for the wrong data or size, but we do risk that pathological driver behavior might consume all of QEMU's file descriptors, so we cap ourselves to 10 ioeventfds for this purpose. In support of the above, generic ioeventfd infrastructure is added for vfio quirks. This automatically initializes an ioeventfd list per quirk, disables and frees ioeventfds on exit, and allows ioeventfds marked as dynamic to be dropped on device reset. The rationale for this latter feature is that useful ioeventfds may depend on specific driver behavior and since we necessarily place a cap on our use of ioeventfds, a machine reset is a reasonable point at which to assume a new driver and re-profile. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: Add quirk reset callbackAlex Williamson1-0/+15
Quirks can be self modifying, provide a hook to allow them to cleanup on device reset if desired. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-06-05vfio/quirks: Add common quirk alloc helperAlex Williamson1-27/+21
This will later be used to include list initialization. Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2018-02-06vfio/pci: Add option to disable GeForce quirksAlex Williamson1-3/+6
These quirks are necessary for GeForce, but not for Quadro/GRID/Tesla assignment. Leaving them enabled is fully functional and provides the most compatibility, but due to the unique NVIDIA MSI ACK behavior[1], it also introduces latency in re-triggering the MSI interrupt. This overhead is typically negligible, but has been shown to adversely affect some (very) high interrupt rate applications. This adds the vfio-pci device option "x-no-geforce-quirks=" which can be set to "on" to disable this additional overhead. A follow-on optimization for GeForce might be to make use of an ioeventfd to allow KVM to trigger an irqfd in the kernel vfio-pci driver, avoiding the bounce through userspace to handle this device write. [1] Background: the NVIDIA driver has been observed to issue a write to the MMIO mirror of PCI config space in BAR0 in order to allow the MSI interrupt for the device to retrigger. Older reports indicated a write of 0xff to the (read-only) MSI capability ID register, while more recently a write of 0x0 is observed at config space offset 0x704, non-architected, extended config space of the device (BAR0 offset 0x88704). Virtualization of this range is only required for GeForce. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-15pci: Add INTERFACE_CONVENTIONAL_PCI_DEVICE to Conventional PCI devicesEduardo Habkost1-0/+4
Add INTERFACE_CONVENTIONAL_PCI_DEVICE to all direct subtypes of TYPE_PCI_DEVICE, except: 1) The ones that already have INTERFACE_PCIE_DEVICE set: * base-xhci * e1000e * nvme * pvscsi * vfio-pci * virtio-pci * vmxnet3 2) base-pci-bridge Not all PCI bridges are Conventional PCI devices, so INTERFACE_CONVENTIONAL_PCI_DEVICE is added only to the subtypes that are actually Conventional PCI: * dec-21154-p2p-bridge * i82801b11-bridge * pbm-bridge * pci-bridge The direct subtypes of base-pci-bridge not touched by this patch are: * xilinx-pcie-root: Already marked as PCIe-only. * pcie-pci-bridge: Already marked as PCIe-only. * pcie-port: all non-abstract subtypes of pcie-port are already marked as PCIe-only devices. 3) megasas-base Not all megasas devices are Conventional PCI devices, so the interface names are added to the subclasses registered by megasas_register_types(), according to information in the megasas_devices[] array. "megasas-gen2" already implements INTERFACE_PCIE_DEVICE, so add INTERFACE_CONVENTIONAL_PCI_DEVICE only to "megasas". Acked-by: Alberto Garcia <berto@igalia.com> Acked-by: John Snow <jsnow@redhat.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Acked-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-10-03vfio/pci: Add NVIDIA GPUDirect Cliques supportAlex Williamson1-0/+110
NVIDIA has defined a specification for creating GPUDirect "cliques", where devices with the same clique ID support direct peer-to-peer DMA. When running on bare-metal, tools like NVIDIA's p2pBandwidthLatencyTest (part of cuda-samples) determine which GPUs can support peer-to-peer based on chipset and topology. When running in a VM, these tools have no visibility to the physical hardware support or topology. This option allows the user to specify hints via a vendor defined capability. For instance: <qemu:commandline> <qemu:arg value='-set'/> <qemu:arg value='device.hostdev0.x-nv-gpudirect-clique=0'/> <qemu:arg value='-set'/> <qemu:arg value='device.hostdev1.x-nv-gpudirect-clique=1'/> <qemu:arg value='-set'/> <qemu:arg value='device.hostdev2.x-nv-gpudirect-clique=1'/> </qemu:commandline> This enables two cliques. The first is a singleton clique with ID 0, for the first hostdev defined in the XML (note that since cliques define peer-to-peer sets, singleton clique offer no benefit). The subsequent two hostdevs are both added to clique ID 1, indicating peer-to-peer is possible between these devices. QEMU only provides validation that the clique ID is valid and applied to an NVIDIA graphics device, any validation that the resulting cliques are functional and valid is the user's responsibility. The NVIDIA specification allows a 4-bit clique ID, thus valid values are 0-15. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-10-03vfio/pci: Add virtual capabilities quirk infrastructureAlex Williamson1-0/+4
If the hypervisor needs to add purely virtual capabilties, give us a hook through quirks to do that. Note that we determine the maximum size for a capability based on the physical device, if we insert a virtual capability, that can change. Therefore if maximum size is smaller after added virt capabilities, use that. Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2017-04-06vfio/pci-quirks: Exclude non-ioport BAR from NVIDIA quirkAlex Williamson1-1/+1
The NVIDIA BAR5 quirk is targeting an ioport BAR. Some older devices have a BAR5 which is not ioport and can induce a segfault here. Test the BAR type to skip these devices. Link: https://bugs.launchpad.net/qemu/+bug/1678466 Signed-off-by: Alex Williamson <alex.williamson@redhat.com>