aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-03-13Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Peter Maydell56-381/+1425
into staging virtio,pc,pci: features, cleanups, fixes more memslots support in libvhost-user support PCIe Gen5/Gen6 link speeds in pcie more traces in vdpa network simulation devices support in vdpa SMBIOS type 9 descriptor implementation Bump max_cpus to 4096 vcpus in q35 aw-bits and granule options in VIRTIO-IOMMU Support report NUMA nodes for device memory using GI in acpi Beginning of shutdown event support in pvpanic fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmXw0TMPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRp8x4H+gLMoGwaGAX7gDGPgn2Ix4j/3kO77ZJ9X9k/ # 1KqZu/9eMS1j2Ei+vZqf05w7qRjxxhwDq3ilEXF/+UFqgAehLqpRRB8j5inqvzYt # +jv0DbL11PBp/oFjWcytm5CbiVsvq8KlqCF29VNzc162XdtcduUOWagL96y8lJfZ # uPrOoyeR7SMH9lp3LLLHWgu+9W4nOS03RroZ6Umj40y5B7yR0Rrppz8lMw5AoQtr # 0gMRnFhYXeiW6CXdz+Tzcr7XfvkkYDi/j7ibiNSURLBfOpZa6Y8+kJGKxz5H1K1G # 6ZY4PBcOpQzl+NMrktPHogczgJgOK10t+1i/R3bGZYw2Qn/93Eg= # =C0UU # -----END PGP SIGNATURE----- # gpg: Signature made Tue 12 Mar 2024 22:03:31 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (68 commits) docs/specs/pvpanic: document shutdown event hw/cxl: Fix missing reserved data in CXL Device DVSEC hmat acpi: Fix out of bounds access due to missing use of indirection hmat acpi: Do not add Memory Proximity Domain Attributes Structure targetting non existent memory. qemu-options.hx: Document the virtio-iommu-pci aw-bits option hw/arm/virt: Set virtio-iommu aw-bits default value to 48 hw/i386/q35: Set virtio-iommu aw-bits default value to 39 virtio-iommu: Add an option to define the input range width virtio-iommu: Trace domain range limits as unsigned int qemu-options.hx: Document the virtio-iommu-pci granule option virtio-iommu: Change the default granule to the host page size virtio-iommu: Add a granule property hw/i386/acpi-build: Add support for SRAT Generic Initiator structures hw/acpi: Implement the SRAT GI affinity structure qom: new object to associate device to NUMA node hw/i386/pc: Inline pc_cmos_init() into pc_cmos_init_late() and remove it hw/i386/pc: Set "normal" boot device order in pc_basic_device_init() hw/i386/pc: Avoid one use of the current_machine global hw/i386/pc: Remove "rtc_state" link again Revert "hw/i386/pc: Confine system flash handling to pc_sysfw" ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # hw/core/machine.c
2024-03-13Merge tag 'pull-ppc-for-9.0-2-20240313' of https://gitlab.com/npiggin/qemu ↵Peter Maydell28-563/+2883
into staging * PAPR nested hypervisor host implementation for spapr TCG * excp_helper.c code cleanups and improvements * Move more ops to decodetree * Deprecate pseries-2.12 machines and P9 and P10 DD1.0 CPUs * Document running Linux on AmigaNG * Update dt feature advertising POWER CPUs. * Add P10 PMU SPRs * Improve pnv topology calculation for SMT8 CPUs. * Various bug fixes. # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCgAdFiEETkN92lZhb0MpsKeVZ7MCdqhiHK4FAmXwiT8ACgkQZ7MCdqhi # HK7C/w//XxEO2bQTFPLFDTrP/voq7pcX8XeQNVyXCkXYjvsbu05oQow50k+Y5UAE # US4MFjt8jFz0vuIKuKyoA3kG41zDSOzoX4TQXMM+tyTWbuFF3KAyfizb1xE6SYAN # xJEGvmiXv/EgoSBD7BTKQp1tMPdIGZLwSdYiA0lmOo7YaMCgYAXaujW5hnNjQecT # 873sN+10pHtQY++mINtD9Nfb6AcDGMWw0b+bykqIXhNRkI8IGOS4WF4vAuMBrwfe # UM00wDnNRb86Dk14bv2XVNDr6/i0VRtUMwM4yiptrQ1TQx18LZaPSQFYjQfPaan7 # LwN4QkMFnBX54yJ7Npvjvu8BCBF47kwOVu4CIAFJ4sIm0WfTmozDpPttwcZ5w7Ve # iXDOB9ECAB4pQ2rCgbSNG8MYUZgoHHOuThqolOP0Vh9NHRRJxpdw6CyAbmCGftc0 # lvRDPFiKp8xmCNJ/j3XzoUdHoG7NMwpUmHv9ruGU18SdQ8hyJN9AcQGWYrB4v0RV # /hs2RAbwntG7ahkcwd8uy5aFw88Wph/uGXPXc49EWj7i49vHeIV2y5+gtthMywje # qqjFXkistXuF+JHVnyoYmqqCyXaHX5CEwtawMv4EQeaJs76bLhMeMTKKl9rRp8qB # DtbIZphO8iMsocrBnje48sA5HR0PM+H4HTjw10i8R0fLlWitaIY= # =XnY5 # -----END PGP SIGNATURE----- # gpg: Signature made Tue 12 Mar 2024 16:56:31 GMT # gpg: using RSA key 4E437DDA56616F4329B0A79567B30276A8621CAE # gpg: Good signature from "Nicholas Piggin <npiggin@gmail.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 4E43 7DDA 5661 6F43 29B0 A795 67B3 0276 A862 1CAE * tag 'pull-ppc-for-9.0-2-20240313' of https://gitlab.com/npiggin/qemu: (38 commits) spapr: nested: Introduce cap-nested-papr for Nested PAPR API spapr: nested: Introduce H_GUEST_RUN_VCPU hcall. spapr: nested: Use correct source for parttbl info for nested PAPR API. spapr: nested: Introduce H_GUEST_[GET|SET]_STATE hcalls. spapr: nested: Initialize the GSB elements lookup table. spapr: nested: Extend nested_ppc_state for nested PAPR API spapr: nested: Introduce H_GUEST_CREATE_VCPU hcall. spapr: nested: Introduce H_GUEST_[CREATE|DELETE] hcalls. spapr: nested: Introduce H_GUEST_[GET|SET]_CAPABILITIES hcalls. spapr: nested: Document Nested PAPR API spapr: nested: keep nested-hv related code restricted to its API. spapr: nested: Introduce SpaprMachineStateNested to store related info. spapr: nested: move nested part of spapr_get_pate into spapr_nested.c spapr: nested: register nested-hv api hcalls only for cap-nested-hv target/ppc: Remove interrupt handler wrapper functions target/ppc: Clean up ifdefs in excp_helper.c, part 3 target/ppc: Clean up ifdefs in excp_helper.c, part 2 target/ppc: Clean up ifdefs in excp_helper.c, part 1 target/ppc: Add gen_exception_err_nip() function target/ppc: Readability improvements in exception handlers ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-13Merge tag 'tracing-pull-request' of https://gitlab.com/stefanha/qemu into ↵Peter Maydell3-44/+46
staging Pull request # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmXwpoYACgkQnKSrs4Gr # c8gE0wf/c0hNDKoV01N8IwfJdmIBySNeCYRQiwcR84iiPoGGAwYdKuLa7wHaQKiO # iM0EV/ltJiiOGCHxlffVqLBzJurJHsHG6m429KBLRBXWc6gVzhCN9TjD8DwHxiTU # qzczoev8NJ2y5mrxzPPPjMxSSJEe3Ynas6ngeHeYBUtu0PRNp79zceWdtS0sPzia # sCI8EH/oCZQgVcwI/UkIOXjzbKK1lZWa2805//KIqvG27i9zHzLJ0l5eeLtbpZpy # LnFGRyQGGf+jEKAJuT6598q6T+jCkLCMN6zpyKWGvcYleNvBnlw6+N8Il8zV7KSc # TE5BNk+C7I9aimrRyaz3WrFCZW5DbQ== # =q9Im # -----END PGP SIGNATURE----- # gpg: Signature made Tue 12 Mar 2024 19:01:26 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * tag 'tracing-pull-request' of https://gitlab.com/stefanha/qemu: meson: generate .stp files for tools too tracetool: remove redundant --target-type / --target-name args Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-03-12docs/specs/pvpanic: document shutdown eventThomas Weißschuh1-0/+2
Shutdown requests are normally hardware dependent. By extending pvpanic to also handle shutdown requests, guests can submit such requests with an easily implementable and cross-platform mechanism. Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de> Message-Id: <20240310-pvpanic-shutdown-spec-v1-1-b258e182ce55@t-8ch.de> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/cxl: Fix missing reserved data in CXL Device DVSECJonathan Cameron1-1/+2
The r3.1 specification introduced a new 2 byte field, but to maintain DWORD alignment, a additional 2 reserved bytes were added. Forgot those in updating the structure definition but did include them in the size define leading to a buffer overrun. Also use the define so that we don't duplicate the value. Fixes: Coverity ID 1534095 buffer overrun Fixes: 8700ee15de ("hw/cxl: Standardize all references on CXL r3.1 and minor updates") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240308143831.6256-1-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hmat acpi: Fix out of bounds access due to missing use of indirectionJonathan Cameron1-1/+5
With a numa set up such as -numa nodeid=0,cpus=0 \ -numa nodeid=1,memdev=mem \ -numa nodeid=2,cpus=1 and appropriate hmat_lb entries the initiator list is correctly computed and writen to HMAT as 0,2 but then the LB data is accessed using the node id (here 2), landing outside the entry_list array. Stash the reverse lookup when writing the initiator list and use it to get the correct array index index. Fixes: 4586a2cb83 ("hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)") Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240307160326.31570-3-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hmat acpi: Do not add Memory Proximity Domain Attributes Structure ↵Jonathan Cameron1-0/+7
targetting non existent memory. If qemu is started with a proximity node containing CPUs alone, it will provide one of these structures to say memory in this node is directly connected to itself. This description is arguably pointless even if there is memory in the node. If there is no memory present, and hence no SRAT entry it breaks Linux HMAT passing and the table is rejected. https://elixir.bootlin.com/linux/v6.7/source/drivers/acpi/numa/hmat.c#L444 Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240307160326.31570-2-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12qemu-options.hx: Document the virtio-iommu-pci aw-bits optionEric Auger1-0/+3
Document the new aw-bits option. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240307134445.92296-10-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12hw/arm/virt: Set virtio-iommu aw-bits default value to 48Eric Auger1-0/+17
On ARM we set 48b as a default (matching SMMUv3 SMMU_IDR5.VAX == 0). hw_compat_8_2 is used to handle the compatibility for machine types before 9.0 (default was 64 bits). Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <Zhenzhong.duan@intel.com> Message-Id: <20240307134445.92296-9-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/q35: Set virtio-iommu aw-bits default value to 39Eric Auger3-1/+11
Currently the default input range can extend to 64 bits. On x86, when the virtio-iommu protects vfio devices, the physical iommu may support only 39 bits. Let's set the default to 39, as done for the intel-iommu. We use hw_compat_8_2 to handle the compatibility for machines before 9.0 which used to have a virtio-iommu default input range of 64 bits. Of course if aw-bits is set from the command line, the default is overriden. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240307134445.92296-8-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
2024-03-12virtio-iommu: Add an option to define the input range widthEric Auger2-1/+8
aw-bits is a new option that allows to set the bit width of the input address range. This value will be used as a default for the device config input_range.end. By default it is set to 64 bits which is the current value. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20240307134445.92296-7-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12virtio-iommu: Trace domain range limits as unsigned intEric Auger1-1/+1
Use %u format to trace domain_range limits. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-Id: <20240307134445.92296-6-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12qemu-options.hx: Document the virtio-iommu-pci granule optionEric Auger1-0/+8
We are missing an entry for the virtio-iommu-pci device. Add the information on which machine it is currently supported and document the new granule option. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240307134445.92296-5-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12virtio-iommu: Change the default granule to the host page sizeEric Auger2-2/+5
We used to set the default granule to 4KB but with VFIO assignment it makes more sense to use the actual host page size. Indeed when hotplugging a VFIO device protected by a virtio-iommu on a 64kB/64kB host/guest config, we current get a qemu crash: "vfio: DMA mapping failed, unable to continue" This is due to the hot-attached VFIO device calling memory_region_iommu_set_page_size_mask() with 64kB granule whereas the virtio-iommu granule was already frozen to 4KB on machine init done. Set the granule property to "host" and introduce a new compat. The page size mask used before 9.0 was qemu_target_page_mask(). Since the virtio-iommu currently only supports x86_64 and aarch64, this matched a 4KB granule. Note that the new default will prevent 4kB guest on 64kB host because the granule will be set to 64kB which would be larger than the guest page size. In that situation, the virtio-iommu driver fails on viommu_domain_finalise() with "granule 0x10000 larger than system page size 0x1000". In that case the workaround is to request 4K granule. The current limitation of global granule in the virtio-iommu should be removed and turned into per domain granule. But until we get this upgraded, this new default is probably better because I don't think anyone is currently interested in running a 4KB page size guest with virtio-iommu on a 64KB host. However supporting 64kB guest on 64kB host with virtio-iommu and VFIO looks a more important feature. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20240307134445.92296-4-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12virtio-iommu: Add a granule propertyEric Auger2-3/+27
This allows to choose which granule will be used by default by the virtio-iommu. Current page size mask default is qemu_target_page_mask so this translates into a 4k granule on ARM and x86_64 where virtio-iommu is supported. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20240307134445.92296-3-eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/acpi-build: Add support for SRAT Generic Initiator structuresAnkit Agrawal1-0/+3
The acpi-generic-initiator object is added to allow a host device to be linked with a NUMA node. Qemu use it to build the SRAT Generic Initiator Affinity structure [1]. Add support for i386. [1] ACPI Spec 6.3, Section 5.2.16.6 Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-4-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-03-12hw/acpi: Implement the SRAT GI affinity structureAnkit Agrawal6-2/+109
ACPI spec provides a scheme to associate "Generic Initiators" [1] (e.g. heterogeneous processors and accelerators, GPUs, and I/O devices with integrated compute or DMA engines GPUs) with Proximity Domains. This is achieved using Generic Initiator Affinity Structure in SRAT. During bootup, Linux kernel parse the ACPI SRAT to determine the PXM ids and create a NUMA node for each unique PXM ID encountered. Qemu currently do not implement these structures while building SRAT. Add GI structures while building VM ACPI SRAT. The association between device and node are stored using acpi-generic-initiator object. Lookup presence of all such objects and use them to build these structures. The structure needs a PCI device handle [2] that consists of the device BDF. The vfio-pci device corresponding to the acpi-generic-initiator object is located to determine the BDF. [1] ACPI Spec 6.3, Section 5.2.16.6 [2] ACPI Spec 6.3, Table 5.80 Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cedric Le Goater <clg@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-3-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12qom: new object to associate device to NUMA nodeAnkit Agrawal4-0/+111
NVIDIA GPU's support MIG (Mult-Instance GPUs) feature [1], which allows partitioning of the GPU device resources (including device memory) into several (upto 8) isolated instances. Each of the partitioned memory needs a dedicated NUMA node to operate. The partitions are not fixed and they can be created/deleted at runtime. Unfortunately Linux OS does not provide a means to dynamically create/destroy NUMA nodes and such feature implementation is not expected to be trivial. The nodes that OS discovers at the boot time while parsing SRAT remains fixed. So we utilize the Generic Initiator (GI) Affinity structures that allows association between nodes and devices. Multiple GI structures per BDF is possible, allowing creation of multiple nodes by exposing unique PXM in each of these structures. Implement the mechanism to build the GI affinity structures as Qemu currently does not. Introduce a new acpi-generic-initiator object to allow host admin link a device with an associated NUMA node. Qemu maintains this association and use this object to build the requisite GI Affinity Structure. When multiple NUMA nodes are associated with a device, it is required to create those many number of acpi-generic-initiator objects, each representing a unique device:node association. Following is one of a decoded GI affinity structure in VM ACPI SRAT. [0C8h 0200 1] Subtable Type : 05 [Generic Initiator Affinity] [0C9h 0201 1] Length : 20 [0CAh 0202 1] Reserved1 : 00 [0CBh 0203 1] Device Handle Type : 01 [0CCh 0204 4] Proximity Domain : 00000007 [0D0h 0208 16] Device Handle : 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 [0E0h 0224 4] Flags (decoded below) : 00000001 Enabled : 1 [0E4h 0228 4] Reserved2 : 00000000 [0E8h 0232 1] Subtable Type : 05 [Generic Initiator Affinity] [0E9h 0233 1] Length : 20 An admin can provide a range of acpi-generic-initiator objects, each associating a device (by providing the id through pci-dev argument) to the desired NUMA node (using the node argument). Currently, only PCI device is supported. For the grace hopper system, create a range of 8 nodes and associate that with the device using the acpi-generic-initiator object. While a configuration of less than 8 nodes per device is allowed, such configuration will prevent utilization of the feature to the fullest. The following sample creates 8 nodes per PCI device for a VM with 2 PCI devices and link them to the respecitve PCI device using acpi-generic-initiator objects: -numa node,nodeid=2 -numa node,nodeid=3 -numa node,nodeid=4 \ -numa node,nodeid=5 -numa node,nodeid=6 -numa node,nodeid=7 \ -numa node,nodeid=8 -numa node,nodeid=9 \ -device vfio-pci-nohotplug,host=0009:01:00.0,bus=pcie.0,addr=04.0,rombar=0,id=dev0 \ -object acpi-generic-initiator,id=gi0,pci-dev=dev0,node=2 \ -object acpi-generic-initiator,id=gi1,pci-dev=dev0,node=3 \ -object acpi-generic-initiator,id=gi2,pci-dev=dev0,node=4 \ -object acpi-generic-initiator,id=gi3,pci-dev=dev0,node=5 \ -object acpi-generic-initiator,id=gi4,pci-dev=dev0,node=6 \ -object acpi-generic-initiator,id=gi5,pci-dev=dev0,node=7 \ -object acpi-generic-initiator,id=gi6,pci-dev=dev0,node=8 \ -object acpi-generic-initiator,id=gi7,pci-dev=dev0,node=9 \ -numa node,nodeid=10 -numa node,nodeid=11 -numa node,nodeid=12 \ -numa node,nodeid=13 -numa node,nodeid=14 -numa node,nodeid=15 \ -numa node,nodeid=16 -numa node,nodeid=17 \ -device vfio-pci-nohotplug,host=0009:01:01.0,bus=pcie.0,addr=05.0,rombar=0,id=dev1 \ -object acpi-generic-initiator,id=gi8,pci-dev=dev1,node=10 \ -object acpi-generic-initiator,id=gi9,pci-dev=dev1,node=11 \ -object acpi-generic-initiator,id=gi10,pci-dev=dev1,node=12 \ -object acpi-generic-initiator,id=gi11,pci-dev=dev1,node=13 \ -object acpi-generic-initiator,id=gi12,pci-dev=dev1,node=14 \ -object acpi-generic-initiator,id=gi13,pci-dev=dev1,node=15 \ -object acpi-generic-initiator,id=gi14,pci-dev=dev1,node=16 \ -object acpi-generic-initiator,id=gi15,pci-dev=dev1,node=17 \ Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu [1] Cc: Jonathan Cameron <qemu-devel@nongnu.org> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Message-Id: <20240308145525.10886-2-ankita@nvidia.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/pc: Inline pc_cmos_init() into pc_cmos_init_late() and remove itBernhard Beschow4-16/+0
Now that pc_cmos_init() doesn't populate the X86MachineState::rtc attribute any longer, its duties can be merged into pc_cmos_init_late() which is called within machine_done notifier. This frees pc_piix and pc_q35 from explicit CMOS initialization. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240303185332.1408-5-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/pc: Set "normal" boot device order in pc_basic_device_init()Bernhard Beschow1-2/+2
The boot device order may change during the lifetime of a VM. Usually, the "normal" order is set once during machine init(). However, if a user specifies `-boot once=...`, the "normal" order is overwritten by the "once" order just before machine_done, and a reset handler is registered which restores the "normal" order during the next reset. In the next patch, pc_cmos_init() will be inlined into pc_cmos_init_late() which runs during machine_done. This means that the "once" boot order would be overwritten again with the "normal" boot order -- which renders the user's choice ineffective. Fix this by setting the "normal" boot order in pc_basic_device_init() which already registers the boot_set() handler. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240303185332.1408-4-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/i386/pc: Avoid one use of the current_machine globalBernhard Beschow1-3/+4
The RTC can be accessed through the X86 machine instance, so rather than passing the RTC it's possible to pass the machine state instead. This avoids pc_boot_set() from having to access the current_machine global. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240303185332.1408-3-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12hw/i386/pc: Remove "rtc_state" link againBernhard Beschow1-8/+0
Commit 99e1c1137b6f "hw/i386/pc: Populate RTC attribute directly" made linking the "rtc_state" property unnecessary and removed it. Commit 84e945aad2d0 "vl, pc: turn -no-fd-bootchk into a machine property" accidently reintroduced the link. Remove it again since it is not needed. Fixes: 84e945aad2d0 "vl, pc: turn -no-fd-bootchk into a machine property" Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240303185332.1408-2-shentey@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12Revert "hw/i386/pc: Confine system flash handling to pc_sysfw"Bernhard Beschow4-4/+6
Specifying the property `-M pflash0` results in a regression: qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found Revert the change for now until a solution is found. This reverts commit 6f6ad2b24582593d8feb00434ce2396840666227. Reported-by: Volker Rümelin <vr_qemu@t-online.de> Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240226215909.30884-3-shentey@gmail.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12Revert "hw/i386/pc_sysfw: Inline pc_system_flash_create() and remove it"Bernhard Beschow1-2/+13
Commit 6f6ad2b24582 "hw/i386/pc: Confine system flash handling to pc_sysfw" causes a regression when specifying the property `-M pflash0` in the PCI PC machines: qemu-system-x86_64: Property 'pc-q35-9.0-machine.pflash0' not found In order to revert the commit, the commit below must be reverted first. This reverts commit cb05cc16029bb0a61ac5279ab7b3b90dcf2aa69f. Signed-off-by: Bernhard Beschow <shentey@gmail.com> Message-Id: <20240226215909.30884-2-shentey@gmail.com> Tested-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12pc: q35: Bump max_cpus to 4096 vcpusAni Sinha1-1/+2
Since commit f10a570b093e6 ("KVM: x86: Add CONFIG_KVM_MAX_NR_VCPUS to allow up to 4096 vCPUs") Linux kernel can support upto a maximum number of 4096 vcpus when MAXSMP is enabled in the kernel. At present, QEMU has been tested to correctly boot a linux guest with 4096 vcpus using the current edk2 upstream master branch that has the fixes corresponding to the following two PRs: https://github.com/tianocore/edk2/pull/5410 https://github.com/tianocore/edk2/pull/5418 The changes merged into edk2 with the above PRs will be in the upcoming 2024-05 release. With current seabios firmware, it boots fine with 4096 vcpus already. So bump up the value max_cpus to 4096 for q35 machines versions 9 and newer. Q35 machines versions 8.2 and older continue to support 1024 maximum vcpus as before for compatibility reasons. If KVM is not able to support the specified number of vcpus, QEMU would return the following error messages: $ ./qemu-system-x86_64 -cpu host -accel kvm -machine q35 -smp 1728 qemu-system-x86_64: -accel kvm: warning: Number of SMP cpus requested (1728) exceeds the recommended cpus supported by KVM (12) qemu-system-x86_64: -accel kvm: warning: Number of hotpluggable cpus requested (1728) exceeds the recommended cpus supported by KVM (12) Number of SMP cpus requested (1728) exceeds the maximum cpus supported by KVM (1024) Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Julia Suvorova <jusual@redhat.com> Cc: kraxel@redhat.com Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Ani Sinha <anisinha@redhat.com> Message-Id: <20240228143351.3967-1-anisinha@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/pci: Always call pcie_sriov_pf_reset()Akihiko Odaki3-6/+1
Call pcie_sriov_pf_reset() from pci_do_device_reset() just as we do for msi_reset() and msix_reset() to prevent duplicating code for each SR-IOV PF. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-5-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
2024-03-12pcie_sriov: Do not reset NumVFs after disabling VFsAkihiko Odaki1-1/+2
The spec does not NumVFs is reset after disabling VFs except when resetting the PF. Clearing it is guest visible and out of spec, even though Linux doesn't rely on this value being preserved, so we never noticed. Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-4-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12pcie_sriov: Reset SR-IOV extended capabilityAkihiko Odaki4-12/+22
pcie_sriov_pf_disable_vfs() is called when resetting the PF, but it only disables VFs and does not reset SR-IOV extended capability, leaking the state and making the VF Enable register inconsistent with the actual state. Replace pcie_sriov_pf_disable_vfs() with pcie_sriov_pf_reset(), which does not only disable VFs but also resets the capability. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-3-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
2024-03-12pcie_sriov: Validate NumVFsAkihiko Odaki1-0/+3
The guest may write NumVFs greater than TotalVFs and that can lead to buffer overflow in VF implementations. Cc: qemu-stable@nongnu.org Fixes: CVE-2024-26327 Fixes: 7c0fa8dff811 ("pcie: Add support for Single Root I/O Virtualization (SR/IOV)") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-2-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Sriram Yagnaraman <sriram.yagnaraman@ericsson.com>
2024-03-12hw/nvme: Use pcie_sriov_num_vfs()Akihiko Odaki1-18/+8
nvme_sriov_pre_write_ctrl() used to directly inspect SR-IOV configurations to know the number of VFs being disabled due to SR-IOV configuration writes, but the logic was flawed and resulted in out-of-bound memory access. It assumed PCI_SRIOV_NUM_VF always has the number of currently enabled VFs, but it actually doesn't in the following cases: - PCI_SRIOV_NUM_VF has been set but PCI_SRIOV_CTRL_VFE has never been. - PCI_SRIOV_NUM_VF was written after PCI_SRIOV_CTRL_VFE was set. - VFs were only partially enabled because of realization failure. It is a responsibility of pcie_sriov to interpret SR-IOV configurations and pcie_sriov does it correctly, so use pcie_sriov_num_vfs(), which it provides, to get the number of enabled VFs before and after SR-IOV configuration writes. Cc: qemu-stable@nongnu.org Fixes: CVE-2024-26328 Fixes: 11871f53ef8e ("hw/nvme: Add support for the Virtualization Management command") Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20240228-reuse-v8-1-282660281e60@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12Implement SMBIOS type 9 v2.6Felix Wu3-4/+51
Signed-off-by: Felix Wu <flwu@google.com> Signed-off-by: Nabih Estefan <nabihestefan@google.com> Message-Id: <20240221170027.1027325-3-nabihestefan@google.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12Implement base of SMBIOS type 9 descriptor.Felix Wu3-0/+115
Version 2.1+. Signed-off-by: Felix Wu <flwu@google.com> Signed-off-by: Nabih Estefan <nabihestefan@google.com> Message-Id: <20240221170027.1027325-2-nabihestefan@google.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/intc: Check @errp to handle the error of IOAPICCommonClass.realize()Zhao Liu1-0/+4
IOAPICCommonClass implements its own private realize(), and this private realize() allows error. Since IOAPICCommonClass.realize() returns void, to check the error, dereference @errp with ERRP_GUARD(). Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20240223085653.1255438-8-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12hw/vfio/iommufd: Fix missing ERRP_GUARD() in iommufd_cdev_getfd()Zhao Liu1-0/+1
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in iommufd_cdev_getfd(), @errp is dereferenced without ERRP_GUARD(): if (*errp) { error_prepend(errp, VFIO_MSG_PREFIX, path); } Currently, since vfio_attach_device() - the caller of iommufd_cdev_getfd() - is always called in DeviceClass.realize() context and doesn't get the NULL @errp parameter, iommufd_cdev_getfd() hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in iommufd_cdev_getfd(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240223085653.1255438-7-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/pci-bridge/cxl_upstream: Fix missing ERRP_GUARD() in cxl_usp_realize()Zhao Liu1-0/+1
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in cxl_usp_realize(), @errp is dereferenced without ERRP_GUARD(): cxl_doe_cdat_init(cxl_cstate, errp); if (*errp) { goto err_cap; } Here we check *errp, because cxl_doe_cdat_init() returns void. And since cxl_usp_realize() - as a PCIDeviceClass.realize() method - doesn't get the NULL @errp parameter, it hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in cxl_usp_realize(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240223085653.1255438-6-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Thomas Huth <thuth@redhat.com>
2024-03-12hw/misc/xlnx-versal-trng: Check returned bool in trng_prop_fault_event_set()Zhao Liu1-2/+1
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in trng_prop_fault_event_set, @errp is dereferenced without ERRP_GUARD(): visit_type_uint32(v, name, events, errp); if (*errp) { return; } Currently, since trng_prop_fault_event_set() doesn't get the NULL @errp parameter as a "set" method of object property, it hasn't triggered the bug that dereferencing the NULL @errp. And since visit_type_uint32() returns bool, check the returned bool directly instead of dereferencing @errp, then we needn't the add missing ERRP_GUARD(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20240223085653.1255438-5-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-03-12hw/mem/cxl_type3: Fix missing ERRP_GUARD() in ct3_realize()Zhao Liu1-0/+1
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in ct3_realize(), @errp is dereferenced without ERRP_GUARD(): cxl_doe_cdat_init(cxl_cstate, errp); if (*errp) { goto err_free_special_ops; } Here we check *errp, because cxl_doe_cdat_init() returns void. And ct3_realize() - as a PCIDeviceClass.realize() method - doesn't get the NULL @errp parameter, it hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in ct3_realize(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240223085653.1255438-4-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-03-12hw/display/macfb: Fix missing ERRP_GUARD() in macfb_nubus_realize()Zhao Liu1-0/+1
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in macfb_nubus_realize(), @errp is dereferenced without ERRP_GUARD(): ndc->parent_realize(dev, errp); if (*errp) { return; } Here we check *errp, because the ndc->parent_realize(), as a DeviceClass.realize() callback, returns void. And since macfb_nubus_realize(), also as a DeviceClass.realize(), doesn't get the NULL @errp parameter, it hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in macfb_nubus_realize(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240223085653.1255438-3-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/cxl/cxl-host: Fix missing ERRP_GUARD() in cxl_fixed_memory_window_config()Zhao Liu1-0/+1
As the comment in qapi/error, dereferencing @errp requires ERRP_GUARD(): * = Why, when and how to use ERRP_GUARD() = * * Without ERRP_GUARD(), use of the @errp parameter is restricted: * - It must not be dereferenced, because it may be null. ... * ERRP_GUARD() lifts these restrictions. * * To use ERRP_GUARD(), add it right at the beginning of the function. * @errp can then be used without worrying about the argument being * NULL or &error_fatal. * * Using it when it's not needed is safe, but please avoid cluttering * the source with useless code. But in cxl_fixed_memory_window_config(), @errp is dereferenced in 2 places without ERRP_GUARD(): fw->enc_int_ways = cxl_interleave_ways_enc(fw->num_targets, errp); if (*errp) { return; } and fw->enc_int_gran = cxl_interleave_granularity_enc(object->interleave_granularity, errp); if (*errp) { return; } For the above 2 places, we check "*errp", because neither function returns a suitable error code. And since machine_set_cfmw() - the caller of cxl_fixed_memory_window_config() - doesn't get the NULL @errp parameter as the "set" method of object property, cxl_fixed_memory_window_config() hasn't triggered the bug that dereferencing the NULL @errp. To follow the requirement of @errp, add missing ERRP_GUARD() in cxl_fixed_memory_window_config(). Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20240223085653.1255438-2-zhao1.liu@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
2024-03-12hw/virtio: Add support for VDPA network simulation devicesHao Chen9-3/+399
This patch adds support for VDPA network simulation devices. The device is developed based on virtio-net and tap backend, and supports hardware live migration function. For more details, please refer to "docs/system/devices/vdpa-net.rst" Signed-off-by: Hao Chen <chenh@yusur.tech> Message-Id: <20240221073802.2888022-1-chenh@yusur.tech> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/virtio: check owner for removing objectsAlbert Esteve2-3/+22
Shared objects lack spoofing protection. For VHOST_USER_BACKEND_SHARED_OBJECT_REMOVE messages received by the vhost-user interface, any backend was allowed to remove entries from the shared table just by knowing the UUID. Only the owner of the entry shall be allowed to removed their resources from the table. To fix that, add a check for all *SHARED_OBJECT_REMOVE messages received. A vhost device can only remove TYPE_VHOST_DEV entries that are owned by them, otherwise skip the removal, and inform the device that the entry has not been removed in the answer. Signed-off-by: Albert Esteve <aesteve@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20240219143423.272012-2-aesteve@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/audio/virtio-sound: return correct command response sizeVolker Rümelin2-2/+6
The payload size returned by command VIRTIO_SND_R_PCM_INFO is wrong. The code in process_cmd() assumes that all commands return only a virtio_snd_hdr payload, but some commands like VIRTIO_SND_R_PCM_INFO may return an additional payload. Add a zero initialized payload_size variable to struct virtio_snd_ctrl_command to allow for additional payloads. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Volker Rümelin <vr_qemu@t-online.de> Message-Id: <20240218083351.8524-1-vr_qemu@t-online.de> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12hw/pci-bridge/pxb-cxl: Drop RAS capability from host bridge.Jonathan Cameron3-5/+19
This CXL component isn't allowed to have a RAS capability. Whilst this should be harmless as software is not expected to look here, good to clean it up. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Message-Id: <20240215155206.2736-1-Jonathan.Cameron@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12vdpa: trace skipped memory sectionsEugenio Pérez2-0/+4
Sometimes, certain parts are not being skipped in vhost_vdpa_listener_region_del, but they are skipped in vhost_vdpa_listener_region_add, or vice versa. The vhost-vdpa code expects all parts to maintain their properties, so we're adding a trace to help with debugging when any part is skipped. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20240215103616.330518-3-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12vdpa: stash memory region properties in varsEugenio Pérez1-6/+8
Next changes uses this variables, so avoid call repeatedly to memory region functions. No functional change intended. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20240215103616.330518-2-eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12pcie: Support PCIe Gen5/Gen6 link speedsLukas Stockner4-3/+29
This patch extends the PCIe link speed option so that slots can be configured as supporting 32GT/s (Gen5) or 64GT/s (Gen5) speeds. This is as simple as setting the appropriate bit in LnkCap2 and the appropriate value in LnkCap and LnkCtl2. Signed-off-by: Lukas Stockner <lstockner@genesiscloud.com> Message-Id: <20240215012326.3272366-1-lstockner@genesiscloud.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12libvhost-user: Mark mmap'ed region memory as MADV_DONTDUMPDavid Hildenbrand1-0/+6
We already use MADV_NORESERVE to deal with sparse memory regions. Let's also set madvise(MADV_DONTDUMP), otherwise a crash of the process can result in us allocating all memory in the mmap'ed region for dumping purposes. This change implies that the mmap'ed rings won't be included in a coredump. If ever required for debugging purposes, we could mark only the mapped rings MADV_DODUMP. Ignore errors during madvise() for now. Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-15-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12libvhost-user: Dynamically remap rings after (temporarily?) removing memory ↵David Hildenbrand1-29/+78
regions Currently, we try to remap all rings whenever we add a single new memory region. That doesn't quite make sense, because we already map rings when setting the ring address, and panic if that goes wrong. Likely, that handling was simply copied from set_mem_table code, where we actually have to remap all rings. Remapping all rings might require us to walk quite a lot of memory regions to perform the address translations. Ideally, we'd simply remove that remapping. However, let's be a bit careful. There might be some weird corner cases where we might temporarily remove a single memory region (e.g., resize it), that would have worked for now. Further, a ring might be located on hotplugged memory, and as the VM reboots, we might unplug that memory, to hotplug memory before resetting the ring addresses. So let's unmap affected rings as we remove a memory region, and try dynamically mapping the ring again when required. Acked-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-14-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12libvhost-user: Factor out vq usability checkDavid Hildenbrand1-12/+12
Let's factor it out to prepare for further changes. Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-13-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-03-12libvhost-user: Use most of mmap_offset as fd_offsetDavid Hildenbrand1-6/+48
In the past, QEMU would create memory regions that could partially cover hugetlb pages, making mmap() fail if we would use the mmap_offset as an fd_offset. For that reason, we never used the mmap_offset as an offset into the fd and instead always mapped the fd from the very start. However, that can easily result in us mmap'ing a lot of unnecessary parts of an fd, possibly repeatedly. QEMU nowadays does not create memory regions that partially cover huge pages -- it never really worked with postcopy. QEMU handles merging of regions that partially cover huge pages (due to holes in boot memory) since 2018 in c1ece84e7c93 ("vhost: Huge page align and merge"). Let's be a bit careful and not unconditionally convert the mmap_offset into an fd_offset. Instead, let's simply detect the hugetlb size and pass as much as we can as fd_offset, making sure that we call mmap() with a properly aligned offset. With QEMU and a virtio-mem device that is fully plugged (50GiB using 50 memslots) the qemu-storage daemon process consumes in the VA space 1281GiB before this change and 58GiB after this change. ================ Vhost user message ================ Request: VHOST_USER_ADD_MEM_REG (37) Flags: 0x9 Size: 40 Fds: 59 Adding region 4 guest_phys_addr: 0x0000000200000000 memory_size: 0x0000000040000000 userspace_addr: 0x00007fb73bffe000 old mmap_offset: 0x0000000080000000 fd_offset: 0x0000000080000000 new mmap_offset: 0x0000000000000000 mmap_addr: 0x00007f02f1bdc000 Successfully added new region ================ Vhost user message ================ Request: VHOST_USER_ADD_MEM_REG (37) Flags: 0x9 Size: 40 Fds: 59 Adding region 5 guest_phys_addr: 0x0000000240000000 memory_size: 0x0000000040000000 userspace_addr: 0x00007fb77bffe000 old mmap_offset: 0x00000000c0000000 fd_offset: 0x00000000c0000000 new mmap_offset: 0x0000000000000000 mmap_addr: 0x00007f0284000000 Successfully added new region Reviewed-by: Raphael Norwitz <raphael@enfabrica.net> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20240214151701.29906-12-david@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>