aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2023-12-26vdpa: use VhostVDPAShared in vdpa_dma_map and unmapEugenio Pérez1-2/+3
The callers only have the shared information by the end of this series. Start converting this functions. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231221174322.3130442-12-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26vdpa: move file descriptor to vhost_vdpa_sharedEugenio Pérez1-7/+4
Next patches will register the vhost_vdpa memory listener while the VM is migrating at the destination, so we can map the memory to the device before stopping the VM at the source. The main goal is to reduce the downtime. However, the destination QEMU is unaware of which vhost_vdpa device will register its memory_listener. If the source guest has CVQ enabled, it will be the CVQ device. Otherwise, it will be the first one. Move the file descriptor to VhostVDPAShared so all vhost_vdpa can use it, rather than always in the first / last vhost_vdpa. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231221174322.3130442-7-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26vdpa: move shadow_data to vhost_vdpa_sharedEugenio Pérez1-17/+5
Next patches will register the vhost_vdpa memory listener while the VM is migrating at the destination, so we can map the memory to the device before stopping the VM at the source. The main goal is to reduce the downtime. However, the destination QEMU is unaware of which vhost_vdpa device will register its memory_listener. If the source guest has CVQ enabled, it will be the CVQ device. Otherwise, it will be the first one. Move the shadow_data member to VhostVDPAShared so all vhost_vdpa can use it, rather than always in the first or last vhost_vdpa. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231221174322.3130442-5-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26vdpa: move iova_range to vhost_vdpa_sharedEugenio Pérez1-5/+5
Next patches will register the vhost_vdpa memory listener while the VM is migrating at the destination, so we can map the memory to the device before stopping the VM at the source. The main goal is to reduce the downtime. However, the destination QEMU is unaware of which vhost_vdpa device will register its memory_listener. If the source guest has CVQ enabled, it will be the CVQ device. Otherwise, it will be the first one. Move the iova range to VhostVDPAShared so all vhost_vdpa can use it, rather than always in the first or last vhost_vdpa. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231221174322.3130442-4-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26vdpa: move iova tree to the shared structEugenio Pérez1-31/+23
Next patches will register the vhost_vdpa memory listener while the VM is migrating at the destination, so we can map the memory to the device before stopping the VM at the source. The main goal is to reduce the downtime. However, the destination QEMU is unaware of which vhost_vdpa device will register its memory_listener. If the source guest has CVQ enabled, it will be the CVQ device. Otherwise, it will be the first one. Move the iova tree to VhostVDPAShared so all vhost_vdpa can use it, rather than always in the first or last vhost_vdpa. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231221174322.3130442-3-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-26vdpa: add VhostVDPASharedEugenio Pérez1-2/+22
It will hold properties shared among all vhost_vdpa instances associated with of the same device. For example, we just need one iova_tree or one memory listener for the entire device. Next patches will register the vhost_vdpa memory listener at the beginning of the VM migration at the destination. This enables QEMU to map the memory to the device before stopping the VM at the source, instead of doing while both source and destination are stopped, thus minimizing the downtime. However, the destination QEMU is unaware of which vhost_vdpa struct will register its memory_listener. If the source guest has CVQ enabled, it will be the one associated with the CVQ. Otherwise, it will be the first one. Save the memory operations related members in a common place rather than always in the first / last vhost_vdpa. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20231221174322.3130442-2-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-12-21block: remove AioContext lockingStefan Hajnoczi1-2/+0
This is the big patch that removes aio_context_acquire()/aio_context_release() from the block layer and affected block layer users. There isn't a clean way to split this patch and the reviewers are likely the same group of people, so I decided to do it in one patch. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-ID: <20231205182011.1976568-7-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-11-21net: do not delete nics in net_cleanup()David Woodhouse1-6/+22
In net_cleanup() we only need to delete the netdevs, as those may have state which outlives Qemu when it exits, and thus may actually need to be cleaned up on exit. The nics, on the other hand, are owned by the device which created them. Most devices don't bother to clean up on exit because they don't have any state which will outlive Qemu... but XenBus devices do need to clean up their nodes in XenStore, and do have an exit handler to delete them. When the XenBus exit handler destroys the xen-net-device, it attempts to delete its nic after net_cleanup() had already done so. And crashes. Fix this by only deleting netdevs as we walk the list. As the comment notes, we can't use QTAILQ_FOREACH_SAFE() as each deletion may remove *multiple* entries, including the "safely" saved 'next' pointer. But we can store the *previous* entry, since nics are safe. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-11-21net: Update MemReentrancyGuard for NICAkihiko Odaki1-0/+14
Recently MemReentrancyGuard was added to DeviceState to record that the device is engaging in I/O. The network device backend needs to update it when delivering a packet to a device. This implementation follows what bottom half does, but it does not add a tracepoint for the case that the network device backend started delivering a packet to a device which is already engaging in I/O. This is because such reentrancy frequently happens for qemu_flush_queued_packets() and is insignificant. Fixes: CVE-2023-3019 Reported-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-11-21net: Provide MemReentrancyGuard * to qemu_new_nic()Akihiko Odaki1-0/+1
Recently MemReentrancyGuard was added to DeviceState to record that the device is engaging in I/O. The network device backend needs to update it when delivering a packet to a device. In preparation for such a change, add MemReentrancyGuard * as a parameter of qemu_new_nic(). Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Alexander Bulekov <alxndr@bu.edu> Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-11-17net: Fix a misleading error messageMarkus Armbruster1-3/+3
The error message $ qemu-system-x86_64 -netdev user,id=net0,ipv6-net=fec0::0/ qemu-system-x86_64: -netdev user,id=net0,ipv6-net=fec0::0/: Parameter 'ipv6-prefixlen' expects a number points to ipv6-prefixlen instead of ipv6-net. Fix: qemu-system-x86_64: -netdev user,id=net0,ipv6-net=fec0::0/: parameter 'ipv6-net' expects a number after '/' Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20231031111059.3407803-6-armbru@redhat.com>
2023-11-07vdpa: Allow VIRTIO_NET_F_RSS in SVQHawkins Jiawei1-0/+1
Enable SVQ with VIRTIO_NET_F_RSS feature. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <626449eb303207de408126b3dc7c155cd72b028b.1698195059.git.yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07vdpa: Restore receive-side scaling stateHawkins Jiawei1-23/+44
This patch reuses vhost_vdpa_net_load_rss() with some refactorings to restore the receive-side scaling state at device's startup. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <cf5b78a16ed0318982ceffb195f2227f6aad4ac1.1698195059.git.yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07vdpa: Add SetSteeringEBPF method for NetClientStateHawkins Jiawei1-0/+8
At present, to enable the VIRTIO_NET_F_RSS feature, eBPF must be loaded for the vhost backend. Given that vhost-vdpa is one of the vhost backend, we need to implement the SetSteeringEBPF method to support RSS for vhost-vdpa, even if vhost-vdpa calculates the rss hash in the hardware device instead of in the kernel by eBPF. Although this requires QEMU to be compiled with `--enable-bpf` configuration even if the vdpa device does not use eBPF to calculate the rss hash, this can avoid adding the specific conditional statements for vDPA case to enable the VIRTIO_NET_F_RSS feature, which reduces code maintainbility. Suggested-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <280e20ddce55b6de60f1552ba0865bffffe909b2.1698195059.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07vdpa: Allow VIRTIO_NET_F_HASH_REPORT in SVQHawkins Jiawei1-0/+1
Enable SVQ with VIRTIO_NET_F_HASH_REPORT feature. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <d66b0aee501cdad7954231900c35a11cad1e13db.1698194366.git.yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-07vdpa: Restore hash calculation stateHawkins Jiawei1-0/+91
This patch introduces vhost_vdpa_net_load_rss() to restore the hash calculation state at device's startup. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <dbf699acff8c226596136a55a6abe35ebfeac8b0.1698194366.git.yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-11-01migration: Use VMSTATE_INSTANCE_ID_ANY for slirpJuan Quintela1-2/+3
Each user network conection create a new slirp instance. We register more than one slirp instance for number 0. qemu-system-x86_64: -netdev user,id=hs1: savevm_state_handler_insert: Detected duplicate SaveStateEntry: id=slirp, instance_id=0x0 Broken pipe ../../../../../mnt/code/qemu/full/tests/qtest/libqtest.c:195: kill_qemu() tried to terminate QEMU process but encountered exit status 1 (expected 0) Aborted (core dumped) Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231020090731.28701-6-quintela@redhat.com>
2023-10-23Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi1-132/+242
into staging virtio,pc,pci: features, cleanups infrastructure for vhost-vdpa shadow work piix south bridge rework reconnect for vhost-user-scsi dummy ACPI QTG DSM for cxl tests, cleanups, fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmU06PMPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpNIsH/0DlKti86VZLJ6PbNqsnKxoK2gg05TbEhPZU # pQ+RPDaCHpFBsLC5qsoMJwvaEQFe0e49ZFemw7bXRzBxgmbbNnZ9ArCIPqT+rvQd # 7UBmyC+kacVyybZatq69aK2BHKFtiIRlT78d9Izgtjmp8V7oyKoz14Esh8wkE+FT # ypHUa70Addi6alNm6BVkm7bxZxi0Wrmf3THqF8ViYvufzHKl7JR5e17fKWEG0BqV # 9W7AeHMnzJ7jkTvBGUw7g5EbzFn7hPLTbO4G/VW97k0puS4WRX5aIMkVhUazsRIa # zDOuXCCskUWuRapiCwY0E4g7cCaT8/JR6JjjBaTgkjJgvo5Y8Eg= # =ILek # -----END PGP SIGNATURE----- # gpg: Signature made Sun 22 Oct 2023 02:18:43 PDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (62 commits) intel-iommu: Report interrupt remapping faults, fix return value MAINTAINERS: Add include/hw/intc/i8259.h to the PC chip section vhost-user: Fix protocol feature bit conflict tests/acpi: Update DSDT.cxl with QTG DSM hw/cxl: Add QTG _DSM support for ACPI0017 device tests/acpi: Allow update of DSDT.cxl hw/i386/cxl: ensure maxram is greater than ram size for calculating cxl range vhost-user: fix lost reconnect vhost-user-scsi: start vhost when guest kicks vhost-user-scsi: support reconnect to backend vhost: move and rename the conn retry times vhost-user-common: send get_inflight_fd once hw/i386/pc_piix: Make PIIX4 south bridge usable in PC machine hw/isa/piix: Implement multi-process QEMU support also for PIIX4 hw/isa/piix: Resolve duplicate code regarding PCI interrupt wiring hw/isa/piix: Reuse PIIX3's PCI interrupt triggering in PIIX4 hw/isa/piix: Rename functions to be shared for PCI interrupt triggering hw/isa/piix: Reuse PIIX3 base class' realize method in PIIX4 hw/isa/piix: Share PIIX3's base class with PIIX4 hw/isa/piix: Harmonize names of reset control memory regions ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-10-20migration: simplify notifiersSteve Sistare1-3/+4
Pass the callback function to add_migration_state_change_notifier so that migration can initialize the notifier on add and clear it on delete, which simplifies the call sites. Shorten the function names so the extra arg can be added more legibly. Hide the global notifier list in a new function migration_call_notifiers, and make it externally visible so future live update code can call it. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Tested-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Michael Galaxy <mgalaxy@akamai.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <1686148954-250144-1-git-send-email-steven.sistare@oracle.com>
2023-10-18vdpa: Send cvq state load commands in parallelHawkins Jiawei1-63/+102
This patch enables sending CVQ state load commands in parallel at device startup by following steps: * Refactor vhost_vdpa_net_load_cmd() to iterate through the control commands shadow buffers. This allows different CVQ state load commands to use their own unique buffers. * Delay the polling and checking of buffers until either the SVQ is full or control commands shadow buffers are full. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1578 Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <9350f32278e39f7bce297b8f2d82dac27c6f8c9a.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-18vdpa: Introduce cursors to vhost_vdpa_net_loadx()Hawkins Jiawei1-36/+75
This patch introduces two new arugments, `out_cursor` and `in_cursor`, to vhost_vdpa_net_loadx(). Addtionally, it includes a helper function vhost_vdpa_net_load_cursor_reset() for resetting these cursors. Furthermore, this patch refactors vhost_vdpa_net_load_cmd() so that vhost_vdpa_net_load_cmd() prepares buffers for the device using the cursors arguments, instead of directly accesses `s->cvq_cmd_out_buffer` and `s->status` fields. By making these change, next patches in this series can refactor vhost_vdpa_net_load_cmd() directly to iterate through the control commands shadow buffers, allowing QEMU to send CVQ state load commands in parallel at device startup. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <1c6516e233a14cc222f0884e148e4e1adceda78d.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-18vdpa: Move vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add()Hawkins Jiawei1-10/+43
This patch moves vhost_svq_poll() to the caller of vhost_vdpa_net_cvq_add() and introduces a helper funtion. By making this change, next patches in this series is able to refactor vhost_vdpa_net_load_x() only to delay the polling and checking process until either the SVQ is full or control commands shadow buffers are full. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <196cadb55175a75275660c6634a538289f027ae3.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-18vdpa: Check device ack in vhost_vdpa_net_load_rx_mode()Hawkins Jiawei1-45/+31
Considering that vhost_vdpa_net_load_rx_mode() is only called within vhost_vdpa_net_load_rx() now, this patch refactors vhost_vdpa_net_load_rx_mode() to include a check for the device's ack, simplifying the code and improving its maintainability. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <68811d52f96ae12d68f0d67d996ac1642a623943.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-18vdpa: Avoid using vhost_vdpa_net_load_*() outside vhost_vdpa_net_load()Hawkins Jiawei1-3/+11
Next patches in this series will refactor vhost_vdpa_net_load_cmd() to iterate through the control commands shadow buffers, allowing QEMU to send CVQ state load commands in parallel at device startup. Considering that QEMU always forwards the CVQ command serialized outside of vhost_vdpa_net_load(), it is more elegant to send the CVQ commands directly without invoking vhost_vdpa_net_load_*() helpers. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <254f0618efde7af7229ba4fdada667bb9d318991.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-18vdpa: Use iovec for vhost_vdpa_net_cvq_add()Hawkins Jiawei1-17/+22
Next patches in this series will no longer perform an immediate poll and check of the device's used buffers for each CVQ state load command. Consequently, there will be multiple pending buffers in the shadow VirtQueue, making it a must for every control command to have its own buffer. To achieve this, this patch refactor vhost_vdpa_net_cvq_add() to accept `struct iovec`, which eliminates the coupling of control commands to `s->cvq_cmd_out_buffer` and `s->status`, allowing them to use their own buffer. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <8a328f146fb043f34edb75ba6d043d2d6de88f99.1697165821.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-06net/net: Clean up global variable shadowingPhilippe Mathieu-Daudé1-7/+7
Fix: net/net.c:1680:35: error: declaration shadows a variable in the global scope [-Werror,-Wshadow] bool netdev_is_modern(const char *optarg) ^ net/net.c:1714:38: error: declaration shadows a variable in the global scope [-Werror,-Wshadow] void netdev_parse_modern(const char *optarg) ^ net/net.c:1728:60: error: declaration shadows a variable in the global scope [-Werror,-Wshadow] void net_client_parse(QemuOptsList *opts_list, const char *optarg) ^ /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/getopt.h:77:14: note: previous declaration is here extern char *optarg; /* getopt(3) external variables */ ^ Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20231004120019.93101-4-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2023-10-05Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi1-39/+106
into staging virtio,pci: features, cleanups vdpa: shadow vq vlan support net migration with cvq cxl: support emulating 4 HDM decoders serial number extended capability virtio: hared dma-buf Fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (53 commits) libvhost-user: handle shared_object msg vhost-user: add shared_object msg hw/display: introduce virtio-dmabuf util/uuid: add a hash function virtio: remove unused next argument from virtqueue_split_read_next_desc() virtio: remove unnecessary thread fence while reading next descriptor virtio: use shadow_avail_idx while checking number of heads libvhost-user.c: add assertion to vu_message_read_default pcie_sriov: unregister_vfs(): fix error path hw/i386/pc: improve physical address space bound check for 32-bit x86 systems amd_iommu: Fix APIC address check vdpa net: follow VirtIO initialization properly at cvq isolation probing vdpa net: stop probing if cannot set features vdpa net: fix error message setting virtio status hw/pci-bridge/cxl-upstream: Add serial number extended capability support hw/cxl: Support 4 HDM decoders at all levels of topology hw/cxl: Fix and use same calculation for HDM decoder block size everywhere hw/cxl: Add utility functions decoder interleave ways and target count. hw/cxl: Push cxl_decoder_count_enc() and cxl_decode_ig() into .c vdpa net: zero vhost_vdpa iova_tree pointer at cleanup ... Conflicts: hw/core/machine.c Context conflict with commit 314e0a84cd5d ("hw/core: remove needless includes") because it removed an adjacent #include.
2023-10-04vdpa net: follow VirtIO initialization properly at cvq isolation probingEugenio Pérez1-4/+10
This patch solves a few issues. The most obvious is that the feature set was done previous to ACKNOWLEDGE | DRIVER status bit set. Current vdpa devices are permissive with this, but it is better to follow the standard. Fixes: 152128d646 ("vdpa: move CVQ isolation check to net_init_vhost_vdpa") Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230915170836.3078172-4-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04vdpa net: stop probing if cannot set featuresEugenio Pérez1-0/+1
Otherwise it continues the CVQ isolation probing. Fixes: 152128d646 ("vdpa: move CVQ isolation check to net_init_vhost_vdpa") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230915170836.3078172-3-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-10-04vdpa net: fix error message setting virtio statusEugenio Pérez1-1/+1
It incorrectly prints "error setting features", probably because a copy paste miss. Fixes: 152128d646 ("vdpa: move CVQ isolation check to net_init_vhost_vdpa") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230915170836.3078172-2-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-10-04vdpa net: zero vhost_vdpa iova_tree pointer at cleanupEugenio Pérez1-0/+2
Not zeroing it causes a SIGSEGV if the live migration is cancelled, at net device restart. This is caused because CVQ tries to reuse the iova_tree that is present in the first vhost_vdpa device at the end of vhost_vdpa_net_cvq_start. As a consequence, it tries to access an iova_tree that has been already free. Fixes: 00ef422e9fbf ("vdpa net: move iova tree creation from init to start") Reported-by: Yanhui Ma <yama@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230913123408.2819185-1-eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04vdpa: fix gcc cvq_isolated uninitialized variable warningStefan Hajnoczi1-1/+1
gcc 13.2.1 emits the following warning: net/vhost-vdpa.c: In function ‘net_vhost_vdpa_init.constprop’: net/vhost-vdpa.c:1394:25: error: ‘cvq_isolated’ may be used uninitialized [-Werror=maybe-uninitialized] 1394 | s->cvq_isolated = cvq_isolated; | ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~ net/vhost-vdpa.c:1355:9: note: ‘cvq_isolated’ was declared here 1355 | int cvq_isolated; | ^~~~~~~~~~~~ cc1: all warnings being treated as errors Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230911215435.4156314-1-stefanha@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04vhost: Add count argument to vhost_svq_poll()Hawkins Jiawei1-1/+1
Next patches in this series will no longer perform an immediate poll and check of the device's used buffers for each CVQ state load command. Instead, they will send CVQ state load commands in parallel by polling multiple pending buffers at once. To achieve this, this patch refactoring vhost_svq_poll() to accept a new argument `num`, which allows vhost_svq_poll() to wait for the device to use multiple elements, rather than polling for a single element. Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <950b3bfcfc5d446168b9d6a249d554a013a691d4.1693287885.git.yin31149@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04vdpa: remove net cvq migration blockerEugenio Pérez1-12/+0
Now that we have add migration blockers if the device does not support all the needed features, remove the general blocker applied to all net devices with CVQ. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230822085330.3978829-6-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04vdpa: move vhost_vdpa_set_vring_ready to the callerEugenio Pérez1-22/+43
Doing that way allows CVQ to be enabled before the dataplane vqs, restoring the state as MQ or MAC addresses properly in the case of a migration. The patch does it by defining a ->load NetClientInfo callback also for dataplane. Ideally, this should be done by an independent patch, but the function is already static so it would only add an empty vhost_vdpa_net_data_load stub. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Message-Id: <20230822085330.3978829-5-eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04vdpa: rename vhost_vdpa_net_load to vhost_vdpa_net_cvq_loadEugenio Pérez1-2/+2
Next patches will add the corresponding data load. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230822085330.3978829-4-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04vdpa: use first queue SVQ state for CVQ defaultEugenio Pérez1-1/+1
Previous to this patch the only way CVQ would be shadowed is if it does support to isolate CVQ group or if all vqs were shadowed from the beginning. The second condition was checked at the beginning, and no more configuration was done. After this series we need to check if data queues are shadowed because they are in the middle of the migration. As checking if they are shadowed already covers the previous case, let's just mimic it. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230822085330.3978829-2-eperezma@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQHawkins Jiawei1-0/+1
Enable SVQ with VIRTIO_NET_F_CTRL_VLAN feature. Co-developed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <38dc63102a42c31c72fd293d0e6e2828fd54c86e.1690106284.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-10-04vdpa: Restore vlan filtering stateHawkins Jiawei1-0/+48
This patch introduces vhost_vdpa_net_load_single_vlan() and vhost_vdpa_net_load_vlan() to restore the vlan filtering state at device's startup. Co-developed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Hawkins Jiawei <yin31149@gmail.com> Message-Id: <e76a29f77bb3f386e4a643c8af94b77b775d1752.1690106284.git.yin31149@gmail.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2023-09-29net/eth: Clean up local variable shadowingPhilippe Mathieu-Daudé1-2/+0
Fix: net/eth.c:435:20: error: declaration shadows a local variable [-Werror,-Wshadow] size_t input_size = iov_size(pkt, pkt_frags); ^ net/eth.c:413:16: note: previous declaration is here size_t input_size = iov_size(pkt, pkt_frags); ^ Suggested-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230904161235.84651-16-philmd@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2023-09-18net/tap: Avoid variable-length arrayPeter Maydell1-1/+2
Use a heap allocation instead of a variable length array in tap_receive_iov(). The codebase has very few VLAs, and if we can get rid of them all we can make the compiler error on new additions. This is a defensive measure against security bugs where an on-stack dynamic allocation isn't correctly size-checked (e.g. CVE-2021-3527). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18net/dump: Avoid variable length arrayPeter Maydell1-1/+1
Use a g_autofree heap allocation instead of a variable length array in dump_receive_iov(). The codebase has very few VLAs, and if we can get rid of them all we can make the compiler error on new additions. This is a defensive measure against security bugs where an on-stack dynamic allocation isn't correctly size-checked (e.g. CVE-2021-3527). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Francisco Iglesias <frasse.iglesias@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18net: add initial support for AF_XDP network backendIlya Maximets4-0/+540
AF_XDP is a network socket family that allows communication directly with the network device driver in the kernel, bypassing most or all of the kernel networking stack. In the essence, the technology is pretty similar to netmap. But, unlike netmap, AF_XDP is Linux-native and works with any network interfaces without driver modifications. Unlike vhost-based backends (kernel, user, vdpa), AF_XDP doesn't require access to character devices or unix sockets. Only access to the network interface itself is necessary. This patch implements a network backend that communicates with the kernel by creating an AF_XDP socket. A chunk of userspace memory is shared between QEMU and the host kernel. 4 ring buffers (Tx, Rx, Fill and Completion) are placed in that memory along with a pool of memory buffers for the packet data. Data transmission is done by allocating one of the buffers, copying packet data into it and placing the pointer into Tx ring. After transmission, device will return the buffer via Completion ring. On Rx, device will take a buffer form a pre-populated Fill ring, write the packet data into it and place the buffer into Rx ring. AF_XDP network backend takes on the communication with the host kernel and the network interface and forwards packets to/from the peer device in QEMU. Usage example: -device virtio-net-pci,netdev=guest1,mac=00:16:35:AF:AA:5C -netdev af-xdp,ifname=ens6f1np1,id=guest1,mode=native,queues=1 XDP program bridges the socket with a network interface. It can be attached to the interface in 2 different modes: 1. skb - this mode should work for any interface and doesn't require driver support. With a caveat of lower performance. 2. native - this does require support from the driver and allows to bypass skb allocation in the kernel and potentially use zero-copy while getting packets in/out userspace. By default, QEMU will try to use native mode and fall back to skb. Mode can be forced via 'mode' option. To force 'copy' even in native mode, use 'force-copy=on' option. This might be useful if there is some issue with the driver. Option 'queues=N' allows to specify how many device queues should be open. Note that all the queues that are not open are still functional and can receive traffic, but it will not be delivered to QEMU. So, the number of device queues should generally match the QEMU configuration, unless the device is shared with something else and the traffic re-direction to appropriate queues is correctly configured on a device level (e.g. with ethtool -N). 'start-queue=M' option can be used to specify from which queue id QEMU should start configuring 'N' queues. It might also be necessary to use this option with certain NICs, e.g. MLX5 NICs. See the docs for examples. In a general case QEMU will need CAP_NET_ADMIN and CAP_SYS_ADMIN or CAP_BPF capabilities in order to load default XSK/XDP programs to the network interface and configure BPF maps. It is possible, however, to run with no capabilities. For that to work, an external process with enough capabilities will need to pre-load default XSK program, create AF_XDP sockets and pass their file descriptors to QEMU process on startup via 'sock-fds' option. Network backend will need to be configured with 'inhibit=on' to avoid loading of the program. QEMU will need 32 MB of locked memory (RLIMIT_MEMLOCK) per queue or CAP_IPC_LOCK. There are few performance challenges with the current network backends. First is that they do not support IO threads. This means that data path is handled by the main thread in QEMU and may slow down other work or may be slowed down by some other work. This also means that taking advantage of multi-queue is generally not possible today. Another thing is that data path is going through the device emulation code, which is not really optimized for performance. The fastest "frontend" device is virtio-net. But it's not optimized for heavy traffic either, because it expects such use-cases to be handled via some implementation of vhost (user, kernel, vdpa). In practice, we have virtio notifications and rcu lock/unlock on a per-packet basis and not very efficient accesses to the guest memory. Communication channels between backend and frontend devices do not allow passing more than one packet at a time as well. Some of these challenges can be avoided in the future by adding better batching into device emulation or by implementing vhost-af-xdp variant. There are also a few kernel limitations. AF_XDP sockets do not support any kinds of checksum or segmentation offloading. Buffers are limited to a page size (4K), i.e. MTU is limited. Multi-buffer support implementation for AF_XDP is in progress, but not ready yet. Also, transmission in all non-zero-copy modes is synchronous, i.e. done in a syscall. That doesn't allow high packet rates on virtual interfaces. However, keeping in mind all of these challenges, current implementation of the AF_XDP backend shows a decent performance while running on top of a physical NIC with zero-copy support. Test setup: 2 VMs running on 2 physical hosts connected via ConnectX6-Dx card. Network backend is configured to open the NIC directly in native mode. The driver supports zero-copy. NIC is configured to use 1 queue. Inside a VM - iperf3 for basic TCP performance testing and dpdk-testpmd for PPS testing. iperf3 result: TCP stream : 19.1 Gbps dpdk-testpmd (single queue, single CPU core, 64 B packets) results: Tx only : 3.4 Mpps Rx only : 2.0 Mpps L2 FWD Loopback : 1.5 Mpps In skb mode the same setup shows much lower performance, similar to the setup where pair of physical NICs is replaced with veth pair: iperf3 result: TCP stream : 9 Gbps dpdk-testpmd (single queue, single CPU core, 64 B packets) results: Tx only : 1.2 Mpps Rx only : 1.0 Mpps L2 FWD Loopback : 0.7 Mpps Results in skb mode or over the veth are close to results of a tap backend with vhost=on and disabled segmentation offloading bridged with a NIC. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> (docker/lcitool) Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18virtio-net: Add USO flags to vhost support.Andrew Melnychenko1-0/+3
New features are subject to check with vhost-user and vdpa. Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18tap: Add check for USO featuresYuri Benditovich7-0/+49
Tap indicates support for USO features according to capabilities of current kernel module. Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Signed-off-by: Andrew Melnychecnko <andrew@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-18tap: Add USO support to tap device.Andrew Melnychenko10-14/+26
Passing additional parameters (USOv4 and USOv6 offloads) when setting TAP offloads Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Signed-off-by: Andrew Melnychenko <andrew@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2023-09-13meson: Fix targetos match for illumos and Solaris.Jonathan Perkin1-1/+1
qemu 8.1.0 breaks on illumos platforms due to _XOPEN_SOURCE and others no longer being set correctly, leading to breakage such as: https://us-central.manta.mnx.io/pkgsrc/public/reports/trunk/tools/20230908.1404/qemu-8.1.0/build.log This is a result of meson conversion which incorrectly matches against 'solaris' instead of 'sunos' for uname. First time submitting a patch here, hope I did it correctly. Thanks. Signed-off-by: Jonathan Perkin <jonathan@perkin.org.uk> Message-ID: <ZPtdxtum9UVPy58J@perkin.org.uk> Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-08misc/other: spelling fixesMichael Tokarev3-9/+9
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Eric Blake <eblake@redhat.com>
2023-09-07Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingStefan Hajnoczi1-8/+10
* only build util/async-teardown.c when system build is requested * target/i386: fix BQL handling of the legacy FERR interrupts * target/i386: fix memory operand size for CVTPS2PD * target/i386: Add support for AMX-COMPLEX in CPUID enumeration * compile plugins on Darwin * configure and meson cleanups * drop mkvenv support for Python 3.7 and Debian10 * add wrap file for libblkio * tweak KVM stubs # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmT5t6UUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroMmjwf+MpvVuq+nn+3PqGUXgnzJx5ccA5ne # O9Xy8+1GdlQPzBw/tPovxXDSKn3HQtBfxObn2CCE1tu/4uHWpBA1Vksn++NHdUf2 # P0yoHxGskJu5iYYTtIcNw5cH2i+AizdiXuEjhfNjqD5Y234cFoHnUApt9e3zBvVO # cwGD7WpPuSb4g38hHkV6nKcx72o7b4ejDToqUVZJ2N+RkddSqB03fSdrOru0hR7x # V+lay0DYdFszNDFm05LJzfDbcrHuSryGA91wtty7Fzj6QhR/HBHQCUZJxMB5PI7F # Zy4Zdpu60zxtSxUqeKgIi7UhNFgMcax2Hf9QEqdc/B4ARoBbboh4q4u8kQ== # =dH7/ # -----END PGP SIGNATURE----- # gpg: Signature made Thu 07 Sep 2023 07:44:37 EDT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (51 commits) docs/system/replay: do not show removed command line option subprojects: add wrap file for libblkio sysemu/kvm: Restrict kvm_pc_setup_irq_routing() to x86 targets sysemu/kvm: Restrict kvm_has_pit_state2() to x86 targets sysemu/kvm: Restrict kvm_get_apic_state() to x86 targets sysemu/kvm: Restrict kvm_arch_get_supported_cpuid/msr() to x86 targets target/i386: Restrict declarations specific to CONFIG_KVM target/i386: Allow elision of kvm_hv_vpindex_settable() target/i386: Allow elision of kvm_enable_x2apic() target/i386: Remove unused KVM stubs target/i386/cpu-sysemu: Inline kvm_apic_in_kernel() target/i386/helper: Restrict KVM declarations to system emulation hw/i386/fw_cfg: Include missing 'cpu.h' header hw/i386/pc: Include missing 'cpu.h' header hw/i386/pc: Include missing 'sysemu/tcg.h' header Revert "mkvenv: work around broken pip installations on Debian 10" mkvenv: assume presence of importlib.metadata Python: Drop support for Python 3.7 configure: remove dead code meson: list leftover CONFIG_* symbols ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-07configure, meson: remove CONFIG_SOLARIS from config-host.makPaolo Bonzini1-8/+10
CONFIG_SOLARIS is only used to pick tap implementations. But the target OS is invariant and does not depend on the configuration, so move away from config_host and just use unconditional rules in softmmu_ss. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>