aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-11-02Merge remote-tracking branch ↵Peter Maydell16-24/+832
'remotes/dgilbert/tags/pull-migration-20201102a' into staging Migration and virtiofs fixes 2020-11-02 Fixes for postcopy migration test hang A seccomp crash for virtiofsd on some !x86 Help message and minor CID fix And another crack at Max's set. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Mon 02 Nov 2020 19:54:59 GMT # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20201102a: tests/acceptance: Add virtiofs_submounts.py tests/acceptance/boot_linux: Accept SSH pubkey virtiofsd: Announce sub-mount points virtiofsd: Add mount ID to the lo_inode key meson.build: Check for statx() virtiofsd: Add attr_flags to fuse_entry_param virtiofsd: Check FUSE_SUBMOUNTS virtiofsd: Fix the help message of posix lock tools/virtiofsd: Check vu_init() return value (CID 1435958) virtiofsd: Seccomp: Add 'send' for syslog migration: Postpone the kick of the fault thread after recover migration: Unify reset of last_rb on destination node when recover Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-02tests/acceptance: Add virtiofs_submounts.pyMax Reitz5-0/+662
This test invokes several shell scripts to create a random directory tree full of submounts, and then check in the VM whether every submount has its own ID and the structure looks as expected. (Note that the test scripts must be non-executable, so Avocado will not try to execute them as if they were tests on their own, too.) Because at this commit's date it is unlikely that the Linux kernel on the image provided by boot_linux.py supports submounts in virtio-fs, the test will be cancelled if no custom Linux binary is provided through the vmlinuz parameter. (The on-image kernel can be used by providing an empty string via vmlinuz=.) So, invoking the test can be done as follows: $ avocado run \ tests/acceptance/virtiofs_submounts.py \ -p vmlinuz=/path/to/linux/build/arch/x86/boot/bzImage This test requires root privileges (through passwordless sudo -n), because at this point, virtiofsd requires them. (If you have a timestamp_timeout period for sudoers (e.g. the default of 5 min), you can provide this by executing something like "sudo true" before invoking Avocado.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201102161859.156603-8-mreitz@redhat.com> Tested-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02tests/acceptance/boot_linux: Accept SSH pubkeyMax Reitz1-6/+7
Let download_cloudinit() take an optional pubkey, which subclasses of BootLinux can pass through setUp(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Willian Rampazzo <willianr@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201102161859.156603-7-mreitz@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02virtiofsd: Announce sub-mount pointsMax Reitz2-0/+23
Whenever we encounter a directory with an st_dev or mount ID that differs from that of its parent, we set the FUSE_ATTR_SUBMOUNT flag so the guest can create a submount for it. We only need to do so in lo_do_lookup(). The following functions return a fuse_attr object: - lo_create(), though fuse_reply_create(): Calls lo_do_lookup(). - lo_lookup(), though fuse_reply_entry(): Calls lo_do_lookup(). - lo_mknod_symlink(), through fuse_reply_entry(): Calls lo_do_lookup(). - lo_link(), through fuse_reply_entry(): Creating a link cannot create a submount, so there is no need to check for it. - lo_getattr(), through fuse_reply_attr(): Announcing submounts when the node is first detected (at lookup) is sufficient. We do not need to return the submount attribute later. - lo_do_readdir(), through fuse_add_direntry_plus(): Calls lo_do_lookup(). Make announcing submounts optional, so submounts are only announced to the guest with the announce_submounts option. Some users may prefer the current behavior, so that the guest learns nothing about the host mount structure. (announce_submounts is force-disabled when the guest does not present the FUSE_SUBMOUNTS capability, or when there is no statx().) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201102161859.156603-6-mreitz@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02virtiofsd: Add mount ID to the lo_inode keyMax Reitz2-10/+86
Using st_dev is not sufficient to uniquely identify a mount: You can mount the same device twice, but those are still separate trees, and e.g. by mounting something else inside one of them, they may differ. Using statx(), we can get a mount ID that uniquely identifies a mount. If that is available, add it to the lo_inode key. Most of this patch is taken from Miklos's mail here: https://marc.info/?l=fuse-devel&m=160062521827983 (virtiofsd-use-mount-id.patch attachment) Suggested-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201102161859.156603-5-mreitz@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02meson.build: Check for statx()Max Reitz1-0/+16
Check whether the glibc provides statx() and if so, define CONFIG_STATX. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201102161859.156603-4-mreitz@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02virtiofsd: Add attr_flags to fuse_entry_paramMax Reitz2-0/+7
fuse_entry_param is converted to fuse_attr on the line (by fill_entry()), so it should have a member that mirrors fuse_attr.flags. fill_entry() should then copy this fuse_entry_param.attr_flags to fuse_attr.flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201102161859.156603-3-mreitz@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02virtiofsd: Check FUSE_SUBMOUNTSMax Reitz2-0/+10
FUSE_SUBMOUNTS is a pure indicator by the kernel to signal that it supports submounts. It does not check its state in the init reply, so there is nothing for fuse_lowlevel.c to do but to check its existence and copy it into fuse_conn_info.capable. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201102161859.156603-2-mreitz@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02virtiofsd: Fix the help message of posix lockJiachen Zhang1-1/+1
The commit 88fc107956a5812649e5918e0c092d3f78bb28ad disabled remote posix locks by default. But the --help message still says it is enabled by default. So fix it to output no_posix_lock. Signed-off-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com> Message-Id: <20201027081558.29904-1-zhangjiachen.jaycee@bytedance.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02tools/virtiofsd: Check vu_init() return value (CID 1435958)Philippe Mathieu-Daudé1-2/+5
Since commit 6f5fd837889, vu_init() can fail if malloc() returns NULL. This fixes the following Coverity warning: CID 1435958 (#1 of 1): Unchecked return value (CHECKED_RETURN) Fixes: 6f5fd837889 ("libvhost-user: support many virtqueues") Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201102092339.2034297-1-philmd@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02virtiofsd: Seccomp: Add 'send' for syslogDr. David Alan Gilbert1-0/+1
On ppc, and some other archs, it looks like syslog ends up using 'send' rather than 'sendto'. Reference: https://github.com/kata-containers/kata-containers/issues/1050 Reported-by: amulmek1@in.ibm.com Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201102150750.34565-1-dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02migration: Postpone the kick of the fault thread after recoverPeter Xu1-3/+8
The new migrate_send_rp_req_pages_pending() call should greatly improve destination responsiveness because it will resync faulted address after postcopy recovery. However it is also the 1st place to initiate the page request from the main thread. One thing is overlooked on that migrate_send_rp_message_req_pages() is not designed to be thread-safe. So if we wake the fault thread before syncing all the faulted pages in the main thread, it means they can race. Postpone the wake up operation after the sync of faulted addresses. Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery") Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201102153010.11979-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02migration: Unify reset of last_rb on destination node when recoverPeter Xu2-2/+6
When postcopy recover happens, we need to reset last_rb after each return of postcopy_pause_fault_thread() because that means we just got the postcopy migration continued. Unify this reset to the place right before we want to kick the fault thread again, when we get the command MIG_CMD_POSTCOPY_RESUME from source. This is actually more than that - because the main thread on destination will now be able to call migrate_send_rp_req_pages_pending() too, so the fault thread is not the only user of last_rb now. Move the reset earlier will allow the first call to migrate_send_rp_req_pages_pending() to use the reset value even if called from the main thread. (NOTE: this is not a real fix to 0c26781c09 mentioned below, however it is just a mark that when picking up 0c26781c09 we'd better have this one too; the real fix will come later) Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery") Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201102153010.11979-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-02Merge remote-tracking branch 'remotes/nvme/tags/pull-nvme-20201102' into stagingPeter Maydell12-300/+1022
nvme pull 2 Nov 2020 # gpg: Signature made Mon 02 Nov 2020 15:20:30 GMT # gpg: using RSA key DBC11D2D373B4A3755F502EC625156610A4F6CC0 # gpg: Good signature from "Keith Busch <kbusch@kernel.org>" [unknown] # gpg: aka "Keith Busch <keith.busch@gmail.com>" [unknown] # gpg: aka "Keith Busch <keith.busch@intel.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: DBC1 1D2D 373B 4A37 55F5 02EC 6251 5661 0A4F 6CC0 * remotes/nvme/tags/pull-nvme-20201102: (30 commits) hw/block/nvme: fix queue identifer validation hw/block/nvme: fix create IO SQ/CQ status codes hw/block/nvme: fix prp mapping status codes hw/block/nvme: report actual LBA data shift in LBAF hw/block/nvme: add trace event for requests with non-zero status code hw/block/nvme: add nsid to get/setfeat trace events hw/block/nvme: reject io commands if only admin command set selected hw/block/nvme: support for admin-only command set hw/block/nvme: validate command set selected hw/block/nvme: support per-namespace smart log hw/block/nvme: fix log page offset check hw/block/nvme: remove pointless rw indirection hw/block/nvme: update nsid when registered hw/block/nvme: change controller pci id pci: allocate pci id for nvme hw/block/nvme: support multiple namespaces hw/block/nvme: refactor identify active namespace id list hw/block/nvme: add support for sgl bit bucket descriptor hw/block/nvme: add support for scatter gather lists hw/block/nvme: harden cmb access ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-02Merge remote-tracking branch 'remotes/aperard/tags/pull-xen-20201102' into ↵Peter Maydell1-14/+13
staging xen patch - Rework Xen disk unplug to work with newer command line options. # gpg: Signature made Mon 02 Nov 2020 14:42:37 GMT # gpg: using RSA key F80C006308E22CFD8A92E7980CF5572FD7FB55AF # gpg: issuer "anthony.perard@citrix.com" # gpg: Good signature from "Anthony PERARD <anthony.perard@gmail.com>" [marginal] # gpg: aka "Anthony PERARD <anthony.perard@citrix.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: 5379 2F71 024C 600F 778A 7161 D8D5 7199 DF83 42C8 # Subkey fingerprint: F80C 0063 08E2 2CFD 8A92 E798 0CF5 572F D7FB 55AF * remotes/aperard/tags/pull-xen-20201102: xen: rework pci_piix3_xen_ide_unplug Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-02xen: rework pci_piix3_xen_ide_unplugAnthony PERARD1-14/+13
This is to allow IDE disks to be unplugged when adding to QEMU via: -drive file=/root/disk_file,if=none,id=ide-disk0,format=raw -device ide-hd,drive=ide-disk0,bus=ide.0,unit=0 as the current code only works for disk added with: -drive file=/root/disk_file,if=ide,index=0,media=disk,format=raw Since the code already have the IDE controller as `dev`, we don't need to use the legacy DriveInfo to find all the drive we want to unplug. We can simply use `blk` from the controller, as it kind of was already assume to be the same, by setting it to NULL. Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: John Snow <jsnow@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-Id: <20201027154058.495112-1-anthony.perard@citrix.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
2020-11-02Merge remote-tracking branch 'remotes/cschoenebeck/tags/pull-9p-20201102' ↵Peter Maydell3-48/+470
into staging 9pfs: only test case changes this time * Fix occasional test failures with parallel tests. * Fix coverity error in test code. * Avoid error when auto removing test directory if it disappeared for some reason. * Refactor: Rename functions to make top-level test functions fs_*() easily distinguishable from utility test functions do_*(). * Refactor: Drop unnecessary function arguments in utility test functions. * More test cases using the 9pfs 'local' filesystem driver backend, namely for the following 9p requests: Tunlinkat, Tlcreate, Tsymlink and Tlink. # gpg: Signature made Mon 02 Nov 2020 09:31:35 GMT # gpg: using RSA key 96D8D110CF7AF8084F88590134C2B58765A47395 # gpg: issuer "qemu_oss@crudebyte.com" # gpg: Good signature from "Christian Schoenebeck <qemu_oss@crudebyte.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: ECAB 1A45 4014 1413 BA38 4926 30DB 47C3 A012 D5F4 # Subkey fingerprint: 96D8 D110 CF7A F808 4F88 5901 34C2 B587 65A4 7395 * remotes/cschoenebeck/tags/pull-9p-20201102: tests/9pfs: add local Tunlinkat hard link test tests/9pfs: add local Tlink test tests/9pfs: add local Tunlinkat symlink test tests/9pfs: add local Tsymlink test tests/9pfs: add local Tunlinkat file test tests/9pfs: add local Tlcreate test tests/9pfs: add local Tunlinkat directory test tests/9pfs: simplify do_mkdir() tests/9pfs: Turn fs_mkdir() into a helper tests/9pfs: Turn fs_readdir_split() into a helper tests/9pfs: Factor out do_attach() helper tests/9pfs: Set alloc in fs_create_dir() tests/9pfs: Factor out do_version() helper tests/9pfs: Force removing of local 9pfs test directory tests/9pfs: fix coverity error in create_local_test_dir() tests/9pfs: fix test dir for parallel tests tests/9pfs: make create/remove test dir public Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-02Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20201101.0' ↵Peter Maydell52-217/+2481
into staging VFIO update 2020-11-01 * Migration support (Kirti Wankhede) * s390 DMA limiting (Matthew Rosato) * zPCI hardware info (Matthew Rosato) * Lock guard (Amey Narkhede) * Print fixes (Zhengui li) * Warning/build fixes # gpg: Signature made Sun 01 Nov 2020 20:38:10 GMT # gpg: using RSA key 239B9B6E3BB08B22 # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>" [full] # gpg: aka "Alex Williamson <alex@shazbot.org>" [full] # gpg: aka "Alex Williamson <alwillia@redhat.com>" [full] # gpg: aka "Alex Williamson <alex.l.williamson@gmail.com>" [full] # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B 8A90 239B 9B6E 3BB0 8B22 * remotes/awilliam/tags/vfio-update-20201101.0: (32 commits) vfio: fix incorrect print type hw/vfio: Use lock guard macros s390x/pci: get zPCI function info from host vfio: Add routine for finding VFIO_DEVICE_GET_INFO capabilities s390x/pci: use a PCI Function structure s390x/pci: clean up s390 PCI groups s390x/pci: use a PCI Group structure s390x/pci: create a header dedicated to PCI CLP s390x/pci: Honor DMA limits set by vfio s390x/pci: Add routine to get the vfio dma available count vfio: Find DMA available capability vfio: Create shared routine for scanning info capabilities s390x/pci: Move header files to include/hw/s390x linux-headers: update against 5.10-rc1 update-linux-headers: Add vfio_zdev.h qapi: Add VFIO devices migration stats in Migration stats vfio: Make vfio-pci device migration capable vfio: Add ioctl to get dirty pages bitmap during dma unmap vfio: Dirty page tracking when vIOMMU is enabled vfio: Add vfio_listener_log_sync to mark dirty pages ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-11-01vfio: fix incorrect print typeZhengui li1-2/+2
The type of input variable is unsigned int while the printer type is int. So fix incorrect print type. Signed-off-by: Zhengui li <lizhengui@huawei.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01hw/vfio: Use lock guard macrosAmey Narkhede1-5/+2
Use qemu LOCK_GUARD macros in hw/vfio. Saves manual unlock calls Signed-off-by: Amey Narkhede <ameynarkhede03@gmail.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01s390x/pci: get zPCI function info from hostMatthew Rosato6-7/+202
We use the capability chains of the VFIO_DEVICE_GET_INFO ioctl to retrieve the CLP information that the kernel exports. To be compatible with previous kernel versions we fall back on previous predefined values, same as the emulation values, when the ioctl is found to not support capability chains. If individual CLP capabilities are not found, we fall back on default values for only those capabilities missing from the chain. This patch is based on work previously done by Pierre Morel. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: non-Linux build fixes] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add routine for finding VFIO_DEVICE_GET_INFO capabilitiesMatthew Rosato2-0/+12
Now that VFIO_DEVICE_GET_INFO supports capability chains, add a helper function to find specific capabilities in the chain. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01s390x/pci: use a PCI Function structurePierre Morel3-6/+15
We use a ClpRspQueryPci structure to hold the information related to a zPCI Function. This allows us to be ready to support different zPCI functions and to retrieve the zPCI function information from the host. Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01s390x/pci: clean up s390 PCI groupsMatthew Rosato1-0/+12
Add a step to remove all stashed PCI groups to avoid stale data between machine resets. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01s390x/pci: use a PCI Group structurePierre Morel3-9/+66
We use a S390PCIGroup structure to hold the information related to a zPCI Function group. This allows us to be ready to support multiple groups and to retrieve the group information from the host. Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01s390x/pci: create a header dedicated to PCI CLPPierre Morel3-196/+212
To have a clean separation between s390-pci-bus.h and s390-pci-inst.h headers we export the PCI CLP instructions in a dedicated header. Signed-off-by: Pierre Morel <pmorel@linux.ibm.com> Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01s390x/pci: Honor DMA limits set by vfioMatthew Rosato6-11/+116
When an s390 guest is using lazy unmapping, it can result in a very large number of oustanding DMA requests, far beyond the default limit configured for vfio. Let's track DMA usage similar to vfio in the host, and trigger the guest to flush their DMA mappings before vfio runs out. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: non-Linux build fixes] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01s390x/pci: Add routine to get the vfio dma available countMatthew Rosato3-0/+79
Create new files for separating out vfio-specific work for s390 pci. Add the first such routine, which issues VFIO_IOMMU_GET_INFO ioctl to collect the current dma available count. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: Fix non-Linux build with CONFIG_LINUX] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Find DMA available capabilityMatthew Rosato2-0/+33
The underlying host may be limiting the number of outstanding DMA requests for type 1 IOMMU. Add helper functions to check for the DMA available capability and retrieve the current number of DMA mappings allowed. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: vfio_get_info_dma_avail moved inside CONFIG_LINUX] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Create shared routine for scanning info capabilitiesMatthew Rosato1-8/+13
Rather than duplicating the same loop in multiple locations, create a static function to do the work. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01s390x/pci: Move header files to include/hw/s390xMatthew Rosato6-5/+6
Seems a more appropriate location for them. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01linux-headers: update against 5.10-rc1Matthew Rosato28-16/+294
commit 3650b228f83adda7e5ee532e2b90429c03f7b9ec Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> [aw: drop pvrdma_ring.h changes to avoid revert of d73415a31547 ("qemu/atomic.h: rename atomic_ to qatomic_")] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01update-linux-headers: Add vfio_zdev.hMatthew Rosato1-1/+1
vfio_zdev.h is used by s390x zPCI support to pass device-specific CLP information between host and userspace. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01qapi: Add VFIO devices migration stats in Migration statsKirti Wankhede6-0/+71
Added amount of bytes transferred to the VM at destination by all VFIO devices Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Make vfio-pci device migration capableKirti Wankhede2-21/+8
If the device is not a failover primary device, call vfio_migration_probe() and vfio_migration_finalize() to enable migration support for those devices that support it respectively to tear it down again. Removed migration blocker from VFIO PCI device specific structure and use migration blocker from generic structure of VFIO device. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add ioctl to get dirty pages bitmap during dma unmapKirti Wankhede1-4/+93
With vIOMMU, IO virtual address range can get unmapped while in pre-copy phase of migration. In that case, unmap ioctl should return pages pinned in that range and QEMU should find its correcponding guest physical addresses and report those dirty. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> [aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Dirty page tracking when vIOMMU is enabledKirti Wankhede2-6/+83
When vIOMMU is enabled, register MAP notifier from log_sync when all devices in container are in stop and copy phase of migration. Call replay and get dirty pages from notifier callback. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add vfio_listener_log_sync to mark dirty pagesKirti Wankhede2-0/+117
vfio_listener_log_sync gets list of dirty pages from container using VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all devices are stopped and saving state. Return early for the RAM block section of mapped MMIO region. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> [aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add function to start and stop dirty pages trackingKirti Wankhede1-0/+36
Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking for VFIO devices. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Get migration capability flags for containerKirti Wankhede3-9/+91
Added helper functions to get IOMMU info capability chain. Added function to get migration capability information from that capability chain for IOMMU container. Similar change was proposed earlier: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg03759.html Disable migration for devices if IOMMU module doesn't support migration capability. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabledKirti Wankhede1-1/+1
mr->ram_block is NULL when mr->is_iommu is true, then fr.dirty_log_mask wasn't set correctly due to which memory listener's log_sync doesn't get called. This patch returns log_mask with DIRTY_MEMORY_MIGRATION set when IOMMU is enabled. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add load state functions to SaveVMHandlersKirti Wankhede2-0/+199
Sequence during _RESUMING device state: While data for this device is available, repeat below steps: a. read data_offset from where user application should write data. b. write data of data_size to migration region from data_offset. c. write data_size which indicates vendor driver that data is written in staging buffer. For user, data is opaque. User should write data in the same order as received. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add save state functions to SaveVMHandlersKirti Wankhede3-0/+283
Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy functions. These functions handles pre-copy and stop-and-copy phase. In _SAVING|_RUNNING device state or pre-copy phase: - read pending_bytes. If pending_bytes > 0, go through below steps. - read data_offset - indicates kernel driver to write data to staging buffer. - read data_size - amount of data in bytes written by vendor driver in migration region. - read data_size bytes of data from data_offset in the migration region. - Write data packet to file stream as below: {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data, VFIO_MIG_FLAG_END_OF_STATE } In _SAVING device state or stop-and-copy phase a. read config space of device and save to migration file stream. This doesn't need to be from vendor driver. Any other special config state from driver can be saved as data in following iteration. b. read pending_bytes. If pending_bytes > 0, go through below steps. c. read data_offset - indicates kernel driver to write data to staging buffer. d. read data_size - amount of data in bytes written by vendor driver in migration region. e. read data_size bytes of data from data_offset in the migration region. f. Write data packet as below: {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data} g. iterate through steps b to f while (pending_bytes > 0) h. Write {VFIO_MIG_FLAG_END_OF_STATE} When data region is mapped, its user's responsibility to read data from data_offset of data_size before moving to next steps. Added fix suggested by Artem Polyakov to reset pending_bytes in vfio_save_iterate(). Added fix suggested by Zhi Wang to add 0 as data size in migration stream and add END_OF_STATE delimiter to indicate phase complete. Suggested-by: Artem Polyakov <artemp@nvidia.com> Suggested-by: Zhi Wang <zhi.wang.linux@gmail.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Register SaveVMHandlers for VFIO deviceKirti Wankhede2-0/+104
Define flags to be used as delimiter in migration stream for VFIO devices. Added .save_setup and .save_cleanup functions. Map & unmap migration region from these functions at source during saving or pre-copy phase. Set VFIO device state depending on VM's state. During live migration, VM is running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO device. During save-restore, VM is paused, _SAVING state is set for VFIO device. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add migration state change notifierKirti Wankhede3-0/+31
Added migration state change notifier to get notification on migration state change. These states are translated to VFIO device state and conveyed to vendor driver. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add VM state change handler to know state of VMKirti Wankhede3-0/+166
VM state change handler is called on change in VM's state. Based on VM state, VFIO device state should be changed. Added read/write helper functions for migration region. Added function to set device_state. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: lx -> HWADDR_PRIx, remove redundant parens] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add migration region initialization and finalize functionKirti Wankhede4-0/+135
Whether the VFIO device supports migration or not is decided based of migration region query. If migration region query is successful and migration region initialization is successful then migration is supported else migration is blocked. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add save and load functions for VFIO PCI devicesKirti Wankhede2-0/+53
Added functions to save and restore PCI device specific data, specifically config space of PCI device. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add vfio_get_object callback to VFIODeviceOpsKirti Wankhede2-0/+9
Hook vfio_get_object callback for PCI devices. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-11-01vfio: Add function to unmap VFIO regionKirti Wankhede3-4/+30
This function will be used for migration region. Migration region is mmaped when migration starts and will be unmapped when migration is complete. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>