aboutsummaryrefslogtreecommitdiff
path: root/migration
AgeCommit message (Collapse)AuthorFilesLines
2024-12-19migration: Constify migration_propertiesRichard Henderson2-2/+2
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218134251.4724-2-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-02migration: correct multifd receive thread namePrasad Pandit1-1/+1
Multifd receive threads run on the destination side. Correct the thread name marco to indicate the same. Fixes: e620b1e4770b ("migration: Put thread names together with macros") Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241127111528.167330-1-ppandit@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-11-25migration: Fix extra cleanup at postcopy listenFabiano Rosas1-1/+0
After fixing the loadvm cleanup race the qemu_loadvm_state_cleanup() is now being called twice in the postcopy listen thread. Fixes: 4ce5622908 ("migration/multifd: Fix rb->receivedmap cleanup race") Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241125191128.9120-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-11-25migration: Allow pipes to keep working for fd migrationsPeter Xu1-2/+25
Libvirt may still use pipes for old file migrations in fd: URI form, especially when loading old images dumped from Libvirt's compression algorithms. In that case, Libvirt needs to compress / uncompress the images on its own over the migration binary stream, and pipes are passed over to QEMU for outgoing / incoming migrations in "fd:" URIs. For future such use case, it should be suggested to use mapped-ram when saving such VM image. However there can still be old images that was compressed in such way, so libvirt needs to be able to load those images, uncompress them and use the same pipe mechanism to pass that over to QEMU. It means, even if new file migrations can be gradually moved over to mapped-ram (after Libvirt start supporting it), Libvirt still needs the uncompressor for the old images to be able to load like before. Meanwhile since Libvirt currently exposes the compression capability to guest images, it may needs its own lifecycle management to move that over to mapped-ram, maybe can be done after mapped-ram saved the image, however Dan and PeterK raised concern on temporary double disk space consumption. I suppose for now the easiest is to enable pipes for both sides of "fd:" migrations, until all things figured out from Libvirt side on how to move on. And for "channels" QMP interface support on "migrate" / "migrate-incoming" commands, we'll also need to move away from pipe. But let's leave that for later too. So far, still allow pipes to happen like before on both save/load sides, just like we would allow sockets to pass. Cc: qemu-stable <qemu-stable@nongnu.org> Cc: Fabiano Rosas <farosas@suse.de> Cc: Peter Krempa <pkrempa@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Fixes: c55deb860c ("migration: Deprecate fd: for file migration") Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241120160132.3659735-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-11-13migration: fix-possible-int-overflowDmitry Frolov1-1/+1
stat64_add() takes uint64_t as 2nd argument, but both "p->next_packet_size" and "p->packet_len" are uint32_t. Thus, theyr sum may overflow uint32_t. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Dmitry Frolov <frolov@swemel.ru> Link: https://lore.kernel.org/r/20241113140509.325732-2-frolov@swemel.ru Signed-off-by: Peter Xu <peterx@redhat.com>
2024-11-13migration: Check current_migration in migration_is_running()Peter Xu1-0/+4
Report shows that commit 34a8892dec broke iotest 055: https://lore.kernel.org/r/b8806360-a2b6-4608-83a3-db67e264c733@linaro.org Denis Rastyogin reported more such issue: https://lore.kernel.org/r/20241107114256.106831-1-gerben@altlinux.org In this merge, the migration_is_idle() function was replaced with migrate_is_running(). However, the null pointer check for `s` was removed, leading to a dereference of `s` when using qemu-system-x86_64 -hda *.vdi. When replacing migration_is_idle() with "!migration_is_running()", it was overlooked that the idle helper also checks for current_migration being available first. Sample stack dump: migration_is_running is_busy migrate_add_blocker_modes migrate_add_blocker_normal vmdk_open bdrv_open_driver bdrv_open_common bdrv_open_inherit bdrv_open blk_new_open blockdev_init drive_new drive_init_func qemu_opts_foreach configure_blockdev qemu_create_early_backends qemu_init main The check would be there if the whole series was applied, but since the last patches in the previous series rely on some other patches to land first, we need to recover the behavior of migration_is_idle() first before that whole set will be merged. I left migration_is_active / migration_is_device alone, as I don't think it's possible for them to hit uninitialized current_migration. Also they're prone to removal soon from VFIO side. Cc: Peter Maydell <peter.maydell@linaro.org> Fixes: 34a8892dec ("migration: Drop migration_is_idle()") Reported-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reported-by: Denis Rastyogin <gerben@altlinux.org> Tested-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241105182725.2393425-1-peterx@redhat.com [peterx: enhance commit msg] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-11-05Merge tag 'pull-ppc-for-9.2-1-20241104' of https://gitlab.com/npiggin/qemu ↵Peter Maydell1-19/+0
into staging * Various bug fixes * Big cleanup of deprecated machines * Power11 support for spapr * XIVE improvements * Goodbye to Cedric and David as ppc reviewers, thank you both o7 # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCgAdFiEETkN92lZhb0MpsKeVZ7MCdqhiHK4FAmcoEicACgkQZ7MCdqhi # HK5M8Q//fz+ZkJndXkBjb1Oinx+q+eVtNm2JrvcWIsXyhG3K+6VxYPp69H+SRv/Z # TWuUqMQPxq8mhQvBJlDAttp/oaUEiOcCRvs/iUoBN12L4mVxXfdoT88TZ4frN3eP # 8bePq+DW2N/7gpmsJm5CyEZPpcf9AjVHgLRp3KYFkOJ/14uzvuwnocU39gl+2IUh # MXHTedQgMNXaKorJXk1NVdM6NxMuVhOvwxAs6ya2gwhxyA5tteo5PiQOnDJWkejf # xg3RRsNzGYcs1Qg/3kFIf3RfEB0aYbPxROM8IfPaJWKN5KnMggj/JAkHyK1x/V3J # wml7+cB0doMt/yRiuYJhXpyrtOqpvjRWPA6RhxECWW2kwrovv8NAF8IrFnw9NvOQ # QC66ZaaFcbAcFrVT1e/iggU76d01II6m4OAgKcXw+FRHgps4VU9y83j7ApNnNUWN # IXp9hkzoHi5VwX0FrG4ELUr2iEf1HASMvM8EZ/0AxzWj5iNtQB8lFsrEdaGVXyIS # M5JaJeNjCn4koCyYaFSctH5eKtbzIwnGWnDcdTwaOuQ+9itBvY8O+HZalE6sAc5S # kLFZ7i/Ut/qxbY5pMumt8LKD4pR1SsOxFB8dJCmn/f/tvRGtIVsoY6btNe4M0+24 # 42MxZbWO6W379C32bwbtsPiGA+aLSgShjP4cWm9cgRjz4RJFnwg= # =vmIG # -----END PGP SIGNATURE----- # gpg: Signature made Mon 04 Nov 2024 00:15:35 GMT # gpg: using RSA key 4E437DDA56616F4329B0A79567B30276A8621CAE # gpg: Good signature from "Nicholas Piggin <npiggin@gmail.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 4E43 7DDA 5661 6F43 29B0 A795 67B3 0276 A862 1CAE * tag 'pull-ppc-for-9.2-1-20241104' of https://gitlab.com/npiggin/qemu: (67 commits) MAINTAINERS: Remove myself as reviewer MAINTAINERS: Remove myself from XIVE MAINTAINERS: Remove myself from the PowerNV machines hw/ppc: Consolidate ppc440 initial mapping creation functions hw/ppc: Consolidate e500 initial mapping creation functions tests/qtest: Add XIVE tests for the powernv10 machine pnv/xive2: TIMA CI ops using alternative offsets or byte lengths pnv/xive2: TIMA support for 8-byte OS context push for PHYP pnv/xive: Update PIPR when updating CPPR pnv/xive: Add special handling for pool targets ppc/xive2: Support "Pull Thread Context to Odd Thread Reporting Line" ppc/xive2: Change context/ring specific functions to be generic ppc/xive2: Support "Pull Thread Context to Register" operation ppc/xive2: Allow 1-byte write of Target field in TIMA ppc/xive2: Dump the VP-group and crowd tables with 'info pic' ppc/xive2: Dump more NVP state with 'info pic' pnv/xive2: Support for "OS LGS Push" TIMA operation ppc/xive2: Support TIMA "Pull OS Context to Odd Thread Reporting Line" pnv/xive2: Define OGEN field in the TIMA pnv/xive: TIMA patch sets pre-req alignment and formatting changes ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-11-04ppc/spapr: remove deprecated machine pseries-2.9Harsh Prateek Bora1-19/+0
Commit 1392617d3576 intended to tag pseries-2.1 - 2.11 machines as deprecated with reasons mentioned in its commit log. Removing pseries-2.9 specific code with this patch for now. While at it, also remove the pre-2.10 migration hacks which now become obsolete. Suggested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-10-31migration/multifd: Zero p->flags before starting filling a packetMaciej S. Szmigiero1-1/+1
This way there aren't stale flags there. p->flags can't contain SYNC to be sent at the next RAM packet since syncs are now handled separately in multifd_send_thread. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/r/1c96b6cdb797e6f035eb1a4ad9bfc24f4c7f5df8.1730203967.git.maciej.szmigiero@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration/ram: Add load start trace eventMaciej S. Szmigiero2-0/+2
There's a RAM load complete trace event but there wasn't its start equivalent. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/94ddfa7ecb83a78f73b82867dd30c8767592d257.1730203967.git.maciej.szmigiero@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Drop migration_is_idle()Peter Xu2-21/+2
Now with the current migration_is_running(), it will report exactly the opposite of what will be reported by migration_is_idle(). Drop migration_is_idle(), instead use "!migration_is_running()" which should be identical on functionality. In reality, most of the idle check is inverted, so it's even easier to write with "migrate_is_running()" check. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Drop migration_is_setup_or_active()Peter Xu2-35/+5
This helper is mostly the same as migration_is_running(), except that one has COLO reported as true, the other has CANCELLING reported as true. Per my past years experience on the state changes, none of them should matter. To make it slightly safer, report both COLO || CANCELLING to be true in migration_is_running(), then drop the other one. We kept the 1st only because the name is simpler, and clear enough. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Unexport ram_mig_init()Peter Xu1-0/+1
It's only used within migration/. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Unexport dirty_bitmap_mig_init()Peter Xu1-0/+4
It's only used within migration/, so it shouldn't be exported. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Take migration object refcount earlier for threadsPeter Xu1-2/+8
Both migration thread or background snapshot thread will take a refcount of the migration object at the entrace of the thread function. That makes sense, because it protects the object from being freed by the main thread in migration_shutdown() later, but it might still race with it if the thread is scheduled too late. Consider the case right after pthread_create() happened, VM shuts down with the object released, but right after that the migration thread finally got created, referencing MigrationState* in the opaque pointer which is already freed. The only 100% safe way to make sure it won't get freed is taking the refcount right before the thread is created, meanwhile when BQL is held. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration/dirtyrate: Silence warning about strcpy() on OpenBSDThomas Huth1-1/+4
The linker on OpenBSD complains: ld: warning: dirtyrate.c:447 (../src/migration/dirtyrate.c:447)(...): warning: strcpy() is almost always misused, please use strlcpy() It's currently not a real problem in this case since both arrays have the same size (256 bytes). But just in case somebody changes the size of the source array in the future, let's better play safe and use g_strlcpy() here instead, with an additional check that the string has been copied as a whole. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Hyman Huang <yong.huang@smartx.com> Link: https://lore.kernel.org/r/20241022063402.184213-1-thuth@redhat.com [peterx: Fix over-80 chars] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Support periodic RAMBlock dirty bitmap syncHyman Huang5-4/+84
When VM is configured with huge memory, the current throttle logic doesn't look like to scale, because migration_trigger_throttle() is only called for each iteration, so it won't be invoked for a long time if one iteration can take a long time. The periodic dirty sync aims to fix the above issue by synchronizing the ramblock from remote dirty bitmap and, when necessary, triggering the CPU throttle multiple times during a long iteration. This is a trade-off between synchronization overhead and CPU throttle impact. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/f61f1b3653f2acf026901103e1c73d157d38b08f.1729146786.git.yong.huang@smartx.com [peterx: make prev_cnt global, and reset for each migration] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Remove "rs" parameter in migration_bitmap_sync_precopyHyman Huang1-5/+6
The global static variable ram_state in fact is referred to by the "rs" parameter in migration_bitmap_sync_precopy. For ease of calling by the callees, use the global variable directly in migration_bitmap_sync_precopy and remove "rs" parameter. The migration_bitmap_sync_precopy will be exported in the next commit. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/283c335d61463bf477160da91b24da45cdaf3e43.1729146786.git.yong.huang@smartx.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Move cpu-throttle.c from system to migrationHyman Huang4-0/+138
Move cpu-throttle.c from system to migration since it's only used for migration; this makes us avoid exporting the util functions and variables in misc.h but export them in migration.h when implementing the periodic ramblock dirty sync feature in the upcoming commits. Since CPU throttle timers are only used in migration, move their registry to migration_object_init. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/c1b3efaa0cb49e03d422e9da97bdb65cc3d234d1.1729146786.git.yong.huang@smartx.com [peterx: Fix build on MacOS on cocoa.m, not move cpu-throttle.h yet] [peterx: Fix subject spelling, per pm215] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Stop CPU throttling conditionallyHyman Huang1-1/+3
Since CPU throttling only occurs when auto-converge is on, stop it conditionally. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/f0c787080bb9ab0c37952f0ca5bfaa525d5ddd14.1729146786.git.yong.huang@smartx.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Ensure vmstate_save() sets errpHanna Czenczek1-5/+8
migration/savevm.c contains some calls to vmstate_save() that are followed by migrate_set_error() if the integer return value indicates an error. migrate_set_error() requires that the `Error *` object passed to it is set. Therefore, vmstate_save() is assumed to always set *errp on error. Right now, that assumption is not met: vmstate_save_state_v() (called internally by vmstate_save()) will not set *errp if vmstate_subsection_save() or vmsd->post_save() fail. Fix that by adding an *errp parameter to vmstate_subsection_save(), and by generating a generic error in case post_save() fails (as is already done for pre_save()). Without this patch, qemu will crash after vmstate_subsection_save() or post_save() have failed inside of a vmstate_save() call (unless migrate_set_error() then happen to discard the new error because s->error is already set). This happens e.g. when receiving the state from a virtio-fs back-end (virtiofsd) fails. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Link: https://lore.kernel.org/r/20241015170437.310358-1-hreitz@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Put thread names together with macrosPeter Xu7-13/+34
Keep migration thread names together, so it's easier to see a list of all possible migration threads. Still two functional changes below besides the macro defintions: - There's one dirty rate thread that we overlooked before, now we add that too and name it as "mig/dirtyrate" following the old rules. - The old name "mig/src/rp-thr" has "-thr" but it may not be useful if it's a thread name anyway, while "rp" can be slightly hard to read. Taking this chance to rename it to "mig/src/return", hopefully a better name. Reviewed-by: Fabiano Rosas <farosas@suse.de> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Link: https://lore.kernel.org/r/20241011153652.517440-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Cleanup migrate_fd_cleanup() on accessing to_dst_filePeter Xu1-13/+19
The cleanup function can in many cases needs cleanup on its own. The major thing we want to do here is not referencing to_dst_file when without the file mutex. When at it, touch things elsewhere too to make it look slightly better in general. One thing to mention is, migration_thread has its own "running" boolean, so it doesn't need to rely on to_dst_file being non-NULL. Multifd has a dependency so it needs to be skipped if to_dst_file is not yet set; add a richer comment for such reason. Resolves: Coverity CID 1527402 Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240919163042.116767-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-09migration/multifd: fix build error when qpl compression is enabledYuan Liu1-5/+5
The page_size member has been removed from the MultiFDSendParams and MultiFDRecvParams. The function multifd_ram_page_size is used to provide the page size in the multifd compressor. Fixes: 90fa121c6c ("migration/multifd: Inline page_size and page_count") Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Link: https://lore.kernel.org/r/20241008104527.3516755-1-yuan1.liu@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration/postcopy: Use uffd helpersDr. David Alan Gilbert1-34/+14
Use the uffd_copy_page, uffd_zero_page and uffd_wakeup helpers rather than calling ioctl ourselves. They return -errno on error, and print an error_report themselves. I think this actually makes postcopy_place_page actually more consistent in it's callers. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240919134626.166183-7-dave@treblig.org [peterx: fix i386 build] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration: Remove unused socket_send_channel_create_syncDr. David Alan Gilbert2-19/+0
socket_send_channel_create_sync only use was removed by d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously") Remove it. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240919134626.166183-5-dave@treblig.org Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration: Deprecate zero-blocks capabilityFabiano Rosas1-0/+4
The zero-blocks capability was meant to be used along with the block migration, which has been removed already in commit eef0bae3a7 ("migration: Remove block migration"). Setting zero-blocks is currently a noop, but the outright removal of the capability would cause and error in case some users are still setting it. Put the capability through the deprecation process. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240919134626.166183-4-dave@treblig.org Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration: Remove unused migrate_zero_blocksDr. David Alan Gilbert2-8/+0
migrate_zero_blocks is unused since eef0bae3a7 ("migration: Remove block migration") Remove it. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240919134626.166183-3-dave@treblig.org Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration: Remove migrate_cap_setDr. David Alan Gilbert2-21/+0
migrate_cap_set has been unused since 18d154f575 ("migration: Remove 'blk/-b' option from migrate commands") Remove it. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240919134626.166183-2-dave@treblig.org Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration/multifd: Ensure packet->ramblock is null-terminatedFabiano Rosas1-1/+3
Coverity points out that the current usage of strncpy to write the ramblock name allows the field to not have an ending '\0' in case idstr is already not null-terminated (e.g. if it's larger than 256 bytes). This is currently harmless because the packet->ramblock field is never touched again on the source side. The destination side reads only up to the field's size from the stream and forces the last byte to be 0. We're still open to a programming error in the future in case this field is ever passed into a function that expects a null-terminated string. Change from strncpy to QEMU's pstrcpy, which puts a '\0' at the end of the string and doesn't fill the extra space with zeros. (there's no spillage between iterations of fill_packet because after commit 87bb9e953e ("migration/multifd: Isolate ram pages packet data") the packet is always zeroed before filling) Resolves: Coverity CID 1560071 Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240919150611.17074-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-02migration: fix -Werror=maybe-uninitialized false-positiveMarc-André Lureau1-1/+1
../migration/ram.c:1873:23: error: ‘dirty’ may be used uninitialized [-Werror=maybe-uninitialized] When 'block' != NULL, 'dirty' is initialized. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Peter Xu <peterx@redhat.com>
2024-10-02migration: fix -Werror=maybe-uninitialized false-positivesMarc-André Lureau2-3/+3
../migration/dirtyrate.c:186:5: error: ‘records’ may be used uninitialized [-Werror=maybe-uninitialized] ../migration/dirtyrate.c:168:12: error: ‘gen_id’ may be used uninitialized [-Werror=maybe-uninitialized] ../migration/migration.c:2273:5: error: ‘file’ may be used uninitialized [-Werror=maybe-uninitialized] Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Hyman Huang <yong.huang@smartx.com>
2024-09-24migration: remove return after g_assert_not_reached()Pierrick Bouvier3-10/+0
This patch is part of a series that moves towards a consistent use of g_assert_not_reached() rather than an ad hoc mix of different assertion mechanisms. Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240919044641.386068-31-pierrick.bouvier@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-09-24migration: replace assert(false) with g_assert_not_reached()Pierrick Bouvier1-1/+1
This patch is part of a series that moves towards a consistent use of g_assert_not_reached() rather than an ad hoc mix of different assertion mechanisms. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-ID: <20240919044641.386068-14-pierrick.bouvier@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-09-24migration: replace assert(0) with g_assert_not_reached()Pierrick Bouvier3-11/+11
This patch is part of a series that moves towards a consistent use of g_assert_not_reached() rather than an ad hoc mix of different assertion mechanisms. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240919044641.386068-5-pierrick.bouvier@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-09-18migration/multifd: Fix rb->receivedmap cleanup raceFabiano Rosas2-2/+9
Fix a segmentation fault in multifd when rb->receivedmap is cleared too early. After commit 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page faults"), multifd started using the rb->receivedmap bitmap, which belongs to ram.c and is initialized and *freed* from the ram SaveVMHandlers. Multifd threads are live until migration_incoming_state_destroy(), which is called after qemu_loadvm_state_cleanup(), leading to a crash when accessing rb->receivedmap. process_incoming_migration_co() ... qemu_loadvm_state() multifd_nocomp_recv() qemu_loadvm_state_cleanup() ramblock_recv_bitmap_set_offset() rb->receivedmap = NULL set_bit_atomic(..., rb->receivedmap) ... migration_incoming_state_destroy() multifd_recv_cleanup() multifd_recv_terminate_threads(NULL) Move the loadvm cleanup into migration_incoming_state_destroy(), after multifd_recv_cleanup() to ensure multifd threads have already exited when rb->receivedmap is cleared. Adjust the postcopy listen thread comment to indicate that we still want to skip the cpu synchronization. CC: qemu-stable@nongnu.org Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page faults") Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240917185802.15619-3-farosas@suse.de [peterx: added comment in migration_incoming_state_destroy()] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-18migration/savevm: Remove extra load cleanup callsFabiano Rosas1-2/+0
There are two qemu_loadvm_state_cleanup() calls that were introduced when qemu_loadvm_state_setup() was still called before loading the configuration section, so there was state to be cleaned up if the header checks failed. However, commit 9e14b84908 ("migration/savevm: load_header before load_setup") has moved that configuration section part to qemu_loadvm_state_header() which now happens before qemu_loadvm_state_setup(). Remove the cleanup calls that are now misplaced. Note that we didn't use Fixes because it's benign to cleanup() even if setup() is not invoked. So this patch is not needed for stable, as it falls into cleanup category. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240917185802.15619-2-farosas@suse.de [peterx: added last paragraph of commit message] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-18migration/multifd: Fix loop conditions in multifd_zstd_send_prepare and ↵Stefan Weil1-4/+4
multifd_zstd_recv GitHub's CodeQL reports four critical errors which are fixed by this commit: Unsigned difference expression compared to zero An expression (u - v > 0) with unsigned values u, v is only false if u == v, so all changed expressions did not work as expected. Signed-off-by: Stefan Weil <sw@weilnetz.de> Link: https://lore.kernel.org/r/20240910054138.1458555-1-sw@weilnetz.de [peterx: Fix mangled email for author] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-17migration/multifd: Fix build for qatzipPeter Xu1-9/+9
The qatzip series was based on an older commit, it applied cleanly even though it has conflicts. Neither CI nor myself found the build will break as it's skipped by default when qatzip library was missing. Fix the build issues. No need to copy stable as it just landed 9.2. Cc: Yichen Wang <yichen.wang@bytedance.com> Cc: Bryan Zhang <bryan.zhang@bytedance.com> Cc: Hao Xiang <hao.xiang@linux.dev> Cc: Yuan Liu <yuan1.liu@intel.com> Fixes: 80484f9459 ("migration: Introduce 'qatzip' compression method") Link: https://lore.kernel.org/r/20240910210450.3835123-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-09migration: Introduce 'qatzip' compression methodBryan Zhang3-2/+398
Adds support for 'qatzip' as an option for the multifd compression method parameter, and implements using QAT for 'qatzip' compression and decompression. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com> Signed-off-by: Hao Xiang <hao.xiang@linux.dev> Signed-off-by: Yichen Wang <yichen.wang@bytedance.com> Link: https://lore.kernel.org/r/20240830232722.58272-5-yichen.wang@bytedance.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-09migration: Add migration parameters for QATzipBryan Zhang3-0/+39
Adds support for migration parameters to control QATzip compression level. Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com> Signed-off-by: Hao Xiang <hao.xiang@linux.dev> Signed-off-by: Yichen Wang <yichen.wang@bytedance.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Link: https://lore.kernel.org/r/20240830232722.58272-4-yichen.wang@bytedance.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-03migration/multifd: Add documentation for multifd methodsFabiano Rosas1-6/+70
Add documentation clarifying the usage of the multifd methods. The general idea is that the client code calls into multifd to trigger send/recv of data and multifd then calls these hooks back from the worker threads at opportune moments so the client can process a portion of the data. Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-09-03migration/multifd: Add a couple of asserts for p->iovFabiano Rosas1-0/+2
Check that p->iov is indeed always allocated and freed by the MultiFDMethods hooks. Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-09-03migration/multifd: Fix p->iov leak in multifd-uadk.cFabiano Rosas1-0/+2
The send_cleanup() hook should free the p->iov that was allocated at send_setup(). This was missed because the UADK code is conditional on the presence of the accelerator, so it's not tested by default. Fixes: 819dd20636 ("migration/multifd: Add UADK initialization") Reported-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-09-03migration/multifd: Stop changing the packet on recv sideFabiano Rosas2-28/+21
As observed by Philippe, the multifd_ram_unfill_packet() function currently leaves the MultiFDPacket structure with mixed endianness. This is harmless, but ultimately not very clean. Stop touching the received packet and do the necessary work using stack variables instead. While here tweak the error strings and fix the space before semicolons. Also remove the "100 times bigger" comment because it's just one possible explanation for a size mismatch and it doesn't even match the code. CC: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-09-03migration/multifd: Make MultiFDMethods constFabiano Rosas7-10/+10
The methods are defined at module_init time and don't ever change. Make them const. Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-09-03migration/multifd: Move nocomp code into multifd-nocomp.cFabiano Rosas4-375/+402
In preparation for adding new payload types to multifd, move most of the no-compression code into multifd-nocomp.c. Let's try to keep a semblance of layering by not mixing general multifd control flow with the details of transmitting pages of ram. There are still some pieces leftover, namely the p->normal, p->zero, etc variables that we use for zero page tracking and the packet allocation which is heavily dependent on the ram code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-09-03migration/multifd: Register nocomp ops dynamicallyFabiano Rosas1-13/+19
Prior to moving the ram code into multifd-nocomp.c, change the code to register the nocomp ops dynamically so we don't need to have the ops structure defined in multifd.c. While here, move the ops struct initialization to the end of the file to make the next diff cleaner. Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-09-03migration/multifd: Standardize on multifd ops namesFabiano Rosas5-316/+36
Add the multifd_ prefix to all functions and remove the useless docstrings. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-09-03migration/multifd: Allow multifd sync without flushFabiano Rosas3-8/+14
Separate the multifd sync from flushing the client data to the channels. These two operations are closely related but not strictly necessary to be executed together. The multifd sync is intrinsic to how multifd works. The multiple channels operate independently and may finish IO out of order in relation to each other. This applies also between the source and destination QEMU. Flushing the data that is left in the client-owned data structures (e.g. MultiFDPages_t) prior to sync is usually the right thing to do, but that is particular to how the ram migration is implemented with several passes over dirty data. Make these two routines separate, allowing future code to call the sync by itself if needed. This also allows the usage of multifd_ram_send to be isolated to ram code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>