aboutsummaryrefslogtreecommitdiff
path: root/migration
AgeCommit message (Collapse)AuthorFilesLines
2025-01-09migration/multifd: Further remove the SYNC on completePeter Xu1-3/+10
Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in complete()") stopped sending the RAM_SAVE_FLAG_MULTIFD_FLUSH flag at ram_save_complete(), because the sync on the destination side is not needed due to the last iteration of find_dirty_block() having already done it. However, that commit overlooked that multifd_ram_flush_and_sync() on the source side is also not needed at ram_save_complete(), for the same reason. Moreover, removing the RAM_SAVE_FLAG_MULTIFD_FLUSH but keeping the multifd_ram_flush_and_sync() means that currently the recv threads will hang when receiving the MULTIFD_FLAG_SYNC message, waiting for the destination sync which only happens when RAM_SAVE_FLAG_MULTIFD_FLUSH is received. Luckily, multifd is still all working fine because recv side cleanup code (mostly multifd_recv_sync_main()) is smart enough to make sure even if recv threads are stuck at SYNC it'll get kicked out. And since this is the completion phase of migration, nothing else will be sent after the SYNCs. This needs to be fixed because in the future VFIO will have data to push after ram_save_complete() and we don't want the recv thread to be stuck in the MULTIFD_FLAG_SYNC message. Remove the unnecessary (and buggy) invocation of multifd_ram_flush_and_sync(). For very old binaries (multifd_flush_after_each_section==true), the flush_and_sync is still needed because each EOS received on destination will enforce all-channel sync once. Stable branches do not need this patch, as no real bug I can think of that will go wrong there.. so not attaching Fixes to be clear on the backport not needed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20241206224755.1108686-2-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/multifd: Fix compile error caused by page_size usageShameer Kolothum1-1/+1
>From Commit 90fa121c6c07 ("migration/multifd: Inline page_size and page_count") onwards page_size is not part of MutiFD*Params but uses an inline constant instead. However, it missed updating an old usage, causing a compile error. Fixes: 90fa121c6c07 ("migration/multifd: Inline page_size and page_count") Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241203124943.52572-1-shameerali.kolothum.thodi@huawei.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2024-12-26migration: Unexport migration_is_active()Avihai Horon1-8/+8
After being removed from VFIO and dirty limit, migration_is_active() no longer has any users outside the migration subsystem, and in fact, it's only used in migration.c. Unexport it and also relocate it so it can be made static. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20241218134022.21264-8-avihaih@nvidia.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-26migration: Drop migration_is_device()Avihai Horon1-7/+0
After being removed from VFIO, migration_is_device() no longer has any users. Drop it. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20241218134022.21264-7-avihaih@nvidia.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
2024-12-21Merge tag 'exec-20241220' of https://github.com/philmd/qemu into stagingStefan Hajnoczi14-35/+35
Accel & Exec patch queue - Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander) - Add '-d invalid_mem' logging option (Zoltan) - Create QOM containers explicitly (Peter) - Rename sysemu/ -> system/ (Philippe) - Re-orderning of include/exec/ headers (Philippe) Move a lot of declarations from these legacy mixed bag headers: . "exec/cpu-all.h" . "exec/cpu-common.h" . "exec/cpu-defs.h" . "exec/exec-all.h" . "exec/translate-all" to these more specific ones: . "exec/page-protection.h" . "exec/translation-block.h" . "user/cpu_loop.h" . "user/guest-host.h" . "user/page-protection.h" # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t # wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt # KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K # A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8 # 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe/// # 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r # xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl # VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay # ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP # 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd # +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6 # x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo= # =cjz8 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 20 Dec 2024 11:45:20 EST # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'exec-20241220' of https://github.com/philmd/qemu: (59 commits) util/qemu-timer: fix indentation meson: Do not define CONFIG_DEVICES on user emulation system/accel-ops: Remove unnecessary 'exec/cpu-common.h' header system/numa: Remove unnecessary 'exec/cpu-common.h' header hw/xen: Remove unnecessary 'exec/cpu-common.h' header target/mips: Drop left-over comment about Jazz machine target/mips: Remove tswap() calls in semihosting uhi_fstat_cb() target/xtensa: Remove tswap() calls in semihosting simcall() helper accel/tcg: Un-inline translator_is_same_page() accel/tcg: Include missing 'exec/translation-block.h' header accel/tcg: Move tcg_cflags_has/set() to 'exec/translation-block.h' accel/tcg: Restrict curr_cflags() declaration to 'internal-common.h' qemu/coroutine: Include missing 'qemu/atomic.h' header exec/translation-block: Include missing 'qemu/atomic.h' header accel/tcg: Declare cpu_loop_exit_requested() in 'exec/cpu-common.h' exec/cpu-all: Include 'cpu.h' earlier so MMU_USER_IDX is always defined target/sparc: Move sparc_restore_state_to_opc() to cpu.c target/sparc: Uninline cpu_get_tb_cpu_state() target/loongarch: Declare loongarch_cpu_dump_state() locally user: Move various declarations out of 'exec/exec-all.h' ... Conflicts: hw/char/riscv_htif.c hw/intc/riscv_aplic.c target/s390x/cpu.c Apply sysemu header path changes to not in the pull request. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2024-12-20include: Rename sysemu/ -> system/Philippe Mathieu-Daudé14-35/+35
Headers in include/sysemu/ are not only related to system *emulation*, they are also used by virtualization. Rename as system/ which is clearer. Files renamed manually then mechanical change using sed tool. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Message-Id: <20241203172445.28576-1-philmd@linaro.org>
2024-12-19migration: Use device_class_set_props_nRichard Henderson3-2/+4
Export the migration_properties array size from options.c; use that to feed device_class_set_props_n. We must remove DEFINE_PROP_END_OF_LIST so the count is correct. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218134251.4724-15-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-19include/hw/qdev-core: Detect most empty Property lists at compile timeRichard Henderson1-1/+1
Add a macro expansion of device_class_set_props which can check on the type and size of PROPS before calling the function. Avoid the macro in migration.c because migration_properties is defined externally with indeterminate size. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218134251.4724-13-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-19migration: Constify migration_propertiesRichard Henderson2-2/+2
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218134251.4724-2-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-12-02migration: correct multifd receive thread namePrasad Pandit1-1/+1
Multifd receive threads run on the destination side. Correct the thread name marco to indicate the same. Fixes: e620b1e4770b ("migration: Put thread names together with macros") Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241127111528.167330-1-ppandit@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-11-25migration: Fix extra cleanup at postcopy listenFabiano Rosas1-1/+0
After fixing the loadvm cleanup race the qemu_loadvm_state_cleanup() is now being called twice in the postcopy listen thread. Fixes: 4ce5622908 ("migration/multifd: Fix rb->receivedmap cleanup race") Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241125191128.9120-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-11-25migration: Allow pipes to keep working for fd migrationsPeter Xu1-2/+25
Libvirt may still use pipes for old file migrations in fd: URI form, especially when loading old images dumped from Libvirt's compression algorithms. In that case, Libvirt needs to compress / uncompress the images on its own over the migration binary stream, and pipes are passed over to QEMU for outgoing / incoming migrations in "fd:" URIs. For future such use case, it should be suggested to use mapped-ram when saving such VM image. However there can still be old images that was compressed in such way, so libvirt needs to be able to load those images, uncompress them and use the same pipe mechanism to pass that over to QEMU. It means, even if new file migrations can be gradually moved over to mapped-ram (after Libvirt start supporting it), Libvirt still needs the uncompressor for the old images to be able to load like before. Meanwhile since Libvirt currently exposes the compression capability to guest images, it may needs its own lifecycle management to move that over to mapped-ram, maybe can be done after mapped-ram saved the image, however Dan and PeterK raised concern on temporary double disk space consumption. I suppose for now the easiest is to enable pipes for both sides of "fd:" migrations, until all things figured out from Libvirt side on how to move on. And for "channels" QMP interface support on "migrate" / "migrate-incoming" commands, we'll also need to move away from pipe. But let's leave that for later too. So far, still allow pipes to happen like before on both save/load sides, just like we would allow sockets to pass. Cc: qemu-stable <qemu-stable@nongnu.org> Cc: Fabiano Rosas <farosas@suse.de> Cc: Peter Krempa <pkrempa@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Fixes: c55deb860c ("migration: Deprecate fd: for file migration") Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241120160132.3659735-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-11-13migration: fix-possible-int-overflowDmitry Frolov1-1/+1
stat64_add() takes uint64_t as 2nd argument, but both "p->next_packet_size" and "p->packet_len" are uint32_t. Thus, theyr sum may overflow uint32_t. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Dmitry Frolov <frolov@swemel.ru> Link: https://lore.kernel.org/r/20241113140509.325732-2-frolov@swemel.ru Signed-off-by: Peter Xu <peterx@redhat.com>
2024-11-13migration: Check current_migration in migration_is_running()Peter Xu1-0/+4
Report shows that commit 34a8892dec broke iotest 055: https://lore.kernel.org/r/b8806360-a2b6-4608-83a3-db67e264c733@linaro.org Denis Rastyogin reported more such issue: https://lore.kernel.org/r/20241107114256.106831-1-gerben@altlinux.org In this merge, the migration_is_idle() function was replaced with migrate_is_running(). However, the null pointer check for `s` was removed, leading to a dereference of `s` when using qemu-system-x86_64 -hda *.vdi. When replacing migration_is_idle() with "!migration_is_running()", it was overlooked that the idle helper also checks for current_migration being available first. Sample stack dump: migration_is_running is_busy migrate_add_blocker_modes migrate_add_blocker_normal vmdk_open bdrv_open_driver bdrv_open_common bdrv_open_inherit bdrv_open blk_new_open blockdev_init drive_new drive_init_func qemu_opts_foreach configure_blockdev qemu_create_early_backends qemu_init main The check would be there if the whole series was applied, but since the last patches in the previous series rely on some other patches to land first, we need to recover the behavior of migration_is_idle() first before that whole set will be merged. I left migration_is_active / migration_is_device alone, as I don't think it's possible for them to hit uninitialized current_migration. Also they're prone to removal soon from VFIO side. Cc: Peter Maydell <peter.maydell@linaro.org> Fixes: 34a8892dec ("migration: Drop migration_is_idle()") Reported-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reported-by: Denis Rastyogin <gerben@altlinux.org> Tested-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Tested-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241105182725.2393425-1-peterx@redhat.com [peterx: enhance commit msg] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-11-05Merge tag 'pull-ppc-for-9.2-1-20241104' of https://gitlab.com/npiggin/qemu ↵Peter Maydell1-19/+0
into staging * Various bug fixes * Big cleanup of deprecated machines * Power11 support for spapr * XIVE improvements * Goodbye to Cedric and David as ppc reviewers, thank you both o7 # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCgAdFiEETkN92lZhb0MpsKeVZ7MCdqhiHK4FAmcoEicACgkQZ7MCdqhi # HK5M8Q//fz+ZkJndXkBjb1Oinx+q+eVtNm2JrvcWIsXyhG3K+6VxYPp69H+SRv/Z # TWuUqMQPxq8mhQvBJlDAttp/oaUEiOcCRvs/iUoBN12L4mVxXfdoT88TZ4frN3eP # 8bePq+DW2N/7gpmsJm5CyEZPpcf9AjVHgLRp3KYFkOJ/14uzvuwnocU39gl+2IUh # MXHTedQgMNXaKorJXk1NVdM6NxMuVhOvwxAs6ya2gwhxyA5tteo5PiQOnDJWkejf # xg3RRsNzGYcs1Qg/3kFIf3RfEB0aYbPxROM8IfPaJWKN5KnMggj/JAkHyK1x/V3J # wml7+cB0doMt/yRiuYJhXpyrtOqpvjRWPA6RhxECWW2kwrovv8NAF8IrFnw9NvOQ # QC66ZaaFcbAcFrVT1e/iggU76d01II6m4OAgKcXw+FRHgps4VU9y83j7ApNnNUWN # IXp9hkzoHi5VwX0FrG4ELUr2iEf1HASMvM8EZ/0AxzWj5iNtQB8lFsrEdaGVXyIS # M5JaJeNjCn4koCyYaFSctH5eKtbzIwnGWnDcdTwaOuQ+9itBvY8O+HZalE6sAc5S # kLFZ7i/Ut/qxbY5pMumt8LKD4pR1SsOxFB8dJCmn/f/tvRGtIVsoY6btNe4M0+24 # 42MxZbWO6W379C32bwbtsPiGA+aLSgShjP4cWm9cgRjz4RJFnwg= # =vmIG # -----END PGP SIGNATURE----- # gpg: Signature made Mon 04 Nov 2024 00:15:35 GMT # gpg: using RSA key 4E437DDA56616F4329B0A79567B30276A8621CAE # gpg: Good signature from "Nicholas Piggin <npiggin@gmail.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 4E43 7DDA 5661 6F43 29B0 A795 67B3 0276 A862 1CAE * tag 'pull-ppc-for-9.2-1-20241104' of https://gitlab.com/npiggin/qemu: (67 commits) MAINTAINERS: Remove myself as reviewer MAINTAINERS: Remove myself from XIVE MAINTAINERS: Remove myself from the PowerNV machines hw/ppc: Consolidate ppc440 initial mapping creation functions hw/ppc: Consolidate e500 initial mapping creation functions tests/qtest: Add XIVE tests for the powernv10 machine pnv/xive2: TIMA CI ops using alternative offsets or byte lengths pnv/xive2: TIMA support for 8-byte OS context push for PHYP pnv/xive: Update PIPR when updating CPPR pnv/xive: Add special handling for pool targets ppc/xive2: Support "Pull Thread Context to Odd Thread Reporting Line" ppc/xive2: Change context/ring specific functions to be generic ppc/xive2: Support "Pull Thread Context to Register" operation ppc/xive2: Allow 1-byte write of Target field in TIMA ppc/xive2: Dump the VP-group and crowd tables with 'info pic' ppc/xive2: Dump more NVP state with 'info pic' pnv/xive2: Support for "OS LGS Push" TIMA operation ppc/xive2: Support TIMA "Pull OS Context to Odd Thread Reporting Line" pnv/xive2: Define OGEN field in the TIMA pnv/xive: TIMA patch sets pre-req alignment and formatting changes ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-11-04ppc/spapr: remove deprecated machine pseries-2.9Harsh Prateek Bora1-19/+0
Commit 1392617d3576 intended to tag pseries-2.1 - 2.11 machines as deprecated with reasons mentioned in its commit log. Removing pseries-2.9 specific code with this patch for now. While at it, also remove the pre-2.10 migration hacks which now become obsolete. Suggested-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-10-31migration/multifd: Zero p->flags before starting filling a packetMaciej S. Szmigiero1-1/+1
This way there aren't stale flags there. p->flags can't contain SYNC to be sent at the next RAM packet since syncs are now handled separately in multifd_send_thread. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/r/1c96b6cdb797e6f035eb1a4ad9bfc24f4c7f5df8.1730203967.git.maciej.szmigiero@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration/ram: Add load start trace eventMaciej S. Szmigiero2-0/+2
There's a RAM load complete trace event but there wasn't its start equivalent. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/94ddfa7ecb83a78f73b82867dd30c8767592d257.1730203967.git.maciej.szmigiero@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Drop migration_is_idle()Peter Xu2-21/+2
Now with the current migration_is_running(), it will report exactly the opposite of what will be reported by migration_is_idle(). Drop migration_is_idle(), instead use "!migration_is_running()" which should be identical on functionality. In reality, most of the idle check is inverted, so it's even easier to write with "migrate_is_running()" check. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Drop migration_is_setup_or_active()Peter Xu2-35/+5
This helper is mostly the same as migration_is_running(), except that one has COLO reported as true, the other has CANCELLING reported as true. Per my past years experience on the state changes, none of them should matter. To make it slightly safer, report both COLO || CANCELLING to be true in migration_is_running(), then drop the other one. We kept the 1st only because the name is simpler, and clear enough. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Unexport ram_mig_init()Peter Xu1-0/+1
It's only used within migration/. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Unexport dirty_bitmap_mig_init()Peter Xu1-0/+4
It's only used within migration/, so it shouldn't be exported. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Take migration object refcount earlier for threadsPeter Xu1-2/+8
Both migration thread or background snapshot thread will take a refcount of the migration object at the entrace of the thread function. That makes sense, because it protects the object from being freed by the main thread in migration_shutdown() later, but it might still race with it if the thread is scheduled too late. Consider the case right after pthread_create() happened, VM shuts down with the object released, but right after that the migration thread finally got created, referencing MigrationState* in the opaque pointer which is already freed. The only 100% safe way to make sure it won't get freed is taking the refcount right before the thread is created, meanwhile when BQL is held. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration/dirtyrate: Silence warning about strcpy() on OpenBSDThomas Huth1-1/+4
The linker on OpenBSD complains: ld: warning: dirtyrate.c:447 (../src/migration/dirtyrate.c:447)(...): warning: strcpy() is almost always misused, please use strlcpy() It's currently not a real problem in this case since both arrays have the same size (256 bytes). But just in case somebody changes the size of the source array in the future, let's better play safe and use g_strlcpy() here instead, with an additional check that the string has been copied as a whole. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Hyman Huang <yong.huang@smartx.com> Link: https://lore.kernel.org/r/20241022063402.184213-1-thuth@redhat.com [peterx: Fix over-80 chars] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Support periodic RAMBlock dirty bitmap syncHyman Huang5-4/+84
When VM is configured with huge memory, the current throttle logic doesn't look like to scale, because migration_trigger_throttle() is only called for each iteration, so it won't be invoked for a long time if one iteration can take a long time. The periodic dirty sync aims to fix the above issue by synchronizing the ramblock from remote dirty bitmap and, when necessary, triggering the CPU throttle multiple times during a long iteration. This is a trade-off between synchronization overhead and CPU throttle impact. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/f61f1b3653f2acf026901103e1c73d157d38b08f.1729146786.git.yong.huang@smartx.com [peterx: make prev_cnt global, and reset for each migration] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Remove "rs" parameter in migration_bitmap_sync_precopyHyman Huang1-5/+6
The global static variable ram_state in fact is referred to by the "rs" parameter in migration_bitmap_sync_precopy. For ease of calling by the callees, use the global variable directly in migration_bitmap_sync_precopy and remove "rs" parameter. The migration_bitmap_sync_precopy will be exported in the next commit. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/283c335d61463bf477160da91b24da45cdaf3e43.1729146786.git.yong.huang@smartx.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Move cpu-throttle.c from system to migrationHyman Huang4-0/+138
Move cpu-throttle.c from system to migration since it's only used for migration; this makes us avoid exporting the util functions and variables in misc.h but export them in migration.h when implementing the periodic ramblock dirty sync feature in the upcoming commits. Since CPU throttle timers are only used in migration, move their registry to migration_object_init. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/c1b3efaa0cb49e03d422e9da97bdb65cc3d234d1.1729146786.git.yong.huang@smartx.com [peterx: Fix build on MacOS on cocoa.m, not move cpu-throttle.h yet] [peterx: Fix subject spelling, per pm215] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Stop CPU throttling conditionallyHyman Huang1-1/+3
Since CPU throttling only occurs when auto-converge is on, stop it conditionally. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/f0c787080bb9ab0c37952f0ca5bfaa525d5ddd14.1729146786.git.yong.huang@smartx.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Ensure vmstate_save() sets errpHanna Czenczek1-5/+8
migration/savevm.c contains some calls to vmstate_save() that are followed by migrate_set_error() if the integer return value indicates an error. migrate_set_error() requires that the `Error *` object passed to it is set. Therefore, vmstate_save() is assumed to always set *errp on error. Right now, that assumption is not met: vmstate_save_state_v() (called internally by vmstate_save()) will not set *errp if vmstate_subsection_save() or vmsd->post_save() fail. Fix that by adding an *errp parameter to vmstate_subsection_save(), and by generating a generic error in case post_save() fails (as is already done for pre_save()). Without this patch, qemu will crash after vmstate_subsection_save() or post_save() have failed inside of a vmstate_save() call (unless migrate_set_error() then happen to discard the new error because s->error is already set). This happens e.g. when receiving the state from a virtio-fs back-end (virtiofsd) fails. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Link: https://lore.kernel.org/r/20241015170437.310358-1-hreitz@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Put thread names together with macrosPeter Xu7-13/+34
Keep migration thread names together, so it's easier to see a list of all possible migration threads. Still two functional changes below besides the macro defintions: - There's one dirty rate thread that we overlooked before, now we add that too and name it as "mig/dirtyrate" following the old rules. - The old name "mig/src/rp-thr" has "-thr" but it may not be useful if it's a thread name anyway, while "rp" can be slightly hard to read. Taking this chance to rename it to "mig/src/return", hopefully a better name. Reviewed-by: Fabiano Rosas <farosas@suse.de> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Link: https://lore.kernel.org/r/20241011153652.517440-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Cleanup migrate_fd_cleanup() on accessing to_dst_filePeter Xu1-13/+19
The cleanup function can in many cases needs cleanup on its own. The major thing we want to do here is not referencing to_dst_file when without the file mutex. When at it, touch things elsewhere too to make it look slightly better in general. One thing to mention is, migration_thread has its own "running" boolean, so it doesn't need to rely on to_dst_file being non-NULL. Multifd has a dependency so it needs to be skipped if to_dst_file is not yet set; add a richer comment for such reason. Resolves: Coverity CID 1527402 Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240919163042.116767-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-09migration/multifd: fix build error when qpl compression is enabledYuan Liu1-5/+5
The page_size member has been removed from the MultiFDSendParams and MultiFDRecvParams. The function multifd_ram_page_size is used to provide the page size in the multifd compressor. Fixes: 90fa121c6c ("migration/multifd: Inline page_size and page_count") Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Link: https://lore.kernel.org/r/20241008104527.3516755-1-yuan1.liu@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration/postcopy: Use uffd helpersDr. David Alan Gilbert1-34/+14
Use the uffd_copy_page, uffd_zero_page and uffd_wakeup helpers rather than calling ioctl ourselves. They return -errno on error, and print an error_report themselves. I think this actually makes postcopy_place_page actually more consistent in it's callers. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240919134626.166183-7-dave@treblig.org [peterx: fix i386 build] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration: Remove unused socket_send_channel_create_syncDr. David Alan Gilbert2-19/+0
socket_send_channel_create_sync only use was removed by d0edb8a173 ("migration: Create the postcopy preempt channel asynchronously") Remove it. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240919134626.166183-5-dave@treblig.org Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration: Deprecate zero-blocks capabilityFabiano Rosas1-0/+4
The zero-blocks capability was meant to be used along with the block migration, which has been removed already in commit eef0bae3a7 ("migration: Remove block migration"). Setting zero-blocks is currently a noop, but the outright removal of the capability would cause and error in case some users are still setting it. Put the capability through the deprecation process. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240919134626.166183-4-dave@treblig.org Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration: Remove unused migrate_zero_blocksDr. David Alan Gilbert2-8/+0
migrate_zero_blocks is unused since eef0bae3a7 ("migration: Remove block migration") Remove it. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240919134626.166183-3-dave@treblig.org Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration: Remove migrate_cap_setDr. David Alan Gilbert2-21/+0
migrate_cap_set has been unused since 18d154f575 ("migration: Remove 'blk/-b' option from migrate commands") Remove it. Signed-off-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240919134626.166183-2-dave@treblig.org Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-08migration/multifd: Ensure packet->ramblock is null-terminatedFabiano Rosas1-1/+3
Coverity points out that the current usage of strncpy to write the ramblock name allows the field to not have an ending '\0' in case idstr is already not null-terminated (e.g. if it's larger than 256 bytes). This is currently harmless because the packet->ramblock field is never touched again on the source side. The destination side reads only up to the field's size from the stream and forces the last byte to be 0. We're still open to a programming error in the future in case this field is ever passed into a function that expects a null-terminated string. Change from strncpy to QEMU's pstrcpy, which puts a '\0' at the end of the string and doesn't fill the extra space with zeros. (there's no spillage between iterations of fill_packet because after commit 87bb9e953e ("migration/multifd: Isolate ram pages packet data") the packet is always zeroed before filling) Resolves: Coverity CID 1560071 Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240919150611.17074-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-02migration: fix -Werror=maybe-uninitialized false-positiveMarc-André Lureau1-1/+1
../migration/ram.c:1873:23: error: ‘dirty’ may be used uninitialized [-Werror=maybe-uninitialized] When 'block' != NULL, 'dirty' is initialized. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Peter Xu <peterx@redhat.com>
2024-10-02migration: fix -Werror=maybe-uninitialized false-positivesMarc-André Lureau2-3/+3
../migration/dirtyrate.c:186:5: error: ‘records’ may be used uninitialized [-Werror=maybe-uninitialized] ../migration/dirtyrate.c:168:12: error: ‘gen_id’ may be used uninitialized [-Werror=maybe-uninitialized] ../migration/migration.c:2273:5: error: ‘file’ may be used uninitialized [-Werror=maybe-uninitialized] Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Hyman Huang <yong.huang@smartx.com>
2024-09-24migration: remove return after g_assert_not_reached()Pierrick Bouvier3-10/+0
This patch is part of a series that moves towards a consistent use of g_assert_not_reached() rather than an ad hoc mix of different assertion mechanisms. Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240919044641.386068-31-pierrick.bouvier@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-09-24migration: replace assert(false) with g_assert_not_reached()Pierrick Bouvier1-1/+1
This patch is part of a series that moves towards a consistent use of g_assert_not_reached() rather than an ad hoc mix of different assertion mechanisms. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-ID: <20240919044641.386068-14-pierrick.bouvier@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-09-24migration: replace assert(0) with g_assert_not_reached()Pierrick Bouvier3-11/+11
This patch is part of a series that moves towards a consistent use of g_assert_not_reached() rather than an ad hoc mix of different assertion mechanisms. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240919044641.386068-5-pierrick.bouvier@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-09-18migration/multifd: Fix rb->receivedmap cleanup raceFabiano Rosas2-2/+9
Fix a segmentation fault in multifd when rb->receivedmap is cleared too early. After commit 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page faults"), multifd started using the rb->receivedmap bitmap, which belongs to ram.c and is initialized and *freed* from the ram SaveVMHandlers. Multifd threads are live until migration_incoming_state_destroy(), which is called after qemu_loadvm_state_cleanup(), leading to a crash when accessing rb->receivedmap. process_incoming_migration_co() ... qemu_loadvm_state() multifd_nocomp_recv() qemu_loadvm_state_cleanup() ramblock_recv_bitmap_set_offset() rb->receivedmap = NULL set_bit_atomic(..., rb->receivedmap) ... migration_incoming_state_destroy() multifd_recv_cleanup() multifd_recv_terminate_threads(NULL) Move the loadvm cleanup into migration_incoming_state_destroy(), after multifd_recv_cleanup() to ensure multifd threads have already exited when rb->receivedmap is cleared. Adjust the postcopy listen thread comment to indicate that we still want to skip the cpu synchronization. CC: qemu-stable@nongnu.org Fixes: 5ef7e26bdb ("migration/multifd: solve zero page causing multiple page faults") Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240917185802.15619-3-farosas@suse.de [peterx: added comment in migration_incoming_state_destroy()] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-18migration/savevm: Remove extra load cleanup callsFabiano Rosas1-2/+0
There are two qemu_loadvm_state_cleanup() calls that were introduced when qemu_loadvm_state_setup() was still called before loading the configuration section, so there was state to be cleaned up if the header checks failed. However, commit 9e14b84908 ("migration/savevm: load_header before load_setup") has moved that configuration section part to qemu_loadvm_state_header() which now happens before qemu_loadvm_state_setup(). Remove the cleanup calls that are now misplaced. Note that we didn't use Fixes because it's benign to cleanup() even if setup() is not invoked. So this patch is not needed for stable, as it falls into cleanup category. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240917185802.15619-2-farosas@suse.de [peterx: added last paragraph of commit message] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-18migration/multifd: Fix loop conditions in multifd_zstd_send_prepare and ↵Stefan Weil1-4/+4
multifd_zstd_recv GitHub's CodeQL reports four critical errors which are fixed by this commit: Unsigned difference expression compared to zero An expression (u - v > 0) with unsigned values u, v is only false if u == v, so all changed expressions did not work as expected. Signed-off-by: Stefan Weil <sw@weilnetz.de> Link: https://lore.kernel.org/r/20240910054138.1458555-1-sw@weilnetz.de [peterx: Fix mangled email for author] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-17migration/multifd: Fix build for qatzipPeter Xu1-9/+9
The qatzip series was based on an older commit, it applied cleanly even though it has conflicts. Neither CI nor myself found the build will break as it's skipped by default when qatzip library was missing. Fix the build issues. No need to copy stable as it just landed 9.2. Cc: Yichen Wang <yichen.wang@bytedance.com> Cc: Bryan Zhang <bryan.zhang@bytedance.com> Cc: Hao Xiang <hao.xiang@linux.dev> Cc: Yuan Liu <yuan1.liu@intel.com> Fixes: 80484f9459 ("migration: Introduce 'qatzip' compression method") Link: https://lore.kernel.org/r/20240910210450.3835123-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-09migration: Introduce 'qatzip' compression methodBryan Zhang3-2/+398
Adds support for 'qatzip' as an option for the multifd compression method parameter, and implements using QAT for 'qatzip' compression and decompression. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com> Signed-off-by: Hao Xiang <hao.xiang@linux.dev> Signed-off-by: Yichen Wang <yichen.wang@bytedance.com> Link: https://lore.kernel.org/r/20240830232722.58272-5-yichen.wang@bytedance.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-09migration: Add migration parameters for QATzipBryan Zhang3-0/+39
Adds support for migration parameters to control QATzip compression level. Acked-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com> Signed-off-by: Hao Xiang <hao.xiang@linux.dev> Signed-off-by: Yichen Wang <yichen.wang@bytedance.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Link: https://lore.kernel.org/r/20240830232722.58272-4-yichen.wang@bytedance.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-09-03migration/multifd: Add documentation for multifd methodsFabiano Rosas1-6/+70
Add documentation clarifying the usage of the multifd methods. The general idea is that the client code calls into multifd to trigger send/recv of data and multifd then calls these hooks back from the worker threads at opportune moments so the client can process a portion of the data. Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>