aboutsummaryrefslogtreecommitdiff
path: root/migration/migration.c
AgeCommit message (Collapse)AuthorFilesLines
2021-07-26migration: Move the yank unregister of channel_close outPeter Xu1-1/+13
It's efficient, but hackish to call yank unregister calls in channel_close(), especially it'll be hard to debug when qemu crashed with some yank function leaked. Remove that hack, but instead explicitly unregister yank functions at the places where needed, they are: (on src) - migrate_fd_cleanup - postcopy_pause (on dst) - migration_incoming_state_destroy - postcopy_pause_incoming Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-6-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-26migration: Make from_dst_file accesses thread-safePeter Xu1-10/+29
Accessing from_dst_file is potentially racy in current code base like below: if (s->from_dst_file) do_something(s->from_dst_file); Because from_dst_file can be reset right after the check in another thread (rp_thread). One example is migrate_fd_cancel(). Use the same qemu_file_lock to protect it too, just like to_dst_file. When it's safe to access without lock, comment it. There's one special reference in migration_thread() that can be replaced by the newly introduced rp_thread_created flag. Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <20210722175841.938739-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> with Peter's fixup
2021-07-26migration: Fix missing join() of rp_threadPeter Xu1-1/+3
It's possible that the migration thread skip the join() of the rp_thread in below race and crash on src right at finishing migration: migration_thread rp_thread ---------------- --------- migration_completion() (before rp_thread quits) from_dst_file=NULL [thread got scheduled out] s->rp_state.from_dst_file==NULL (skip join() of rp_thread) migrate_fd_cleanup() qemu_fclose(s->to_dst_file) yank_unregister_instance() assert(yank_find_entry()) <------- crash It could mostly happen with postcopy, but that shouldn't be required, e.g., I think it could also trigger with MIGRATION_CAPABILITY_RETURN_PATH set. It's suspected that above race could be the root cause of a recent (but rare) migration-test break reported by either Dave or PMM: https://lore.kernel.org/qemu-devel/YPamXAHwan%2FPPXLf@work-vm/ The issue is: from_dst_file is reset in the rp_thread, so if the thread reset it to NULL fast enough then the migration thread will assume there's no rp_thread at all. This could potentially cause more severe issue (e.g. crash) after the yank code. Fix it by using a boolean to keep "whether we've created rp_thread". Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210722175841.938739-2-peterx@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-13migration: Clear error at entry of migrate_fd_connect()Peter Xu1-0/+16
For each "migrate" command, remember to clear the s->error before going on. For one reason, when there's a new error it'll be still remembered; see migrate_set_error() who only sets the error if error==NULL. Meanwhile if a failed migration completes (e.g., postcopy recovered and finished), we shouldn't dump an error when calling migrate_fd_cleanup() at last. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-4-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-13migration: Don't do migrate cleanup if during postcopy resumePeter Xu1-1/+12
Below process could crash qemu with postcopy recovery: 1. (hmp) migrate -d .. 2. (hmp) migrate_start_postcopy 3. [network down, postcopy paused] 4. (hmp) migrate -r $WRONG_PORT when try the recover on an invalid $WRONG_PORT, cleanup_bh will be cleared 5. (hmp) migrate -r $RIGHT_PORT [qemu crash on assert(cleanup_bh)] The thing is we shouldn't cleanup if it's postcopy resume; the error is set mostly because the channel is wrong, so we return directly waiting for the user to retry. migrate_fd_cleanup() should only be called when migration is cancelled or completed. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-13migration: Release return path early for paused postcopyPeter Xu1-4/+4
When postcopy pause triggered, we rely on the migration thread to cleanup the to_dst_file handle, and the return path thread to cleanup the from_dst_file handle (which is stored in the local variable "rp"). Within the process, from_dst_file cleanup (qemu_fclose) is postponed until it's setup again due to a postcopy recovery. It used to work before yank was born; after yank is introduced we rely on the refcount of IOC to correctly unregister yank function in channel_close(). If without the early and on-time release of from_dst_file handle the yank function will be leftover during paused postcopy. Without this patch, below steps (quoted from Xiaohui) could trigger qemu src crash: 1.Boot vm on src host 2.Boot vm on dst host 3.Enable postcopy on src&dst host 4.Load stressapptest in vm and set postcopy speed to 50M 5.Start migration from src to dst host, change into postcopy mode when migration is active. 6.When postcopy is active, down the network card(do migration via this network) on dst host. 7.Wait untill postcopy is paused on src&dst host. 8.Before up network card, recover migration on dst host, will get error like following. 9.Ignore the error of step 8, go on recovering migration on src host: After step 9, qemu on src host will core dump after some seconds: qemu-kvm: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. 1.sh: line 38: 44662 Aborted (core dumped) Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210708190653.252961-2-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-13migration: failover: emit a warning when the card is not fully unpluggedLaurent Vivier1-0/+4
When the migration fails or is canceled we wait the end of the unplug operation to be able to plug it back. But if the unplug operation is never finished we stop to wait and QEMU emits a warning to inform the user. Based-on: 20210629155007.629086-1-lvivier@redhat.com Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20210701131458.112036-1-lvivier@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05migration: failover: continue to wait card unplug on errorLaurent Vivier1-0/+11
If the user cancels the migration in the unplug-wait state, QEMU will try to plug back the card and this fails because the card is partially unplugged. To avoid the problem, continue to wait the card unplug, but to allow the migration to be canceled if the card never finishes to unplug use a timeout. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1976852 Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210629155007.629086-3-lvivier@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05migration: move wait-unplug loop to its own functionLaurent Vivier1-28/+26
The loop is used in migration_thread() and bg_migration_thread(), so we can move it to its own function and call it from these both places. Moreover, in migration_thread() we have a wrong state transition from SETUP to ACTIVE while state could be WAIT_UNPLUG. This is correctly managed in bg_migration_thread() so use this code instead. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20210629155007.629086-2-lvivier@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05migration: Allow reset of postcopy_recover_triggered when failedPeter Xu1-0/+13
It's possible qemu_start_incoming_migration() failed at any point, when it happens we should reset postcopy_recover_triggered to false so that the user can still retry with a saner incoming port. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20210629181356.217312-3-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-07-05migration: Move yank outside qemu_start_incoming_migration()Peter Xu1-6/+5
Starting from commit b5eea99ec2f5c, qmp_migrate_recover() calls unregister before calling qemu_start_incoming_migration(). I believe it wanted to mitigate the next call to yank_register_instance(), but I think that's wrong. Firstly, if during recover, we should keep the yank instance there, not "quickly removing and adding it back". Meanwhile, calling qmp_migrate_recover() twice with b5eea99ec2f5c will directly crash the dest qemu (right now it can't; but it'll start to work right after the next patch) because the 1st call of qmp_migrate_recover() will unregister permanently when the channel failed to establish, then the 2nd call of qmp_migrate_recover() crashes at yank_unregister_instance(). This patch fixes it by moving yank ops out of qemu_start_incoming_migration() into qmp_migrate_incoming. For qmp_migrate_recover(), drop the unregister of yank instance too since we keep it there during the recovery phase. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210629181356.217312-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-06-14migration: add trace point when vm_stop_force_state failsDaniel P. Berrangé1-0/+1
This is a critical failure scenario for migration that is hard to diagnose from existing probes. Most likely it is caused by an error from bdrv_flush(), but we're not logging the errno anywhere, hence this new probe. Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-11Remove migrate_set_block_enabled in checkpointRao, Lei1-0/+4
We can detect disk migration in migrate_prepare, if disk migration is enabled in COLO mode, we can directly report an error.and there is no need to disable block migration at every checkpoint. Signed-off-by: Lei Rao <lei.rao@intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Jason Wang <jasowang@redhat.com>
2021-06-08migration: Add cleanup hook for inwards migrationDr. David Alan Gilbert1-0/+3
Add a cleanup hook for incoming migration that gets called at the end as a way for a transport to allow cleanup. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210421112834.107651-4-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-05-14Merge remote-tracking branch ↵Peter Maydell1-15/+0
'remotes/thuth-gitlab/tags/pull-request-2021-05-14' into staging * Replace YAML anchors by extends in the gitlab-CI yaml files * Many small qtest fixes (e.g. to fix issues discovered by Coverity) * Poison more config switches in common code * Fix the failing Travis-CI and Cirrus-CI tasks # gpg: Signature made Fri 14 May 2021 12:17:39 BST # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * remotes/thuth-gitlab/tags/pull-request-2021-05-14: cirrus.yml: Fix the MSYS2 task pc-bios/s390-ccw: Fix inline assembly for older versions of Clang tests/qtest/migration-test: Use g_autofree to avoid leaks on error paths configure: Poison all current target-specific #defines migration: Move populate_vfio_info() into a separate file include/sysemu: Poison all accelerator CONFIG switches in common code tests: Avoid side effects inside g_assert() arguments tests/qtest/rtc-test: Remove pointless NULL check tests/qtest/tpm-util.c: Free memory with correct free function tests/migration-test: Fix "true" vs true tests/qtest/npcm7xx_pwm-test.c: Avoid g_assert_true() for non-test assertions tests/qtest/ahci-test.c: Calculate iso_size with 64-bit arithmetic util/compatfd.c: Replaced a malloc call with g_malloc. libqtest: refuse QTEST_QEMU_BINARY=qemu-kvm docs/devel/qgraph: add troubleshooting information libqos/qgraph: fix "UNAVAILBLE" typo gitlab-ci: Replace YAML anchors by extends (native_test_job) gitlab-ci: Replace YAML anchors by extends (native_build_job) gitlab-ci: Replace YAML anchors by extends (container_job) tests/docker/dockerfiles: Add ccache to containers where it was missing Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-05-14migration: Move populate_vfio_info() into a separate fileThomas Huth1-15/+0
The CONFIG_VFIO switch only works in target specific code. Since migration/migration.c is common code, the #ifdef does not have the intended behavior here. Move the related code to a separate file now which gets compiled via specific_ss instead. Fixes: 3710586caa ("qapi: Add VFIO devices migration stats in Migration stats") Message-Id: <20210414112004.943383-3-thuth@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2021-05-13migration/ram: Handle RAM block resizes during precopyDavid Hildenbrand1-2/+7
Resizing while migrating is dangerous and does not work as expected. The whole migration code works on the usable_length of ram blocks and does not expect this to change at random points in time. In the case of precopy, the ram block size must not change on the source, after syncing the RAM block list in ram_save_setup(), so as long as the guest is still running on the source. Resizing can be trigger *after* (but not during) a reset in ACPI code by the guest - hw/arm/virt-acpi-build.c:acpi_ram_update() - hw/i386/acpi-build.c:acpi_ram_update() Use the ram block notifier to get notified about resizes. Let's simply cancel migration and indicate the reason. We'll continue running on the source. No harm done. Update the documentation. Postcopy will be handled separately. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210429112708.12291-5-david@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Manual merge
2021-05-13migration: Drop redundant query-migrate result @blockedMarkus Armbruster1-16/+13
Result @blocked is redundant. Unfortunately, we realized this too close to the release to risk dropping it, so we deprecated it instead, in commit e11ce6c06. Since it was deprecated from the start, we can delete it without the customary grace period. Do so. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210429140424.2802929-1-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-04-07migration: Pre-fault memory before starting background snasphotAndrey Gruzdev1-0/+8
This commit solves the issue with userfault_fd WP feature that background snapshot is based on. For any never poluated or discarded memory page, the UFFDIO_WRITEPROTECT ioctl() would skip updating PTE for that page, thereby loosing WP setting for it. So we need to pre-fault pages for each RAM block to be protected before making a userfault_fd wr-protect ioctl(). Fixes: 278e2f551a095b234de74dca9c214d5502a1f72c (migration: support UFFD write fault processing in ram_save_iterate()) Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210401092226.102804-4-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> dgilbert: Bodged ifdef __linux__ on ram_write_tracking_prepare, should really go in a stub
2021-04-06migration: Inhibit virtio-balloon for the duration of background snapshotAndrey Gruzdev1-0/+8
The same thing as for incoming postcopy - we cannot deal with concurrent RAM discards when using background snapshot feature in outgoing migration. Fixes: 8518278a6af589ccc401f06e35f171b1e6fae800 (migration: implementation of background snapshot thread) Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reported-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210401092226.102804-3-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-04-06migration: Fix missing qemu_fflush() on buffer file in bg_migration_threadAndrey Gruzdev1-1/+7
Added missing qemu_fflush() on buffer file holding precopy device state. Increased initial QIOChannelBuffer allocation to 512KB to avoid reallocs. Typical configurations often require >200KB for device state and VMDESC. Fixes: 8518278a6af589ccc401f06e35f171b1e6fae800 (migration: implementation of background snapshot thread) Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Message-Id: <20210401092226.102804-2-andrey.gruzdev@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-03-18migrate: remove QMP/HMP commands for speed, downtime and cache sizeDaniel P. Berrangé1-45/+0
The generic 'migrate_set_parameters' command handle all types of param. Only the QMP commands were documented in the deprecations page, but the rationale for deprecating applies equally to HMP, and the replacements exist. Furthermore the HMP commands are just shims to the QMP commands, so removing the latter breaks the former unless they get re-implemented. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-03-15migration: Replaced qemu_mutex_lock calls with QEMU_LOCK_GUARDMahmoud Mandour1-4/+2
Replaced various qemu_mutex_lock calls and their respective qemu_mutex_unlock calls with QEMU_LOCK_GUARD macro. This simplifies the code by eliminating the respective qemu_mutex_unlock calls. Signed-off-by: Mahmoud Mandour <ma.mandourr@gmail.com> Message-Id: <20210311031538.5325-7-ma.mandourr@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08migration: Add blocker informationDr. David Alan Gilbert1-2/+23
Modify query-migrate so that it has a flag indicating if outbound migration is blocked, and if it is a list of reasons. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202135522.127380-2-dgilbert@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08migration: Fix a few absurdly defective error messagesMarkus Armbruster1-12/+11
migrate_params_check() has a number of error messages of the form Parameter 'NAME' expects is invalid, it should be ... Fix them to something like Parameter 'NAME' expects a ... Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202141734.2488076-5-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08migration: Clean up signed vs. unsigned XBZRLE cache-sizeMarkus Armbruster1-2/+2
73af8dd8d7 "migration: Make xbzrle_cache_size a migration parameter" (v2.11.0) made the new parameter unsigned (QAPI type 'size', uint64_t in C). It neglected to update existing code, which continues to use int64_t. migrate_xbzrle_cache_size() returns the new parameter. Adjust its return type. QMP query-migrate-cache-size returns migrate_xbzrle_cache_size(). Adjust its return type. migrate-set-parameters passes the new parameter to xbzrle_cache_resize(). Adjust its parameter type. xbzrle_cache_resize() passes it on to cache_init(). Adjust its parameter type. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210202141734.2488076-3-armbru@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08migration: implementation of background snapshot threadAndrey Gruzdev1-2/+253
Introducing implementation of 'background' snapshot thread which in overall follows the logic of precopy migration while internally utilizes completely different mechanism to 'freeze' vmstate at the start of snapshot creation. This mechanism is based on userfault_fd with wr-protection support and is Linux-specific. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20210129101407.103458-5-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-02-08migration: introduce 'background-snapshot' migration capabilityAndrey Gruzdev1-0/+102
Add new capability to 'qapi/migration.json' schema. Update migrate_caps_check() to validate enabled capability set against introduced one. Perform checks for required kernel features and compatibility with guest memory backends. Signed-off-by: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20210129101407.103458-2-andrey.gruzdev@virtuozzo.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2021-01-28qapi: More complex uses of QAPI_LIST_APPENDEric Blake1-14/+6
These cases require a bit more thought to review; in each case, the code was appending to a list, but not with a FOOList **tail variable. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210113221013.390592-6-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [Flawed change to qmp_guest_network_get_interfaces() dropped] Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-13migration: Add yank featureLukas Straub1-0/+22
Register yank functions on sockets to shut them down. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <484c6a14cc2506bebedd5a237259b91363ff8f88.1609167865.git.lukasstraub2@web.de> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-01Merge remote-tracking branch ↵Peter Maydell1-0/+1
'remotes/ehabkost-gl/tags/machine-next-pull-request' into staging Machine queue, 2020-12-23 Cleanup: * qdev code cleanup (Eduardo Habkost) Bug fix: * hostmem: Free host_nodes list right after visited (Keqian Zhu) # gpg: Signature made Wed 23 Dec 2020 21:25:58 GMT # gpg: using RSA key 5A322FD5ABC4D3DBACCFD1AA2807936F984DC5A6 # gpg: issuer "ehabkost@redhat.com" # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost-gl/tags/machine-next-pull-request: bugfix: hostmem: Free host_nodes list right after visited qdev: Avoid unnecessary DeviceState* variable at set_prop_arraylen() qdev: Rename qdev_get_prop_ptr() to object_field_prop_ptr() qdev: Move qdev_prop_tpm declaration to tpm_prop.h qdev: Make qdev_class_add_property() more flexible qdev: Make PropertyInfo.create return ObjectProperty* qdev: Move dev->realized check to qdev_property_set() qdev: Wrap getters and setters in separate helpers qdev: Add name argument to PropertyInfo.create method qdev: Add name parameter to qdev_class_add_property() qdev: Avoid using prop->name unnecessarily qdev: Get just property name at error_set_from_qdev_prop_error() sparc: Use DEFINE_PROP for nwindows property qdev: Reuse DEFINE_PROP in all DEFINE_PROP_* macros qdev: Move softmmu properties to qdev-properties-system.h Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-01Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into ↵Peter Maydell1-18/+11
staging QAPI patches patches for 2020-12-19 # gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits) qobject: Make QString immutable block: Use GString instead of QString to build filenames keyval: Use GString to accumulate value strings json: Use GString instead of QString to accumulate strings migration: Replace migration's JSON writer by the general one qobject: Factor JSON writer out of qobject_to_json() qobject: Factor quoted_str() out of to_json() qobject: Drop qstring_get_try_str() qobject: Drop qobject_get_try_str() Revert "qobject: let object_property_get_str() use new API" block: Avoid qobject_get_try_str() qmp: Fix tracing of non-string command IDs qobject: Move internals to qobject-internal.h hw/rdma: Replace QList by GQueue Revert "qstring: add qstring_free()" qobject: Change qobject_to_json()'s value to GString qobject: Use GString instead of QString to accumulate JSON qobject: Make qobject_to_json_pretty() take a pretty argument monitor: Use GString instead of QString for output buffer hmp: Simplify how qmp_human_monitor_command() gets output ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-12-19qapi: Use QAPI_LIST_PREPEND() where possibleEric Blake1-5/+2
Anywhere we create a list of just one item or by prepending items (typically because order doesn't matter), we can use QAPI_LIST_PREPEND(). But places where we must keep the list in order by appending remain open-coded until later patches. Note that as a side effect, this also performs a cleanup of two minor issues in qga/commands-posix.c: the old code was performing new = g_malloc0(sizeof(*ret)); which 1) is confusing because you have to verify whether 'new' and 'ret' are variables with the same type, and 2) would conflict with C++ compilation (not an actual problem for this file, but makes copy-and-paste harder). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201113011340.463563-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> [Straightforward conflicts due to commit a8aa94b5f8 "qga: update schema for guest-get-disks 'dependents' field" and commit a10b453a52 "target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c" resolved. Commit message tweaked.] Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-12-19migration: Refactor migrate_cap_addEric Blake1-13/+9
Instead of taking a list parameter and returning a new head at a distance, just return the new item for the caller to insert into a list via QAPI_LIST_PREPEND. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201113011340.463563-4-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-12-18qdev: Move softmmu properties to qdev-properties-system.hEduardo Habkost1-0/+1
Move the property types and property macros implemented in qdev-properties-system.c to a new qdev-properties-system.h header. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20201211220529.2290218-16-ehabkost@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2020-12-18migration: Don't allow migration if vm is in POSTMIGRATETuguoyi1-0/+6
The following steps will cause qemu assertion failure: - pause vm by executing 'virsh suspend' - create external snapshot of memory and disk using 'virsh snapshot-create-as' - doing the above operation again will cause qemu crash The backtrace looks like: at /build/qemu-5.0/migration/savevm.c:1401 at /build/qemu-5.0/migration/savevm.c:1453 When the first migration completes, bs->open_flags will set BDRV_O_INACTIVE flag by bdrv_inactivate_all(), and during the second migration the bdrv_inactivate_recurse assert that the bs->open_flags is already BDRV_O_INACTIVE enabled which cause crash. As Vladimir suggested, this patch makes migrate_prepare check the state of vm and return error if it is in RUN_STATE_POSTMIGRATE state. Signed-off-by: Tuguoyi <tu.guoyi@h3c.com> Message-Id: <6b704294ad2e405781c38fb38d68c744@h3c.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reported-by: Li Zhang <li.zhang@cloud.ionos.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-12-10migration, vl: start migration via qmp_migrate_incomingPaolo Bonzini1-25/+8
Make qemu_start_incoming_migration local to migration/migration.c. By using the runstate instead of a separate flag, vl need not do anything to setup deferred incoming migration. qmp_migrate_incoming also does not need the deferred_incoming flag anymore, because "-incoming PROTOCOL" will clear the "once" flag before the main loop starts. Therefore, later invocations of the migrate-incoming command will fail with the existing "The incoming migration has already been started" error message. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-12-10vl: move various initialization routines out of qemu_initPaolo Bonzini1-0/+4
Some very simple initialization routines can be nested in existing subsystem-level functions, do that to simplify qemu_init. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-11-12migration: handle CANCELLING state in migration_completion()Longpeng(Mike)1-0/+2
The following sequence may cause the VM abort during migration: 1. RUN_STATE_RUNNING,MIGRATION_STATUS_ACTIVE 2. before call migration_completion(), we send migrate_cancel QMP command, the state machine is changed to: RUN_STATE_RUNNING,MIGRATION_STATUS_CANCELLING 3. call migration_completion(), and the state machine is switch to: RUN_STATE_RUNNING,MIGRATION_STATUS_COMPLETED 4. call migration_iteration_finish(), because the migration status is COMPLETED, so it will try to set the runstate to POSTMIGRATE, but RUNNING-->POSTMIGRATE is an invalid transition, so abort(). The migration_completion() should not change the migration state to COMPLETED if it is already changed to CANCELLING. Signed-off-by: Longpeng(Mike) <longpeng2@huawei.com> Message-Id: <20201105091726.148-1-longpeng2@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-12migration: fix uninitialized variable warning in migrate_send_rp_req_pages()Chen Qun1-1/+1
After the WITH_QEMU_LOCK_GUARD macro is added, the compiler cannot identify that the statements in the macro must be executed. As a result, some variables assignment statements in the macro may be considered as unexecuted by the compiler. When the -Wmaybe-uninitialized capability is enabled on GCC9,the compiler showed warning: migration/migration.c: In function ‘migrate_send_rp_req_pages’: migration/migration.c:384:8: warning: ‘received’ may be used uninitialized in this function [-Wmaybe-uninitialized] 384 | if (received) { | ^ Add a default value for 'received' to prevented the warning. Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201111142203.2359370-6-kuhn.chenqun@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-11-01qapi: Add VFIO devices migration stats in Migration statsKirti Wankhede1-0/+17
Added amount of bytes transferred to the VM at destination by all VFIO devices Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2020-10-27Merge remote-tracking branch ↵Peter Maydell1-7/+52
'remotes/dgilbert/tags/pull-migration-20201026a' into staging migration pull: 2020-10-26 Another go at Peter's postcopy fixes Cleanups from Bihong Yu and Peter Maydell. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> # gpg: Signature made Mon 26 Oct 2020 16:17:03 GMT # gpg: using RSA key 45F5C71B4A0CB7FB977A9FA90516331EBC5BFDE7 # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>" [full] # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A 9FA9 0516 331E BC5B FDE7 * remotes/dgilbert/tags/pull-migration-20201026a: migration-test: Only hide error if !QTEST_LOG migration/postcopy: Release fd before going into 'postcopy-pause' migration: Sync requested pages after postcopy recovery migration: Maintain postcopy faulted addresses migration: Introduce migrate_send_rp_message_req_pages() migration: Pass incoming state into qemu_ufd_copy_ioctl() migration: using trace_ to replace DPRINTF migration: Delete redundant spaces migration: Open brace '{' following function declarations go on the next line migration: Do not initialise statics and globals to 0 or NULL migration: Add braces {} for if statement migration: Open brace '{' following struct go on the same line migration: Add spaces around operator migration: Don't use '#' flag of printf format migration: Do not use C99 // comments migration: Drop unused VMSTATE_FLOAT64 support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-26migration/postcopy: Release fd before going into 'postcopy-pause'Peter Xu1-3/+3
Logically below race could trigger with the old code: test program migration thread ------------ ---------------- wait_until('postcopy-pause') postcopy_pause() set_state('postcopy-pause') do_postcopy_recover() arm s->to_dst_file with new fd release s->to_dst_file [1] Here [1] could have released the just-installed recoverying channel. Then the migration could hang without really resuming. Instead, it should be very safe to release the fd before setting the state into 'postcopy-pause', because there's no reason for any other thread to touch it during 'postcopy-active'. Dave reported a very rare postcopy recovery hang that the migration-test program waited for the migration to complete in migrate_postcopy_complete(). We do suspect it's the same thing that we're gonna fix here. Hard to tell. However since we've noticed this, fix this irrelevant of the hang report. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-6-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26migration: Maintain postcopy faulted addressesPeter Xu1-1/+40
Maintain a list of faulted addresses on the destination host for which we're waiting on. This is implemented using a GTree rather than a real list to make sure even there're plenty of vCPUs/threads that are faulting, the lookup will still be fast with O(log(N)) (because we'll do that after placing each page). It should bring a slight overhead, but ideally that shouldn't be a big problem simply because in most cases the requested page list will be short. Actually we did similar things for postcopy blocktime measurements. This patch didn't use that simply because: (1) blocktime measurement is towards vcpu threads only, but here we need to record all faulted addresses, including main thread and external thread (like, DPDK via vhost-user). (2) blocktime measurement will require UFFD_FEATURE_THREAD_ID, but here we don't want to add that extra dependency on the kernel version since not necessary. E.g., we don't need to know which thread faulted on which page, we also don't care about multiple threads faulting on the same page. But we only care about what addresses are faulted so waiting for a page copying from src. (3) blocktime measurement is not enabled by default. However we need this by default especially for postcopy recover. Another thing to mention is that this patch introduced a new mutex to serialize the receivedmap and the page_requested tree, however that serialization does not cover other procedures like UFFDIO_COPY. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-4-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26migration: Introduce migrate_send_rp_message_req_pages()Peter Xu1-2/+8
This is another layer wrapper for sending a page request to the source VM. The new migrate_send_rp_message_req_pages() will be used elsewhere in coming patches. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201021212721.440373-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26migration: Add spaces around operatorBihong Yu1-2/+2
Signed-off-by: Bihong Yu <yubihong@huawei.com> Reviewed-by: Chuan Zheng <zhengchuan@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <1603163448-27122-4-git-send-email-yubihong@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-10-26machine: remove deprecated -machine enforce-config-section optionPaolo Bonzini1-10/+0
Deprecated since 3.1 and complicates the initialization sequence, remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-09error: Remove NULL checks on error_propagate() calls (again)Markus Armbruster1-6/+2
Patch created mechanically by rerunning: $ spatch --sp-file scripts/coccinelle/error_propagate_null.cocci \ --macro-file scripts/cocci-macro-file.h \ --use-gitgrep . Cc: Jens Freimann <jfreimann@redhat.com> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com> Cc: Juan Quintela <quintela@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200722084048.1726105-4-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2020-09-28Revert "migration: Properly destroy variables on incoming side"Dr. David Alan Gilbert1-5/+2
This reverts commit c02039a6f3730ddcf683a0ba9a175688c6db71a0. This is breaking test 068 that does a loadvm twice. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-09-25migration/tls: save hostname into MigrationStateChuan Zheng1-0/+1
hostname is need in multifd-tls, save hostname into MigrationState. Signed-off-by: Chuan Zheng <zhengchuan@huawei.com> Signed-off-by: Yan Jin <jinyan12@huawei.com> Message-Id: <1600139042-104593-2-git-send-email-zhengchuan@huawei.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>