aboutsummaryrefslogtreecommitdiff
path: root/migration/colo.c
AgeCommit message (Collapse)AuthorFilesLines
2021-12-15COLO: Move some trace code behind qemu_mutex_unlock_iothread()Rao, Lei1-3/+3
There is no need to put some trace code in the critical section. So, moving it behind qemu_mutex_unlock_iothread() can reduce the lock time. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-12-15migration/colo: Optimize COLO primary node start code pathZhang Chen1-2/+0
Optimize COLO primary start path from: MIGRATION_STATUS_XXX --> MIGRATION_STATUS_ACTIVE --> MIGRATION_STATUS_COLO --> MIGRATION_STATUS_COMPLETED To: MIGRATION_STATUS_XXX --> MIGRATION_STATUS_COLO --> MIGRATION_STATUS_COMPLETED No need to start primary COLO through "MIGRATION_STATUS_ACTIVE". Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-12-15Fixed a QEMU hang when guest poweroff in COLO modeRao, Lei1-0/+20
When the PVM guest poweroff, the COLO thread may wait a semaphore in colo_process_checkpoint().So, we should wake up the COLO thread before migration shutdown. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-12-15migration/colo: More accurate update checkpoint timeZhang Chen1-3/+2
Previous operation(like vm_start and replication_start_all) will consume extra time before update the timer, so reduce time in this patch. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-09Reset the auto-converge counter at every checkpoint.Rao, Lei1-0/+4
if we don't reset the auto-converge counter, it will continue to run with COLO running, and eventually the system will hang due to the CPU throttle reaching DEFAULT_MIGRATE_MAX_CPU_THROTTLE. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-03Changed the last-mode to none of first start COLORao, Lei1-5/+3
When we first stated the COLO, the last-mode is as follows: { "execute": "query-colo-status" } {"return": {"last-mode": "primary", "mode": "primary", "reason": "none"}} The last-mode is unreasonable. After the patch, will be changed to the following: { "execute": "query-colo-status" } {"return": {"last-mode": "none", "mode": "primary", "reason": "none"}} Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-03Removed the qemu_fclose() in colo_process_incoming_threadRao, Lei1-5/+0
After the live migration, the related fd will be cleanup in migration_incoming_state_destroy(). So, the qemu_close() in colo_process_incoming_thread is not necessary. Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-03colo: fixed 'Segmentation fault' when the simplex mode PVM poweroffRao, Lei1-0/+1
The GDB statck is as follows: Program terminated with signal SIGSEGV, Segmentation fault. 0 object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832 if (type->class->interfaces && [Current thread is 1 (Thread 0x7f756e97eb00 (LWP 1811577))] (gdb) bt 0 object_class_dynamic_cast (class=0x55c8f5d2bf50, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:832 1 0x000055c8f2c3dd14 in object_dynamic_cast (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel") at qom/object.c:763 2 0x000055c8f2c3ddce in object_dynamic_cast_assert (obj=0x55c8f543ac00, typename=0x55c8f2f7379e "qio-channel", file=0x55c8f2f73780 "migration/qemu-file-channel.c", line=117, func=0x55c8f2f73800 <__func__.18724> "channel_shutdown") at qom/object.c:786 3 0x000055c8f2bbc6ac in channel_shutdown (opaque=0x55c8f543ac00, rd=true, wr=true, errp=0x0) at migration/qemu-file-channel.c:117 4 0x000055c8f2bba56e in qemu_file_shutdown (f=0x7f7558070f50) at migration/qemu-file.c:67 5 0x000055c8f2ba5373 in migrate_fd_cancel (s=0x55c8f4ccf3f0) at migration/migration.c:1699 6 0x000055c8f2ba1992 in migration_shutdown () at migration/migration.c:187 7 0x000055c8f29a5b77 in main (argc=69, argv=0x7fff3e9e8c08, envp=0x7fff3e9e8e38) at vl.c:4512 The root cause is that we still want to shutdown the from_dst_file in migrate_fd_cancel() after qemu_close in colo_process_checkpoint(). So, we should set the s->rp_state.from_dst_file = NULL after qemu_close(). Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-11-03Some minor optimizations for COLORao, Lei1-1/+1
Signed-off-by: Lei Rao <lei.rao@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2021-06-11Remove migrate_set_block_enabled in checkpointRao, Lei1-6/+0
We can detect disk migration in migrate_prepare, if disk migration is enabled in COLO mode, we can directly report an error.and there is no need to disable block migration at every checkpoint. Signed-off-by: Lei Rao <lei.rao@intel.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Lukas Straub <lukasstraub2@web.de> Tested-by: Lukas Straub <lukasstraub2@web.de> Signed-off-by: Jason Wang <jasowang@redhat.com>
2021-05-26replication: move include out of root directoryPaolo Bonzini1-1/+1
The replication.h file is included from migration/colo.c and tests/unit/test-replication.c, so it should be in include/. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-08Remove superfluous timer_del() callsPeter Maydell1-1/+0
This commit is the result of running the timer-del-timer-free.cocci script on the whole source tree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
2020-10-09error: Remove NULL checks on error_propagate() calls (again)Markus Armbruster1-3/+1
Patch created mechanically by rerunning: $ spatch --sp-file scripts/coccinelle/error_propagate_null.cocci \ --macro-file scripts/cocci-macro-file.h \ --use-gitgrep . Cc: Jens Freimann <jfreimann@redhat.com> Cc: Hailiang Zhang <zhang.zhanghailiang@huawei.com> Cc: Juan Quintela <quintela@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200722084048.1726105-4-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2020-09-17migration/: fix some comment spelling errorszhaolichang1-1/+1
I found that there are many spelling errors in the comments of qemu, so I used the spellcheck tool to check the spelling errors and finally found some spelling errors in the migration folder. Signed-off-by: zhaolichang <zhaolichang@huawei.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20200917075029.313-3-zhaolichang@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-06-01migration/colo.c: Move colo_notify_compares_event to the right placeLukas Straub1-6/+6
If the secondary has to failover during checkpointing, it still is in the old state (i.e. different state than primary). Thus we can't expose the primary state until after the checkpoint is sent. This fixes sporadic connection reset of client connections during failover. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <d4555dd5146a54518c4d9d4efd996b7c745c6687.1589193382.git.lukasstraub2@web.de> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-06-01migration/colo.c: Relaunch failover even if there was an errorLukas Straub1-5/+12
If vmstate_loading is true, secondary_vm_do_failover will set failover status to FAILOVER_STATUS_RELAUNCH and return success without initiating failover. However, if there is an error during the vmstate_loading section, failover isn't relaunched. Instead we then wait for failover on colo_incoming_sem. Fix this by relaunching failover even if there was an error. Also, to make this work properly, set vmstate_loading to false when returning during the vmstate_loading section. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <f60b0a8e2fadaaec792e04819dfc46951842d6ba.1589193382.git.lukasstraub2@web.de> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-06-01migration/colo.c: Flush ram cache only after receiving device stateLukas Straub1-0/+1
If we suceed in receiving ram state, but fail receiving the device state, there will be a mismatch between the two. Fix this by flushing the ram cache only after the vmstate has been received. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <3289d007d494cb0e2f05b1cf4ae6a78d300fede3.1589193382.git.lukasstraub2@web.de> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-06-01migration/colo.c: Use cpu_synchronize_all_states()Lukas Straub1-1/+1
cpu_synchronize_all_pre_loadvm() marks all vcpus as dirty, so the registers are loaded from CPUState before we continue running the vm. However if we failover during checkpoint, CPUState is not initialized and the registers are loaded with garbage. This causes guest hangs and crashes. Fix this by using cpu_synchronize_all_states(), which initializes CPUState from the current cpu registers additionally to marking the vcpus as dirty. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <9675031ce557b73ebd10e7bd20ebbf57f30b177c.1589193382.git.lukasstraub2@web.de> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-06-01migration/colo.c: Use event instead of semaphoreLukas Straub1-4/+5
If multiple packets miscompare in a short timeframe, the semaphore value will be increased multiple times. This causes multiple checkpoints even if one would be sufficient. Fix this by using a event instead of a semaphore for triggering checkpoints. Now, checkpoint requests will be ignored until the checkpoint event is sent to colo-compare (which releases the miscompared packets). Benchmark results (iperf3): Client-to-server tcp: without patch: ~66 Mbit/s with patch: ~61 Mbit/s Server-to-client tcp: without patch: ~702 Kbit/s with patch: ~16 Mbit/s Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <fd601ba1beb524aada54ba66e87ebfc12cf4574b.1589193382.git.lukasstraub2@web.de> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-05-07migration/colo: Add missing error-propagation codePhilippe Mathieu-Daudé1-0/+3
Running the coccinelle script produced: $ spatch \ --macro-file scripts/cocci-macro-file.h --include-headers \ --sp-file scripts/coccinelle/find-missing-error_propagate.cocci \ --keep-comments --smpl-spacing --dir . HANDLING: ./migration/colo.c [[manual check required: error_propagate() might be missing in migrate_set_block_enabled() ./migration/colo.c:439:4]] Add the missing error_propagate() after review. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20200413205250.687-1-f4bug@amsat.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-04-29migration/colo: Fix qmp_xen_colo_do_checkpoint() error handlingMarkus Armbruster1-1/+7
The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. qmp_xen_colo_do_checkpoint() passes @errp first to replication_do_checkpoint_all(), and then to colo_notify_filters_event(). If both fail, this will trip the assertion in error_setv(). Similar code in secondary_vm_do_failover() calls colo_notify_filters_event() only after replication_do_checkpoint_all() succeeded. Do the same here. Fixes: 0e8818f023616677416840d6ddc880db8de3c967 Cc: Zhang Chen <chen.zhang@intel.com> Cc: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Message-Id: <20200422130719.28225-12-armbru@redhat.com>
2020-03-25migration/colo: fix use after free of local_errVladimir Sementsov-Ogievskiy1-0/+1
local_err is used again in secondary_vm_do_failover() after replication_stop_all(), so we must zero it. Otherwise try to set non-NULL local_err will crash. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200324153630.11882-5-vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2020-03-13COLO: Optimize memory back-up processzhanghailiang1-0/+3
This patch will reduce the downtime of VM for the initial process, Previously, we copied all these memory in preparing stage of COLO while we need to stop VM, which is a time-consuming process. Here we optimize it by a trick, back-up every page while in migration process while COLO is enabled, though it affects the speed of the migration, but it obviously reduce the downtime of back-up all SVM'S memory in COLO preparing stage. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Message-Id: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> minor typo fixes
2020-02-28migration/colo: wrap incoming checkpoint process into new helperzhanghailiang1-127/+133
Split checkpoint incoming process into a helper. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-08-16sysemu: Split sysemu/runstate.h off sysemu/sysemu.hMarkus Armbruster1-0/+1
sysemu/sysemu.h is a rather unfocused dumping ground for stuff related to the system-emulator. Evidence: * It's included widely: in my "build everything" tree, changing sysemu/sysemu.h still triggers a recompile of some 1100 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h, down from 5400 due to the previous two commits). * It pulls in more than a dozen additional headers. Split stuff related to run state management into its own header sysemu/runstate.h. Touching sysemu/sysemu.h now recompiles some 850 objects. qemu/uuid.h also drops from 1100 to 850, and qapi/qapi-types-run-state.h from 4400 to 4200. Touching new sysemu/runstate.h recompiles some 500 objects. Since I'm touching MAINTAINERS to add sysemu/runstate.h anyway, also add qemu/main-loop.h. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-30-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [Unbreak OS-X build]
2019-08-16Include qemu/main-loop.h lessMarkus Armbruster1-0/+1
In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-08-16Include exec/memory.h slightly lessMarkus Armbruster1-0/+1
Drop unnecessary inclusions from headers. Downgrade a few more to exec/hwaddr.h. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190812052359.30071-17-armbru@redhat.com>
2019-07-02migration/colo.c: Add missed filter notify for Xen COLO.Zhang Chen1-0/+2
We need to notify net filter to do checkpoint for Xen COLO, like KVM side. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2019-05-14migration/colo.c: Remove redundant input parameterZhang Chen1-1/+1
The colo_do_failover no need the input parameter. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Message-Id: <20190426090730.2691-2-chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-25Migration/colo.c: Make user obtain the last COLO mode info after failoverZhang Chen1-0/+16
Add the last_colo_mode to save the status after failover. This patch can solve the issue that user want to get last colo mode use query_colo_status after failover. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-03-25Migration/colo.c: Add the necessary checks for colo_do_failoverZhang Chen1-2/+8
Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-03-25Migration/colo.c: Add new COLOExitReason to handle all failover stateZhang Chen1-11/+13
In this patch we add the processing state for COLOExitReason, because we have to identify COLO in the failover processing state or failover error state. In the way, we can handle all the failover state. We have improved the description of the COLOExitReason by the way. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-03-25Migration/colo.c: Fix COLO failover status errorZhang Chen1-3/+4
When finished COLO failover, the status is FAILOVER_STATUS_COMPLETED. The origin codes misunderstand the FAILOVER_STATUS_REQUIRE. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2019-03-06Migration/colo.c: Make COLO node running after failoverZhang Chen1-1/+0
Delay to close COLO for auto start VM after failover. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190303145021.2962-4-chen.zhang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2019-03-06Migration/colo.c: Fix double close bug when occur COLO failoverZhang Chen1-0/+1
In migration_incoming_state_destroy(void) will check the mis->to_src_file to double close the mis->to_src_file when occur COLO failover. Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20190303145021.2962-2-chen.zhang@intel.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-12-14qapi: add conditions to REPLICATION type/commands on the schemaMarc-André Lureau1-12/+4
Add #if defined(CONFIG_REPLICATION) in generated code, and adjust the code accordingly. Made conditional: * xen-set-replication, query-xen-replication-status, xen-colo-do-checkpoint Before the patch, we first register the commands unconditionally in generated code (requires a stub), then conditionally unregister in qmp_unregister_commands_hack(). Afterwards, we register only when CONFIG_REPLICATION. The command fails exactly the same, with CommandNotFound. Improvement, because now query-qmp-schema is accurate, and we're one step closer to killing qmp_unregister_commands_hack(). * enum BlockdevDriver value "replication" in command blockdev-add * BlockdevOptions variant @replication and related structures. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20181213123724.4866-23-marcandre.lureau@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-11-21migration/colo.c: Fix compilation issue when disable replicationZhang Chen1-1/+27
This compilation issue will occur when user use --disable-replication to config Qemu. Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Message-Id: <20181101021226.6353-1-zhangckid@gmail.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-10-19COLO: quick failover process by kick COLO threadzhanghailiang1-0/+8
COLO thread may sleep at qemu_sem_wait(&s->colo_checkpoint_sem), while failover works begin, It's better to wakeup it to quick the process. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19COLO: notify net filters about checkpoint/failover eventzhanghailiang1-0/+15
Notify all net filters about the checkpoint and failover event. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19savevm: split the process of different stages for loadvm/savevmZhang Chen1-9/+32
There are several stages during loadvm/savevm process. In different stage, migration incoming processes different types of sections. We want to control these stages more accuracy, it will benefit COLO performance, we don't have to save type of QEMU_VM_SECTION_START sections everytime while do checkpoint, besides, we want to separate the process of saving/loading memory and devices state. So we add three new helper functions: qemu_load_device_state() and qemu_savevm_live_state() to achieve different process during migration. Besides, we make qemu_loadvm_state_main() and qemu_save_device_state() public, and simplify the codes of qemu_save_device_state() by calling the wrapper qemu_savevm_state_header(). Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19qapi: Add new command to query colo statusZhang Chen1-0/+21
Libvirt or other high level software can use this command query colo status. You can test this command like that: {'execute':'query-colo-status'} Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19qapi/migration.json: Rename COLO unknown mode to none mode.Zhang Chen1-1/+1
Suggested by Markus Armbruster rename COLO unknown mode to none mode. Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19qmp event: Add COLO_EXIT event to notify users while exited COLOzhanghailiang1-0/+31
If some errors happen during VM's COLO FT stage, it's important to notify the users of this event. Together with 'x-colo-lost-heartbeat', Users can intervene in COLO's failover work immediately. If users don't want to get involved in COLO's failover verdict, it is still necessary to notify users that we exited COLO mode. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19COLO: Remove colo_state migration structZhang Chen1-1/+12
We need to know if migration is going into COLO state for incoming side before start normal migration. Instead by using the VMStateDescription to send colo_state from source side to destination side, we use MIG_CMD_ENABLE_COLO to indicate whether COLO is enabled or not. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19COLO: Add block replication into colo processZhang Chen1-0/+43
Make sure master start block replication after slave's block replication started. Besides, we need to activate VM's blocks before goes into COLO state. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-10-19COLO: integrate colo compare with colo frameZhang Chen1-2/+40
For COLO FT, both the PVM and SVM run at the same time, only sync the state while it needs. So here, let SVM runs while not doing checkpoint, change DEFAULT_MIGRATE_X_CHECKPOINT_DELAY to 200*100. Besides, we forgot to release colo_checkpoint_semd and colo_delay_timer, fix them here. Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Zhang Chen <zhangckid@gmail.com> Signed-off-by: Zhang Chen <chen.zhang@intel.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2018-08-22migration: implement bi-directional RDMA QIOChannelLidong Chen1-0/+2
This patch implements bi-directional RDMA QIOChannel. Because different threads may access RDMAQIOChannel currently, this patch use RCU to protect it. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-03-02Include less of the generated modular QAPI headersMarkus Armbruster1-1/+1
In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-02-09Include qapi/error.h exactly where neededMarkus Armbruster1-0/+1
This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-5-armbru@redhat.com> [Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
2017-09-04qapi: Mechanically convert FOO_lookup[...] to FOO_str(...)Markus Armbruster1-7/+7
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <1503564371-26090-14-git-send-email-armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>