aboutsummaryrefslogtreecommitdiff
path: root/migration/migration.c
diff options
context:
space:
mode:
authorPeter Xu <peterx@redhat.com>2023-09-18 14:28:15 -0300
committerStefan Hajnoczi <stefanha@redhat.com>2023-09-27 13:58:02 -0400
commitcf02f29e1e3843784630d04783e372fa541a77e5 (patch)
tree91f805bcbef017b703d131752edef83b243c3f87 /migration/migration.c
parent5dfd80e38b63dc5bf2202bc87a9b1a3e1460efb9 (diff)
downloadqemu-cf02f29e1e3843784630d04783e372fa541a77e5.zip
qemu-cf02f29e1e3843784630d04783e372fa541a77e5.tar.gz
qemu-cf02f29e1e3843784630d04783e372fa541a77e5.tar.bz2
migration: Fix race that dest preempt thread close too early
We hit intermit CI issue on failing at migration-test over the unit test preempt/plain: qemu-system-x86_64: Unable to read from socket: Connection reset by peer Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1 ** ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0) (test program exited with status code -6) Fabiano debugged into it and found that the preempt thread can quit even without receiving all the pages, which can cause guest not receiving all the pages and corrupt the guest memory. To make sure preempt thread finished receiving all the pages, we can rely on the page_requested_count being zero because preempt channel will only receive requested page faults. Note, not all the faulted pages are required to be sent via the preempt channel/thread; imagine the case when a requested page is just queued into the background main channel for migration, the src qemu will just still send it via the background channel. Here instead of spinning over reading the count, we add a condvar so the main thread can wait on it if that unusual case happened, without burning the cpu for no good reason, even if the duration is short; so even if we spin in this rare case is probably fine. It's just better to not do so. The condvar is only used when that special case is triggered. Some memory ordering trick is needed to guarantee it from happening (against the preempt thread status field), so the main thread will always get a kick when that triggers correctly. Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886 Debugged-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-2-farosas@suse.de>
Diffstat (limited to 'migration/migration.c')
-rw-r--r--migration/migration.c3
1 files changed, 2 insertions, 1 deletions
diff --git a/migration/migration.c b/migration/migration.c
index d61e572..3ee1e6b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -153,6 +153,7 @@ void migration_object_init(void)
qemu_sem_init(&current_incoming->postcopy_qemufile_dst_done, 0);
qemu_mutex_init(&current_incoming->page_request_mutex);
+ qemu_cond_init(&current_incoming->page_request_cond);
current_incoming->page_requested = g_tree_new(page_request_addr_cmp);
migration_object_check(current_migration, &error_fatal);
@@ -367,7 +368,7 @@ int migrate_send_rp_req_pages(MigrationIncomingState *mis,
* things like g_tree_lookup() will return TRUE (1) when found.
*/
g_tree_insert(mis->page_requested, aligned, (gpointer)1);
- mis->page_requested_count++;
+ qatomic_inc(&mis->page_requested_count);
trace_postcopy_page_req_add(aligned, mis->page_requested_count);
}
}