migration: Unify reset of last_rb on destination node when recover

When postcopy recover happens, we need to reset last_rb after each return of postcopy_pause_fault_thread() because that means we just got the postcopy migration continued. Unify this reset to the place right before we want to kick the fault thread again, when we get the command MIG_CMD_POSTCOPY_RESUME from source. This is actually more than that - because the main thread on destination will now be able to call migrate_send_rp_req_pages_pending() too, so the fault thread is not the only user of last_rb now. Move the reset earlier will allow the first call to migrate_send_rp_req_pages_pending() to use the reset value even if called from the main thread. (NOTE: this is not a real fix to 0c26781c09 mentioned below, however it is just a mark that when picking up 0c26781c09 we'd better have this one too; the real fix will come later) Fixes: 0c26781c09 ("migration: Sync requested pages after postcopy recovery") Tested-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20201102153010.11979-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
author: Peter Xu <peterx@redhat.com> 2020-11-02 10:30:09 -0500
committer: Dr. David Alan Gilbert <dgilbert@redhat.com> 2020-11-02 18:25:39 +0000
commit: cc5ab87200257199eba91aba9baf141ae0e91d0c (patch)
tree: b83d80c48dc0ff3c316c434ba9e75dc17fc8e5c1
parent: b139d11ae198aba0e009daddf7a3370ce84b2d09 (diff)
download: qemu-cc5ab87200257199eba91aba9baf141ae0e91d0c.zip
qemu-cc5ab87200257199eba91aba9baf141ae0e91d0c.tar.gz
qemu-cc5ab87200257199eba91aba9baf141ae0e91d0c.tar.bz2
2 files changed, 6 insertions, 2 deletions
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index d3bb3a7..d99842e 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -903,7 +903,6 @@ static void *postcopy_ram_fault_thread(void *opaque)
              * the channel is rebuilt.
              */
             if (postcopy_pause_fault_thread(mis)) {
-                mis->last_rb = NULL;
                 /* Continue to read the userfaultfd */
             } else {
                 error_report("%s: paused but don't allow to continue",
@@ -985,7 +984,6 @@ retry:
                 /* May be network failure, try to wait for recovery */
                 if (ret == -EIO && postcopy_pause_fault_thread(mis)) {
                     /* We got reconnected somehow, try to continue */
-                    mis->last_rb = NULL;
                     goto retry;
                 } else {
                     /* This is a unavoidable fault */
diff --git a/migration/savevm.c b/migration/savevm.c
index 21ccba9..e883499 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2062,6 +2062,12 @@ static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis)
     }
 
     /*
+     * Reset the last_rb before we resend any page req to source again, since
+     * the source should have it reset already.
+     */
+    mis->last_rb = NULL;
+
+    /*
      * This means source VM is ready to resume the postcopy migration.
      * It's time to switch state and release the fault thread to
      * continue service page faults.
author	Peter Xu <peterx@redhat.com>	2020-11-02 10:30:09 -0500
committer	Dr. David Alan Gilbert <dgilbert@redhat.com>	2020-11-02 18:25:39 +0000
commit	cc5ab87200257199eba91aba9baf141ae0e91d0c (patch)
tree	b83d80c48dc0ff3c316c434ba9e75dc17fc8e5c1
parent	b139d11ae198aba0e009daddf7a3370ce84b2d09 (diff)
download	qemu-cc5ab87200257199eba91aba9baf141ae0e91d0c.zip qemu-cc5ab87200257199eba91aba9baf141ae0e91d0c.tar.gz qemu-cc5ab87200257199eba91aba9baf141ae0e91d0c.tar.bz2