aboutsummaryrefslogtreecommitdiff
path: root/migration/ram.c
AgeCommit message (Collapse)AuthorFilesLines
2018-05-25migration: use g_free for ram load bitmapPeter Xu1-2/+2
Buffers allocated with bitmap_new() should be freed with g_free(). Both reported by Coverity: *** CID 1391300: API usage errors (ALLOC_FREE_MISMATCH) /migration/ram.c: 3517 in ram_dirty_bitmap_reload() 3511 * the last one to sync, we need to notify the main send thread. 3512 */ 3513 ram_dirty_bitmap_reload_notify(s); 3514 3515 ret = 0; 3516 out: >>> CID 1391300: API usage errors (ALLOC_FREE_MISMATCH) >>> Calling "free" frees "le_bitmap" using "free" but it should have been freed using "g_free". 3517 free(le_bitmap); 3518 return ret; 3519 } 3520 3521 static int ram_resume_prepare(MigrationState *s, void *opaque) 3522 { *** CID 1391292: API usage errors (ALLOC_FREE_MISMATCH) /migration/ram.c: 249 in ramblock_recv_bitmap_send() 243 * Mark as an end, in case the middle part is screwed up due to 244 * some "misterious" reason. 245 */ 246 qemu_put_be64(file, RAMBLOCK_RECV_BITMAP_ENDING); 247 qemu_fflush(file); 248 >>> CID 1391292: API usage errors (ALLOC_FREE_MISMATCH) >>> Calling "free" frees "le_bitmap" using "free" but it should have been freed using "g_free". 249 free(le_bitmap); 250 251 if (qemu_file_get_error(file)) { 252 return qemu_file_get_error(file); 253 } 254 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20180525015042.31778-1-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15migration: setup ramstate for resumePeter Xu1-1/+44
After we updated the dirty bitmaps of ramblocks, we also need to update the critical fields in RAMState to make sure it is ready for a resume. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-18-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15migration: synchronize dirty bitmap for resumePeter Xu1-0/+47
This patch implements the first part of core RAM resume logic for postcopy. ram_resume_prepare() is provided for the work. When the migration is interrupted by network failure, the dirty bitmap on the source side will be meaningless, because even the dirty bit is cleared, it is still possible that the sent page was lost along the way to destination. Here instead of continue the migration with the old dirty bitmap on source, we ask the destination side to send back its received bitmap, then invert it to be our initial dirty bitmap. The source side send thread will issue the MIG_CMD_RECV_BITMAP requests, once per ramblock, to ask for the received bitmap. On destination side, MIG_RP_MSG_RECV_BITMAP will be issued, along with the requested bitmap. Data will be received on the return-path thread of source, and the main migration thread will be notified when all the ramblock bitmaps are synchronized. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-17-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15migration: new message MIG_RP_MSG_RECV_BITMAPPeter Xu1-0/+144
Introducing new return path message MIG_RP_MSG_RECV_BITMAP to send received bitmap of ramblock back to source. This is the reply message of MIG_CMD_RECV_BITMAP, it contains not only the header (including the ramblock name), and it was appended with the whole ramblock received bitmap on the destination side. When the source receives such a reply message (MIG_RP_MSG_RECV_BITMAP), it parses it, convert it to the dirty bitmap by inverting the bits. One thing to mention is that, when we send the recv bitmap, we are doing these things in extra: - converting the bitmap to little endian, to support when hosts are using different endianess on src/dst. - do proper alignment for 8 bytes, to support when hosts are using different word size (32/64 bits) on src/dst. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180502104740.12123-13-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15migration: Define MultifdRecvParams soonerJuan Quintela1-15/+31
Once there, we don't need the struct names anywhere, just the typedefs. And now also document all fields. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-05-15migration: Transmit initial package through the multifd channelsJuan Quintela1-5/+99
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> -- Be network agnostic. Add error checking for all values.
2018-05-15migration: Delay start of migration main routinesJuan Quintela1-0/+3
We need to make sure that we have started all the multifd threads. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15migration: Create multifd channelsJuan Quintela1-10/+42
In both sides. We still don't transmit anything through them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15migration: Be sure all recv channels are createdJuan Quintela1-0/+11
We need them before we start migration. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15migration: terminate_* can be called for other threadsJuan Quintela1-14/+30
Once there, make count field to always be accessed with atomic operations. To make blocking operations, we need to know that the thread is running, so create a bool to indicate that. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> -- Once here, s/terminate_multifd_*-threads/multifd_*_terminate_threads/ This is consistente with every other function
2018-05-15migration: Introduce multifd_recv_new_channel()Juan Quintela1-0/+6
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-05-15migration: Set error state in case of errorJuan Quintela1-2/+24
Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-05-15migration: fix saving normal page even if it's been compressedXiao Guangrong1-1/+1
Fix the bug introduced by da3f56cb2e767016 (migration: remove ram_save_compressed_page()), It should be 'return' rather than 'res' Sorry for this stupid mistake :( Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180428081045.8878-1-xiaoguangrong@tencent.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-04-25migration: remove ram_save_compressed_page()Xiao Guangrong1-37/+8
Now, we can reuse the path in ram_save_page() to post the page out as normal, then the only thing remained in ram_save_compressed_page() is compression that we can move it out to the caller Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-11-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: introduce save_normal_page()Xiao Guangrong1-20/+30
It directly sends the page to the stream neither checking zero nor using xbzrle or compression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-10-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: move calling save_zero_page to the common placeXiao Guangrong1-46/+59
save_zero_page() is always our first approach to try, move it to the common place before calling ram_save_compressed_page and ram_save_page Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-9-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: move calling control_save_page to the common placeXiao Guangrong1-8/+8
The function is called by both ram_save_page and ram_save_target_page, so move it to the common caller to cleanup the code Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-8-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: move some code to ram_save_host_pageXiao Guangrong1-24/+19
Move some code from ram_save_target_page() to ram_save_host_page() to make it be more readable for latter patches that dramatically clean ram_save_target_page() up Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-7-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: introduce control_save_page()Xiao Guangrong1-85/+89
Abstract the common function control_save_page() to cleanup the code, no logic is changed Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-6-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: detect compression and decompression errorsXiao Guangrong1-17/+39
Currently the page being compressed is allowed to be updated by the VM on the source QEMU, correspondingly the destination QEMU just ignores the decompression error. However, we completely miss the chance to catch real errors, then the VM is corrupted silently To make the migration more robuster, we copy the page to a buffer first to avoid it being written by VM, then detect and handle the errors of both compression and decompression errors properly Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-5-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: stop decompression to allocate and free memory frequentlyXiao Guangrong1-30/+82
Current code uses uncompress() to decompress memory which manages memory internally, that causes huge memory is allocated and freed very frequently, more worse, frequently returning memory to kernel will flush TLBs So, we maintain the memory by ourselves and reuse it for each decompression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-4-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: stop compression to allocate and free memory frequentlyXiao Guangrong1-9/+32
Current code uses compress2() to compress memory which manages memory internally, that causes huge memory is allocated and freed very frequently More worse, frequently returning memory to kernel will flush TLBs and trigger invalidation callbacks on mmu-notification which interacts with KVM MMU, that dramatically reduce the performance of VM So, we maintain the memory by ourselves and reuse it for each compression Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-3-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-04-25migration: stop compressing page in migration threadXiao Guangrong1-16/+16
As compression is a heavy work, do not do it in migration thread, instead, we post it out as a normal page Reviewed-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180330075128.26919-2-xiaoguangrong@tencent.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-03-20Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell1-0/+5
virtio,vhost,pci,pc: features, cleanups SRAT tables for DIMM devices new virtio net flags for speed/duplex post-copy migration support in vhost cleanups in pci Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Tue 20 Mar 2018 14:40:43 GMT # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (51 commits) postcopy shared docs libvhost-user: Claim support for postcopy postcopy: Allow shared memory vhost: Huge page align and merge vhost+postcopy: Wire up POSTCOPY_END notify vhost-user: Add VHOST_USER_POSTCOPY_END message libvhost-user: mprotect & madvises for postcopy vhost+postcopy: Call wakeups vhost+postcopy: Add vhost waker postcopy: postcopy_notify_shared_wake postcopy: helper for waking shared vhost+postcopy: Resolve client address postcopy-ram: add a stub for postcopy_request_shared_page vhost+postcopy: Helper to send requests to source for shared pages vhost+postcopy: Stash RAMBlock and offset vhost+postcopy: Send address back to qemu libvhost-user+postcopy: Register new regions with the ufd migration/ram: ramblock_recv_bitmap_test_byte_offset postcopy+vhost-user: Split set_mem_table for postcopy vhost+postcopy: Transmit 'listen' to slave ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> # Conflicts: # scripts/update-linux-headers.sh
2018-03-20migration/ram: ramblock_recv_bitmap_test_byte_offsetDr. David Alan Gilbert1-0/+5
Utility for testing the map when you already know the offset in the RAMBlock. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-03-13migration: introduce postcopy-only pendingVladimir Sementsov-Ogievskiy1-4/+5
There would be savevm states (dirty-bitmap) which can migrate only in postcopy stage. The corresponding pending is introduced here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-id: 20180313180320.339796-6-vsementsov@virtuozzo.com
2018-03-09migration: do not transfer ram during bulk storage migrationPeter Lieven1-0/+8
this patch makes the bulk phase of a block migration to take place before we start transferring ram. As the bulk block migration can take a long time its pointless to transfer ram during that phase. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <1520507908-16743-2-git-send-email-pl@kamp.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-03-02Include less of the generated modular QAPI headersMarkus Armbruster1-1/+1
In my "build everything" tree, a change to the types in qapi-schema.json triggers a recompile of about 4800 out of 5100 objects. The previous commit split up qmp-commands.h, qmp-event.h, qmp-visit.h, qapi-types.h. Each of these headers still includes all its shards. Reduce compile time by including just the shards we actually need. To illustrate the benefits: adding a type to qapi/migration.json now recompiles some 2300 instead of 4800 objects. The next commit will improve it further. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180211093607.27351-24-armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-02-14migration: better error handling with QEMUFilePeter Xu1-4/+17
If the postcopy down due to some reason, we can always see this on dst: qemu-system-x86_64: RP: Received invalid message 0x0000 length 0x0000 However in most cases that's not the real issue. The problem is that qemu_get_be16() has no way to show whether the returned data is valid or not, and we are _always_ assuming it is valid. That's possibly not wise. The best approach to solve this would be: refactoring QEMUFile interface to allow the APIs to return error if there is. However it needs quite a bit of work and testing. For now, let's explicitly check the validity first before using the data in all places for qemu_get_*(). This patch tries to fix most of the cases I can see. Only if we are with this, can we make sure we are processing the valid data, and also can we make sure we can capture the channel down events correctly. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180208103132.28452-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-02-14migration: Fix early failure cleanupDr. David Alan Gilbert1-5/+7
Avoid crash in cleanup after a very early migration failure (possibly due to my 688a3dcba980bf01344a 'Route errors down ...') Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180212160340.15333-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
2018-02-09Include qapi/error.h exactly where neededMarkus Armbruster1-0/+2
This cleanup makes the number of objects depending on qapi/error.h drop from 1910 (out of 4743) to 1612 in my "build everything" tree. While there, separate #include from file comment with a blank line, and drop a useless comment on why qemu/osdep.h is included first. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20180201111846.21846-5-armbru@redhat.com> [Semantic conflict with commit 34e304e975 resolved, OSX breakage fixed]
2018-02-06migration: Drop current address parameter from save_zero_page()Juan Quintela1-6/+5
It already has RAMBlock and offset, it can calculate it itself. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-01-15migration: Guard ram_bytes_remaining against early callDr. David Alan Gilbert1-1/+2
Calling ram_bytes_remaining during the early part of setup is unsafe because the ram_state isn't yet initialised. This can happen in the sequence: migrate migrate_cancel info migrate if the migrate sticks trying to connect (e.g. to an unresponsive destination due to the connect timeout). Here 'info migrate' sees a state of CANCELLING and so assumes the migrate has partially happened. partial fix for: RH bz: https://bugzilla.redhat.com/show_bug.cgi?id=1525899 Reported-by: Xianxian Wang <xianwang@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-11-22migration/ram.c: do not set 'postcopy_running' in POSTCOPY_INCOMING_ENDDaniel Henrique Barboza1-2/+14
When migrating a VM with 'migrate_set_capability postcopy-ram on' a postcopy_state is set during the process, ending up with the state POSTCOPY_INCOMING_END when the migration is over. This postcopy_state is taken into account inside ram_load to check how it will load the memory pages. This same ram_load is called when in a loadvm command. Inside ram_load, the logic to see if we're at postcopy_running state is: postcopy_running = postcopy_state_get() >= POSTCOPY_INCOMING_LISTENING postcopy_state_get() returns this enum type: typedef enum { POSTCOPY_INCOMING_NONE = 0, POSTCOPY_INCOMING_ADVISE, POSTCOPY_INCOMING_DISCARD, POSTCOPY_INCOMING_LISTENING, POSTCOPY_INCOMING_RUNNING, POSTCOPY_INCOMING_END } PostcopyState; In the case where ram_load is executed and postcopy_state is POSTCOPY_INCOMING_END, postcopy_running will be set to 'true' and ram_load will behave like a postcopy is in progress. This scenario isn't achievable in a migration but it is reproducible when executing savevm/loadvm after migrating with 'postcopy-ram on', causing loadvm to fail with Error -22: Source: (qemu) migrate_set_capability postcopy-ram on (qemu) migrate tcp:127.0.0.1:4444 Dest: (qemu) migrate_set_capability postcopy-ram on (qemu) ubuntu1704-intel login: Ubuntu 17.04 ubuntu1704-intel ttyS0 ubuntu1704-intel login: (qemu) (qemu) savevm test1 (qemu) loadvm test1 Unknown combination of migration flags: 0x4 (postcopy mode) error while loading state for instance 0x0 of device 'ram' Error -22 while loading VM state (qemu) This patch fixes this problem by changing the existing logic for postcopy_advised and postcopy_running in ram_load, making them 'false' if we're at POSTCOPY_INCOMING_END state. Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com> CC: Juan Quintela <quintela@redhat.com> CC: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reported-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-29migration: Make xbzrle_cache_size a migration parameterJuan Quintela1-7/+0
Right now it is a variable in MigrationState instead of a MigrationParameter. The change allows to set it as the rest of the Migration parameters, from the command line, with query_migration_paramters, set_migrate_parameters, etc. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-29migration: No need to return the size of the cacheJuan Quintela1-6/+4
After the previous commits, we make sure that the value passed is right, or we just drop an error. So now we return if there is one error or we have setup correctly the value passed. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Improve error messasge Return 0 always for success
2017-10-29migration: Don't play games with the requested cache sizeJuan Quintela1-5/+6
Now that we check that the value passed is a power of 2, we don't need to play games when comparing what is the size that is going to take the cache. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-10-23migration: add bitmap for received pageAlexey Perevalov1-0/+40
This patch adds ability to track down already received pages, it's necessary for calculation vCPU block time in postcopy migration feature, and for recovery after postcopy migration failure. Also it's necessary to solve shared memory issue in postcopy livemigration. Information about received pages will be transferred to the software virtual bridge (e.g. OVS-VSWITCHD), to avoid fallocate (unmap) for already received pages. fallocate syscall is required for remmaped shared memory, due to remmaping itself blocks ioctl(UFFDIO_COPY, ioctl in this case will end with EEXIT error (struct page is exists after remmap). Bitmap is placed into RAMBlock as another postcopy/precopy related bitmaps. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-23migration: postcopy_place_page factoring outAlexey Perevalov1-2/+2
Need to mark copied pages as closer as possible to the place where it tracks down. That will be necessary in futher patch. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Alexey Perevalov <a.perevalov@samsung.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-23migration: new ram_init_bitmaps()Peter Xu1-21/+30
Rearrange the bitmap initialization and the first sync. Since at it, make sure the locks are taken/released in correct order (I moved RCU unlock upper - though it may not affect much). Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-23migration: clean up xbzrle cache init/destroyPeter Xu1-46/+77
Let's further simplify ram_init_all() and ram_save_cleanup() by abstract all the XBZRLE related codes into their own functions. When allocating xbzrle cache, we are always very careful on -ENOMEM; which makes sense. Replacing the last g_malloc0() with g_try_malloc0(), then refactor the logic a bit. This patch should be fixing some memory leaks when some memory allocation failed for XBZRLE in the past. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-23migration: provide ram_state_cleanupPeter Xu1-3/+10
There are two Mutexes that are created but not yet destroyed for RAMState. Fix that. Since we are at it, provide helper function to clean up RAMState. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-23migration: provide ram_state_init()Peter Xu1-10/+26
The old ram_state_init() is not really initializing the RAMState only, but including lots of other stuff that is RAM-related. Renaming it to ram_init_all(). Instead, provide a real ram_state_init(). Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-10-23migration: Make cache_init() take an error parameterJuan Quintela1-13/+5
Once there, take a total size instead of the size of the pages. We move the check that the new_size is bigger than one page from xbzrle_cache_resize(). Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> -- Fix typo spotted by Peter Xu
2017-10-23migration: Move xbzrle cache resize error handling to xbzrle_cache_resizeJuan Quintela1-2/+20
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
2017-09-27migration: disable auto-converge during bulk block migrationPeter Lieven1-1/+5
auto-converge and block migration currently do not play well together. During block migration the auto-converge logic detects that ram migration makes no progress and thus throttles down the vm until it nearly stalls completely. Avoid this by disabling the throttling logic during the bulk phase of the block migration. Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1506421996-12513-1-git-send-email-pl@kamp.de> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2017-09-22migration: fix ram_save_pendingVladimir Sementsov-Ogievskiy1-2/+6
Fill postcopy-able pending only if ram postcopy is enabled. It is necessary because of there will be other postcopy-able states and when ram postcopy is disabled, it should not spoil common postcopy related pending. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22migration: add has_postcopy savevm handlerVladimir Sementsov-Ogievskiy1-0/+6
Now postcopy-able states are recognized by not NULL save_live_complete_postcopy handler. But when we have several different postcopy-able states, it is not convenient. Ram postcopy may be disabled, while some other postcopy enabled, in this case Ram state should behave as it is not postcopy-able. This patch add separate has_postcopy handler to specify behaviour of savevm state. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2017-09-22migration: Create multifd migration threadsJuan Quintela1-0/+202
Creation of the threads, nothing inside yet. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Use pointers instead of long array names Move to use semaphores instead of conditions as paolo suggestion Put all the state inside one struct. Use a counter for the number of threads created. Needed during cancellation. Add error return to thread creation Add id field Rename functions to multifd_save/load_setup/cleanup Change recv parameters to a pointer to struct Change back to a struct Use Error * for _cleanup
2017-08-02migration: fix comment disorder in RAMStatePeter Xu1-2/+2
Comments for "migration_dirty_pages" and "bitmap_mutex" are switched. Fix it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1501666880-10159-2-git-send-email-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>