aboutsummaryrefslogtreecommitdiff
path: root/migration
AgeCommit message (Collapse)AuthorFilesLines
2018-07-24migration: fix duplicate initialization for expected_downtime and cleanup_bhLidong Chen1-2/+0
migrate_fd_connect duplicate initialize expected_downtime and cleanup_bh. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Message-Id: <1532434585-14732-2-git-send-email-lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-24migration: disallow recovery for release-ramPeter Xu1-0/+19
Postcopy recovery won't work well with release-ram capability since release-ram will drop the page buffer as long as the page is put into the send buffer. So if there is a network failure happened, any page buffers that have not yet reached the destination VM but have already been sent from the source VM will be lost forever. Let's refuse the client from resuming such a postcopy migration. Luckily release-ram was designed to only be used when src and destination VMs are on the same host, so it should be fine. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180723123305.24792-3-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-24migration: update recv bitmap only on dest vmPeter Xu1-2/+9
We shouldn't update the received bitmap if we're the source VM. This fixes a breakage when release-ram is enabled on postcopy. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180723123305.24792-2-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-24migrate: Fix cancelling state warningDr. David Alan Gilbert1-0/+1
We've been getting the warning: migration_iteration_finish: Unknown ending state 2 on a cancel. I think that's originally due to 39b9e17905c; although I've only seen the warning, I think that in some cases that we could find the VM stays paused after a cancel where it should restart. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180719092257.12703-1-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-24migration: fix potential overflow in multifd sendPeter Xu1-1/+1
I would guess it won't happen normally, but this should ease Coverity. >>> CID 1394385: Integer handling issues (OVERFLOW_BEFORE_WIDEN) >>> Potentially overflowing expression "pages->used * 8192U" with type "unsigned int" (32 bits, unsigned) is evaluated using 32-bit arithmetic, and then used in a context that expects an expression of type "uint64_t" (64 bits, unsigned). 854 transferred = pages->used * TARGET_PAGE_SIZE + p->packet_len; Fixes: CID 1394385 CC: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180720034713.11711-1-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10migration: reorder MIG_CMD_POSTCOPY_RESUMEPeter Xu1-1/+1
It was accidently added before MIG_CMD_PACKAGED so it might break command compatibility when we run postcopy migration between old/new QEMUs. Fix that up quickly before the QEMU 3.0 release. Reported-by: Lukáš Doktor <ldoktor@redhat.com> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710094424.30754-1-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10migration: show pause/recover state on dst hostPeter Xu1-0/+2
These two states will be missing when doing "query-migrate" on destination VM. Add these states so that we can get the query results as expected. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-5-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10migration: fix incorrect bitmap size calculationPeter Xu1-2/+2
The calculation on size of received bitmap is incorrect for postcopy recovery. Here we wanted to let the size to cover all the valid bits in the bitmap, we should use DIV_ROUND_UP() instead of a division. For example, a RAMBlock with size=4K (which contains only one single 4K page) will have nbits=1, then nbits/8=0, then the real bitmap won't be sent to source at all. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-4-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10migration: loosen recovery check when load vmPeter Xu1-10/+6
We were checking against -EIO, assuming that it will cover all IO failures. But actually it is not. One example is that in qemu_loadvm_section_start_full() we can have tons of places that will return -EINVAL even if the error is caused by IO failures on the network. Let's loosen the recovery check logic here to cover all the error cases happened by removing the explicit check against -EIO. After all we won't lose anything here if any other failure happened. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-3-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10migration: simplify check to use qemu file bufferPeter Xu1-6/+11
Firstly, renaming the old matching_page_sizes variable to matches_target_page_size, which suites more to what it did (it only checks against target page size rather than multiple page sizes). Meanwhile, simplify the check logic a bit, and enhance the comments. Should have no functional change. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180710091902.28780-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10migration: unify incoming processingPeter Xu4-13/+16
This is the 2nd patch to unbreak postcopy recovery. Let's unify the migration_incoming_process() call at a single place rather than calling it in connection setup codes. This fixes a problem that we will go into incoming migration procedure even if we are trying to recovery from a paused postcopy migration. Fixes: 36c2f8be2c ("migration: Delay start of migration main routines") Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-5-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10migration: unbreak postcopy recoveryPeter Xu1-5/+18
The whole postcopy recovery logic was accidentally broken. We need to fix it in two steps. This is the first step that we should do the recovery when needed. It was bypassed before after commit 36c2f8be2c. Introduce postcopy_try_recovery() helper for the postcopy recovery logic. Call it both in migration_fd_process_incoming() and migration_ioc_process_incoming(). Fixes: 36c2f8be2c ("migration: Delay start of migration main routines") Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-4-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10migration: move income process out of multifdPeter Xu3-8/+10
Move the call to migration_incoming_process() out of multifd code. It's a bit strange that we can migration generic calls in multifd code. Instead, let multifd_recv_new_channel() return a boolean showing whether it's ready to continue the incoming migration. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-3-peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-10migration: delay postcopy paused statePeter Xu1-3/+3
Before this patch we firstly setup the postcopy-paused state then we clean up the QEMUFile handles. That can be racy if there is a very fast "migrate-recover" command running in parallel. Fix that up. Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180627132246.5576-2-peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-07-04dirty-bitmap: fix double lock on bitmap enablingVladimir Sementsov-Ogievskiy1-2/+2
Bitmap lock/unlock were added to bdrv_enable_dirty_bitmap in 8b1402ce80d, but some places were not updated correspondingly, which leads to trying to take this lock twice, which is dead-lock. Fix this. Actually, iotest 199 (about dirty bitmap postcopy migration) is broken now, and this fixes it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20180625165745.25259-3-vsementsov@virtuozzo.com Signed-off-by: John Snow <jsnow@redhat.com>
2018-06-28Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20180627' ↵Peter Maydell5-27/+508
into staging migration/next for 20180627 # gpg: Signature made Wed 27 Jun 2018 13:53:53 BST # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20180627: migration: fix crash in when incoming client channel setup fails postcopy: drop ram_pages parameter from postcopy_ram_incoming_init() migration: Stop sending whole pages through main channel migration: Remove not needed semaphore and quit migration: Wait for blocking IO migration: Start sending messages migration: Create ram_save_multifd_page migration: Create multifd_bytes ram_counter migration: Synchronize multifd threads with main thread migration: Add block where to send/receive packets migration: Multifd channels always wait on the sem migration: Add multifd traces for start/end thread migration: Abstract the number of bytes sent migration: Calculate mbps only during transfer time migration: Create multifd packet migration: Create multipage support Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-27migration: fix crash in when incoming client channel setup failsDaniel P. Berrangé1-1/+2
The way we determine if we can start the incoming migration was changed to use migration_has_all_channels() in: commit 428d89084c709e568f9cd301c2f6416a54c53d6d Author: Juan Quintela <quintela@redhat.com> Date: Mon Jul 24 13:06:25 2017 +0200 migration: Create migration_has_all_channels This method in turn calls multifd_recv_all_channels_created() which is hardcoded to always return 'true' when multifd is not in use. This is a latent bug... ...activated in a following commit where that return result ends up acting as the flag to indicate whether it is possible to start processing the migration: commit 36c2f8be2c4eb0003ac77a14910842b7ddd7337e Author: Juan Quintela <quintela@redhat.com> Date: Wed Mar 7 08:40:52 2018 +0100 migration: Delay start of migration main routines This means that if channel initialization fails with normal migration, it'll never notice and attempt to start the incoming migration regardless and crash on a NULL pointer. This can be seen, for example, if a client connects to a server requiring TLS, but has an invalid x509 certificate: qemu-system-x86_64: The certificate hasn't got a known issuer qemu-system-x86_64: migration/migration.c:386: process_incoming_migration_co: Assertion `mis->from_src_file' failed. #0 0x00007fffebd24f2b in raise () at /lib64/libc.so.6 #1 0x00007fffebd0f561 in abort () at /lib64/libc.so.6 #2 0x00007fffebd0f431 in _nl_load_domain.cold.0 () at /lib64/libc.so.6 #3 0x00007fffebd1d692 in () at /lib64/libc.so.6 #4 0x0000555555ad027e in process_incoming_migration_co (opaque=<optimized out>) at migration/migration.c:386 #5 0x0000555555c45e8b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116 #6 0x00007fffebd3a6a0 in __start_context () at /lib64/libc.so.6 #7 0x0000000000000000 in () To handle the non-multifd case, we check whether mis->from_src_file is non-NULL. With this in place, the migration server drops the rejected client and stays around waiting for another, hopefully valid, client to arrive. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20180619163552.18206-1-berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-27postcopy: drop ram_pages parameter from postcopy_ram_incoming_init()David Hildenbrand3-6/+4
Not needed. Don't expose last_ram_page(). Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20180620202736.21399-1-david@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-27migration: Stop sending whole pages through main channelJuan Quintela1-8/+3
We have to flush() the QEMUFile because now we sent really few data through that channel. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27migration: Remove not needed semaphore and quitJuan Quintela1-9/+5
We know quit with shutdwon in the QIO. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Add comment Use shutdown() instead of unref()
2018-06-27migration: Wait for blocking IOJuan Quintela1-13/+0
We have three conditions here: - channel fails -> error - we have to quit: we close the channel and reads fails - normal read that success, we are in bussiness So forget the complications of waiting in a semaphore. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27migration: Start sending messagesJuan Quintela1-5/+24
Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27migration: Create ram_save_multifd_pageJuan Quintela2-1/+108
The function still don't use multifd, but we have simplified ram_save_page, xbzrle and RDMA stuff is gone. We have added a new counter. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- Add last_page parameter Add commets for done and address Remove multifd field, it is the same than normal pages Merge next patch, now we send multiple pages at a time Remove counter for multifd pages, it is identical to normal pages Use iovec's instead of creating the equivalent. Clear memory used by pages (dave) Use g_new0(danp) define MULTIFD_CONTINUE now pages member is a pointer Fix off-by-one in number of pages in one packet Remove RAM_SAVE_FLAG_MULTIFD_PAGE s/multifd_pages_t/MultiFDPages_t/ add comment explaining what it means
2018-06-27migration: Create multifd_bytes ram_counterJuan Quintela1-0/+1
This will include how many bytes they are sent through multifd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27migration: Synchronize multifd threads with main threadJuan Quintela2-30/+121
We synchronize all threads each RAM_SAVE_FLAG_EOS. Bitmap synchronizations don't happen inside a ram section, so we are safe about two channels trying to overwrite the same memory. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- seq needs to be atomic now, will also be accessed from main thread. Fix the if (true || ...) leftover We are back to non-atomics
2018-06-27migration: Add block where to send/receive packetsJuan Quintela2-5/+46
Once there add tracepoints. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27migration: Multifd channels always wait on the semJuan Quintela1-2/+11
Either for quit, sync or packet, we first wake them. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27migration: Add multifd traces for start/end threadJuan Quintela2-0/+26
We want to know how many pages/packets each channel has sent. Add counters for those. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- sort trace-events (dave)
2018-06-27migration: Abstract the number of bytes sentJuan Quintela1-3/+11
Right now we use the "position" inside the QEMUFile, but things like RDMA already do weird things to be able to maintain that counter right, and multifd will have some similar problems. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27migration: Calculate mbps only during transfer timeJuan Quintela1-2/+4
We used to include in this calculation the setup time, but that can be quite big in rdma or multifd. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27migration: Create multifd packetJuan Quintela1-1/+144
We still don't put anything there. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> -- fix magic (dave) check offset/ramblock (dave) s/seq/packet_num/ and make it 64bit
2018-06-27migration: Create multipage supportJuan Quintela1-0/+57
We only create/destry the page list here. We will use it later. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-27trace: forbid floating point typesStefan Hajnoczi1-1/+1
Only one existing trace event uses a floating point type. Unfortunately float and double cannot be supported since SystemTap does not have floating point types. Remove float and double from the whitelist and document this limitation. Update the migrate_transferred trace event to use uint64_t instead of double. Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-id: 20180621150254.4922-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-15migration: calculate expected_downtime with ram_bytes_remaining()Balamuruhan S2-2/+2
expected_downtime value is not accurate with dirty_pages_rate * page_size, using ram_bytes_remaining() would yeild it resonable. consider to read the remaining ram just after having updated the dirty pages count later migration_bitmap_sync_range() in migration_bitmap_sync() and reuse the `remaining` field in ram_counters to hold ram_bytes_remaining() for calculating expected_downtime. Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com> Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-Id: <20180612085009.17594-2-bala24@linux.vnet.ibm.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration/postcopy: Wake rate limit sleep on postcopy requestDr. David Alan Gilbert1-1/+8
Use the 'urgent request' mechanism added in the previous patch for entries added to the postcopy request queue for RAM. Ignore the rate limiting while we have requests. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-4-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration: Wake rate limiting for urgent requestsDr. David Alan Gilbert3-4/+41
Rate limiting sleeps the migration thread for a while when it runs out of bandwidth; but sometimes we want to wake up to get on with something more urgent (like a postcopy request). Here we use a semaphore with a timedwait instead of a simple sleep; Incrementing the sempahore will wake it up sooner. Anything that consumes these urgent events must decrement the sempahore. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration/postcopy: Add max-postcopy-bandwidth parameterDr. David Alan Gilbert1-1/+34
Limit the background transfer bandwidth during the postcopy phase to the value set on this new parameter. The default, 0, corresponds to the existing behaviour which is unlimited bandwidth. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180613102642.23995-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration: introduce migration_update_ratesXiao Guangrong1-13/+22
It is used to slightly clean the code up, no logic is changed Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180604095520.8563-5-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration: fix counting xbzrle cache_miss_rateXiao Guangrong1-1/+1
Sync up xbzrle_cache_miss_prev only after migration iteration goes forward Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Message-Id: <20180604095520.8563-4-xiaoguangrong@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration/block-dirty-bitmap: fix dirty_bitmap_loadVladimir Sementsov-Ogievskiy1-0/+3
dirty_bitmap_load_header return code is obtained but not handled. Fix this. Bug was introduced in b35ebdf076d697bc "migration: add postcopy migration of dirty bitmaps" with the whole function. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180530112424.204835-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration: Poison ramblock loops in migrationDr. David Alan Gilbert2-1/+6
The migration code should be using the RAMBLOCK_FOREACH_MIGRATABLE and qemu_ram_foreach_block_migratable not the all-block versions; poison them so that we can't accidentally use them. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180605162545.80778-3-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15migration: Fixes for non-migratable RAMBlocksDr. David Alan Gilbert2-3/+3
There are still a few cases where migration code is using the macros and functions that do all RAMBlocks rather than just the migratable blocks; fix those up. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Message-Id: <20180605162545.80778-2-dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-15typedefs: add QJSONGreg Kurz1-2/+0
Since commit 83ee768d6247b, we now have two places that define the QJSON type: $ git grep 'typedef struct QJSON QJSON' include/migration/vmstate.h:typedef struct QJSON QJSON; migration/qjson.h:typedef struct QJSON QJSON; This breaks docker-test-build@centos6: In file included from /tmp/qemu-test/src/migration/savevm.c:59: /tmp/qemu-test/src/migration/qjson.h:16: error: redefinition of typedef 'QJSON' /tmp/qemu-test/src/include/migration/vmstate.h:30: note: previous declaration of 'QJSON' was here make: *** [migration/savevm.o] Error 1 This happens because CentOS 6 has an old GCC 4.4.7. Even if redefining a typedef with the same type is permitted since GCC 4.6, unless -pedantic is passed, we don't really need to do that on purpose. Let's have a single definition in <qemu/typedefs.h> instead. Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <152844714981.11789.3657734445739553287.stgit@bahia.lan> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-04Merge remote-tracking branch 'remotes/juanquintela/tags/migration/20180604' ↵Peter Maydell7-45/+90
into staging migration/next for 20180604 # gpg: Signature made Mon 04 Jun 2018 05:14:24 BST # gpg: using RSA key F487EF185872D723 # gpg: Good signature from "Juan Quintela <quintela@redhat.com>" # gpg: aka "Juan Quintela <quintela@trasno.org>" # Primary key fingerprint: 1899 FF8E DEBF 58CC EE03 4B82 F487 EF18 5872 D723 * remotes/juanquintela/tags/migration/20180604: migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect migration: remove unnecessary variables len in QIOChannelRDMA migration: Don't activate block devices if using -S migration: discard non-migratable RAMBlocks migration: introduce decompress-error-check Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell5-5/+6
acpi, vhost, misc: fixes, features vDPA support, fix to vhost blk RO bit handling, some include path cleanups, NFIT ACPI table. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 01 Jun 2018 17:25:19 BST # gpg: using RSA key 281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (31 commits) vhost-blk: turn on pre-defined RO feature bit ACPI testing: test NFIT platform capabilities nvdimm, acpi: support NFIT platform capabilities tests/.gitignore: add entry for generated file arch_init: sort architectures ui: use local path for local headers qga: use local path for local headers colo: use local path for local headers migration: use local path for local headers usb: use local path for local headers sd: fix up include vhost-scsi: drop an unused include ppc: use local path for local headers rocker: drop an unused include e1000e: use local path for local headers ioapic: fix up includes ide: use local path for local headers display: use local path for local headers trace: use local path for local headers migration: drop an unused include ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-04migration: not wait RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnectLidong Chen2-11/+2
When cancel migration during RDMA precopy, the source qemu main thread hangs sometime. The backtrace is: (gdb) bt #0 0x00007f249eabd43d in write () from /lib64/libpthread.so.0 #1 0x00007f24a1ce98e4 in rdma_get_cm_event (channel=0x4675d10, event=0x7ffe2f643dd0) at src/cma.c:2189 #2 0x00000000007b6166 in qemu_rdma_cleanup (rdma=0x6784000) at migration/rdma.c:2296 #3 0x00000000007b7cae in qio_channel_rdma_close (ioc=0x3bfcc30, errp=0x0) at migration/rdma.c:2999 #4 0x00000000008db60e in qio_channel_close (ioc=0x3bfcc30, errp=0x0) at io/channel.c:273 #5 0x00000000007a8765 in channel_close (opaque=0x3bfcc30) at migration/qemu-file-channel.c:98 #6 0x00000000007a71f9 in qemu_fclose (f=0x527c000) at migration/qemu-file.c:334 #7 0x0000000000795b96 in migrate_fd_cleanup (opaque=0x3b46280) at migration/migration.c:1162 #8 0x000000000093a71b in aio_bh_call (bh=0x3db7a20) at util/async.c:90 #9 0x000000000093a7b2 in aio_bh_poll (ctx=0x3b121c0) at util/async.c:118 #10 0x000000000093f2ad in aio_dispatch (ctx=0x3b121c0) at util/aio-posix.c:436 #11 0x000000000093ab41 in aio_ctx_dispatch (source=0x3b121c0, callback=0x0, user_data=0x0) at util/async.c:261 #12 0x00007f249f73c7aa in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #13 0x000000000093dc5e in glib_pollfds_poll () at util/main-loop.c:215 #14 0x000000000093dd4e in os_host_main_loop_wait (timeout=28000000) at util/main-loop.c:263 #15 0x000000000093de05 in main_loop_wait (nonblocking=0) at util/main-loop.c:522 #16 0x00000000005bc6a5 in main_loop () at vl.c:1944 #17 0x00000000005c39b5 in main (argc=56, argv=0x7ffe2f6443f8, envp=0x3ad0030) at vl.c:4752 It does not get the RDMA_CM_EVENT_DISCONNECTED event after rdma_disconnect sometime. According to IB Spec once active side send DREQ message, it should wait for DREP message and only once it arrived it should trigger a DISCONNECT event. DREP message can be dropped due to network issues. For that case the spec defines a DREP_timeout state in the CM state machine, if the DREP is dropped we should get a timeout and a TIMEWAIT_EXIT event will be trigger. Unfortunately the current kernel CM implementation doesn't include the DREP_timeout state and in above scenario we will not get DISCONNECT or TIMEWAIT_EXIT events. So it should not invoke rdma_get_cm_event which may hang forever, and the event channel is also destroyed in qemu_rdma_cleanup. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-04migration: remove unnecessary variables len in QIOChannelRDMALidong Chen1-8/+7
Because qio_channel_rdma_writev and qio_channel_rdma_readv maybe invoked by different threads concurrently, this patch removes unnecessary variables len in QIOChannelRDMA and use local variable instead. Signed-off-by: Lidong Chen <lidongchen@tencent.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Lidong Chen <jemmy858585@gmail.com>
2018-06-04migration: Don't activate block devices if using -SDr. David Alan Gilbert1-7/+27
Activating the block devices causes the locks to be taken on the backing file. If we're running with -S and the destination libvirt hasn't started the destination with 'cont', it's expecting the locks are still untaken. Don't activate the block devices if we're not going to autostart the VM; 'cont' already will do that anyway. This change is tied to the new migration capability 'late-block-activate' that defaults to off, keeping the old behaviour by default. bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-04migration: discard non-migratable RAMBlocksCédric Le Goater3-18/+42
On the POWER9 processor, the XIVE interrupt controller can control interrupt sources using MMIO to trigger events, to EOI or to turn off the sources. Priority management and interrupt acknowledgment is also controlled by MMIO in the presenter sub-engine. These MMIO regions are exposed to guests in QEMU with a set of 'ram device' memory mappings, similarly to VFIO, and the VMAs are populated dynamically with the appropriate pages using a fault handler. But, these regions are an issue for migration. We need to discard the associated RAMBlocks from the RAM state on the source VM and let the destination VM rebuild the memory mappings on the new host in the post_load() operation just before resuming the system. To achieve this goal, the following introduces a new RAMBlock flag RAM_MIGRATABLE which is updated in the vmstate_register_ram() and vmstate_unregister_ram() routines. This flag is then used by the migration to identify RAMBlocks to discard on the source. Some checks are also performed on the destination to make sure nothing invalid was sent. This change impacts the boston, malta and jazz mips boards for which migration compatibility is broken. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
2018-06-04migration: introduce decompress-error-checkXiao Guangrong3-1/+12
QEMU 3.0 enables strict check for compression & decompression to make the migration more robust, that depends on the source to fix the internal design which triggers the unexpected error conditions To make it work for migrating old version QEMU to 2.13 QEMU, we introduce this parameter to disable the error check on the destination which is the default behavior of the machine type which is older than 2.13, alternately, the strict check can be enabled explicitly as followings: -M pc-q35-2.11 -global migration.decompress-error-check=true Signed-off-by: Xiao Guangrong <xiaoguangrong@tencent.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>