aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2025-01-09multifd: bugfix for incorrect migration data with qatzip compressionYuan Liu1-0/+1
When QPL compression is enabled on the migration channel and the same dirty page changes from a normal page to a zero page in the iterative memory copy, the dirty page will not be updated to a zero page again on the target side, resulting in incorrect memory data on the source and target sides. The root cause is that the target side does not record the normal pages to the receivedmap. The solution is to add ramblock_recv_bitmap_set_offset in target side to record the normal pages. Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20241218091413.140396-4-yuan1.liu@intel.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09multifd: bugfix for incorrect migration data with QPL compressionYuan Liu1-0/+1
When QPL compression is enabled on the migration channel and the same dirty page changes from a normal page to a zero page in the iterative memory copy, the dirty page will not be updated to a zero page again on the target side, resulting in incorrect memory data on the source and target sides. The root cause is that the target side does not record the normal pages to the receivedmap. The solution is to add ramblock_recv_bitmap_set_offset in target side to record the normal pages. Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20241218091413.140396-3-yuan1.liu@intel.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09multifd: bugfix for migration using compression methodsYuan Liu1-2/+1
When compression is enabled on the migration channel and the pages processed are all zero pages, these pages will not be sent and updated on the target side, resulting in incorrect memory data on the source and target sides. The root cause is that all compression methods call multifd_send_prepare_common to determine whether to compress dirty pages, but multifd_send_prepare_common does not update the IOV of MultiFDPacket_t when all dirty pages are zero pages. The solution is to always update the IOV of MultiFDPacket_t regardless of whether the dirty pages are all zero pages. Fixes: 303e6f54f9 ("migration/multifd: Implement zero page transmission on the multifd thread.") Cc: qemu-stable@nongnu.org #9.0+ Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Jason Zeng <jason.zeng@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20241218091413.140396-2-yuan1.liu@intel.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09s390x: Fix CSS migrationFabiano Rosas1-1/+1
Commit a55ae46683 ("s390: move css_migration_enabled from machine to css.c") disabled CSS migration globally instead of doing it per-instance. CC: Paolo Bonzini <pbonzini@redhat.com> CC: qemu-stable@nongnu.org #9.1 Fixes: a55ae46683 ("s390: move css_migration_enabled from machine to css.c") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2704 Reviewed-by: Thomas Huth <thuth@redhat.com> Message-Id: <20250109185249.23952-8-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration: Fix arrays of pointers in JSON writerFabiano Rosas2-9/+50
Currently, if an array of pointers contains a NULL pointer, that pointer will be encoded as '0' in the stream. Since the JSON writer doesn't define a "pointer" type, that '0' will now be an uint8, which is different from the original type being pointed to, e.g. struct. (we're further calling uint8 "nullptr", but that's irrelevant to the issue) That mixed-type array shouldn't be compressed, otherwise data is lost as the code currently makes the whole array have the type of the first element: css = {NULL, NULL, ..., 0x5555568a7940, NULL}; {"name": "s390_css", "instance_id": 0, "vmsd_name": "s390_css", "version": 1, "fields": [ ..., {"name": "css", "array_len": 256, "type": "nullptr", "size": 1}, ..., ]} In the above, the valid pointer at position 254 got lost among the compressed array of nullptr. While we could disable the array compression when a NULL pointer is found, the JSON part of the stream still makes part of downtime, so we should avoid writing unecessary bytes to it. Keep the array compression in place, but if NULL and non-NULL pointers are mixed break the array into several type-contiguous pieces : css = {NULL, NULL, ..., 0x5555568a7940, NULL}; {"name": "s390_css", "instance_id": 0, "vmsd_name": "s390_css", "version": 1, "fields": [ ..., {"name": "css", "array_len": 254, "type": "nullptr", "size": 1}, {"name": "css", "type": "struct", "struct": {"vmsd_name": "s390_css_img", ... }, "size": 768}, {"name": "css", "type": "nullptr", "size": 1}, ..., ]} Now each type-discontiguous region will become a new JSON entry. The reader should interpret this as a concatenation of values, all part of the same field. Parsing the JSON with analyze-script.py now shows the proper data being pointed to at the places where the pointer is valid and "nullptr" where there's NULL: "s390_css (14)": { ... "css": [ "nullptr", "nullptr", ... "nullptr", { "chpids": [ { "in_use": "0x00", "type": "0x00", "is_virtual": "0x00" }, ... ] }, "nullptr", } Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20250109185249.23952-7-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration: Dump correct JSON format for nullptr replacementPeter Xu1-27/+91
QEMU plays a trick with null pointers inside an array of pointers in a VMSD field. See 07d4e69147 ("migration/vmstate: fix array of ptr with nullptrs") for more details on why. The idea makes sense in general, but it may overlooked the JSON writer where it could write nothing in a "struct" in the JSON hints section. We hit some analyze-migration.py issues on s390 recently, showing that some of the struct field contains nothing, like: {"name": "css", "array_len": 256, "type": "struct", "struct": {}, "size": 1} As described in details by Fabiano: https://lore.kernel.org/r/87pll37cin.fsf@suse.de It could be that we hit some null pointers there, and JSON was gone when they're null pointers. To fix it, instead of hacking around only at VMStateInfo level, do that from VMStateField level, so that JSON writer can also be involved. In this case, JSON writer will replace the pointer array (which used to be a "struct") to be the real representation of the nullptr field. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20250109185249.23952-6-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration: Rename vmstate_info_nullptrFabiano Rosas2-1/+24
Rename vmstate_info_nullptr from "uint64_t" to "nullptr". This vmstate actually reads and writes just a byte, so the proper name would be uint8. However, since this is a marker for a NULL pointer, it's convenient to have a more explicit name that can be identified by the consumers of the JSON part of the stream. Change the name to "nullptr" and add support for it in the analyze-migration.py script. Arbitrarily use the name of the type as the value of the field to avoid the script showing 0x30 or '0', which could be confusing for readers. Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20250109185249.23952-5-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration: Fix parsing of s390 streamFabiano Rosas1-1/+17
The parsing for the S390StorageAttributes section is currently leaving an unconsumed token that is later interpreted by the generic code as QEMU_VM_EOF, cutting the parsing short. The migration will issue a STATTR_FLAG_DONE between iterations, which the script consumes correctly, but there's a final STATTR_FLAG_EOS at .save_complete that the script is ignoring. Since the EOS flag is a u64 0x1ULL and the stream is big endian, on little endian hosts a byte read from it will be 0x0, the same as QEMU_VM_EOF. Fixes: 81c2c9dd5d ("tests/qtest/migration-test: Fix analyze-migration.py for s390x") Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20250109185249.23952-4-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration: Remove unused argument in vmsd_desc_field_endFabiano Rosas1-2/+2
Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20250109185249.23952-3-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration: Add more error handling to analyze-migration.pyFabiano Rosas1-29/+44
The analyze-migration script was seen failing in s390x in misterious ways. It seems we're reaching the VMSDFieldStruct constructor without any fields, which would indicate an empty .subsection entry, a VMSTATE_STRUCT with no fields or a vmsd with no fields. We don't have any of those, at least not without the unmigratable flag set, so this should never happen. Add some debug statements so that we can see what's going on the next time the issue happens. Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20250109185249.23952-2-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/block: Rewrite disk activationPeter Xu9-91/+140
This patch proposes a flag to maintain disk activation status globally. It mostly rewrites disk activation mgmt for QEMU, including COLO and QMP command xen_save_devices_state. Backgrounds =========== We have two problems on disk activations, one resolved, one not. Problem 1: disk activation recover (for switchover interruptions) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When migration is either cancelled or failed during switchover, especially when after the disks are inactivated, QEMU needs to remember re-activate the disks again before vm starts. It used to be done separately in two paths: one in qmp_migrate_cancel(), the other one in the failure path of migration_completion(). It used to be fixed in different commits, all over the places in QEMU. So these are the relevant changes I saw, I'm not sure if it's complete list: - In 2016, commit fe904ea824 ("migration: regain control of images when migration fails to complete") - In 2017, commit 1d2acc3162 ("migration: re-active images while migration been canceled after inactive them") - In 2023, commit 6dab4c93ec ("migration: Attempt disk reactivation in more failure scenarios") Now since we have a slightly better picture maybe we can unify the reactivation in a single path. One side benefit of doing so is, we can move the disk operation outside QMP command "migrate_cancel". It's possible that in the future we may want to make "migrate_cancel" be OOB-compatible, while that requires the command doesn't need BQL in the first place. This will already do that and make migrate_cancel command lightweight. Problem 2: disk invalidation on top of invalidated disks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is an unresolved bug for current QEMU. Link in "Resolves:" at the end. It turns out besides the src switchover phase (problem 1 above), QEMU also needs to remember block activation on destination. Consider two continuous migration in a row, where the VM was always paused. In that scenario, the disks are not activated even until migration completed in the 1st round. When the 2nd round starts, if QEMU doesn't know the status of the disks, it needs to try inactivate the disk again. Here the issue is the block layer API bdrv_inactivate_all() will crash a QEMU if invoked on already inactive disks for the 2nd migration. For detail, see the bug link at the end. Implementation ============== This patch proposes to maintain disk activation with a global flag, so we know: - If we used to inactivate disks for migration, but migration got cancelled, or failed, QEMU will know it should reactivate the disks. - On incoming side, if the disks are never activated but then another migration is triggered, QEMU should be able to tell that inactivate is not needed for the 2nd migration. We used to have disk_inactive, but it only solves the 1st issue, not the 2nd. Also, it's done in completely separate paths so it's extremely hard to follow either how the flag changes, or the duration that the flag is valid, and when we will reactivate the disks. Convert the existing disk_inactive flag into that global flag (also invert its naming), and maintain the disk activation status for the whole lifecycle of qemu. That includes the incoming QEMU. Put both of the error cases of source migration (failure, cancelled) together into migration_iteration_finish(), which will be invoked for either of the scenario. So from that part QEMU should behave the same as before. However with such global maintenance on disk activation status, we not only cleanup quite a few temporary paths that we try to maintain the disk activation status (e.g. in postcopy code), meanwhile it fixes the crash for problem 2 in one shot. For freshly started QEMU, the flag is initialized to TRUE showing that the QEMU owns the disks by default. For incoming migrated QEMU, the flag will be initialized to FALSE once and for all showing that the dest QEMU doesn't own the disks until switchover. That is guaranteed by the "once" variable. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2395 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-7-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/block: Fix possible race with block_inactivePeter Xu2-6/+5
Src QEMU sets block_inactive=true very early before the invalidation takes place. It means if something wrong happened during setting the flag but before reaching qemu_savevm_state_complete_precopy_non_iterable() where it did the invalidation work, it'll make block_inactive flag inconsistent. For example, think about when qemu_savevm_state_complete_precopy_iterable() can fail: it will have block_inactive set to true even if all block drives are active. Fix that by only update the flag after the invalidation is done. No Fixes for any commit, because it's not an issue if bdrv_activate_all() is re-entrant upon all-active disks - false positive block_inactive can bring nothing more than "trying to active the blocks but they're already active". However let's still do it right to avoid the inconsistent flag v.s. reality. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-6-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/block: Apply late-block-active behavior to postcopyPeter Xu1-13/+12
Postcopy never cared about late-block-active. However there's no mention in the capability that it doesn't apply to postcopy. Considering that we _assumed_ late activation is always good, do that too for postcopy unconditionally, just like precopy. After this patch, we should have unified the behavior across all. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-5-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/block: Make late-block-active the defaultPeter Xu1-19/+19
Migration capability 'late-block-active' controls when the block drives will be activated. If enabled, block drives will only be activated until VM starts, either src runstate was "live" (RUNNING, or SUSPENDED), or it'll be postponed until qmp_cont(). Let's do this unconditionally. There's no harm to delay activation of block drives. Meanwhile there's no ABI breakage if dest does it, because src QEMU has nothing to do with it, so it's no concern on ABI breakage. IIUC we could avoid introducing this cap when introducing it before, but now it's still not too late to just always do it. Cap now prone to removal, but it'll be for later patches. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-4-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09qmp/cont: Only activate disks if migration completedPeter Xu1-12/+14
As the comment says, the activation of disks is for the case where migration has completed, rather than when QEMU is still during migration (RUN_STATE_INMIGRATE). Move the code over to reflect what the comment is describing. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-3-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration: Add helper to get target runstatePeter Xu1-4/+17
In 99% cases, after QEMU migrates to dest host, it tries to detect the target VM runstate using global_state_get_runstate(). There's one outlier so far which is Xen that won't send global state. That's the major reason why global_state_received() check was always there together with global_state_get_runstate(). However it's utterly confusing why global_state_received() has anything to do with "let's start VM or not". Provide a helper to explain it, then we have an unified entry for getting the target dest QEMU runstate after migration. Suggested-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20241206230838.1111496-2-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/multifd: Fix compat with QEMU < 9.0Fabiano Rosas1-6/+9
Commit f5f48a7891 ("migration/multifd: Separate SYNC request with normal jobs") changed the multifd source side to stop sending data along with the MULTIFD_FLAG_SYNC, effectively introducing the concept of a SYNC-only packet. Relying on that, commit d7e58f412c ("migration/multifd: Don't send ram data during SYNC") later came along and skipped reading data from SYNC packets. In a versions timeline like this: 8.2 f5f48a7 9.0 9.1 d7e58f41 9.2 The issue arises that QEMUs < 9.0 still send data along with SYNC, but QEMUs > 9.1 don't gather that data anymore. This leads to various kinds of migration failures due to desync/missing data. Stop checking for a SYNC packet on the destination and unconditionally unfill the packet. >From now on: old -> new: the source sends data + sync, destination reads normally new -> new: source sends only sync, destination reads zeros new -> old: source sends only sync, destination reads zeros CC: qemu-stable@nongnu.org Fixes: d7e58f412c ("migration/multifd: Don't send ram data during SYNC") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2720 Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241213160120.23880-2-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/multifd: Document the reason to sync for save_setup()Peter Xu1-0/+25
It's not straightforward to see why src QEMU needs to sync multifd during setup() phase. After all, there's no page queued at that point. For old QEMUs, there's a solid reason: EOS requires it to work. While it's clueless on the new QEMUs which do not take EOS message as sync requests. One will figure that out only when this is conditionally removed. In fact, the author did try it out. Logically we could still avoid doing this on new machine types, however that needs a separate compat field and that can be an overkill in some trivial overhead in setup() phase. Let's instead document it completely, to avoid someone else tries this again and do the debug one more time, or anyone confused on why this ever existed. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206224755.1108686-8-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/multifd: Cleanup src flushes on condition checkPeter Xu3-7/+47
The src flush condition check is over complicated, and it's getting more out of control if postcopy will be involved. In general, we have two modes to do the sync: legacy or modern ways. Legacy uses per-section flush, modern uses per-round flush. Mapped-ram always uses the modern, which is per-round. Introduce two helpers, which can greatly simplify the code, and hopefully make it readable again. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206224755.1108686-7-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/multifd: Remove sync processing on postcopyPeter Xu1-8/+0
Multifd never worked with postcopy, at least yet so far. Remove the sync processing there, because it's confusing, and they should never appear. Now if RAM_SAVE_FLAG_MULTIFD_FLUSH is observed, we fail hard instead of trying to invoke multifd code. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20241206224755.1108686-6-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messagesPeter Xu3-17/+30
RAM_SAVE_FLAG_MULTIFD_FLUSH message should always be correlated to a sync request on src. Unify such message into one place, and conditionally send the message only if necessary. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20241206224755.1108686-5-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/ram: Move RAM_SAVE_FLAG* into ram.hPeter Xu3-28/+28
Firstly, we're going to use the multifd flag soon in multifd code, so ram.c isn't gonna work. Secondly, we have a separate RDMA flag dangling around, which is definitely not obvious. There's one comment that helps, but not too much. Put all RAM save flags altogether, so nothing will get overlooked. Add a section explain why we can't use bits over 0x200. Remove RAM_SAVE_FLAG_FULL as it's already not used in QEMU, as the comment explained. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20241206224755.1108686-4-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/multifd: Allow to sync with sender threads onlyPeter Xu3-10/+37
Teach multifd_send_sync_main() to sync with threads only. We already have such requests, which is when mapped-ram is enabled with multifd. In that case, no SYNC messages will be pushed to the stream when multifd syncs the sender threads because there's no destination threads waiting for that. The whole point of the sync is to make sure all threads finished their jobs. So fundamentally we have a request to do the sync in different ways: - Either to sync the threads only, - Or to sync the threads but also with the destination side. Mapped-ram did it already because of the use_packet check in the sync handler of the sender thread. It works. However it may stop working when e.g. VFIO may start to reuse multifd channels to push device states. In that case VFIO has similar request on "thread-only sync" however we can't check a flag because such sync request can still come from RAM which needs the on-wire notifications. Paving way for that by allowing the multifd_send_sync_main() to specify what kind of sync the caller needs. We can use it for mapped-ram already. No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206224755.1108686-3-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/multifd: Further remove the SYNC on completePeter Xu1-3/+10
Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in complete()") stopped sending the RAM_SAVE_FLAG_MULTIFD_FLUSH flag at ram_save_complete(), because the sync on the destination side is not needed due to the last iteration of find_dirty_block() having already done it. However, that commit overlooked that multifd_ram_flush_and_sync() on the source side is also not needed at ram_save_complete(), for the same reason. Moreover, removing the RAM_SAVE_FLAG_MULTIFD_FLUSH but keeping the multifd_ram_flush_and_sync() means that currently the recv threads will hang when receiving the MULTIFD_FLAG_SYNC message, waiting for the destination sync which only happens when RAM_SAVE_FLAG_MULTIFD_FLUSH is received. Luckily, multifd is still all working fine because recv side cleanup code (mostly multifd_recv_sync_main()) is smart enough to make sure even if recv threads are stuck at SYNC it'll get kicked out. And since this is the completion phase of migration, nothing else will be sent after the SYNCs. This needs to be fixed because in the future VFIO will have data to push after ram_save_complete() and we don't want the recv thread to be stuck in the MULTIFD_FLAG_SYNC message. Remove the unnecessary (and buggy) invocation of multifd_ram_flush_and_sync(). For very old binaries (multifd_flush_after_each_section==true), the flush_and_sync is still needed because each EOS received on destination will enforce all-channel sync once. Stable branches do not need this patch, as no real bug I can think of that will go wrong there.. so not attaching Fixes to be clear on the backport not needed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20241206224755.1108686-2-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09migration/multifd: Fix compile error caused by page_size usageShameer Kolothum1-1/+1
>From Commit 90fa121c6c07 ("migration/multifd: Inline page_size and page_count") onwards page_size is not part of MutiFD*Params but uses an inline constant instead. However, it missed updating an old usage, causing a compile error. Fixes: 90fa121c6c07 ("migration/multifd: Inline page_size and page_count") Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241203124943.52572-1-shameerali.kolothum.thodi@huawei.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-01-09system: Inline machine_containers[] in qemu_create_machine_containers()Philippe Mathieu-Daudé1-9/+7
Only qemu_create_machine_containers() uses the machine_containers[] array, restrict the scope to this single user. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250102211800.79235-9-philmd@linaro.org>
2025-01-09qom: remove unused InterfaceInfo::concrete_class fieldPaolo Bonzini2-2/+4
The "concrete_class" field of InterfaceClass is only ever written, and as far as I can tell is not particularly useful when debugging either; remove it. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-ID: <20250107111308.21886-1-pbonzini@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-01-09qom: Remove container_get()Peter Xu2-34/+0
Now there's no user of container_get(), remove it. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20241121192202.4155849-14-peterx@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2025-01-09qom: Use object_get_container()Peter Xu6-8/+8
Use object_get_container() whenever applicable across the tree. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20241121192202.4155849-13-peterx@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2025-01-09qom: Add object_get_container()Peter Xu2-0/+20
Add a helper to fetch a root container (under object_get_root()). Sanity check on the type of the object. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-ID: <20241121192202.4155849-12-peterx@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2025-01-09qdev: Use machine_get_container()Peter Xu8-15/+12
Use machine_get_container() whenever applicable across the tree. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20241121192202.4155849-11-peterx@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2025-01-09qdev: Add machine_get_container()Peter Xu2-0/+21
Add a helper to fetch machine containers. Add some sanity check around. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Peter Xu <peterx@redhat.com> Message-ID: <20241121192202.4155849-10-peterx@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2025-01-09qdev: Make qdev_get_machine() not use container_get()Peter Xu1-1/+6
Currently, qdev_get_machine() has a slight misuse on container_get(), as the helper says "get a container" but in reality the goal is to get the machine object. It is still a "container" but not strictly. Note that it _may_ get a container (at "/machine") in our current unit test of test-qdev-global-props.c before all these changes, but it's probably unexpected and worked by accident. Switch to an explicit object_resolve_path_component(), with a side benefit that qdev_get_machine() can happen a lot, and we don't need to split the string ("/machine") every time. This also paves way for making the helper container_get() never try to return a non-container at all. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20241121192202.4155849-9-peterx@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2025-01-09qdev: Implement qdev_create_fake_machine() for user emulationPhilippe Mathieu-Daudé4-1/+37
When a QDev instance is realized, qdev_get_machine() ends up called. In the next commit, qdev_get_machine() will require a "machine" container to be always present. To satisfy this QOM containers design, Implement qdev_create_fake_machine() which creates a fake "machine" container for user emulation. On system emulation, qemu_create_machine() is called from qemu_init(). For user emulation, since the TCG accelerator always calls tcg_init_machine(), we use it to hook our fake machine creation. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250102211800.79235-2-philmd@linaro.org>
2025-01-09qdev: Remove opts memberAkihiko Odaki3-10/+7
It is no longer used. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20250104-reuse-v18-14-c349eafd8673@daynix.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-01-09hw/pci: Use -1 as the default value for rombarAkihiko Odaki3-5/+4
vfio_pci_size_rom() distinguishes whether rombar is explicitly set to 1 by checking dev->opts, bypassing the QOM property infrastructure. Use -1 as the default value for rombar to tell if the user explicitly set it to 1. The property is also converted from unsigned to signed. -1 is signed so it is safe to give it a new meaning. The values in [2 ^ 31, 2 ^ 32) become invalid, but nobody should have typed these values by chance. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20250104-reuse-v18-13-c349eafd8673@daynix.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2025-01-09Merge tag 'pull-xenfv-20250109-1' of https://gitlab.com/dwmw2/qemu into stagingStefan Hajnoczi4-25/+73
Xen emulation fixes # -----BEGIN PGP SIGNATURE----- # # iQJGBAABCAAwFiEEMUsIrNDeSBEzpfKGm+mA/QrAFUQFAmd/qNYSHGR3bXdAYW1h # em9uLmNvLnVrAAoJEJvpgP0KwBVEtHsP/1qdpeVDCW1LAdGsOl9vixBXTR5/85G4 # m1KilpAPyxla8WfChRIagIdSAYGP5gN+yzbZ74AGb8HxumqJdl0bj6Gtqms2r8EQ # 4T7IU1iNONDkncApkHdQW9BdKg4Atq3dY8dEaN1UxzCfRjHC/KS5vHPN3OzGKqJ1 # tAk8wOcDtp7cfW+utw2ssjVR14cfJLQCR7/ehBfeFkC0DSd8p/yTJ31bFnLyPpBn # vh03MrslqV+h47D0uQxKwx5rtvNQhhIc/eRR/RymY3BSzAqRiyed/hTvsrRy4y/Z # EXB8ACQ6U2Ikrj//VXimSTx5aQDeGIU8nD6zvNRWZ1rTmTtD3n5dOxL2U9U5DBHb # TtlYhyochV6zO76mbINyjkSkGdj8ZZgF+5w5IIEhjazfHdWDuMdG0IjcRxl0r2Qz # 4jaoVjxMUT/MLI4noSVYFF29/aWYxsk/nsYCPOM2X4WuzK4/ragIWbpZZqOIFn4X # NyEc7xD2z9iL3MZe0Ygsa1eRpi/Gak0ih6W/u6ngON2EGESdF4T+CI+zTp6I4xtp # jOrAGltp6012pRJibHrKKdpnTYuQCRj3kSFAEP+JhNSBDUhbZ5lJWTnxiW7BkBO4 # BujmX3TMFsdt4jDqNQzht84Tgf4JEAYbGCks9msFcoYdZovKcyG3kgfZyAVfEap2 # kvCgGk7JMz1A # =5kvA # -----END PGP SIGNATURE----- # gpg: Signature made Thu 09 Jan 2025 05:45:42 EST # gpg: using RSA key 314B08ACD0DE481133A5F2869BE980FD0AC01544 # gpg: issuer "dwmw@amazon.co.uk" # gpg: Good signature from "David Woodhouse <dwmw@amazon.co.uk>" [unknown] # gpg: aka "David Woodhouse <dwmw@amazon.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 314B 08AC D0DE 4811 33A5 F286 9BE9 80FD 0AC0 1544 * tag 'pull-xenfv-20250109-1' of https://gitlab.com/dwmw2/qemu: hw/xen: Check if len is 0 before memcpy() hw/i386/pc: Fix level interrupt sharing for Xen event channel GSI Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-01-09Merge tag 'pull-loongarch-20250109' of https://gitlab.com/bibo-mao/qemu into ↵Stefan Hajnoczi12-33/+142
staging loongarch queue # -----BEGIN PGP SIGNATURE----- # # iHUEABYKAB0WIQQNhkKjomWfgLCz0aQfewwSUazn0QUCZ39pJgAKCRAfewwSUazn # 0YpMAQCNV9KJJ8f8EaXAw5a87mnmlcP0vRi5gZiyv1ZV9gRqPgEAhzCn/rnzpzd+ # H3B1fRlD1xmaQ8IqRugQ4vfDBd9CyQY= # =OG4d # -----END PGP SIGNATURE----- # gpg: Signature made Thu 09 Jan 2025 01:13:58 EST # gpg: using EDDSA key 0D8642A3A2659F80B0B3D1A41F7B0C1251ACE7D1 # gpg: Good signature from "bibo mao <maobibo@loongson.cn>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 7044 3A00 19C0 E97A 31C7 13C4 8E86 8FB7 A176 9D4C # Subkey fingerprint: 0D86 42A3 A265 9F80 B0B3 D1A4 1F7B 0C12 51AC E7D1 * tag 'pull-loongarch-20250109' of https://gitlab.com/bibo-mao/qemu: hw/intc/loongarch_extioi: Add irq routing support from physical id hw/intc/loongarch_extioi: Remove num-cpu property hw/intc/loongarch_extioi: Get cpu number from possible_cpu_arch_ids target/loongarch: Only support 64bit pte width hw/loongarch/boot: Support Linux raw boot image hw/core/loader: Use ssize_t for efi zboot unpacker Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-01-09hw/xen: Check if len is 0 before memcpy()Akihiko Odaki1-0/+4
data->data can be NULL when len is 0. Strictly speaking, the behavior of memcpy() in such a scenario is undefined so UBSan complaints. Satisfy UBSan by checking if len is 0 before memcpy(). Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
2025-01-09hw/i386/pc: Fix level interrupt sharing for Xen event channel GSIDavid Woodhouse3-25/+69
The system GSIs are not designed for sharing. One device might assert a shared interrupt with qemu_set_irq() and another might deassert it, and the level from the first device is lost. This could be solved by refactoring the x86 GSI code to use an OrIrq device, but that still wouldn't be ideal. The best answer would be to have a 'resample' callback which is invoked when the interrupt is acked at the interrupt controller, and causes the devices to re-trigger the interrupt if it should still be pending. This is the model that VFIO in Linux uses, with a 'resampler' eventfd that actually unmasks the interrupt on the hardware device and thus triggers a new interrupt from it if needed. As things stand, QEMU currently doesn't use that VFIO interface correctly, and just bashes on the resampler for every MMIO access to the device "just in case". Which requires unmapping and trapping the MMIO while an interrupt is pending! For the Xen callback GSI, QEMU does something similar — a flag is set which triggers a poll on *every* vmexst to see if the GSI should be deasserted. Proper resampler support would solve all of that, but is a task for later which has already been on the TODO list for a while. Since the Xen event channel GSI support *already* has hooks into the PC gsi_handler() code for routing GSIs to PIRQs, we can use that for a simpler bug fix. So... remember the externally-driven state of the line (from e.g. PCI INTx) and set the logical OR of that with the GSI. As a bonus, we now only need to enable the polling of vcpu_info on vmexit if the Xen callback GSI is the *only* reason the corresponding line is asserted. Closes: https://gitlab.com/qemu-project/qemu/-/issues/2731 Fixes: ddf0fd9ae1fd ("hw/xen: Support HVM_PARAM_CALLBACK_TYPE_GSI callback") Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Michael S. Tsirkin <mst@redhat.com>
2025-01-09hw/intc/loongarch_extioi: Add irq routing support from physical idBibo Mao1-4/+26
The simliar with IPI interrupt controller, physical cpu id is used for irq routing for extioi interrupt controller. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-01-09hw/intc/loongarch_extioi: Remove num-cpu propertyBibo Mao2-2/+0
Since cpu number can be acquired from possible_cpu_arch_ids(), num-cpu property is not necessary. Here remove num-cpu property for object TYPE_LOONGARCH_EXTIOI_COMMON object. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-01-09hw/intc/loongarch_extioi: Get cpu number from possible_cpu_arch_idsBibo Mao3-8/+17
Supported CPU number can be acquired from function possible_cpu_arch_ids(), cpu-num property is not necessary. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-01-09target/loongarch: Only support 64bit pte widthBibo Mao4-15/+26
iFrom LoongArch Reference Manual pte width can be 64bit, 128bit or more. Instead real hardware only supports 64bit pte width. For 12bit pte, there is no detail definition for all 128bit from manual. Here only 64bit pte width is supported for simplicity, will add this in later if real hw support it and there is definition for all the bits from manual. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-01-09hw/loongarch/boot: Support Linux raw boot imageJiaxun Yang1-0/+69
Support booting such image by parsing header as per Linux's specification [1]. This enabled booting vmlinux.efi/vmlinuz.efi shipped by distros without supplying BIOS. [1]: https://docs.kernel.org/arch/loongarch/booting.html Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-01-09hw/core/loader: Use ssize_t for efi zboot unpackerJiaxun Yang3-4/+4
Convert to use sszie_t to represent size internally to avoid large image overflowing the size. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Bibo Mao <maobibo@loongson.cn>
2025-01-08Merge tag 'pull-request-2025-01-08' of https://gitlab.com/thuth/qemu into ↵Stefan Hajnoczi18-213/+24
staging * Fix compilation problem in s390x tcg tests * Remove obsolete versioned s390x machine types 2.4 up to 2.8 * Remove deprecated -runas command line option * Fix the x86_64_hotplug_cpu functional test # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEEJ7iIR+7gJQEY8+q5LtnXdP5wLbUFAmd+OikRHHRodXRoQHJl # ZGhhdC5jb20ACgkQLtnXdP5wLbXTig/+Kgt0H5KQzCvGuRDYYD20Iq9MLy8iTJdj # bmPyjyMWI88Mm4qjZUO2Kcv/MSEP3MhnaTLEE/PJYgE2D8Cq2AFMLQf4ES01Uzwt # kFBSHvAlTOgcMFcu6QiWtetlqbqixuPs146J9vvUde1xTWCpB2FuB+wNneSTl6J1 # Ilrb7SM2TzbZ1s3NPuvBSVltQlA8MYa5VdQMvljAZGe2GE5vlpNjXmJMiZ/QJ7lp # +LVZppbB5lYxH0SYf56r2tR8fxqfQi6W4WoGeuBf9FnDf815cbcwagpOnaJiQFzx # iG1WusLt1tlW2arBDP7dzrjNjByjMFhwJvKaoXuSnGS29ivPuVq33mBkcFh80cv3 # 8rElZR+M5tmGfbC/mMwH7FSesUrEH18ZwRyc9RGqZ/QtqoBaukBJpxk6KuyNUbWn # 3x0ZUbGhAItGGNaZ54T70LdM3dTmEISMOnp5YTlv8oUFoROhfvWn20FMIge0Ib7t # Epu/nSiHzzdtURHZ1Z8EKNmCOmiKRW4WpF34hzQ+4uid5BAsCNMVMe0JZy5plN9L # V78CZ4qiqipmHIpKnI81x8L2admyJ3nklYKSRkaCmZFhpSyh8rdMUjCgt9A6xFfe # 34Jha7+MrJq9nnoeYIH/dS1Jhv+9vygNahDy9XaWXIHFPd7Nrsr79BWSC3fe1w5O # DlsdrHyGHvU= # =II7e # -----END PGP SIGNATURE----- # gpg: Signature made Wed 08 Jan 2025 03:41:13 EST # gpg: using RSA key 27B88847EEE0250118F3EAB92ED9D774FE702DB5 # gpg: issuer "thuth@redhat.com" # gpg: Good signature from "Thomas Huth <th.huth@gmx.de>" [full] # gpg: aka "Thomas Huth <thuth@redhat.com>" [full] # gpg: aka "Thomas Huth <huth@tuxfamily.org>" [full] # gpg: aka "Thomas Huth <th.huth@posteo.de>" [unknown] # Primary key fingerprint: 27B8 8847 EEE0 2501 18F3 EAB9 2ED9 D774 FE70 2DB5 * tag 'pull-request-2025-01-08' of https://gitlab.com/thuth/qemu: tests/functional/test_x86_64_hotplug_cpu: Fix race condition during unplug docs/about/deprecated: Remove paragraph about initial deprecation in 2.10 Remove the deprecated "-runas" command line option hw/s390x: Remove the "adapter_routes_max_batch" property from the flic hw/s390x/s390-virtio-ccw: Remove the deprecated 2.8 machine type hw/s390x: Remove the cpu_model_allowed flag and related code hw/s390x/s390-virtio-ccw: Remove the deprecated 2.7 machine type hw/s390x/css-bridge: Remove the "css_dev_path" property hw/s390x/ipl: Remove the "iplbext_migration" property hw/s390x: Remove the "ri_allowed" switch hw/s390x/s390-virtio-ccw: Remove the deprecated 2.6 machine type hw/s390x/s390-skeys: Remove the "migration-enabled" property hw/s390x/s390-virtio-ccw: Remove the deprecated 2.4 and 2.5 machine types tests/tcg/s390x: Use the SLOF libc headers for the multiarch tests Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-01-07tests/functional/test_x86_64_hotplug_cpu: Fix race condition during unplugThomas Huth1-2/+4
When unplugging the CPU, the test tries to check for a successful unplug by changing to the /sys/devices/system/cpu/cpu1 directory to see whether that fails. However, the "cd" could be faster than the unplug operation in the kernel, so there is a race condition and the test sometimes fails here. Fix it by trying to change the directory in a loop until the the CPU has really been unplugged. While we're at it, also add a "cd .." before unplugging to make the console output a little bit less confusing (since the path is echoed in the shell prompt). Reported-by: Stefan Hajnoczi <stefanha@gmail.com> Message-ID: <20250107115245.52755-1-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2025-01-07docs/about/deprecated: Remove paragraph about initial deprecation in 2.10Thomas Huth1-6/+0
When we introduced the deprecation rule of keeping deprecated features for two more releases, we had to state that we would not remove features by surprise that had already been marked as deprecated before. Nowadays, this paragraph is not needed anymore, so we can remove it now. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20250103145702.597139-1-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2025-01-07Remove the deprecated "-runas" command line optionThomas Huth4-29/+7
It has been marked as deprecated two releases ago, so it should be fine now to remove this command line option. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20250103155411.721759-1-thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>