riscv-gnu-toolchain/qemu.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Peter Xu <peterx@redhat.com>	2025-02-20 08:24:59 -0500
committer	Fabiano Rosas <farosas@suse.de>	2025-03-10 12:09:24 -0300
commit	d657a14de5d597bbfe7b54e4c4f0646f440e98ad (patch)
tree	789d7b6d05e9900611c018fbc4a000ad2e1511fe /docs/sphinx/compat.py
parent	5136598e2667f35ef3dc1d757616a266bd5eb3a2 (diff)
download	qemu-d657a14de5d597bbfe7b54e4c4f0646f440e98ad.zip qemu-d657a14de5d597bbfe7b54e4c4f0646f440e98ad.tar.gz qemu-d657a14de5d597bbfe7b54e4c4f0646f440e98ad.tar.bz2

migration: Fix UAF for incoming migration on MigrationState

On the incoming migration side, QEMU uses a coroutine to load all the VM states. Inside, it may reference MigrationState on global states like migration capabilities, parameters, error state, shared mutexes and more. However there's nothing yet to make sure MigrationState won't get destroyed (e.g. after migration_shutdown()). Meanwhile there's also no API available to remove the incoming coroutine in migration_shutdown(), avoiding it to access the freed elements. There's a bug report showing this can happen and crash dest QEMU when migration is cancelled on source. When it happens, the dest main thread is trying to cleanup everything: #0 qemu_aio_coroutine_enter #1 aio_dispatch_handler #2 aio_poll #3 monitor_cleanup #4 qemu_cleanup #5 qemu_default_main Then it found the migration incoming coroutine, schedule it (even after migration_shutdown()), causing crash: #0 __pthread_kill_implementation #1 __pthread_kill_internal #2 __GI_raise #3 __GI_abort #4 __assert_fail_base #5 __assert_fail #6 qemu_mutex_lock_impl #7 qemu_lockable_mutex_lock #8 qemu_lockable_lock #9 qemu_lockable_auto_lock #10 migrate_set_error #11 process_incoming_migration_co #12 coroutine_trampoline To fix it, take a refcount after an incoming setup is properly done when qmp_migrate_incoming() succeeded the 1st time. As it's during a QMP handler which needs BQL, it means the main loop is still alive (without going into cleanups, which also needs BQL). Releasing the refcount now only until the incoming migration coroutine finished or failed. Hence the refcount is valid for both (1) setup phase of incoming ports, mostly IO watches (e.g. qio_channel_add_watch_full()), and (2) the incoming coroutine itself (process_incoming_migration_co()). Note that we can't unref in migration_incoming_state_destroy(), because both qmp_xen_load_devices_state() and load_snapshot() will use it without an incoming migration. Those hold BQL so they're not prone to this issue. PS: I suspect nobody uses Xen's command at all, as it didn't register yank, hence AFAIU the command should crash on master when trying to unregister yank in migration_incoming_state_destroy().. but that's another story. Also note that in some incoming failure cases we may not always unref the MigrationState refcount, which is a trade-off to keep things simple. We could make it accurate, but it can be an overkill. Some examples: - Unlike most of the rest protocols, socket_start_incoming_migration() may create net listener after incoming port setup sucessfully. It means we can't unref in migration_channel_process_incoming() as a generic path because socket protocol might keep using MigrationState. - For either socket or file, multiple IO watches might be created, it means logically each IO watch needs to take one refcount for MigrationState so as to be 100% accurate on ownership of refcount taken. In general, we at least need per-protocol handling to make it accurate, which can be an overkill if we know incoming failed after all. Add a short comment to explain that when taking the refcount in qmp_migrate_incoming(). Bugzilla: https://issues.redhat.com/browse/RHEL-69775 Tested-by: Yan Fu <yafu@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20250220132459.512610-1-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Diffstat (limited to 'docs/sphinx/compat.py')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: