aboutsummaryrefslogtreecommitdiff
path: root/scripts/qapi/expr.py
diff options
context:
space:
mode:
authorPeter Xu <peterx@redhat.com>2025-06-13 10:12:05 -0400
committerFabiano Rosas <farosas@suse.de>2025-07-11 10:37:37 -0300
commitd2a81ca8c6fbe6ed691889d953d0b5fe2c7e4671 (patch)
treed55c657158fdfa1310f904c69937b4f1301e2521 /scripts/qapi/expr.py
parentf62b7a0a29192891f0139f2ad8009f55a41c8641 (diff)
downloadqemu-d2a81ca8c6fbe6ed691889d953d0b5fe2c7e4671.zip
qemu-d2a81ca8c6fbe6ed691889d953d0b5fe2c7e4671.tar.gz
qemu-d2a81ca8c6fbe6ed691889d953d0b5fe2c7e4671.tar.bz2
migration/postcopy: Push blocktime start/end into page req mutex
The postcopy blocktime feature was tricky that it used quite some atomic operations over quite a few arrays and vars, without explaining how that would be thread safe. The thread safety here is about concurrency between the fault thread and the fault resolution threads, possible to access the same chunk of data. All these atomic ops can be expensive too before knowing clearly how it works. OTOH, postcopy has one page_request_mutex used to serialize the received bitmap updates. So far it's ok - we don't yet have a lot of threads contending the lock. It might change after multifd will be supported, but that's a separate story. What is important is, with that mutex, it's pretty lightweight to move all the blocktime maintenance into the mutex critical section. It's because the blocktime layer is lightweighted: almost "remember which vcpu faulted on which address", and "ok we get some fault resolved, calculate how long it takes". It's also an optional feature for now (but I have thought of changing that, maybe in the future). Let's push the blocktime layer into the mutex, so that it's always thread-safe even without any atomic ops. To achieve that, I'll need to add a tid parameter on fault path so that it'll start to pass the faulted thread ID into deeper the stack, but not too deep. When at it, add a comment for the shared fault handler (for example, vhost-user devices running with postcopy), to mention a TODO. One reason it might not be trivial is that vhost-user's userfaultfds should be opened by vhost-user process, so it's pretty hard to control making sure the TID feature will be around. It wasn't supported before, so keep it like that for now. Now we should be as ease when everything is protected by a mutex that we always take anyway. One side effect: we can finally remove one ramblock_recv_bitmap_test() in mark_postcopy_blocktime_begin(), which was pretty weird and which also includes a weird (but maybe necessary.. but maybe not?) operation to inject a blocktime entry then quickly erase it.. When we're with the mutex, and when we make sure it's invoked after checking the receive bitmap, it's not needed anymore. Instead, we assert. As another side effect, this paves way for removing all atomic ops in all the mem accesses in blocktime layer. Note that we need a stub for mark_postcopy_blocktime_begin() for Windows builds. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
Diffstat (limited to 'scripts/qapi/expr.py')
0 files changed, 0 insertions, 0 deletions