aboutsummaryrefslogtreecommitdiff
path: root/util/async.c
AgeCommit message (Collapse)AuthorFilesLines
2024-01-19util/async: Only call icount_notify_exit() if icount is enabledPhilippe Mathieu-Daudé1-7/+9
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20231208113529.74067-6-philmd@linaro.org>
2024-01-08system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()Stefan Hajnoczi1-1/+1
The Big QEMU Lock (BQL) has many names and they are confusing. The actual QemuMutex variable is called qemu_global_mutex but it's commonly referred to as the BQL in discussions and some code comments. The locking APIs, however, are called qemu_mutex_lock_iothread() and qemu_mutex_unlock_iothread(). The "iothread" name is historic and comes from when the main thread was split into into KVM vcpu threads and the "iothread" (now called the main loop thread). I have contributed to the confusion myself by introducing a separate --object iothread, a separate concept unrelated to the BQL. The "iothread" name is no longer appropriate for the BQL. Rename the locking APIs to: - void bql_lock(void) - void bql_unlock(void) - bool bql_locked(void) There are more APIs with "iothread" in their names. Subsequent patches will rename them. There are also comments and documentation that will be updated in later patches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Fabiano Rosas <farosas@suse.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20240102153529.486531-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-12-21aio: remove aio_context_acquire()/aio_context_release() APIStefan Hajnoczi1-10/+0
Delete these functions because nothing calls these functions anymore. I introduced these APIs in commit 98563fc3ec44 ("aio: add aio_context_acquire() and aio_context_release()") in 2014. It's with a sigh of relief that I delete these APIs almost 10 years later. Thanks to Paolo Bonzini's vision for multi-queue QEMU, we got an understanding of where the code needed to go in order to remove the limitations that the original dataplane and the IOThread/AioContext approach that followed it. Emanuele Giuseppe Esposito had the splendid determination to convert large parts of the codebase so that they no longer needed the AioContext lock. This was a painstaking process, both in the actual code changes required and the iterations of code review that Emanuele eked out of Kevin and me over many months. Kevin Wolf tackled multitudes of graph locking conversions to protect in-flight I/O from run-time changes to the block graph as well as the clang Thread Safety Analysis annotations that allow the compiler to check whether the graph lock is being used correctly. And me, well, I'm just here to add some pizzazz to the QEMU multi-queue block layer :). Thank you to everyone who helped with this effort, including Eric Blake, code reviewer extraordinaire, and others who I've forgotten to mention. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20231205182011.1976568-11-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21block: remove AioContext lockingStefan Hajnoczi1-4/+0
This is the big patch that removes aio_context_acquire()/aio_context_release() from the block layer and affected block layer users. There isn't a clean way to split this patch and the reviewers are likely the same group of people, so I decided to do it in one patch. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Message-ID: <20231205182011.1976568-7-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-12-21aio: make aio_context_acquire()/aio_context_release() a no-opStefan Hajnoczi1-2/+2
aio_context_acquire()/aio_context_release() has been replaced by fine-grained locking to protect state shared by multiple threads. The AioContext lock still plays the role of balancing locking in AIO_WAIT_WHILE() and many functions in QEMU either require that the AioContext lock is held or not held for this reason. In other words, the AioContext lock is purely there for consistency with itself and serves no real purpose anymore. Stop actually acquiring/releasing the lock in aio_context_acquire()/aio_context_release() so that subsequent patches can remove callers across the codebase incrementally. I have performed "make check" and qemu-iotests stress tests across x86-64, ppc64le, and aarch64 to confirm that there are no failures as a result of eliminating the lock. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20231205182011.1976568-5-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30aio: remove aio_disable_external() APIStefan Hajnoczi1-2/+1
All callers now pass is_external=false to aio_set_fd_handler() and aio_set_event_notifier(). The aio_disable_external() API that temporarily disables fd handlers that were registered is_external=true is therefore dead code. Remove aio_disable_external(), aio_enable_external(), and the is_external arguments to aio_set_fd_handler() and aio_set_event_notifier(). The entire test-fdmon-epoll test is removed because its sole purpose was testing aio_disable_external(). Parts of this patch were generated using the following coccinelle (https://coccinelle.lip6.fr/) semantic patch: @@ expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque; @@ - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque) + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque) @@ expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready; @@ - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready) + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready) Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-21-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-02async: avoid use-after-free on re-entrancy guardAlexander Bulekov1-6/+8
A BH callback can free the BH, causing a use-after-free in aio_bh_call. Fix that by keeping a local copy of the re-entrancy guard pointer. Buglink: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=58513 Fixes: 9c86c97f12 ("async: Add an optional reentrancy guard to the BH API") Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Message-Id: <20230501141956.3444868-1-alxndr@bu.edu> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-04-29Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingRichard Henderson1-0/+14
* Fix compilation issues under Debian 10 * Update kernel headers to 6.3rc5 * Suppress GCC13 false positive in aio_bh_poll() * Add new x86 feature bits * Coverity fixes * More steps towards removing qatomic_mb_set/read * Fix reduced-phys-bits value for AMD SEV # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmRNC0IUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroNo0wgArWNGKZpbmQ0e5L6ajMvaaPmg4mVL # a2SJGU0TwTp0fUgZr14z2iwzIpSqQrsqhzTIAzOTs0OICDBPBuNvnRucMa+SVQGO # Tc89YAwBVDo66dAKhWi+WR9tx7sTFCso0nbsBfczzdnwAw3g1MJ87Ueqc5tlPGBK # E7YSAD6l4UuogoN5BLU7bSsG/X7bwcyzeUXRB4ik+Z9abWd4DH9qiROnBKLMmBLK # nAi47h8b8MltWORpO+wf6HtkMKi37SAzl9VLHVuHcRhIdY/JhWCRhYSo0HXhgX66 # JLVkyxFpIndT0dUW/xnqATGez92FRZyTxHbxbAcWM0SoC1jOVfUXB+7Gdw== # =vxou # -----END PGP SIGNATURE----- # gpg: Signature made Sat 29 Apr 2023 01:19:14 PM BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [undefined] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: cpus-common: stop using mb_set/mb_read async: Suppress GCC13 false positive in aio_bh_poll() tests: vhost-user-test: release mutex on protocol violation Update linux headers to v6.3rc5 update-linux-headers.sh: Add missing kernel headers. Fix libvhost-user.c compilation. target/i386: Add support for PREFETCHIT0/1 in CPUID enumeration target/i386: Add support for AVX-NE-CONVERT in CPUID enumeration target/i386: Add support for AVX-VNNI-INT8 in CPUID enumeration target/i386: Add support for AVX-IFMA in CPUID enumeration target/i386: Add support for AMX-FP16 in CPUID enumeration target/i386: Add support for CMPCCXADD in CPUID enumeration i386/cpu: Update how the EBX register of CPUID 0x8000001F is set i386/sev: Update checks and information related to reduced-phys-bits qemu-options.hx: Update the reduced-phys-bits documentation qapi, i386/sev: Change the reduced-phys-bits value from 5 to 1 Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-04-29async: Suppress GCC13 false positive in aio_bh_poll()Cédric Le Goater1-0/+14
GCC13 reports an error : ../util/async.c: In function ‘aio_bh_poll’: include/qemu/queue.h:303:22: error: storing the address of local variable ‘slice’ in ‘*ctx.bh_slice_list.sqh_last’ [-Werror=dangling-pointer=] 303 | (head)->sqh_last = &(elm)->field.sqe_next; \ | ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~ ../util/async.c:169:5: note: in expansion of macro ‘QSIMPLEQ_INSERT_TAIL’ 169 | QSIMPLEQ_INSERT_TAIL(&ctx->bh_slice_list, &slice, next); | ^~~~~~~~~~~~~~~~~~~~ ../util/async.c:161:17: note: ‘slice’ declared here 161 | BHListSlice slice; | ^~~~~ ../util/async.c:161:17: note: ‘ctx’ declared here But the local variable 'slice' is removed from the global context list in following loop of the same routine. Add a pragma to silent GCC. Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com> Message-Id: <20230420202939.1982044-1-clg@kaod.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-04-28async: Add an optional reentrancy guard to the BH APIAlexander Bulekov1-1/+17
Devices can pass their MemoryReentrancyGuard (from their DeviceState), when creating new BHes. Then, the async API will toggle the guard before/after calling the BH call-back. This prevents bh->mmio reentrancy issues. Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Message-Id: <20230427211013.2994127-3-alxndr@bu.edu> [thuth: Fix "line over 90 characters" checkpatch.pl error] Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-03-07async: clarify usage of barriers in the polling casePaolo Bonzini1-2/+8
Explain that aio_context_notifier_poll() relies on aio_notify_accept() to catch all the memory writes that were done before ctx->notified was set to true. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-03-07async: update documentation of the memory barriersPaolo Bonzini1-14/+19
Ever since commit 8c6b0356b539 ("util/async: make bh_aio_poll() O(1)", 2020-02-22), synchronization between qemu_bh_schedule() and aio_bh_poll() is happening when the bottom half is enqueued in the bh_list; not when the flags are set. Update the documentation to match. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-01-20coroutine: Use Coroutine typedef name instead of structure tagMarkus Armbruster1-2/+2
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221221131435.3851212-6-armbru@redhat.com>
2022-12-15async: Register/unregister aiocontext in graph lock listEmanuele Giuseppe Esposito1-0/+4
Add/remove the AioContext in aio_context_list in graph-lock.c when it is created/destroyed. This allows using the graph locking operations from this AioContext. In order to allow linking util/async.c with binaries that don't include the block layer, introduce stubs for (un)register_aiocontext(). Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-5-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-06replay: notify vCPU when BH is scheduledPavel Dovgalyuk1-0/+8
vCPU execution should be suspended when new BH is scheduled. This is needed to avoid guest timeouts caused by the long cycles of the execution. In replay mode execution may hang when vCPU sleeps and block event comes to the queue. This patch adds notification which wakes up vCPU or interrupts execution of guest code. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> -- v2: changed first_cpu to current_cpu (suggested by Richard Henderson) v4: moved vCPU notification to aio_bh_enqueue (suggested by Paolo Bonzini) Message-Id: <165364837317.688121.17680519919871405281.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-05-09util/event-loop-base: Introduce options to set the thread pool sizeNicolas Saenz Julienne1-0/+20
The thread pool regulates itself: when idle, it kills threads until empty, when in demand, it creates new threads until full. This behaviour doesn't play well with latency sensitive workloads where the price of creating a new thread is too high. For example, when paired with qemu's '-mlock', or using safety features like SafeStack, creating a new thread has been measured take multiple milliseconds. In order to mitigate this let's introduce a new 'EventLoopBase' property to set the thread pool size. The threads will be created during the pool's initialization or upon updating the property's value, remain available during its lifetime regardless of demand, and destroyed upon freeing it. A properly characterized workload will then be able to configure the pool to avoid any latency spikes. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20220425075723.20019-4-nsaenzju@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-03-04util/async: replace __thread with QEMU TLS macrosStefan Hajnoczi1-5/+7
QEMU TLS macros must be used to make TLS variables safe with coroutines. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220222140150.27240-3-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-01-12aio-posix: split poll check from ready handlerStefan Hajnoczi1-2/+8
Adaptive polling measures the execution time of the polling check plus handlers called when a polled event becomes ready. Handlers can take a significant amount of time, making it look like polling was running for a long time when in fact the event handler was running for a long time. For example, on Linux the io_submit(2) syscall invoked when a virtio-blk device's virtqueue becomes ready can take 10s of microseconds. This can exceed the default polling interval (32 microseconds) and cause adaptive polling to stop polling. By excluding the handler's execution time from the polling check we make the adaptive polling calculation more accurate. As a result, the event loop now stays in polling mode where previously it would have fallen back to file descriptor monitoring. The following data was collected with virtio-blk num-queues=2 event_idx=off using an IOThread. Before: 168k IOPS, IOThread syscalls: 9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16 9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8 9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8 9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3 9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32 174k IOPS (+3.6%), IOThread syscalls: 9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32 9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8 9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8 9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32 Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because the IOThread stays in polling mode instead of falling back to file descriptor monitoring. As usual, polling is not implemented on Windows so this patch ignores the new io_poll_read() callback in aio-win32.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-2-stefanha@redhat.com [Fixed up aio_set_event_notifier() calls in tests/unit/test-fdmon-epoll.c added after this series was queued. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-21iothread: add aio-max-batch parameterStefano Garzarella1-0/+2
The `aio-max-batch` parameter will be propagated to AIO engines and it will be used to control the maximum number of queued requests. When there are in queue a number of requests equal to `aio-max-batch`, the engine invokes the system call to forward the requests to the kernel. This parameter allows us to control the maximum batch size to reduce the latency that requests might accumulate while queued in the AIO engine queue. If `aio-max-batch` is equal to 0 (default value), the AIO engine will use its default maximum batch size value. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20210721094211.69853-3-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-05util/async: print leaked BH name when AioContext finalizesStefan Hajnoczi1-2/+14
BHs must be deleted before the AioContext is finalized. If not, it's a bug and probably indicates that some part of the program still expects the BH to run in the future. That can lead to memory leaks, inconsistent state, or just hangs. Unfortunately the assert(flags & BH_DELETED) call in aio_ctx_finalize() is difficult to debug because the assertion failure contains no information about the BH! Use the QEMUBH name field added in the previous patch to show a useful error when a leaked BH is detected. Suggested-by: Eric Ernst <eric.g.ernst@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210414200247.917496-3-stefanha@redhat.com>
2021-07-05util/async: add a human-readable name to BHs for debuggingStefan Hajnoczi1-2/+7
It can be difficult to debug issues with BHs in production environments. Although BHs can usually be identified by looking up their ->cb() function pointer, this requires debug information for the program. It is also not possible to print human-readable diagnostics about BHs because they have no identifier. This patch adds a name to each BH. The name is not unique per instance but differentiates between cb() functions, which is usually enough. It's done by changing aio_bh_new() and friends to macros that stringify cb. The next patch will use the name field when reporting leaked BHs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210414200247.917496-2-stefanha@redhat.com>
2021-06-18async: the main AioContext is only "current" if under the BQLPaolo Bonzini1-0/+20
If we want to wake up a coroutine from a worker thread, aio_co_wake() currently does not work. In that scenario, aio_co_wake() calls aio_co_enter(), but there is no current AioContext and therefore qemu_get_current_aio_context() returns the main thread. aio_co_wake() then attempts to call aio_context_acquire() instead of going through aio_co_schedule(). The default case of qemu_get_current_aio_context() was added to cover synchronous I/O started from the vCPU thread, but the main and vCPU threads are quite different. The main thread is an I/O thread itself, only running a more complicated event loop; the vCPU thread instead is essentially a worker thread that occasionally calls qemu_mutex_lock_iothread(). It is only in those critical sections that it acts as if it were the home thread of the main AioContext. Therefore, this patch detaches qemu_get_current_aio_context() from iothreads, which is a useless complication. The AioContext pointer is stored directly in the thread-local variable, including for the main loop. Worker threads (including vCPU threads) optionally behave as temporary home threads if they have taken the big QEMU lock, but if that is not the case they will always schedule coroutines on remote threads via aio_co_schedule(). With this change, the stub qemu_mutex_iothread_locked() must be changed from true to false. The previous value of true was needed because the main thread did not have an AioContext in the thread-local variable, but now it does have one. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210609122234.544153-1-pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak commit message per Vladimir's review] Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09util/async: Add aio_co_reschedule_self()Kevin Wolf1-0/+30
Add a function that can be used to move the currently running coroutine to a different AioContext (and therefore potentially a different thread). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201005155855.256490-12-kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-09-23qemu/atomic.h: rename atomic_ to qatomic_Stefan Hajnoczi1-14/+14
clang's C11 atomic_fetch_*() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
2020-08-13async: always set ctx->notified in aio_notify()Stefan Hajnoczi1-11/+21
aio_notify() does not set ctx->notified when called with ctx->aio_notify_me disabled. Therefore aio_notify_me needs to be enabled during polling. This is suboptimal since expensive event_notifier_set(&ctx->notifier) and event_notifier_test_and_clear(&ctx->notifier) calls are required when ctx->aio_notify_me is enabled. Change aio_notify() so that aio->notified is always set, regardless of ctx->aio_notify_me. This will make polling cheaper since ctx->aio_notify_me can remain disabled. Move the event_notifier_test_and_clear() to the fd handler function (which is now no longer an empty function so "dummy" has been dropped from its name). The next patch takes advantage of this by optimizing polling in util/aio-posix.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200806131802.569478-3-stefanha@redhat.com [Paolo Bonzini pointed out that the smp_wmb() in aio_notify_accept() should be smp_wb() but the comment should be smp_wmb() instead of smp_wb(). Fixed. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-08-13async: rename event_notifier_dummy_cb/poll()Stefan Hajnoczi1-4/+4
The event_notifier_*() prefix can be confused with the EventNotifier APIs that are also called event_notifier_*(). Rename the functions to aio_context_notifier_*() to make it clear that they relate to the AioContext::notifier field. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20200806131802.569478-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-05-18aio-posix: disable fdmon-io_uring when GSource is usedStefan Hajnoczi1-0/+1
The glib event loop does not call fdmon_io_uring_wait() so fd handlers waiting to be submitted build up in the list. There is no benefit is using io_uring when the glib GSource is being used, so disable it instead of implementing a more complex fix. This fixes a memory leak where AioHandlers would build up and increasing amounts of CPU time were spent iterating them in aio_pending(). The symptom is that guests become slow when QEMU is built with io_uring support. Buglink: https://bugs.launchpad.net/qemu/+bug/1877716 Fixes: 73fd282e7b6dd4e4ea1c3bbb3d302c8db51e4ccf ("aio-posix: add io_uring fd monitoring implementation") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Message-id: 20200511183630.279750-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-04-09async: use explicit memory barriersPaolo Bonzini1-4/+12
When using C11 atomics, non-seqcst reads and writes do not participate in the total order of seqcst operations. In util/async.c and util/aio-posix.c, in particular, the pattern that we use write ctx->notify_me write bh->scheduled read bh->scheduled read ctx->notify_me if !bh->scheduled, sleep if ctx->notify_me, notify needs to use seqcst operations for both the write and the read. In general this is something that we do not want, because there can be many sources that are polled in addition to bottom halves. The alternative is to place a seqcst memory barrier between the write and the read. This also comes with a disadvantage, in that the memory barrier is implicit on strongly-ordered architectures and it wastes a few dozen clock cycles. Fortunately, ctx->notify_me is never written concurrently by two threads, so we can assert that and relax the writes to ctx->notify_me. The resulting solution works and performs well on both aarch64 and x86. Note that the atomic_set/atomic_read combination is not an atomic read-modify-write, and therefore it is even weaker than C11 ATOMIC_RELAXED; on x86, ATOMIC_RELAXED compiles to a locked operation. Analyzed-by: Ying Fang <fangying1@huawei.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Ying Fang <fangying1@huawei.com> Message-Id: <20200407140746.8041-6-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-22util/async: make bh_aio_poll() O(1)Stefan Hajnoczi1-99/+138
The ctx->first_bh list contains all created BHs, including those that are not scheduled. The list is iterated by the event loop and therefore has O(n) time complexity with respected to the number of created BHs. Rewrite BHs so that only scheduled or deleted BHs are enqueued. Only BHs that actually require action will be iterated. One semantic change is required: qemu_bh_delete() enqueues the BH and therefore invokes aio_notify(). The tests/test-aio.c:test_source_bh_delete_from_cb() test case assumed that g_main_context_iteration(NULL, false) returns false after qemu_bh_delete() but it now returns true for one iteration. Fix up the test case. This patch makes aio_compute_timeout() and aio_bh_poll() drop from a CPU profile reported by perf-top(1). Previously they combined to 9% CPU utilization when AioContext polling is commented out and the guest has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 devices. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20200221093951.1414693-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30util/async: add aio interfaces for io_uringAarushi Mehta1-0/+36
Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-7-stefanha@redhat.com Message-Id: <20200120141858.587874-7-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-10-24util/async: avoid useless castFrediano Ziglio1-1/+0
event_notifier_dummy_cb is already compatible with EventNotifierHandler. Signed-off-by: Frediano Ziglio <fziglio@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20191023122652.2999-1-fziglio@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-10-04win32: work around main-loop busy loop on socket/fd eventMarc-André Lureau1-1/+5
Commit 05e514b1d4d5bd4209e2c8bbc76ff05c85a235f3 introduced an AIO context optimization to avoid calling event_notifier_test_and_clear() on ctx->notifier. On Windows, the same notifier is being used to wakeup the wait on socket events (see commit d3385eb448e38f828c78f8f68ec5d79c66a58b5d). The ctx->notifier event is added to the gpoll sources in aio_set_event_notifier(), aio_ctx_check() should clear the event regardless of ctx->notified, since Windows sets the event by itself, bypassing the aio->notified. This fixes qemu not clearing the event resulting in a busy loop. Paolo suggested to me on irc to call event_notifier_test_and_clear() after select() >0 from aio-win32.c's aio_prepare. Unfortunately, not all fds associated with ctx->notifiers are in AIO fd handlers set. (qemu_set_nonblock() in util/oslib-win32.c calls qemu_fd_register()). This is essentially a v2 of a patch that was sent earlier: https://lists.gnu.org/archive/html/qemu-devel/2017-01/msg00420.html that resurfaced when James investigated Spice performance issues on Windows: https://gitlab.freedesktop.org/spice/spice/issues/36 In order to test that patch, I simply tried running test-char on win32, and it hangs. Applying that patch solves it. QIO idle sources are not dispatched. I haven't investigated much further, I suspect source priorities and busy looping still come into play. This version keeps the "notified" field, so event_notifier_poll() should still work as expected. Cc: James Le Cuirot <chewi@gentoo.org> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-22util/async: hold AioContext ref to prevent use-after-freeStefan Hajnoczi1-0/+8
The tests/test-bdrv-drain /bdrv-drain/iothread/drain test case does the following: 1. The preadv coroutine calls aio_bh_schedule_oneshot() and then yields. 2. The one-shot BH executes in another AioContext. All it does is call aio_co_wakeup(preadv_co). 3. The preadv coroutine is re-entered and returns. There is a race condition in aio_co_wake() where the preadv coroutine returns and the test case destroys the preadv IOThread. aio_co_wake() can still be running in the other AioContext and it performs an access to the freed IOThread AioContext. Here is the race in aio_co_schedule(): QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, co, co_scheduled_next); <-- race: co may execute before we invoke qemu_bh_schedule()! qemu_bh_schedule(ctx->co_schedule_bh); So if co causes ctx to be freed then we're in trouble. Fix this problem by holding a reference to ctx. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20190723190623.21537-1-stefanha@redhat.com Message-Id: <20190723190623.21537-1-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-06-12Include qemu-common.h exactly where neededMarkus Armbruster1-1/+0
No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
2018-09-25util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cbSergio Lopez1-1/+1
AIO Coroutines shouldn't by managed by an AioContext different than the one assigned when they are created. aio_co_enter avoids entering a coroutine from a different AioContext, calling aio_co_schedule instead. Scheduled coroutines are then entered by co_schedule_bh_cb using qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the current AioContext obtained with qemu_get_current_aio_context. Eventually, co->ctx will be set to the AioContext passed as an argument to qemu_aio_coroutine_enter. This means that, if an IO Thread's AioConext is being processed by the Main Thread (due to aio_poll being called with a BDS AioContext, as it happens in AIO_WAIT_WHILE among other places), the AioContext from some coroutines may be wrongly replaced with the one from the Main Thread. This is the root cause behind some crashes, mainly triggered by the drain code at block/io.c. The most common are these abort and failed assertion: util/async.c:aio_co_schedule 456 if (scheduled) { 457 fprintf(stderr, 458 "%s: Co-routine was already scheduled in '%s'\n", 459 __func__, scheduled); 460 abort(); 461 } util/qemu-coroutine-lock.c: 286 assert(mutex->holder == self); But it's also known to cause random errors at different locations, and even SIGSEGV with broken coroutine backtraces. By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can pass the correct AioContext as an argument, making sure co->ctx is not wrongly altered. Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-27linux-aio: properly bubble up errors from initializationNishanth Aravamudan1-3/+11
laio_init() can fail for a couple of reasons, which will lead to a NULL pointer dereference in laio_attach_aio_context(). To solve this, add a aio_setup_linux_aio() function which is called early in raw_open_common. If this fails, propagate the error up. The signature of aio_get_linux_aio() was not modified, because it seems preferable to return the actual errno from the possible failing initialization calls. Additionally, when the AioContext changes, we need to associate a LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context callback and call the new aio_setup_linux_aio(), which will allocate a new AioContext if needed, and return errors on failures. If it fails for any reason, fallback to threaded AIO with an error message, as the device is already in-use by the guest. Add an assert that aio_get_linux_aio() cannot return NULL. Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com> Message-id: 20180622193700.6523-1-naravamudan@digitalocean.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-05-18iothread: fix epollfd leak in the process of delIOThreadJie Wang1-0/+1
When we call addIOThread, the epollfd created in aio_context_setup, but not close it in the process of delIOThread, so the epollfd will leak. Reorder the code in aio_epoll_disable and reuse it. Signed-off-by: Jie Wang <wangjie88@huawei.com> Message-Id: <1526517763-11108-1-git-send-email-wangjie88@huawei.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> [Mention change to aio_epoll_disable in commit message. - Fam] Signed-off-by: Fam Zheng <famz@redhat.com>
2017-11-21coroutine: abort if we try to schedule or enter a pending coroutineJeff Cody1-0/+13
The previous patch fixed a race condition, in which there were coroutines being executing doubly, or after coroutine deletion. We can detect common scenarios when this happens, and print an error message and abort before we corrupt memory / data, or segfault. This patch will abort if an attempt to enter a coroutine is made while it is currently pending execution, either in a specific AioContext bh, or pending execution via a timer. It will also abort if a coroutine is scheduled, before a prior scheduled run has occurred. We cannot rely on the existing co->caller check for recursive re-entry to catch this, as the coroutine may run and exit with COROUTINE_TERMINATE before the scheduled coroutine executes. (This is the scenario that was occurring and fixed in the previous patch). This patch also re-orders the Coroutine struct elements in an attempt to optimize caching. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-11-08util/async: use atomic_mb_set in qemu_bh_cancelSergio Lopez1-1/+1
Commit b7a745d added a qemu_bh_cancel call to the completion function as an optimization to prevent it from unnecessarily rescheduling itself. This completion function is scheduled from worker_thread, after setting the state of a ThreadPoolElement to THREAD_DONE. This was considered to be safe, as the completion function restarts the loop just after the call to qemu_bh_cancel. But, as this loop lacks a HW memory barrier, the read of req->state may actually happen _before_ the call, seeing it still as THREAD_QUEUED, and ending the completion function without having processed a pending TPE linked at pool->head: worker thread | I/O thread ------------------------------------------------------------------------ | speculatively read req->state req->state = THREAD_DONE; | qemu_bh_schedule(p->completion_bh) | bh->scheduled = 1; | | qemu_bh_cancel(p->completion_bh) | bh->scheduled = 0; | if (req->state == THREAD_DONE) | // sees THREAD_QUEUED The source of the misunderstanding was that qemu_bh_cancel is now being used by the _consumer_ rather than the producer, and therefore now needs to have acquire semantics just like e.g. aio_bh_poll. In some situations, if there are no other independent requests in the same aio context that could eventually trigger the scheduling of the completion function, the omitted TPE and all operations pending on it will get stuck forever. [Added Sergio's updated wording about the HW memory barrier. --Stefan] Signed-off-by: Sergio Lopez <slp@redhat.com> Message-id: 20171108063447.2842-1-slp@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-04-11async: Introduce aio_co_enterFam Zheng1-1/+6
They start the coroutine on the specified context. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-03-14cpus: define QEMUTimerListNotifyCB for QEMU system emulationPaolo Bonzini1-1/+1
There is no change for now, because the callback just invokes qemu_notify_event. Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-02-21async: remove unnecessary inc/dec pairsPaolo Bonzini1-6/+6
Pull the increment/decrement pair out of aio_bh_poll and into the callers. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-18-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21aio-posix: partially inline aio_dispatch into aio_pollPaolo Bonzini1-1/+1
This patch prepares for the removal of unnecessary lockcnt inc/dec pairs. Extract the dispatching loop for file descriptor handlers into a new function aio_dispatch_handlers, and then inline aio_dispatch into aio_poll. aio_dispatch can now become void. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-17-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21block: explicitly acquire aiocontext in bottom halves that need itPaolo Bonzini1-2/+2
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-15-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21aio: push aio_context_acquire/release down to dispatchingPaolo Bonzini1-0/+2
The AioContext data structures are now protected by list_lock and/or they are walked with FOREACH_RCU primitives. There is no need anymore to acquire the AioContext for the entire duration of aio_dispatch. Instead, just acquire it before and after invoking the callbacks. The next step is then to push it further down. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-12-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21aio: introduce aio_co_schedule and aio_co_wakePaolo Bonzini1-0/+65
aio_co_wake provides the infrastructure to start a coroutine on a "home" AioContext. It will be used by CoMutex and CoQueue, so that coroutines don't jump from one context to another when they go to sleep on a mutex or waitqueue. However, it can also be used as a more efficient alternative to one-shot bottom halves, and saves the effort of tracking which AioContext a coroutine is running on. aio_co_schedule is the part of aio_co_wake that starts a coroutine on a remove AioContext, but it is also useful to implement e.g. bdrv_set_aio_context callbacks. The implementation of aio_co_schedule is based on a lock-free multiple-producer, single-consumer queue. The multiple producers use cmpxchg to add to a LIFO stack. The consumer (a per-AioContext bottom half) grabs all items added so far, inverts the list to make it FIFO, and goes through it one item at a time until it's empty. The data structure was inspired by OSv, which uses it in the very code we'll "port" to QEMU for the thread-safe CoMutex. Most of the new code is really tests. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21block: move AioContext, QEMUTimer, main-loop to libqemuutilPaolo Bonzini1-0/+423
AioContext is fairly self contained, the only dependency is QEMUTimer but that in turn doesn't need anything else. So move them out of block-obj-y to avoid introducing a dependency from io/ to block-obj-y. main-loop and its dependency iohandler also need to be moved, because later in this series io/ will call iohandler_get_aio_context. [Changed copyright "the QEMU team" to "other QEMU contributors" as suggested by Daniel Berrange and agreed by Paolo. --Stefan] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>