aboutsummaryrefslogtreecommitdiff
path: root/include/block/aio.h
AgeCommit message (Collapse)AuthorFilesLines
2023-01-20coroutine: Use Coroutine typedef name instead of structure tagMarkus Armbruster1-4/+3
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221221131435.3851212-6-armbru@redhat.com>
2023-01-20coroutine: Split qemu/coroutine-core.h off qemu/coroutine.hMarkus Armbruster1-1/+1
qemu/coroutine.h and qemu/lockable.h include each other. They need each other only in macro expansions, so we could simply drop both inclusions to break the loop, and add suitable includes to files that expand the macros. Instead, move a part of qemu/coroutine.h to new qemu/coroutine-core.h so that qemu/coroutine-core.h doesn't need qemu/lockable.h, and qemu/lockable.h only needs qemu/coroutine-core.h. Result: qemu/coroutine.h includes qemu/lockable.h includes qemu/coroutine-core.h. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221221131435.3851212-5-armbru@redhat.com> [Semantic rebase conflict with 7c10cb38cc "accel/tcg: Add debuginfo support" resolved]
2022-12-15graph-lock: Introduce a lock to protect block graph operationsPaolo Bonzini1-0/+9
Block layer graph operations are always run under BQL in the main loop. This is proved by the assertion qemu_in_main_thread() and its wrapper macro GLOBAL_STATE_CODE. However, there are also concurrent coroutines running in other iothreads that always try to traverse the graph. Currently this is protected (among various other things) by the AioContext lock, but once this is removed, we need to make sure that reads do not happen while modifying the graph. We distinguish between writer (main loop, under BQL) that modifies the graph, and readers (all other coroutines running in various AioContext), that go through the graph edges, reading ->parents and->children. The writer (main loop) has "exclusive" access, so it first waits for any current read to finish, and then prevents incoming ones from entering while it has the exclusive access. The readers (coroutines in multiple AioContext) are free to access the graph as long the writer is not modifying the graph. In case it is, they go in a CoQueue and sleep until the writer is done. If a coroutine changes AioContext, the counter in the original and new AioContext are left intact, since the writer does not care where the reader is, but only if there is one. As a result, some AioContexts might have a negative reader count, to balance the positive count of the AioContext that took the lock. This also means that when an AioContext is deleted it may have a nonzero reader count. In that case we transfer the count to a global shared counter so that the writer is always aware of all readers. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20221207131838.239125-3-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-09util/event-loop-base: Introduce options to set the thread pool sizeNicolas Saenz Julienne1-0/+10
The thread pool regulates itself: when idle, it kills threads until empty, when in demand, it creates new threads until full. This behaviour doesn't play well with latency sensitive workloads where the price of creating a new thread is too high. For example, when paired with qemu's '-mlock', or using safety features like SafeStack, creating a new thread has been measured take multiple milliseconds. In order to mitigate this let's introduce a new 'EventLoopBase' property to set the thread pool size. The threads will be created during the pool's initialization or upon updating the property's value, remain available during its lifetime regardless of demand, and destroyed upon freeing it. A properly characterized workload will then be able to configure the pool to avoid any latency spikes. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20220425075723.20019-4-nsaenzju@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-01-12aio-posix: split poll check from ready handlerStefan Hajnoczi1-1/+3
Adaptive polling measures the execution time of the polling check plus handlers called when a polled event becomes ready. Handlers can take a significant amount of time, making it look like polling was running for a long time when in fact the event handler was running for a long time. For example, on Linux the io_submit(2) syscall invoked when a virtio-blk device's virtqueue becomes ready can take 10s of microseconds. This can exceed the default polling interval (32 microseconds) and cause adaptive polling to stop polling. By excluding the handler's execution time from the polling check we make the adaptive polling calculation more accurate. As a result, the event loop now stays in polling mode where previously it would have fallen back to file descriptor monitoring. The following data was collected with virtio-blk num-queues=2 event_idx=off using an IOThread. Before: 168k IOPS, IOThread syscalls: 9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16 9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8 9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8 9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3 9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32 174k IOPS (+3.6%), IOThread syscalls: 9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32 9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8 9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8 9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32 Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because the IOThread stays in polling mode instead of falling back to file descriptor monitoring. As usual, polling is not implemented on Windows so this patch ignores the new io_poll_read() callback in aio-win32.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-2-stefanha@redhat.com [Fixed up aio_set_event_notifier() calls in tests/unit/test-fdmon-epoll.c added after this series was queued. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-21iothread: add aio-max-batch parameterStefano Garzarella1-0/+12
The `aio-max-batch` parameter will be propagated to AIO engines and it will be used to control the maximum number of queued requests. When there are in queue a number of requests equal to `aio-max-batch`, the engine invokes the system call to forward the requests to the kernel. This parameter allows us to control the maximum batch size to reduce the latency that requests might accumulate while queued in the AIO engine queue. If `aio-max-batch` is equal to 0 (default value), the AIO engine will use its default maximum batch size value. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20210721094211.69853-3-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-05util/async: add a human-readable name to BHs for debuggingStefan Hajnoczi1-3/+28
It can be difficult to debug issues with BHs in production environments. Although BHs can usually be identified by looking up their ->cb() function pointer, this requires debug information for the program. It is also not possible to print human-readable diagnostics about BHs because they have no identifier. This patch adds a name to each BH. The name is not unique per instance but differentiates between cb() functions, which is usually enough. It's done by changing aio_bh_new() and friends to macros that stringify cb. The next patch will use the name field when reporting leaked BHs. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20210414200247.917496-2-stefanha@redhat.com>
2021-06-18async: the main AioContext is only "current" if under the BQLPaolo Bonzini1-1/+4
If we want to wake up a coroutine from a worker thread, aio_co_wake() currently does not work. In that scenario, aio_co_wake() calls aio_co_enter(), but there is no current AioContext and therefore qemu_get_current_aio_context() returns the main thread. aio_co_wake() then attempts to call aio_context_acquire() instead of going through aio_co_schedule(). The default case of qemu_get_current_aio_context() was added to cover synchronous I/O started from the vCPU thread, but the main and vCPU threads are quite different. The main thread is an I/O thread itself, only running a more complicated event loop; the vCPU thread instead is essentially a worker thread that occasionally calls qemu_mutex_lock_iothread(). It is only in those critical sections that it acts as if it were the home thread of the main AioContext. Therefore, this patch detaches qemu_get_current_aio_context() from iothreads, which is a useless complication. The AioContext pointer is stored directly in the thread-local variable, including for the main loop. Worker threads (including vCPU threads) optionally behave as temporary home threads if they have taken the big QEMU lock, but if that is not the case they will always schedule coroutines on remote threads via aio_co_schedule(). With this change, the stub qemu_mutex_iothread_locked() must be changed from true to false. The previous value of true was needed because the main thread did not have an AioContext in the thread-local variable, but now it does have one. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210609122234.544153-1-pbonzini@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: tweak commit message per Vladimir's review] Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09util/async: Add aio_co_reschedule_self()Kevin Wolf1-0/+10
Add a function that can be used to move the currently running coroutine to a different AioContext (and therefore potentially a different thread). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201005155855.256490-12-kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-09-23qemu/atomic.h: rename atomic_ to qatomic_Stefan Hajnoczi1-4/+4
clang's C11 atomic_fetch_*() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
2020-05-18aio-posix: disable fdmon-io_uring when GSource is usedStefan Hajnoczi1-0/+3
The glib event loop does not call fdmon_io_uring_wait() so fd handlers waiting to be submitted build up in the list. There is no benefit is using io_uring when the glib GSource is being used, so disable it instead of implementing a more complex fix. This fixes a memory leak where AioHandlers would build up and increasing amounts of CPU time were spent iterating them in aio_pending(). The symptom is that guests become slow when QEMU is built with io_uring support. Buglink: https://bugs.launchpad.net/qemu/+bug/1877716 Fixes: 73fd282e7b6dd4e4ea1c3bbb3d302c8db51e4ccf ("aio-posix: add io_uring fd monitoring implementation") Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Message-id: 20200511183630.279750-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-04-09aio-wait: delegate polling of main AioContext if BQL not heldPaolo Bonzini1-19/+10
Any thread that is not a iothread returns NULL for qemu_get_current_aio_context(). As a result, it would also return true for in_aio_context_home_thread(qemu_get_aio_context()), causing AIO_WAIT_WHILE to invoke aio_poll() directly. This is incorrect if the BQL is not held, because aio_poll() does not expect to run concurrently from multiple threads, and it can actually happen when savevm writes to the vmstate file from the migration thread. Therefore, restrict in_aio_context_home_thread to return true for the main AioContext only if the BQL is held. The function is moved to aio-wait.h because it is mostly used there and to avoid a circular reference between main-loop.h and block/aio.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200407140746.8041-5-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-03-09aio-posix: remove idle poll handlers to improve scalabilityStefan Hajnoczi1-0/+8
When there are many poll handlers it's likely that some of them are idle most of the time. Remove handlers that haven't had activity recently so that the polling loop scales better for guests with a large number of devices. This feature only takes effect for the Linux io_uring fd monitoring implementation because it is capable of combining fd monitoring with userspace polling. The other implementations can't do that and risk starving fds in favor of poll handlers, so don't try this optimization when they are in use. IOPS improves from 10k to 105k when the guest has 100 virtio-blk-pci,num-queues=32 devices and 1 virtio-blk-pci,num-queues=1 device for rw=randread,iodepth=1,bs=4k,ioengine=libaio on NVMe. [Clarified aio_poll_handlers locking discipline explanation in comment after discussion with Paolo Bonzini <pbonzini@redhat.com>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-8-stefanha@redhat.com Message-Id: <20200305170806.1313245-8-stefanha@redhat.com>
2020-03-09aio-posix: support userspace polling of fd monitoringStefan Hajnoczi1-0/+19
Unlike ppoll(2) and epoll(7), Linux io_uring completions can be polled from userspace. Previously userspace polling was only allowed when all AioHandler's had an ->io_poll() callback. This prevented starvation of fds by userspace pollable handlers. Add the FDMonOps->need_wait() callback that enables userspace polling even when some AioHandlers lack ->io_poll(). For example, it's now possible to do userspace polling when a TCP/IP socket is monitored thanks to Linux io_uring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-7-stefanha@redhat.com Message-Id: <20200305170806.1313245-7-stefanha@redhat.com>
2020-03-09aio-posix: add io_uring fd monitoring implementationStefan Hajnoczi1-0/+9
The recent Linux io_uring API has several advantages over ppoll(2) and epoll(2). Details are given in the source code. Add an io_uring implementation and make it the default on Linux. Performance is the same as with epoll(7) but later patches add optimizations that take advantage of io_uring. It is necessary to change how aio_set_fd_handler() deals with deleting AioHandlers since removing monitored file descriptors is asynchronous in io_uring. fdmon_io_uring_remove() marks the AioHandler deleted and aio_set_fd_handler() will let it handle deletion in that case. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-6-stefanha@redhat.com Message-Id: <20200305170806.1313245-6-stefanha@redhat.com>
2020-03-09aio-posix: simplify FDMonOps->update() prototypeStefan Hajnoczi1-7/+6
The AioHandler *node, bool is_new arguments are more complicated to think about than simply being given AioHandler *old_node, AioHandler *new_node. Furthermore, the new Linux io_uring file descriptor monitoring mechanism added by the new patch requires access to both the old and the new nodes. Make this change now in preparation. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-5-stefanha@redhat.com Message-Id: <20200305170806.1313245-5-stefanha@redhat.com>
2020-03-09aio-posix: extract ppoll(2) and epoll(7) fd monitoringStefan Hajnoczi1-2/+34
The ppoll(2) and epoll(7) file descriptor monitoring implementations are mixed with the core util/aio-posix.c code. Before adding another implementation for Linux io_uring, extract out the existing ones so there is a clear interface and the core code is simpler. The new interface is AioContext->fdmon_ops, a pointer to a FDMonOps struct. See the patch for details. Semantic changes: 1. ppoll(2) now reflects events from pollfds[] back into AioHandlers while we're still on the clock for adaptive polling. This was already happening for epoll(7), so if it's really an issue then we'll need to fix both in the future. 2. epoll(7)'s fallback to ppoll(2) while external events are disabled was broken when the number of fds exceeded the epoll(7) upgrade threshold. I guess this code path simply wasn't tested and no one noticed the bug. I didn't go out of my way to fix it but the correct code is simpler than preserving the bug. I also took some liberties in removing the unnecessary AioContext->epoll_available (just check AioContext->epollfd != -1 instead) and AioContext->epoll_enabled (it's implicit if our AioContext->fdmon_ops callbacks are being invoked) fields. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/20200305170806.1313245-4-stefanha@redhat.com Message-Id: <20200305170806.1313245-4-stefanha@redhat.com>
2020-02-22aio-posix: make AioHandler deletion O(1)Stefan Hajnoczi1-1/+5
It is not necessary to scan all AioHandlers for deletion. Keep a list of deleted handlers instead of scanning the full list of all handlers. The AioHandler->deleted field can be dropped. Let's check if the handler has been inserted into the deleted list instead. Add a new QLIST_IS_INSERTED() API for this check. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200214171712.541358-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-22util/async: make bh_aio_poll() O(1)Stefan Hajnoczi1-2/+18
The ctx->first_bh list contains all created BHs, including those that are not scheduled. The list is iterated by the event loop and therefore has O(n) time complexity with respected to the number of created BHs. Rewrite BHs so that only scheduled or deleted BHs are enqueued. Only BHs that actually require action will be iterated. One semantic change is required: qemu_bh_delete() enqueues the BH and therefore invokes aio_notify(). The tests/test-aio.c:test_source_bh_delete_from_cb() test case assumed that g_main_context_iteration(NULL, false) returns false after qemu_bh_delete() but it now returns true for one iteration. Fix up the test case. This patch makes aio_compute_timeout() and aio_bh_poll() drop from a CPU profile reported by perf-top(1). Previously they combined to 9% CPU utilization when AioContext polling is commented out and the guest has 2 virtio-blk,num-queues=1 and 99 virtio-blk,num-queues=32 devices. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20200221093951.1414693-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30block/io_uring: implements interfaces for io_uringAarushi Mehta1-1/+15
Aborts when sqe fails to be set as sqes cannot be returned to the ring. Adds slow path for short reads for older kernels Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-5-stefanha@redhat.com Message-Id: <20200120141858.587874-5-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-06-12Include qemu-common.h exactly where neededMarkus Armbruster1-1/+0
No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
2018-10-19qemu-timer: introduce timer attributesArtem Pisarenko1-7/+52
Attributes are simple flags, associated with individual timers for their whole lifetime. They intended to be used to mark individual timers for special handling when they fire. New/init functions family in timer interface updated and refactored (new 'attribute' argument added, timer_list replaced with timer_list_group+type combinations, comments improved to avoid info duplication). Also existing aio interface extended with attribute-enabled variants of functions, which create/initialize timers. Signed-off-by: Artem Pisarenko <artem.k.pisarenko@gmail.com> Message-Id: <f47b81dbce734e9806f9516eba8ca588e6321c2f.1539764043.git.artem.k.pisarenko@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-06-27linux-aio: properly bubble up errors from initializationNishanth Aravamudan1-0/+3
laio_init() can fail for a couple of reasons, which will lead to a NULL pointer dereference in laio_attach_aio_context(). To solve this, add a aio_setup_linux_aio() function which is called early in raw_open_common. If this fails, propagate the error up. The signature of aio_get_linux_aio() was not modified, because it seems preferable to return the actual errno from the possible failing initialization calls. Additionally, when the AioContext changes, we need to associate a LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context callback and call the new aio_setup_linux_aio(), which will allocate a new AioContext if needed, and return errors on failures. If it fails for any reason, fallback to threaded AIO with an error message, as the device is already in-use by the guest. Add an assert that aio_get_linux_aio() cannot return NULL. Signed-off-by: Nishanth Aravamudan <naravamudan@digitalocean.com> Message-id: 20180622193700.6523-1-naravamudan@digitalocean.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-05-18iothread: fix epollfd leak in the process of delIOThreadJie Wang1-0/+8
When we call addIOThread, the epollfd created in aio_context_setup, but not close it in the process of delIOThread, so the epollfd will leak. Reorder the code in aio_epoll_disable and reuse it. Signed-off-by: Jie Wang <wangjie88@huawei.com> Message-Id: <1526517763-11108-1-git-send-email-wangjie88@huawei.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> [Mention change to aio_epoll_disable in commit message. - Fam] Signed-off-by: Fam Zheng <famz@redhat.com>
2018-03-02aio: rename aio_context_in_iothread() to in_aio_context_home_thread()Stefan Hajnoczi1-2/+5
The name aio_context_in_iothread() is misleading because it also returns true when called on the main AioContext from the main loop thread, which is not an IOThread. This patch renames it to in_aio_context_home_thread() and expands the doc comment to make the semantics clearer. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2017-05-12aio: add missing aio_notify() to aio_enable_external()Stefan Hajnoczi1-2/+8
The main loop uses aio_disable_external()/aio_enable_external() to temporarily disable processing of external AioContext clients like device emulation. This allows monitor commands to quiesce I/O and prevent the guest from submitting new requests while a monitor command is in progress. The aio_enable_external() API is currently broken when an IOThread is in aio_poll() waiting for fd activity when the main loop re-enables external clients. Incrementing ctx->external_disable_cnt does not wake the IOThread from ppoll(2) so fd processing remains suspended and leads to unresponsive emulated devices. This patch adds an aio_notify() call to aio_enable_external() so the IOThread is kicked out of ppoll(2) and will re-arm the file descriptors. The bug can be reproduced as follows: $ qemu -M accel=kvm -m 1024 \ -object iothread,id=iothread0 \ -device virtio-scsi-pci,iothread=iothread0,id=virtio-scsi-pci0 \ -drive if=none,id=drive0,aio=native,cache=none,format=raw,file=test.img \ -device scsi-hd,id=scsi-hd0,drive=drive0 \ -qmp tcp::5555,server,nowait $ scripts/qmp/qmp-shell localhost:5555 (qemu) blockdev-snapshot-sync device=drive0 snapshot-file=sn1.qcow2 mode=absolute-paths format=qcow2 After blockdev-snapshot-sync completes the SCSI disk will be unresponsive. This leads to request timeouts inside the guest. Reported-by: Qianqian Zhu <qizhu@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20170508180705.20609-1-stefanha@redhat.com Suggested-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-04-11async: Introduce aio_co_enterFam Zheng1-0/+9
They start the coroutine on the specified context. Signed-off-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2017-02-21aio-posix: partially inline aio_dispatch into aio_pollPaolo Bonzini1-5/+1
This patch prepares for the removal of unnecessary lockcnt inc/dec pairs. Extract the dispatching loop for file descriptor handlers into a new function aio_dispatch_handlers, and then inline aio_dispatch into aio_poll. aio_dispatch can now become void. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170213135235.12274-17-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-02-21aio: introduce aio_co_schedule and aio_co_wakePaolo Bonzini1-0/+32
aio_co_wake provides the infrastructure to start a coroutine on a "home" AioContext. It will be used by CoMutex and CoQueue, so that coroutines don't jump from one context to another when they go to sleep on a mutex or waitqueue. However, it can also be used as a more efficient alternative to one-shot bottom halves, and saves the effort of tracking which AioContext a coroutine is running on. aio_co_schedule is the part of aio_co_wake that starts a coroutine on a remove AioContext, but it is also useful to implement e.g. bdrv_set_aio_context callbacks. The implementation of aio_co_schedule is based on a lock-free multiple-producer, single-consumer queue. The multiple producers use cmpxchg to add to a LIFO stack. The consumer (a per-AioContext bottom half) grabs all items added so far, inverts the list to make it FIFO, and goes through it one item at a time until it's empty. The data structure was inspired by OSv, which uses it in the very code we'll "port" to QEMU for the thread-safe CoMutex. Most of the new code is really tests. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170213135235.12274-3-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-16aio: document lockingPaolo Bonzini1-16/+16
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170112180800.21085-10-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-16aio: make ctx->list_lock a QemuLockCnt, subsuming ctx->walking_bhPaolo Bonzini1-7/+5
This will make it possible to walk the list of bottom halves without holding the AioContext lock---and in turn to call bottom half handlers without holding the lock. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170112180800.21085-4-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-16aio: rename bh_lock to list_lockPaolo Bonzini1-1/+1
This will be used for AioHandlers too. There is going to be little or no contention, so it is better to reuse the same lock. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20170112180800.21085-2-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-03aio: self-tune polling timeStefan Hajnoczi1-2/+8
This patch is based on the algorithm for the kvm.ko halt_poll_ns parameter in Linux. The initial polling time is zero. If the event loop is woken up within the maximum polling time it means polling could be effective, so grow polling time. If the event loop is woken up beyond the maximum polling time it means polling is not effective, so shrink polling time. If the event loop makes progress within the current polling time then the sweet spot has been reached. This algorithm adjusts the polling time so it can adapt to variations in workloads. The goal is to reach the sweet spot while also recognizing when polling would hurt more than help. Two new trace events, poll_grow and poll_shrink, are added for observing polling time adjustment. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-13-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-03aio: add .io_poll_begin/end() callbacksStefan Hajnoczi1-0/+20
The begin and end callbacks can be used to prepare for the polling loop and clean up when polling stops. Note that they may only be called once for multiple aio_poll() calls if polling continues to succeed. Once polling fails the end callback is invoked before aio_poll() resumes file descriptor monitoring. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-11-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-03aio: add polling mode to AioContextStefan Hajnoczi1-0/+16
The AioContext event loop uses ppoll(2) or epoll_wait(2) to monitor file descriptors or until a timer expires. In cases like virtqueues, Linux AIO, and ThreadPool it is technically possible to wait for events via polling (i.e. continuously checking for events without blocking). Polling can be faster than blocking syscalls because file descriptors, the process scheduler, and system calls are bypassed. The main disadvantage to polling is that it increases CPU utilization. In classic polling configuration a full host CPU thread might run at 100% to respond to events as quickly as possible. This patch implements a timeout so we fall back to blocking syscalls if polling detects no activity. After the timeout no CPU cycles are wasted on polling until the next event loop iteration. The run_poll_handlers_begin() and run_poll_handlers_end() trace events are added to aid performance analysis and troubleshooting. If you need to know whether polling mode is being used, trace these events to find out. Note that the AioContext is now re-acquired before disabling notify_me in the non-polling case. This makes the code cleaner since notify_me was enabled outside the non-polling AioContext release region. This change is correct since it's safe to keep notify_me enabled longer (disabling is an optimization) but potentially causes unnecessary event_notifer_set() calls. I think the chance of performance regression is small here. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-03aio: add AioPollFn and io_poll() interfaceStefan Hajnoczi1-1/+4
The new AioPollFn io_poll() argument to aio_set_fd_handler() and aio_set_event_handler() is used in the next patch. Keep this code change separate due to the number of files it touches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-01-03aio: add flag to skip fds to aio_dispatch()Stefan Hajnoczi1-1/+5
Polling mode will not call ppoll(2)/epoll_wait(2). Therefore we know there are no fds ready and should avoid looping over fd handlers in aio_dispatch(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 20161201192652.9509-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-12-22block: drop remaining legacy aio functions in commentYaowei Bai1-2/+2
Commit 87f68d318222563822b5c6b28192215fc4b4e441 (block: drop aio functions that operate on the main AioContext) drops qemu_aio_wait function references mostly while leaves these behind, clean up them. Signed-off-by: Yaowei Bai <baiyaowei@cmss.chinamobile.com> Message-Id: <1480566640-27264-3-git-send-email-baiyaowei@cmss.chinamobile.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-10-28aio: convert from RFifoLock to QemuRecMutexPaolo Bonzini1-2/+1
It is simpler and a bit faster, and QEMU does not need the contention callbacks (and thus the fairness) anymore. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1477565348-5458-21-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28iothread: release AioContext around aio_pollPaolo Bonzini1-3/+0
This is the first step towards having fine-grained critical sections in dataplane threads, which will resolve lock ordering problems between address_space_* functions (which need the BQL when doing MMIO, even after we complete RCU-based dispatch) and the AioContext. Because AioContext does not use contention callbacks anymore, the unit test has to be changed. Previously applied as a0710f7995f914e3044e5899bd8ff6c43c62f916 and then reverted. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1477565348-5458-19-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-28aio: introduce qemu_get_current_aio_contextPaolo Bonzini1-0/+18
This will be used by BDRV_POLL_WHILE (and thus by bdrv_drain) to choose how to wait for I/O completion. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <1477565348-5458-12-git-send-email-pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2016-10-07async: add aio_bh_schedule_oneshotPaolo Bonzini1-0/+6
qemu_bh_delete is already clearing bh->scheduled at the same time as it's setting bh->deleted. Since it's not using any memory barriers, there is no synchronization going on for bh->deleted, and this makes the bh->deleted checks superfluous in aio_compute_timeout, aio_bh_poll and aio_ctx_check. Just remove them, and put the (bh->scheduled && bh->deleted) combo to work in a new function aio_bh_schedule_oneshot. The new function removes the need to save the QEMUBH pointer between the creation and the execution of the bottom half. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-26AioContext: correct commentsCao jin1-1/+1
Correct comments of field notify_me Cc: Kevin Wolf <kwolf@redhat.com> Cc: Max Reitz <mreitz@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Message-id: 1468575858-22975-1-git-send-email-caoj.fnst@cn.fujitsu.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-18aio-posix: remove useless parameterCao jin1-1/+1
Parameter **errp of aio_context_setup() is useless, remove it and clean up the related code. Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Fam Zheng <famz@redhat.com> Cc: Eric Blake <eblake@redhat.com> Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1468578524-23433-1-git-send-email-caoj.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18linux-aio: share one LinuxAioState within an AioContextPaolo Bonzini1-0/+13
This has better performance because it executes fewer system calls and does not use a bottom half per disk. Originally proposed by Ming Lei. [Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix build error as reported by Peter Maydell <peter.maydell@linaro.org>. --Stefan] Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> squash! linux-aio: share one LinuxAioState within an AioContext
2016-03-22Use scripts/clean-includes to drop redundant qemu/typedefs.hMarkus Armbruster1-1/+0
Re-run scripts/clean-includes to apply the previous commit's corrections and updates. Besides redundant qemu/typedefs.h, this only finds a redundant config-host.h include in ui/egl-helpers.c. No idea how that escaped the previous runs. Some manual whitespace trimming around dropped includes squashed in. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-11-09aio: Introduce aio-epoll.cFam Zheng1-0/+5
To minimize code duplication, epoll is hooked into aio-posix's aio_poll() instead of rolling its own. This approach also has both compile-time and run-time switchability. 1) When QEMU starts with a small number of fds in the event loop, ppoll is used. 2) When QEMU starts with a big number of fds, or when more devices are hot plugged, epoll kicks in when the number of fds hits the threshold. 3) Some fds may not support epoll, such as tty based stdio. In this case, it falls back to ppoll. A rough benchmark with scsi-disk on virtio-scsi dataplane (epoll gets enabled from 64 onward). Numbers are in MB/s. =============================================== | master | epoll | | scsi disks # | read randrw | read randrw -------------|----------------|---------------- 1 | 86 36 | 92 45 8 | 87 43 | 86 41 64 | 71 32 | 70 38 128 | 48 24 | 58 31 256 | 37 19 | 57 28 =============================================== To comply with aio_{disable,enable}_external, we always use ppoll when aio_external_disabled() is true. [Removed #ifdef CONFIG_EPOLL around AioContext epollfd field declaration since the field is also referenced outside CONFIG_EPOLL code. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1446177989-6702-4-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09aio: Introduce aio_context_setupFam Zheng1-0/+8
This is the place to initialize platform specific bits of AioContext. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1446177989-6702-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-09aio: Introduce aio_external_disabledFam Zheng1-0/+11
This allows AioContext users to check the enable/disable state of external clients. Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1446177989-6702-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-11-06bottom halves: introduce bh call functionPavel Dovgalyuk1-0/+5
This patch introduces aio_bh_call function. It is used to execute bottom halves as callbacks without adding them to the queue. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Message-Id: <20150917162450.8676.56980.stgit@PASHA-ISP.def.inno> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru>