aboutsummaryrefslogtreecommitdiff
path: root/include/block
AgeCommit message (Collapse)AuthorFilesLines
2023-09-11Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into stagingStefan Hajnoczi2-4/+1
Block layer patches - Optimise reqs_lock to make multiqueue actually scale - virtio: Drop out of coroutine context in virtio_load() - iotests: Fix reference output for some tests after recent changes - vpc: Avoid dynamic stack allocation - Code cleanup, improved documentation # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmT7VYgRHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9YfOg/7BoYF6lkB7DF/jH3XLY6f8zoI+OVM7dg1 # QFEjyVO+uZiJVh0CeBNI9WgnBe7f5vXMbiStyGbWKo3BLUsjnwoQcW/Sxpw61bR2 # jZYK6UHe0RhFqTQpbt8G1iCmlpRS+sX+Cy+lxcVcbqxcnLRXCOjT6ivyA4bGbYIC # q9BHg/9hBmjuM05NTV6Axy8qjqBGVaIWE9ALTnw8H//waBr4/ydJPTl7EWHe3+tO # Stm73evgPG7aLHM6W4qdFW4gwAQ8f+f42Q+0NH1YavB/pN3LTN1B6sLQY/51du+0 # d/JCsXex0IZQXmNPhqv1h01vhOyU9WBmlwpPG2iZv3a06SXk1ys3rQt/L7uIcsZg # Z58CpcUJ517FERnkl0BWXzYhsdcW2K+RdlaiL5PX6H1A2B9LT05ouZfD47hh7kKv # oX+Ulk05PFr3JRCKQF6QDEejRKXt169bGzInTlns/wXinD/V4sCkUnr9aWQuhoWk # KhQm7WMscTTIyHP2FznO4x9kq0ALsoX/NKqBW2wgJUtqRzsd4XxPp5CXEsAir8Vt # dpne/DaV5iDI1mGFJrvkctJN545tEoezBtUzC8/9rZGE0cxHAkhvQVZUDo7xVmrq # PlGQ1ko9cNui/Gf9B6qDqaJJwSyw0S6vHurGVQJRwbyly57Fi5aisWkr4w7Rc4eA # 7u9B1RvwF/Q= # =2wGD # -----END PGP SIGNATURE----- # gpg: Signature made Fri 08 Sep 2023 13:10:32 EDT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * tag 'for-upstream' of https://repo.or.cz/qemu/kevin: virtio: Drop out of coroutine context in virtio_load() vmstate: Mark VMStateInfo.get/put() coroutine_mixed_fn block: Make more BlockDriver definitions static block/meson.build: Restore alphabetical order of files block: Remove unnecessary variable in bdrv_block_device_info block: Remove bdrv_query_block_node_info vmdk: Clean up bdrv_open_child() return value check qemu-img: Update documentation for compressed images block: Be more verbose in create fallback block/iscsi: Document why we use raw malloc() qemu-img: omit errno value in error message block: change reqs_lock to QemuMutex block: minimize bs->reqs_lock section in tracked_request_end() iotests: adapt test output for new qemu_cleanup() behavior block/vpc: Avoid dynamic stack allocation Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-08block: Remove bdrv_query_block_node_infoFabiano Rosas1-3/+0
The last call site of this function has been removed by commit c04d0ab026 ("qemu-img: Let info print block graph"). Reviewed-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20230901184605.32260-2-farosas@suse.de> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08block: change reqs_lock to QemuMutexStefan Hajnoczi1-1/+1
CoMutex has poor performance when lock contention is high. The tracked requests list is accessed frequently and performance suffers in QEMU multi-queue block layer scenarios. It is not necessary to use CoMutex for the requests lock. The lock is always released across coroutine yield operations. It is held for relatively short periods of time and it is not beneficial to yield when the lock is held by another coroutine. Change the lock type from CoMutex to QemuMutex to improve multi-queue block layer performance. fio randread bs=4k iodepth=64 with 4 IOThreads handling a virtio-blk device with 8 virtqueues improves from 254k to 517k IOPS (+203%). Full benchmark results and configuration details are available here: https://gitlab.com/stefanha/virt-playbooks/-/commit/980c40845d540e3669add1528739503c2e817b57 In the future we may wish to introduce thread-local tracked requests lists to avoid lock contention completely. That would be much more involved though. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230808155852.2745350-3-stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-08Merge tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu into stagingStefan Hajnoczi1-1/+1
trivial patches for 2023-09-08 # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEe3O61ovnosKJMUsicBtPaxppPlkFAmT68tMPHG1qdEB0bHMu # bXNrLnJ1AAoJEHAbT2saaT5ZbEwH/2XcX1f4KcEJbgUn0JVhGQ5GH2c2jepZlkTZ # 2dhvdEECbOPMg73hty0fyyWlyuLWdJ9cMpONfMtzmHTH8RKEOAbpn/zusyo3H+48 # 6cunyUpBqbmb7MHPchrN+JmvtvaSPSazsj2Zdkh+Y4WlfEYj+yVysQ4zQlBlRyHv # iOTi6OdjxXg1QcbtJxAUhp+tKaRJzagiCpLkoyW2m8DIuV9cLVHMJsE3OMgfKNgK # /S+O1fLcaDhuSCrHAbZzArF3Tr4bfLqSwDtGCJfQpqKeIQDJuI+41GLIlm1nYY70 # IFJzEWMOrX/rcMG1CQnUFZOOyDSO+NfILwNnU+eyM49MUekmY54= # =mmPS # -----END PGP SIGNATURE----- # gpg: Signature made Fri 08 Sep 2023 06:09:23 EDT # gpg: using RSA key 7B73BAD68BE7A2C289314B22701B4F6B1A693E59 # gpg: issuer "mjt@tls.msk.ru" # gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>" [full] # gpg: aka "Michael Tokarev <mjt@corpit.ru>" [full] # gpg: aka "Michael Tokarev <mjt@debian.org>" [full] # Primary key fingerprint: 6EE1 95D1 886E 8FFB 810D 4324 457C E0A0 8044 65C5 # Subkey fingerprint: 7B73 BAD6 8BE7 A2C2 8931 4B22 701B 4F6B 1A69 3E59 * tag 'pull-trivial-patches' of https://gitlab.com/mjt0k/qemu: (22 commits) qxl: don't assert() if device isn't yet initialized hw/net/vmxnet3: Fix guest-triggerable assert() tests/qtest/usb-hcd: Remove the empty "init" tests target/ppc: use g_free() in test_opcode_table() hw/ppc: use g_free() in spapr_tce_table_post_load() trivial: Simplify the spots that use TARGET_BIG_ENDIAN as a numeric value accel/tcg: Fix typo in translator_io_start() description tests/qtest/test-hmp: Fix migrate_set_parameter xbzrle-cache-size test docs tests: Fix use of migrate_set_parameter qemu-options.hx: Rephrase the descriptions of the -hd* and -cdrom options hw/display/xlnx_dp: update comments block: spelling fixes misc/other: spelling fixes qga/: spelling fixes tests/: spelling fixes scripts/: spelling fixes include/: spelling fixes audio: spelling fixes xen: spelling fix riscv: spelling fixes ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-08Merge tag 'pull-nbd-2023-09-07-v2' of https://repo.or.cz/qemu/ericb into stagingStefan Hajnoczi1-2/+1
NBD patches for 2023-09-07 - Andrey Drobyshev - fix regression in iotest 197 under -nbd - Stefan Hajnoczi - allow coroutine read and write context to split across threads - Philippe Mathieu-Daudé - remove a VLA allocation - Denis V. Lunev - fix regression in iotest 233 with qemu-nbd -v --fork # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCAAdFiEEccLMIrHEYCkn0vOqp6FrSiUnQ2oFAmT7EsUACgkQp6FrSiUn # Q2qiKgf9EqCWPmcsH2nvXrDvZmDc0/I4tineaNY+hSdPtSb6RFA1IH8AvzkrkPYU # 9ojX6QFp1Z30fUs+pwweQhBMYta03QyjCFhsbPRmDq391dtIDCeww3o+RD1kw/pg # 2ZC+P9N1U3pi2Hi8FhxH17GYYgOQnHMKM9gt1V7JOQvFsDFWbTo9sFj8p/BPoWxV # I3TeLQDWqVnNjf57lG2pwhdKc8DbKoqRmA3XNiXiKI5inEBeRJsTdMMGn4YWpwJE # Y5imM/PbyCqRKQ6MYyJenVk4QVTe1IKO6D4vf1ZHLDBEiaw9NaeYHlk6lnDC4O9v # PeTycAwND6cMKYlKMyEzcJXv9IdRBw== # =jAZi # -----END PGP SIGNATURE----- # gpg: Signature made Fri 08 Sep 2023 08:25:41 EDT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * tag 'pull-nbd-2023-09-07-v2' of https://repo.or.cz/qemu/ericb: qemu-nbd: document -v behavior in respect to --fork in man qemu-nbd: Restore "qemu-nbd -v --fork" output qemu-nbd: invent nbd_client_release_pipe() helper qemu-nbd: put saddr into into struct NbdClientOpts qemu-nbd: move srcpath into struct NbdClientOpts qemu-nbd: define struct NbdClientOpts when HAVE_NBD_DEVICE is not defined qemu-nbd: improve error message for dup2 error util/iov: Avoid dynamic stack allocation io: follow coroutine AioContext in qio_channel_yield() io: check there are no qio_channel_yield() coroutines during ->finalize() nbd: drop unused nbd_start_negotiate() aio_context argument nbd: drop unused nbd_receive_negotiate() aio_context argument qemu-iotests/197: use more generic commands for formats other than qcow2 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-09-08include/: spelling fixesMichael Tokarev1-1/+1
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
2023-09-07nbd: drop unused nbd_receive_negotiate() aio_context argumentStefan Hajnoczi1-2/+1
aio_context is always NULL, so drop it. Suggested-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230830224802.493686-2-stefanha@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2023-09-07hw/ufs: Initial commit for emulated Universal-Flash-StorageJeuk Kim1-0/+1090
Universal Flash Storage (UFS) is a high-performance mass storage device with a serial interface. It is primarily used as a high-performance data storage device for embedded applications. This commit contains code for UFS device to be recognized as a UFS PCI device. Patches to handle UFS logical unit and Transfer Request will follow. Signed-off-by: Jeuk Kim <jeuk20.kim@samsung.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 10232660d462ee5cd10cf673f1a9a1205fc8276c.1693980783.git.jeuk20.kim@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-08-30block/io: align requests to subcluster_sizeAndrey Drobyshev1-4/+4
When target image is using subclusters, and we align the request during copy-on-read, it makes sense to align to subcluster_size rather than cluster_size. Otherwise we end up with unnecessary allocations. This commit renames bdrv_round_to_clusters() to bdrv_round_to_subclusters() and utilizes subcluster_size field of BlockDriverInfo to make necessary alignments. It affects copy-on-read as well as mirror job (which is using bdrv_round_to_clusters()). This change also fixes the following bug with failing assert (covered by the test in the subsequent commit): qemu-img create -f qcow2 base.qcow2 64K qemu-img create -f qcow2 -o extended_l2=on,backing_file=base.qcow2,backing_fmt=qcow2 img.qcow2 64K qemu-io -c "write -P 0xaa 0 2K" img.qcow2 qemu-io -C -c "read -P 0x00 2K 62K" img.qcow2 qemu-io: ../block/io.c:1236: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230711172553.234055-3-andrey.drobyshev@virtuozzo.com>
2023-08-30block: add subcluster_size field to BlockDriverInfoAndrey Drobyshev1-0/+5
This is going to be used in the subsequent commit as requests alignment (in particular, during copy-on-read). This value only makes sense for the formats which support subclusters (currently QCOW2 only). If this field isn't set by driver's own bdrv_get_info() implementation, we simply set it equal to the cluster size thus treating each cluster as having a single subcluster. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230711172553.234055-2-andrey.drobyshev@virtuozzo.com>
2023-07-19nbd: Use enum for various negotiation modesEric Blake1-0/+11
Deciphering the hard-coded list of integer return values from nbd_start_negotiate() will only get more confusing when adding support for 64-bit extended headers. Better is to name things in an enum. Although the function in question is private to client.c, putting the enum in a public header and including an enum-to-string conversion will allow its use in more places in upcoming patches. The enum is intentionally laid out so that operators like <= can be used to group multiple modes with similar characteristics, and where the least powerful mode has value 0, even though this patch does not exploit that. No semantic change intended. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-9-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-07-19nbd: s/handle/cookie/ to match NBD specEric Blake1-5/+6
Externally, libnbd exposed the 64-bit opaque marker for each client NBD packet as the "cookie", because it was less confusing when contrasted with 'struct nbd_handle *' holding all libnbd state. It also avoids confusion between the noun 'handle' as a way to identify a packet and the verb 'handle' for reacting to things like signals. Upstream NBD changed their spec to favor the name "cookie" based on libnbd's recommendations[1], so we can do likewise. [1] https://github.com/NetworkBlockDevice/nbd/commit/ca4392eb2b Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-6-eblake@redhat.com> [eblake: typo fix] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-07-19nbd/server: Prepare for alternate-size headersEric Blake1-4/+4
Upstream NBD now documents[1] an extension that supports 64-bit effect lengths in requests. As part of that extension, the size of the reply headers will change in order to permit a 64-bit length in the reply for symmetry[2]. Additionally, where the reply header is currently 16 bytes for simple reply, and 20 bytes for structured reply; with the extension enabled, there will only be one extended reply header, of 32 bytes, with both structured and extended modes sending identical payloads for chunked replies. Since we are already wired up to use iovecs, it is easiest to allow for this change in header size by splitting each structured reply across multiple iovecs, one for the header (which will become wider in a future patch according to client negotiation), and the other(s) for the chunk payload, and removing the header from the payload struct definitions. Rename the affected functions with s/structured/chunk/ to make it obvious that the code will be reused in extended mode. Interestingly, the client side code never utilized the packed types, so only the server code needs to be updated. [1] https://github.com/NetworkBlockDevice/nbd/blob/extension-ext-header/doc/proto.md as of NBD commit e6f3b94a934 [2] Note that on the surface, this is because some future server might permit a 4G+ NBD_CMD_READ and need to reply with that much data in one transaction. But even though the extended reply length is widened to 64 bits, for now the NBD spec is clear that servers will not reply with more than a maximum payload bounded by the 32-bit NBD_INFO_BLOCK_SIZE field; allowing a client and server to mutually agree to transactions larger than 4G would require yet another extension. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2023-07-19nbd: Consistent typedef usage in headerEric Blake1-18/+13
We had a mix of struct declarations followed by typedefs, and direct struct definitions as part of a typedef. Pick a single style. Also float forward declarations of opaque types to the top of the file, rather than interspersed with function declarations, which will help a future patch that wants to expose yet another opaque type that will be referenced in NBDRequest. No semantic impact. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20230608135653.2918540-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> [eblake: alter patch per mailing list feedback] Signed-off-by: Eric Blake <eblake@redhat.com>
2023-06-28block: use bdrv_co_debug_event in coroutine contextPaolo Bonzini1-0/+7
bdrv_co_debug_event was recently introduced, with bdrv_debug_event becoming a wrapper for use in unknown context. Because most of the time bdrv_debug_event is used on a BdrvChild via the wrapper macro BLKDBG_EVENT, introduce a similar macro BLKDBG_CO_EVENT that calls bdrv_co_debug_event, and switch whenever possible. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230601115145.196465-13-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-28graph-lock: Unlock the AioContext while pollingKevin Wolf1-2/+4
If the caller keeps the AioContext lock for a block node in an iothread, polling in bdrv_graph_wrlock() deadlocks if the condition isn't fulfilled immediately. Now that all callers make sure to actually have the AioContext locked when they call bdrv_replace_child_noperm() like they should, we can change bdrv_graph_wrlock() to take a BlockDriverState whose AioContext lock the caller holds (NULL if it doesn't) and unlock it temporarily while polling. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230605085711.21261-11-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-06-01block: remove bdrv_co_io_plug() APIStefan Hajnoczi2-14/+0
No block driver implements .bdrv_co_io_plug() anymore. Get rid of the function pointers. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20230530180959.1108766-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01block/linux-aio: convert to blk_io_plug_call() APIStefan Hajnoczi1-7/+0
Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Note that a dev_max_batch check is dropped in laio_io_unplug() because the semantics of unplug_fn() are different from .bdrv_co_unplug(): 1. unplug_fn() is only called when the last blk_io_unplug() call occurs, not every time blk_io_unplug() is called. 2. unplug_fn() is per-thread, not per-BlockDriverState, so there is no way to get per-BlockDriverState fields like dev_max_batch. Therefore this condition cannot be moved to laio_unplug_fn(). It is not obvious that this condition affects performance in practice, so I am removing it instead of trying to come up with a more complex mechanism to preserve the condition. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20230530180959.1108766-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-06-01block/io_uring: convert to blk_io_plug_call() APIStefan Hajnoczi1-7/+0
Stop using the .bdrv_co_io_plug() API because it is not multi-queue block layer friendly. Use the new blk_io_plug_call() API to batch I/O submission instead. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Message-id: 20230530180959.1108766-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-30aio: remove aio_disable_external() APIStefan Hajnoczi1-57/+0
All callers now pass is_external=false to aio_set_fd_handler() and aio_set_event_notifier(). The aio_disable_external() API that temporarily disables fd handlers that were registered is_external=true is therefore dead code. Remove aio_disable_external(), aio_enable_external(), and the is_external arguments to aio_set_fd_handler() and aio_set_event_notifier(). The entire test-fdmon-epoll test is removed because its sole purpose was testing aio_disable_external(). Parts of this patch were generated using the following coccinelle (https://coccinelle.lip6.fr/) semantic patch: @@ expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque; @@ - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque) + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque) @@ expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready; @@ - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready) + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready) Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-21-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block/export: don't require AioContext lock around blk_exp_ref/unref()Stefan Hajnoczi1-0/+2
The FUSE export calls blk_exp_ref/unref() without the AioContext lock. Instead of fixing the FUSE export, adjust blk_exp_ref/unref() so they work without the AioContext lock. This way it's less error-prone. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-15-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block: drain from main loop thread in bdrv_co_yield_to_drain()Stefan Hajnoczi1-36/+36
For simplicity, always run BlockDevOps .drained_begin/end/poll() callbacks in the main loop thread. This makes it easier to implement the callbacks and avoids extra locks. Move the function pointer declarations from the I/O Code section to the Global State section for BlockDevOps, BdrvChildClass, and BlockDriver. Narrow IO_OR_GS_CODE() to GLOBAL_STATE_CODE() where appropriate. The test-bdrv-drain test case calls bdrv_drain() from an IOThread. This is now only allowed from coroutine context, so update the test case to run in a coroutine. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-11-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-30block-coroutine-wrapper: Take AioContext lock in no_co_wrappersKevin Wolf1-0/+3
All of the functions that currently take a BlockDriverState, BdrvChild or BlockBackend as their first parameter expect the associated AioContext to be locked when they are called. In the case of no_co_wrappers, they are called from bottom halves directly in the main loop, so no other caller can be expected to take the lock for them. This can result in assertion failures because a lock that isn't taken is released in nested event loops. Looking at the first parameter is already done by co_wrappers to decide where the coroutine should run, so doing the same in no_co_wrappers is only consistent. Take the lock in the generated bottom halves to fix the problem. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-2-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19blockjob: Adhere to rate limit even when reentered earlyKevin Wolf1-4/+10
When jobs are sleeping, for example to enforce a given rate limit, they can be reentered early, in particular in order to get paused, to update the rate limit or to get cancelled. Before this patch, they behave in this case as if they had fully completed their rate limiting delay. This means that requests are sped up beyond their limit, violating the constraints that the user gave us. Change the block jobs to sleep in a loop until the necessary delay is completed, while still allowing cancelling them immediately as well pausing (handled by the pause point in job_sleep_ns()) and updating the rate limit. This change is also motivated by iotests cases being prone to fail because drain operations pause and unpause them so often that block jobs complete earlier than they are supposed to. In particular, the next commit would fail iotests 030 without this change. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-8-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-19block: Call .bdrv_co_create(_opts) unlockedKevin Wolf2-6/+6
These are functions that modify the graph, so they must be able to take a writer lock. This is impossible if they already hold the reader lock. If they need a reader lock for some of their operations, they should take it internally. Many of them go through blk_*(), which will always take the lock itself. Direct calls of bdrv_*() need to take the reader lock. Note that while locking for bdrv_co_*() calls is checked by TSA, this is not the case for the mixed_coroutine_fns bdrv_*(). Holding the lock is still required when they are called from coroutine context like here! This effectively reverts 4ec8df0183, but adds some internal locking instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230510203601.418015-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-15block: add accounting for zone append operationSam Li1-0/+1
Taking account of the new zone append write operation for zoned devices, BLOCK_ACCT_ZONE_APPEND enum is introduced as other I/O request type (read, write, flush). Signed-off-by: Sam Li <faithilikerun@gmail.com> Message-id: 20230508051916.178322-3-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block: introduce zone append write for zoned devicesSam Li3-1/+10
A zone append command is a write operation that specifies the first logical block of a zone as the write position. When writing to a zoned block device using zone append, the byte offset of the call may point at any position within the zone to which the data is being appended. Upon completion the device will respond with the position where the data has been written in the zone. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508051510.177850-3-faithilikerun@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15file-posix: add tracking of the zone write pointersSam Li2-0/+19
Since Linux doesn't have a user API to issue zone append operations to zoned devices from user space, the file-posix driver is modified to add zone append emulation using regular writes. To do this, the file-posix driver tracks the wp location of all zones of the device. It uses an array of uint64_t. The most significant bit of each wp location indicates if the zone type is conventional zones. The zones wp can be changed due to the following operations issued: - zone reset: change the wp to the start offset of that zone - zone finish: change to the end location of that zone - write to a zone - zone append Signed-off-by: Sam Li <faithilikerun@gmail.com> Message-id: 20230508051510.177850-2-faithilikerun@gmail.com [Fix errno propagation from handle_aiocb_zone_mgmt() --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block: add zoned BlockDriver check to block layerSam Li1-0/+5
Putting zoned/non-zoned BlockDrivers on top of each other is not allowed. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-6-faithilikerun@gmail.com Message-id: 20230324090605.28361-6-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org> and clarify that the check is about zoned BlockDrivers. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block/block-backend: add block layer APIs resembling Linux ZonedBlockDevice ↵Sam Li3-1/+35
ioctls Add zoned device option to host_device BlockDriver. It will be presented only for zoned host block devices. By adding zone management operations to the host_block_device BlockDriver, users can use the new block layer APIs including Report Zone and four zone management operations (open, close, finish, reset, reset_all). Qemu-io uses the new APIs to perform zoned storage commands of the device: zone_report(zrp), zone_open(zo), zone_close(zc), zone_reset(zrs), zone_finish(zf). For example, to test zone_report, use following command: $ ./build/qemu-io --image-opts -n driver=host_device, filename=/dev/nullb0 -c "zrp offset nr_zones" Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-4-faithilikerun@gmail.com Message-id: 20230324090605.28361-4-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org> and remove spurious ret = -errno in raw_co_zone_mgmt(). --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block/file-posix: introduce helper functions for sysfs attributesSam Li1-0/+3
Use get_sysfs_str_val() to get the string value of device zoned model. Then get_sysfs_zoned_model() can convert it to BlockZoneModel type of QEMU. Use get_sysfs_long_val() to get the long value of zoned device information. Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-3-faithilikerun@gmail.com Message-id: 20230324090605.28361-3-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-15block/block-common: add zoned device structsSam Li1-0/+43
Signed-off-by: Sam Li <faithilikerun@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Dmitry Fomichev <dmitry.fomichev@wdc.com> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20230508045533.175575-2-faithilikerun@gmail.com Message-id: 20230324090605.28361-2-faithilikerun@gmail.com [Adjust commit message prefix as suggested by Philippe Mathieu-Daudé <philmd@linaro.org>. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-05-10block: Mark bdrv_refresh_limits() and callers GRAPH_RDLOCKKevin Wolf2-2/+6
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_refresh_limits() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-21-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_recurse_can_replace() and callers GRAPH_RDLOCKKevin Wolf3-6/+7
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_recurse_can_replace() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-20-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_query_block_graph_info() and callers GRAPH_RDLOCKKevin Wolf1-3/+4
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_query_block_graph_info() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-19-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark BlockDriver callbacks for amend job GRAPH_RDLOCKEmanuele Giuseppe Esposito1-6/+6
This adds GRAPH_RDLOCK annotations to declare that callers of amend callbacks in BlockDriver need to hold a reader lock for the graph. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-17-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_co_debug_event() GRAPH_RDLOCKEmanuele Giuseppe Esposito2-6/+7
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_debug_event() need to hold a reader lock for the graph. Unfortunately we cannot use a co_wrapper_bdrv_rdlock (i.e. make the coroutine wrapper a no_coroutine_fn), because the function is called (using the BLKDBG_EVENT macro) by mixed functions that run both in coroutine and non-coroutine context (for example many of the functions in qcow2-cluster.c and qcow2-refcount.c). Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-16-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_co_get_info() and callers GRAPH_RDLOCKEmanuele Giuseppe Esposito2-4/+7
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_get_info() need to hold a reader lock for the graph. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-15-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Mark bdrv_co_get_allocated_file_size() and callers GRAPH_RDLOCKEmanuele Giuseppe Esposito2-3/+6
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_co_get_allocated_file_size() need to hold a reader lock for the graph. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20230504115750.54437-14-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: .bdrv_open is non-coroutine and unlockedKevin Wolf1-4/+4
Drivers were a bit confused about whether .bdrv_open can run in a coroutine and whether or not it holds a graph lock. It cannot keep a graph lock from the caller across the whole function because it both changes the graph (requires a writer lock) and does I/O (requires a reader lock). Therefore, it should take these locks internally as needed. The functions used to be called in coroutine context during image creation. This was buggy for other reasons, and as of commit 32192301, all block drivers go through no_co_wrappers. So it is not called in coroutine context any more. Fix qcow2 and qed to work with the correct assumptions: The graph lock needs to be taken internally instead of just assuming it's already there, and the coroutine path is dead code that can be removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-9-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10graph-lock: Fix GRAPH_RDLOCK_GUARD*() to be reader lockKevin Wolf1-8/+8
GRAPH_RDLOCK_GUARD() and GRAPH_RDLOCK_GUARD_MAINLOOP() only take a reader lock for the graph, so the correct annotation for them to use is TSA_ASSERT_SHARED rather than TSA_ASSERT. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20230504115750.54437-8-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10graph-lock: Add GRAPH_UNLOCKED(_PTR)Kevin Wolf1-0/+2
For some functions, it is part of their interface to be called without holding the graph lock. Add a new macro to document this. The macro expands to TSA_EXCLUDES(), which is a relatively weak check because it passes in cases where the compiler just doesn't know if the lock is held. Function pointers can't be checked at all. Therefore, its primary purpose is documentation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-7-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: bdrv/blk_co_unref() for calls in coroutine contextKevin Wolf1-1/+2
These functions must not be called in coroutine context, because they need write access to the graph. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: Consistently call bdrv_activate() outside coroutineKevin Wolf1-1/+5
Migration code can call bdrv_activate() in coroutine context, whereas other callers call it outside of coroutines. As it calls other code that is not supposed to run in coroutines, standardise on running outside of coroutines. This adds a no_co_wrapper to switch to the main loop before calling bdrv_activate(). Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230504115750.54437-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10aio-wait: avoid AioContext lock in aio_wait_bh_oneshot()Stefan Hajnoczi1-1/+1
There is no need for the AioContext lock in aio_wait_bh_oneshot(). It's easy to remove the lock from existing callers and then switch from AIO_WAIT_WHILE() to AIO_WAIT_WHILE_UNLOCKED() in aio_wait_bh_oneshot(). Document that the AioContext lock should not be held across aio_wait_bh_oneshot(). Holding a lock across aio_poll() can cause deadlock so we don't want callers to do that. This is a step towards getting rid of the AioContext lock. Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230404153307.458883-1-stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-05-10block: add missing coroutine_fn annotationsPaolo Bonzini1-2/+2
After the recent introduction of many new coroutine callbacks, a couple calls from non-coroutine_fn to coroutine_fn have sneaked in; fix them. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20230406101752.242125-1-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-28async: Add an optional reentrancy guard to the BH APIAlexander Bulekov1-2/+16
Devices can pass their MemoryReentrancyGuard (from their DeviceState), when creating new BHes. Then, the async API will toggle the guard before/after calling the BH call-back. This prevents bh->mmio reentrancy issues. Signed-off-by: Alexander Bulekov <alxndr@bu.edu> Reviewed-by: Darren Kenny <darren.kenny@oracle.com> Message-Id: <20230427211013.2994127-3-alxndr@bu.edu> [thuth: Fix "line over 90 characters" checkpatch.pl error] Signed-off-by: Thomas Huth <thuth@redhat.com>
2023-04-25thread-pool: avoid passing the pool parameter every timeEmanuele Giuseppe Esposito1-6/+4
thread_pool_submit_aio() is always called on a pool taken from qemu_get_current_aio_context(), and that is the only intended use: each pool runs only in the same thread that is submitting work to it, it can't run anywhere else. Therefore simplify the thread_pool_submit* API and remove the ThreadPool function parameter. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230203131731.851116-5-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-25thread-pool: use ThreadPool from the running threadEmanuele Giuseppe Esposito1-0/+5
Use qemu_get_current_aio_context() where possible, since we always submit work to the current thread anyways. We want to also be sure that the thread submitting the work is the same as the one processing the pool, to avoid adding synchronization to the pool list. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230203131731.851116-4-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-04-25io_uring: use LuringState from the running threadEmanuele Giuseppe Esposito2-8/+11
Remove usage of aio_context_acquire by always submitting asynchronous AIO to the current thread's LuringState. In order to prevent mistakes from the caller side, avoid passing LuringState in luring_io_{plug/unplug} and luring_co_submit, and document the functions to make clear that they work in the current thread's AioContext. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20230203131731.851116-3-eesposit@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>