aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2021-07-21iotests: Improve and rename test 291 to qemu-img-bitmapEric Blake1-1/+1
Enhance the test to demonstrate existing less-than-stellar behavior of qemu-img with a qcow2 image containing an inconsistent bitmap: we don't diagnose the problem until after copying the entire image (a potentially long time), and when we do diagnose the failure, we still end up leaving an empty bitmap in the destination. This mess will be cleaned up in the next patch. While at it, rename the test now that we support useful iotest names, and fix a missing newline in the error message thus exposed. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210709153951.2801666-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nir Soffer <nsoffer@redhat.com>
2021-07-21linux-aio: limit the batch size using `aio-max-batch` parameterStefano Garzarella1-1/+8
When there are multiple queues attached to the same AIO context, some requests may experience high latency, since in the worst case the AIO engine queue is only flushed when it is full (MAX_EVENTS) or there are no more queues plugged. Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger hardcoded value") changed MAX_EVENTS from 128 to 1024, to increase the number of in-flight requests. But this change also increased the potential maximum batch to 1024 elements. When there is a single queue attached to the AIO context, the issue is mitigated from laio_io_unplug() that will flush the queue every time is invoked since there can't be others queue plugged. Let's use the new `aio-max-batch` IOThread parameter to mitigate this issue, limiting the number of requests in a batch. We also define a default value (32): this value is obtained running some benchmarks and it represents a good tradeoff between the latency increase while a request is queued and the cost of the io_submit(2) system call. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20210721094211.69853-4-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-20block/export: Conditionally ignore set-context errorMax Reitz1-1/+4
When invoking block-export-add with some iothread and fixed-iothread=false, and changing the node's iothread fails, the error is supposed to be ignored. However, it is still stored in *errp, which is wrong. If a second error occurs, the "*errp must be NULL" assertion in error_setv() fails: qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion `*errp == NULL' failed. So if fixed-iothread=false, we should ignore the error by passing NULL to bdrv_try_set_aio_context(). Fixes: f51d23c80af73c95e0ce703ad06a300f1b3d63ef ("block/export: add iothread and fixed-iothread options") Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210624083825.29224-2-mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20block/vvfat: fix: drop backingVladimir Sementsov-Ogievskiy1-39/+4
Most probably this fake backing child doesn't work anyway (see notes about it in a8a4d15c1c34d). Still, since 25f78d9e2de528473d52 drivers are required to set .supports_backing if they want to call bdrv_set_backing_hd, so now vvfat just doesn't work because of this check. Let's finally drop this fake backing file. Fixes: 25f78d9e2de528473d52acfcf7acdfb64e3453d4 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210715124853.13335-1-vsementsov@virtuozzo.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20replication: Remove workaroundLukas Straub1-11/+1
Remove the workaround introduced in commit 6ecbc6c52672db5c13805735ca02784879ce8285 "replication: Avoid blk_make_empty() on read-only child". It is not needed anymore since s->hidden_disk is guaranteed to be writable when secondary_do_checkpoint() runs. Because replication_start(), _do_checkpoint() and _stop() are only called by COLO migration code and COLO-migration activates all disks via bdrv_invalidate_cache_all() before it calls these functions. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <d3acfad43879e9f376bffa7dd797ae74d0a7c81a.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20replication: Properly attach childrenLukas Straub1-3/+27
The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty() and bdrv_co_pwritev() to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child() and requesting the required permissions. This ultimatively fixes a potential crash in replication_co_writev(), because it may write to s->secondary_disk if it is in state BLOCK_REPLICATION_FAILOVER_FAILED, without requesting write permissions first. And now the workaround in secondary_do_checkpoint() can be removed. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <5d0539d729afb8072d0d7cde977c5066285591b4.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20replication: Reduce usage of s->hidden_disk and s->secondary_diskLukas Straub1-17/+28
In preparation for the next patch, initialize s->hidden_disk and s->secondary_disk later and replace access to them with local variables in the places where they aren't initialized yet. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1eb9dc179267207d9c7eccaeb30761758e32e9ab.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20replication: Remove s->active_diskLukas Straub1-17/+17
s->active_disk is bs->file. Remove it and use local variables instead. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <2534f867ea9be5b666dfce19744b7d4e2b96c976.1626619393.git.lukasstraub2@web.de> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20block/mirror: fix active mirror dead-lock in mirror_wait_on_conflictsVladimir Sementsov-Ogievskiy1-0/+12
It's possible that requests start to wait each other in mirror_wait_on_conflicts(). To avoid it let's use same technique as in block/io.c in bdrv_wait_serialising_requests_locked() / bdrv_find_conflicting_request(): don't wait on intersecting request if it is already waiting for some other request. For details of the dead-lock look at testIntersectingActiveIO() test-case which we actually fixing now. Fixes: d06107ade0ce74dc39739bac80de84b51ec18546 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210702211636.228981-4-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20block/mirror: set .co for active-write MirrorOp objectsVladimir Sementsov-Ogievskiy1-0/+1
This field is unused, but it very helpful for debugging. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210702211636.228981-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-19blkdebug: protect rules and suspended_reqs with a lockEmanuele Giuseppe Esposito1-10/+39
First, categorize the structure fields to identify what needs to be protected and what doesn't. We essentially need to protect only .state, and the 3 lists in BDRVBlkdebugState. Then, add the lock and mark the functions accordingly. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614082931.24925-7-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19block/blkdebug: remove new_state field and instead use a local variableEmanuele Giuseppe Esposito1-7/+6
There seems to be no benefit in using a field. Replace it with a local variable, and move the state update before the yields. The state update has do be done before the yields because now using a local variable does not allow the new updated state to be visible by the other yields. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614082931.24925-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFEEmanuele Giuseppe Esposito1-1/+6
That would be unsafe in case a rule other than the current one is removed while the coroutine has yielded. Keep FOREACH_SAFE because suspend_request deletes the current rule. After this patch, *all* matching rules are deleted before suspending the coroutine, rather than just one. This doesn't affect the existing testcases. Use actions_count to see how many yield to issue. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210614082931.24925-5-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19blkdebug: track all actionsEmanuele Giuseppe Esposito1-9/+8
Add a counter for each action that a rule can trigger. This is mainly used to keep track of how many coroutine_yield() we need to perform after processing all rules in the list. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210614082931.24925-4-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19blkdebug: move post-resume handling to resume_req_by_tagEmanuele Giuseppe Esposito1-13/+18
We want to move qemu_coroutine_yield() after the loop on rules, because QLIST_FOREACH_SAFE is wrong if the rule list is modified while the coroutine has yielded. Therefore move the suspended request to the heap and clean it up from the remove side. All that is left is for blkdebug_debug_event to handle the yielding. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210614082931.24925-3-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19blkdebug: refactor removal of a suspended requestEmanuele Giuseppe Esposito1-11/+22
Extract to a separate function. Do not rely on FOREACH_SAFE, which is only "safe" if the *current* node is removed---not if another node is removed. Instead, just walk the entire list from the beginning when asked to resume all suspended requests with a given tag. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20210614082931.24925-2-eesposit@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-12nbd: register yank function earlierLukas Straub1-3/+5
Although unlikely, qemu might hang in nbd_send_request(). Allow recovery in this case by registering the yank function before calling it. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <20210704000730.1befb596@gecko.fritz.box> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-07-11Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into ↵Peter Maydell2-1/+2
staging * More SVM fixes (Lara) * Module annotation database (Gerd) * Memory leak fixes (myself) * Build fixes (myself) * --with-devices-* support (Alex) # gpg: Signature made Fri 09 Jul 2021 17:23:52 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (48 commits) meson: Use input/output for entitlements target configure: allow the selection of alternate config in the build configs: rename default-configs to configs and reorganise hw/arm: move CONFIG_V7M out of default-devices hw/arm: add dependency on OR_IRQ for XLNX_VERSAL meson: Introduce target-specific Kconfig meson: switch function tests from compilation to linking vl: fix leak of qdict_crumple return value target/i386: fix exceptions for MOV to DR target/i386: Added DR6 and DR7 consistency checks target/i386: Added MSRPM and IOPM size check monitor/tcg: move tcg hmp commands to accel/tcg, register them dynamically usb: build usb-host as module monitor/usb: register 'info usbhost' dynamically usb: drop usb_host_dev_is_scsi_storage hook monitor: allow register hmp commands accel: build tcg modular accel: add tcg module annotations accel: build qtest modular accel: add qtest module annotations ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-10Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell6-316/+611
Block layer patches - Make blockdev-reopen stable - Remove deprecated qemu-img backing file without format - rbd: Convert to coroutines and add write zeroes support - rbd: Updated MAINTAINERS - export/fuse: Allow other users access to the export - vhost-user: Fix backends without multiqueue support - Fix drive-backup transaction endless drained section # gpg: Signature made Fri 09 Jul 2021 13:49:22 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (28 commits) block: Make blockdev-reopen stable API iotests: Test reopening multiple devices at the same time block: Support multiple reopening with x-blockdev-reopen block: Acquire AioContexts during bdrv_reopen_multiple() block: Add bdrv_reopen_queue_free() qcow2: Fix dangling pointer after reopen for 'file' qemu-img: Improve error for rebase without backing format qemu-img: Require -F with -b backing image qcow2: Prohibit backing file changes in 'qemu-img amend' blockdev: fix drive-backup transaction endless drained section vhost-user: Fix backends without multiqueue support MAINTAINERS: add block/rbd.c reviewer block/rbd: fix type of task->complete iotests/fuse-allow-other: Test allow-other iotests/308: Test +w on read-only FUSE exports export/fuse: Let permissions be adjustable export/fuse: Give SET_ATTR_SIZE its own branch export/fuse: Add allow-other option export/fuse: Pass default_permissions for mount util/uri: do not check argument of uri_free() ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-07-09modules: add block module annotationsGerd Hoffmann1-0/+1
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Jose R. Ziviani <jziviani@suse.de> Message-Id: <20210624103836.2382472-14-kraxel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-09meson: fix missing preprocessor symbolsPaolo Bonzini1-1/+1
While most libraries do not need a CONFIG_* symbol because the "when:" clauses are enough, some do. Add them back or stop using them if possible. In the case of libpmem, the statement to add the CONFIG_* symbol was still in configure, but could not be triggered because it checked for "no" instead of "disabled" (and it would be wrong anyway since the test for the library has not been done yet). Reported-by: Li Zhijian <lizhijian@cn.fujitsu.com> Fixes: 587d59d6cc ("configure, meson: convert virgl detection to meson", 2021-07-06) Fixes: 83ef16821a ("configure, meson: convert libdaxctl detection to meson", 2021-07-06) Fixes: e36e8c70f6 ("configure, meson: convert libpmem detection to meson", 2021-07-06) Fixes: 53c22b68e3 ("configure, meson: convert liburing detection to meson", 2021-07-06) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-07-09block: Acquire AioContexts during bdrv_reopen_multiple()Kevin Wolf1-0/+7
As the BlockReopenQueue can contain nodes in multiple AioContexts, only one of which may be locked when AIO_WAIT_WHILE() can be called, we can't let the caller lock the right contexts. Instead, individually lock the AioContext of a single node when iterating the queue. Reintroduce bdrv_reopen() as a wrapper for reopening a single node that drains the node and temporarily drops the AioContext lock for bdrv_reopen_multiple(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210708114709.206487-4-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09qcow2: Fix dangling pointer after reopen for 'file'Kevin Wolf1-0/+29
Without an external data file, s->data_file is a second pointer with the same value as bs->file. When changing bs->file to a different BdrvChild and freeing the old BdrvChild, s->data_file must also be updated, otherwise it points to freed memory and causes crashes. This problem was caught by iotests case 245. Fixes: df2b7086f169239ebad5d150efa29c9bb6d4f820 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210708114709.206487-2-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09qcow2: Prohibit backing file changes in 'qemu-img amend'Eric Blake1-9/+4
This was deprecated back in bc5ee6da7 (qcow2: Deprecate use of qemu-img amend to change backing file), and no one in the meantime has given any reasons why it should be supported. Time to make change attempts a hard error (but for convenience, specifying the _same_ backing chain is not forbidden). Update a couple of iotests to match. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210503213600.569128-2-eblake@redhat.com> Reviewed-by: Connor Kuehl <ckuehl@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: fix type of task->completePeter Lieven1-1/+1
task->complete is a bool not an integer. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20210707180449.32665-1-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09export/fuse: Let permissions be adjustableMax Reitz1-11/+62
Allow changing the file mode, UID, and GID through SETATTR. Without allow_other, UID and GID are not allowed to be changed, because it would not make sense. Also, changing group or others' permissions is not allowed either. For read-only exports, +w cannot be set. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210625142317.271673-5-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09export/fuse: Give SET_ATTR_SIZE its own branchMax Reitz1-9/+11
In order to support changing other attributes than the file size in fuse_setattr(), we have to give each its own independent branch. This also applies to the only attribute we do support right now. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210625142317.271673-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09export/fuse: Add allow-other optionMax Reitz1-5/+23
Without the allow_other mount option, no user (not even root) but the one who started qemu/the storage daemon can access the export. Allow users to configure the export such that such accesses are possible. While allow_other is probably what users want, we cannot make it an unconditional default, because passing it is only possible (for non-root users) if the global fuse.conf configuration file allows it. Thus, the default is an 'auto' mode, in which we first try with allow_other, and then fall back to without. FuseExport.allow_other reports whether allow_other was actually used as a mount option or not. Currently, this information is not used, but a future patch will let this field decide whether e.g. an export's UID and GID can be changed through chmod. One notable thing about 'auto' mode is that libfuse may print error messages directly to stderr, and so may fusermount (which it executes). Our export code cannot really filter or hide them. Therefore, if 'auto' fails its first attempt and has to fall back, fusermount will print an error message that mounting with allow_other failed. This behavior necessitates a change to iotest 308, namely we need to filter out this error message (because if the first attempt at mounting with allow_other succeeds, there will be no such message). Furthermore, common.rc's _make_test_img should use allow-other=off for FUSE exports, because iotests generally do not need to access images from other users, so allow-other=on or allow-other=auto have no advantage. OTOH, allow-other=on will not work on systems where user_allow_other is disabled, and with allow-other=auto, we get said error message that we would need to filter out again. Just disabling allow-other is simplest. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210625142317.271673-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09export/fuse: Pass default_permissions for mountMax Reitz1-2/+6
We do not do any permission checks in fuse_open(), so let the kernel do them. We already let fuse_getattr() report the proper UNIX permissions, so this should work the way we want. This causes a change in 308's reference output, because now opening a non-writable export with O_RDWR fails already, instead of only actually attempting to write to it. (That is an improvement.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210625142317.271673-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09util/uri: do not check argument of uri_free()Heinrich Schuchardt2-6/+2
uri_free() checks if its argument is NULL in uri_clean() and g_free(). There is no need to check the argument before the call. Signed-off-by: Heinrich Schuchardt <xypron.glpk@gmx.de> Message-Id: <20210629063602.4239-1-xypron.glpk@gmx.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: drop qemu_rbd_refresh_limitsPeter Lieven1-9/+0
librbd supports 1 byte alignment for all aio operations. Currently, there is no API call to query limits from the Ceph ObjectStore backend. So drop the bdrv_refresh_limits completely until there is such an API call. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-7-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: add write zeroes supportPeter Lieven1-1/+31
This patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores BDRV_REQ_MAY_UNMAP for older librbd versions. The rationale for this is as follows (citing Ilya Dryomov current RBD maintainer): ---8<--- a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes() and as a consequence always unmap if librbd is too old It's not clear what qemu's expectation is but in general Write Zeroes is allowed to unmap. The only guarantee is that subsequent reads return zeroes, everything else is a hint. This is how it is specified in the kernel and in the NVMe spec. In particular, block/nvme.c implements it as follows: if (flags & BDRV_REQ_MAY_UNMAP) { cdw12 |= (1 << 25); } This sets the Deallocate bit. But if it's not set, the device may still deallocate: """ If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes command, and the namespace supports clearing all bytes to 0h in the values read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from a deallocated logical block and its metadata (excluding protection information), then for each specified logical block, the controller: - should deallocate that logical block; ... If the Deallocate bit is cleared to '0' in a Write Zeroes command, and the namespace supports clearing all bytes to 0h in the values read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from a deallocated logical block and its metadata (excluding protection information), then, for each specified logical block, the controller: - may deallocate that logical block; """ https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags Again, it's not clear what qemu expects here, but without it we end up in a ridiculous situation where specifying the "don't allow slow fallback" switch immediately fails all efficient zeroing requests on a device where Write Zeroes is always efficient: $ qemu-io -c 'help write' | grep -- '-[zun]' -n, -- with -z, don't allow slow fallback -u, -- with -z, allow unmapping -z, -- write zeroes using blk_co_pwrite_zeroes $ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar write failed: Operation not supported --->8--- Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-6-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: migrate from aio to coroutinesPeter Lieven1-162/+90
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-5-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: update s->image_size in qemu_rbd_getlengthPeter Lieven1-3/+2
While at it just call rbd_get_size and avoid rbd_image_info_t. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-4-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: store object_size in BDRVRBDStatePeter Lieven1-11/+7
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-3-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: bump librbd requirement to luminous releasePeter Lieven1-112/+8
Ceph Luminous (version 12.2.z) is almost 4 years old at this point. Bump the requirement to get rid of the ifdef'ry in the code. Qemu 6.1 dropped the support for RHEL-7 which was the last supported OS that required an older librbd. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-2-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: Add support for rbd image encryptionOr Ozeri1-4/+357
Starting from ceph Pacific, RBD has built-in support for image-level encryption. Currently supported formats are LUKS version 1 and 2. There are 2 new relevant librbd APIs for controlling encryption, both expect an open image context: rbd_encryption_format: formats an image (i.e. writes the LUKS header) rbd_encryption_load: loads encryptor/decryptor to the image IO stack This commit extends the qemu rbd driver API to support the above. Signed-off-by: Or Ozeri <oro@il.ibm.com> Message-Id: <20210627114635.39326-1-oro@il.ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-06block/io: Merge discard request alignmentsAkihiko Odaki1-0/+2
Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Message-id: 20210705130458.97642-3-akihiko.odaki@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-06block/file-posix: Optimize for macOSAkihiko Odaki1-2/+25
This commit introduces "punch hole" operation and optimizes transfer block size for macOS. Thanks to Konstantin Nazarov for detailed analysis of a flaw in an old version of this change: https://gist.github.com/akihikodaki/87df4149e7ca87f18dc56807ec5a1bc5#gistcomment-3654667 Signed-off-by: Akihiko Odaki <akihiko.odaki@gmail.com> Message-id: 20210705130458.97642-1-akihiko.odaki@gmail.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-02Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell3-21/+17
Block layer patches - Supporting changing 'file' in x-blockdev-reopen - ssh: add support for sha256 host key fingerprints - vhost-user-blk: Implement reconnection during realize - introduce QEMU_AUTO_VFREE - Don't require password of encrypted backing file for image creation - Code cleanups # gpg: Signature made Wed 30 Jun 2021 17:00:55 BST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: (24 commits) vhost-user-blk: Implement reconnection during realize vhost-user-blk: Factor out vhost_user_blk_realize_connect() vhost: Distinguish errors in vhost_dev_get_config() vhost-user-blk: Add Error parameter to vhost_user_blk_start() vhost: Return 0/-errno in vhost_dev_init() vhost: Distinguish errors in vhost_backend_init() vhost: Add Error parameter to vhost_dev_init() block/ssh: add support for sha256 host key fingerprints block/commit: use QEMU_AUTO_VFREE introduce QEMU_AUTO_VFREE iotests: Test replacing files with x-blockdev-reopen block: Allow changing bs->file on reopen block: BDRVReopenState: drop replace_backing_bs field block: move supports_backing check to bdrv_set_file_or_backing_noperm() block: bdrv_reopen_parse_backing(): simplify handling implicit filters block: bdrv_reopen_parse_backing(): don't check frozen child block: bdrv_reopen_parse_backing(): don't check aio context block: introduce bdrv_set_file_or_backing_noperm() block: introduce bdrv_remove_file_or_backing_child() block: comment graph-modifying function not updating permissions ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-30block/ssh: add support for sha256 host key fingerprintsDaniel P. Berrangé1-0/+3
Currently the SSH block driver supports MD5 and SHA1 for host key fingerprints. This is a cryptographically sensitive operation and so these hash algorithms are inadequate by modern standards. This adds support for SHA256 which has been supported in libssh since the 0.8.1 release. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20210622115156.138458-1-berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-29block/nbd: Use qcrypto_tls_creds_check_endpoint()Philippe Mathieu-Daudé1-3/+3
Avoid accessing QCryptoTLSCreds internals by using the qcrypto_tls_creds_check_endpoint() helper. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2021-06-29block/commit: use QEMU_AUTO_VFREEVladimir Sementsov-Ogievskiy1-16/+9
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210628121133.193984-3-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-29block: Move read-only check during truncation earlierEric Blake1-5/+5
No need to start a tracked request that will always fail. The choice to check read-only after bdrv_inc_in_flight() predates 1bc5f09f2e (block: Use tracked request for truncate), but waiting for serializing requests can make the effect more noticeable. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210609163034.997943-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-06-28Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into ↵Peter Maydell3-56/+101
staging * Some Meson test conversions * KVM dirty page ring buffer fix * KVM TSC scaling support * Fixes for SG_IO with /dev/sdX devices * (Non)support for host devices on iOS * -smp cleanups # gpg: Signature made Fri 25 Jun 2021 15:16:18 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (28 commits) machine: reject -smp dies!=1 for non-PC machines machine: pass QAPI struct to mc->smp_parse machine: add error propagation to mc->smp_parse machine: move common smp_parse code to caller machine: move dies from X86MachineState to CpuTopology file-posix: handle EINTR during ioctl block: detect DKIOCGETBLOCKCOUNT/SIZE before use block: try BSD disk size ioctls one after another block: check for sys/disk.h block: feature detection for host block support file-posix: try BLKSECTGET on block devices too, do not round to power of 2 block: add max_hw_transfer to BlockLimits block-backend: align max_transfer to request alignment osdep: provide ROUND_DOWN macro scsi-generic: pass max_segments via max_iov field in BlockLimits file-posix: fix max_iov for /dev/sg devices KVM: Fix dirty ring mmap incorrect size due to renaming accident configure, meson: convert libusbredir detection to meson configure, meson: convert libcacard detection to meson configure, meson: convert libusb detection to meson ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-28Merge remote-tracking branch 'remotes/vsementsov/tags/pull-jobs-2021-06-25' ↵Peter Maydell3-156/+305
into staging block: Make block-copy API thread-safe # gpg: Signature made Fri 25 Jun 2021 13:40:24 BST # gpg: using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB # gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 8B9C 26CD B2FD 147C 880E 86A1 561F 24C1 F19F 79FB * remotes/vsementsov/tags/pull-jobs-2021-06-25: block-copy: atomic .cancelled and .finished fields in BlockCopyCallState block-copy: add CoMutex lock block-copy: move progress_set_remaining in block_copy_task_end block-copy: streamline choice of copy_range vs. read/write block-copy: small refactor in block_copy_task_entry and block_copy_common co-shared-resource: protect with a mutex progressmeter: protect with a mutex blockjob: let ratelimit handle a speed of 0 block-copy: let ratelimit handle a speed of 0 ratelimit: treat zero speed as unlimited Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-06-25block-copy: atomic .cancelled and .finished fields in BlockCopyCallStateEmanuele Giuseppe Esposito1-15/+22
By adding acquire/release pairs, we ensure that .ret and .error_is_read fields are written by block_copy_dirty_clusters before .finished is true, and that they are read by API user after .finished is true. The atomic here are necessary because the fields are concurrently modified in coroutines, and read outside. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25block-copy: add CoMutex lockEmanuele Giuseppe Esposito1-39/+116
Group various structures fields, to better understand what we need to protect with a lock and what doesn't need it. Then, add a CoMutex to protect concurrent access of block-copy data structures. This mutex also protects .copy_bitmap, because its thread-safe API does not prevent it from assigning two tasks to the same bitmap region. Exceptions to the lock: - .sleep_state is handled in the series "coroutine: new sleep/wake API" and thus here left as TODO. - .finished, .cancelled and reads to .ret and .error_is_read will be protected in the following patch, because are used also outside coroutines. - .skip_unallocated is atomic. Including it under the mutex would increase the critical sections and make them also much more complex. We can have it as atomic since it is only written from outside and read by block-copy coroutines. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-5-eesposit@redhat.com> [vsementsov: fix typo in comment] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25block-copy: move progress_set_remaining in block_copy_task_endEmanuele Giuseppe Esposito1-3/+3
Moving this function in task_end ensures to update the progress anyways, even if there is an error. It also helps in next patch, allowing task_end to have only one critical section. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210624072043.180494-4-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-06-25block-copy: streamline choice of copy_range vs. read/writePaolo Bonzini1-86/+90
Put the logic to determine the copy size in a separate function, so that there is a simple state machine for the possible methods of copying data from one BlockDriverState to the other. Use .method instead of .copy_range as in-out argument, and include also .zeroes as an additional copy method. While at it, store the common computation of block_copy_max_transfer into a new field of BlockCopyState, and make sure that we always obey max_transfer; that's more efficient even for the COPY_RANGE_READ_WRITE case. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20210624072043.180494-3-eesposit@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>