aboutsummaryrefslogtreecommitdiff
path: root/block/rbd.c
AgeCommit message (Collapse)AuthorFilesLines
2024-06-28block: remove separate bdrv_file_open callbackPaolo Bonzini1-1/+2
bdrv_file_open and bdrv_open are completely equivalent, they are never checked except to see which one to invoke. So merge them into a single one. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12block: Mark bdrv_apply_auto_read_only() and callers GRAPH_RDLOCKKevin Wolf1-0/+2
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_apply_auto_read_only() need to hold a reader lock for the graph because it calls bdrv_can_set_read_only(), which indirectly accesses the parents list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-19-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-10-12block: Mark bdrv_get_parent_name() and callers GRAPH_RDLOCKKevin Wolf1-0/+2
This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_get_parent_name() need to hold a reader lock for the graph because it accesses the parents list of a node. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-13-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-09-29block: Clean up local variable shadowingMarkus Armbruster1-1/+1
Local variables shadowing other local variables or parameters make the code needlessly hard to understand. Tracked down with -Wshadow=local. Clean up: delete inner declarations when they are actually redundant, else rename variables. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230921121312.1301864-7-armbru@redhat.com>
2023-02-23block/rbd: Add support for layered encryptionOr Ozeri1-1/+152
Starting from ceph Reef, RBD has built-in support for layered encryption, where each ancestor image (in a cloned image setting) can be possibly encrypted using a unique passphrase. A new function, rbd_encryption_load2, was added to librbd API. This new function supports an array of passphrases (via "spec" structs). This commit extends the qemu rbd driver API to use this new librbd API, in order to support this new layered encryption feature. Signed-off-by: Or Ozeri <oro@il.ibm.com> Message-Id: <20230129113120.722708-4-oro@oro.sl.cloud9.ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block/rbd: Add luks-any encryption opening optionOr Ozeri1-0/+19
Ceph RBD encryption API required specifying the encryption format for loading encryption. The supported formats were LUKS (v1) and LUKS2. Starting from Reef release, RBD also supports loading with "luks-any" format, which works for both versions of LUKS. This commit extends the qemu rbd driver API to enable qemu users to use this luks-any wildcard format. Signed-off-by: Or Ozeri <oro@il.ibm.com> Message-Id: <20230129113120.722708-3-oro@oro.sl.cloud9.ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-23block/rbd: Remove redundant stack variable passphrase_lenOr Ozeri1-10/+6
Signed-off-by: Or Ozeri <oro@il.ibm.com> Message-Id: <20230129113120.722708-2-oro@oro.sl.cloud9.ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_get_info() to co_wrapper_mixedEmanuele Giuseppe Esposito1-2/+3
bdrv_get_info() is categorized as an I/O function, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since it traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-11-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-02-01block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixedEmanuele Giuseppe Esposito1-2/+2
BlockDriver->bdrv_getlength is categorized as IO callback, and it currently doesn't run in a coroutine. We should let it take a graph rdlock since the callback traverses the block nodes graph, which however is only possible in a coroutine. Therefore turn it into a co_wrapper to move the actual function into a coroutine where the lock can be taken. Because now this function creates a new coroutine and polls, we need to take the AioContext lock where it is missing, for the only reason that internally co_wrapper calls AIO_WAIT_WHILE and it expects to release the AioContext lock. This is especially messy when a co_wrapper creates a coroutine and polls in bdrv_open_driver, because this function has so many callers in so many context that it can easily lead to deadlocks. Therefore the new rule for bdrv_open_driver is that the caller must always hold the AioContext lock of the given bs (except if it is a coroutine), because the function calls bdrv_refresh_total_sectors() which is now a co_wrapper. Once the rwlock is ultimated and placed in every place it needs to be, we will poll using AIO_WAIT_WHILE_UNLOCKED and remove the AioContext lock. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230113204212.359076-7-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2023-01-20include/block: Untangle inclusion loopsMarkus Armbruster1-0/+1
We have two inclusion loops: block/block.h -> block/block-global-state.h -> block/block-common.h -> block/blockjob.h -> block/block.h block/block.h -> block/block-io.h -> block/block-common.h -> block/blockjob.h -> block/block.h I believe these go back to Emanuele's reorganization of the block API, merged a few months ago in commit d7e2fe4aac8. Fortunately, breaking them is merely a matter of deleting unnecessary includes from headers, and adding them back in places where they are now missing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221221133551.3967339-2-armbru@redhat.com>
2022-12-14qapi block: Elide redundant has_FOO in generated CMarkus Armbruster1-11/+6
The has_FOO for pointer-valued FOO are redundant, except for arrays. They are also a nuisance to work with. Recent commit "qapi: Start to elide redundant has_FOO in generated C" provided the means to elide them step by step. This is the step for qapi/block*.json. Said commit explains the transformation in more detail. There is one instance of the invariant violation mentioned there: qcow2_signal_corruption() passes false, "" when node_name is an empty string. Take care to pass NULL then. The previous two commits cleaned up two more. Additionally, helper bdrv_latency_histogram_stats() loses its output parameters and returns a value instead. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Hanna Reitz <hreitz@redhat.com> Cc: qemu-block@nongnu.org Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221104160712.3005652-11-armbru@redhat.com> [Fixes for #ifndef LIBRBD_SUPPORTS_ENCRYPTION and MacOS squashed in]
2022-06-24block/rbd: report a better error when namespace does not existStefano Garzarella1-0/+24
If the namespace does not exist, rbd_create() fails with -ENOENT and QEMU reports a generic "error rbd create: No such file or directory": $ qemu-img create rbd:rbd/namespace/image 1M Formatting 'rbd:rbd/namespace/image', fmt=raw size=1048576 qemu-img: rbd:rbd/namespace/image: error rbd create: No such file or directory Unfortunately rados_ioctx_set_namespace() does not fail if the namespace does not exist, so let's use rbd_namespace_exists() in qemu_rbd_connect() to check if the namespace exists, reporting a more understandable error: $ qemu-img create rbd:rbd/namespace/image 1M Formatting 'rbd:rbd/namespace/image', fmt=raw size=1048576 qemu-img: rbd:rbd/namespace/image: namespace 'namespace' does not exist Reported-by: Tingting Mao <timao@redhat.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220517071012.6120-1-sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-22block/rbd: fix write zeroes with growing imagesStefano Garzarella1-12/+14
Commit d24f80234b ("block/rbd: increase dynamically the image size") added a workaround to support growing images (eg. qcow2), resizing the image before write operations that exceed the current size. We recently added support for write zeroes and without the workaround we can have problems with qcow2. So let's move the resize into qemu_rbd_start_co() and do it when the command is RBD_AIO_WRITE or RBD_AIO_WRITE_ZEROES. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2020993 Fixes: c56ac27d2a ("block/rbd: add write zeroes support") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20220317162638.41192-1-sgarzare@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-02-01block/rbd: workaround for ceph issue #53784Peter Lieven1-2/+40
librbd had a bug until early 2022 that affected all versions of ceph that supported fast-diff. This bug results in reporting of incorrect offsets if the offset parameter to rbd_diff_iterate2 is not object aligned. This patch works around this bug for pre Quincy versions of librbd. Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20220113144426.4036493-3-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-02-01block/rbd: fix handling of holes in .bdrv_co_block_statusPeter Lieven1-5/+5
the assumption that we can't hit a hole if we do not diff against a snapshot was wrong. We can see a hole in an image if we diff against base if there exists an older snapshot of the image and we have discarded blocks in the image where the snapshot has data. Fix this by simply handling a hole like an unallocated area. There are no callbacks for unallocated areas so just bail out if we hit a hole. Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b Suggested-by: Ilya Dryomov <idryomov@gmail.com> Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20220113144426.4036493-2-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02block/rbd: implement bdrv_co_block_statusPeter Lieven1-0/+112
the qemu rbd driver currently lacks support for bdrv_co_block_status. This results mainly in incorrect progress during block operations (e.g. qemu-img convert with an rbd image as source). This patch utilizes the rbd_diff_iterate2 call from librbd to detect allocated and unallocated (all zero areas). To avoid querying the ceph OSDs for the answer this is only done if the image has the fast-diff feature which depends on the object-map and exclusive-lock features. In this case it is guaranteed that the information is present in memory in the librbd client and thus very fast. If fast-diff is not available all areas are reported to be allocated which is the current behaviour if bdrv_co_block_status is not implemented. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20211012152231.24868-1-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-09-29block: use int64_t instead of int in driver discard handlersVladimir Sementsov-Ogievskiy1-2/+2
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver discard handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_pdiscard in block/io.c. It is already prepared to work with 64bit requests, but pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver. Let's look at all updated functions: blkdebug: all calculations are still OK, thanks to bdrv_check_qiov_request(). both rule_check and bdrv_co_pdiscard are 64bit blklogwrites: pass to blk_loc_writes_co_log which is 64bit blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK copy-before-write: pass to bdrv_co_pdiscard which is 64bit and to cbw_do_copy_before_write which is 64bit file-posix: one handler calls raw_account_discard() is 64bit and both handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass to RawPosixAIOData::aio_nbytes, which is 64bit (and calls raw_account_discard()) gluster: somehow, third argument of glfs_discard_async is size_t. Let's set max_pdiscard accordingly. iscsi: iscsi_allocmap_set_invalid is 64bit, !is_byte_request_lun_aligned is 64bit. list.num is uint32_t. Let's clarify max_pdiscard and pdiscard_alignment. mirror_top: pass to bdrv_mirror_top_do_write() which is 64bit nbd: protocol limitation. max_pdiscard is alredy set strict enough, keep it as is for now. nvme: buf.nlb is uint32_t and we do shift. So, add corresponding limits to nvme_refresh_limits(). preallocate: pass to bdrv_co_pdiscard() which is 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(), qcow2_cluster_discard() is 64bit. raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too. throttle: pass to bdrv_co_pdiscard() which is 64bit and to throttle_group_co_io_limits_intercept() which is 64bit as well. test-block-iothread: bytes argument is unused Great! Now all drivers are prepared to handle 64bit discard requests, or else have explicit max_pdiscard limits. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-11-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-09-29block: use int64_t instead of int in driver write_zeroes handlersVladimir Sementsov-Ogievskiy1-2/+2
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write_zeroes handlers bytes parameter to int64_t. The only caller of all updated function is bdrv_co_do_pwrite_zeroes(). bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s max_write_zeroes is limited to INT_MAX. So, updated functions all are safe, they will not get "bytes" larger than before. Still, let's look through all updated functions, and add assertions to the ones which are actually unprepared to values larger than INT_MAX. For these drivers also set explicit max_pwrite_zeroes limit. Let's go: blkdebug: calculations can't overflow, thanks to bdrv_check_qiov_request() in generic layer. rule_check() and bdrv_co_pwrite_zeroes() both have 64bit argument. blklogwrites: pass to blk_log_writes_co_log() with 64bit argument. blkreplay, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which is OK copy-before-write: Calls cbw_do_copy_before_write() and bdrv_co_pwrite_zeroes, both have 64bit argument. file-posix: both handler calls raw_do_pwrite_zeroes, which is updated. In raw_do_pwrite_zeroes() calculations are OK due to bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes which is uint64_t. Check also where that uint64_t gets handed: handle_aiocb_write_zeroes_block() passes a uint64_t[2] to ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate() which takes off_t (and we compile to always have 64-bit off_t), as does handle_aiocb_write_zeroes_unmap. All look safe. gluster: bytes go to GlusterAIOCB::size which is int64_t and to glfs_zerofill_async works with off_t. iscsi: Aha, here we deal with iscsi_writesame16_task() that has uint32_t num_blocks argument and iscsi_writesame16_task() has uint16_t argument. Make comments, add assertions and clarify max_pwrite_zeroes calculation. iscsi_allocmap_() functions already has int64_t argument is_byte_request_lun_aligned is simple to update, do it. mirror_top: pass to bdrv_mirror_top_do_write which has uint64_t argument nbd: Aha, here we have protocol limitation, and NBDRequest::len is uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are OK for now. nvme: Again, protocol limitation. And no inherent limit for write-zeroes at all. But from code that calculates cdw12 it's obvious that we do have limit and alignment. Let's clarify it. Also, obviously the code is not prepared to handle bytes=0. Let's handle this case too. trace events already 64bit preallocate: pass to handle_write() and bdrv_co_pwrite_zeroes(), both 64bit. rbd: pass to qemu_rbd_start_co() which is 64bit. qcow2: offset + bytes and alignment still works good (thanks to bdrv_check_qiov_request()), so tail calculation is OK qcow2_subcluster_zeroize() has 64bit argument, should be OK trace events updated qed: qed_co_request wants int nb_sectors. Also in code we have size_t used for request length which may be 32bit. So, let's just keep INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and don't care. raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both 64bit. throttle: Both throttle_group_co_io_limits_intercept() and bdrv_co_pwrite_zeroes() are 64bit. vmdk: pass to vmdk_pwritev which is 64bit quorum: pass to quorum_co_pwritev() which is 64bit Hooray! At this point all block drivers are prepared to support 64bit write-zero requests, or have explicitly set max_pwrite_zeroes. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-8-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: use <= rather than < in assertions relying on max_pwrite_zeroes] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-09-29block: use int64_t instead of uint64_t in driver write handlersVladimir Sementsov-Ogievskiy1-3/+3
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver write handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_pwritev\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_pwritev() and bdrv_driver_pwritev_compressed() in block/io.c, both pass int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_save_vmstate() does bdrv_check_qiov_request(). Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_pwritev\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done shows several callers: qcow2: qcow2_co_truncate() write at most up to @offset, which is checked in generic qcow2_co_truncate() by bdrv_check_request(). qcow2_co_pwritev_compressed_task() pass the request (or part of the request) that already went through normal write path, so it should be OK qcow: qcow_co_pwritev_compressed() pass int64_t, it's updated by this patch quorum: quorum_co_pwrite_zeroes() pass int64_t and int - OK throttle: throttle_co_pwritev_compressed() pass int64_t, it's updated by this patch vmdk: vmdk_co_pwritev_compressed() pass int64_t, it's updated by this patch Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-09-29block: use int64_t instead of uint64_t in driver read handlersVladimir Sementsov-Ogievskiy1-3/+3
We are generally moving to int64_t for both offset and bytes parameters on all io paths. Main motivation is realization of 64-bit write_zeroes operation for fast zeroing large disk chunks, up to the whole disk. We chose signed type, to be consistent with off_t (which is signed) and with possibility for signed return type (where negative value means error). So, convert driver read handlers parameters which are already 64bit to signed type. While being here, convert also flags parameter to be BdrvRequestFlags. Now let's consider all callers. Simple git grep '\->bdrv_\(aio\|co\)_preadv\(_part\)\?' shows that's there three callers of driver function: bdrv_driver_preadv() in block/io.c, passes int64_t, checked by bdrv_check_qiov_request() to be non-negative. qcow2_load_vmstate() does bdrv_check_qiov_request(). do_perform_cow_read() has uint64_t argument. And a lot of things in qcow2 driver are uint64_t, so converting it is big job. But we must not work with requests that don't satisfy bdrv_check_qiov_request(), so let's just assert it here. Still, the functions may be called directly, not only by drv->... Let's check: git grep '\.bdrv_\(aio\|co\)_preadv\(_part\)\?\s*=' | \ awk '{print $4}' | sed 's/,//' | sed 's/&//' | sort | uniq | \ while read func; do git grep "$func(" | \ grep -v "$func(BlockDriverState"; done The only one such caller: QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, &data, 1); ... ret = bdrv_replace_test_co_preadv(bs, 0, 1, &qiov, 0); in tests/unit/test-bdrv-drain.c, and it's OK obviously. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210903102807.27127-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix typos] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-07-09block/rbd: fix type of task->completePeter Lieven1-1/+1
task->complete is a bool not an integer. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20210707180449.32665-1-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: drop qemu_rbd_refresh_limitsPeter Lieven1-9/+0
librbd supports 1 byte alignment for all aio operations. Currently, there is no API call to query limits from the Ceph ObjectStore backend. So drop the bdrv_refresh_limits completely until there is such an API call. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-7-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: add write zeroes supportPeter Lieven1-1/+31
This patch wittingly sets BDRV_REQ_NO_FALLBACK and silently ignores BDRV_REQ_MAY_UNMAP for older librbd versions. The rationale for this is as follows (citing Ilya Dryomov current RBD maintainer): ---8<--- a) remove the BDRV_REQ_MAY_UNMAP check in qemu_rbd_co_pwrite_zeroes() and as a consequence always unmap if librbd is too old It's not clear what qemu's expectation is but in general Write Zeroes is allowed to unmap. The only guarantee is that subsequent reads return zeroes, everything else is a hint. This is how it is specified in the kernel and in the NVMe spec. In particular, block/nvme.c implements it as follows: if (flags & BDRV_REQ_MAY_UNMAP) { cdw12 |= (1 << 25); } This sets the Deallocate bit. But if it's not set, the device may still deallocate: """ If the Deallocate bit (CDW12.DEAC) is set to '1' in a Write Zeroes command, and the namespace supports clearing all bytes to 0h in the values read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from a deallocated logical block and its metadata (excluding protection information), then for each specified logical block, the controller: - should deallocate that logical block; ... If the Deallocate bit is cleared to '0' in a Write Zeroes command, and the namespace supports clearing all bytes to 0h in the values read (e.g., bits 2:0 in the DLFEAT field are set to 001b) from a deallocated logical block and its metadata (excluding protection information), then, for each specified logical block, the controller: - may deallocate that logical block; """ https://nvmexpress.org/wp-content/uploads/NVM-Express-NVM-Command-Set-Specification-2021.06.02-Ratified-1.pdf b) set BDRV_REQ_NO_FALLBACK in supported_zero_flags Again, it's not clear what qemu expects here, but without it we end up in a ridiculous situation where specifying the "don't allow slow fallback" switch immediately fails all efficient zeroing requests on a device where Write Zeroes is always efficient: $ qemu-io -c 'help write' | grep -- '-[zun]' -n, -- with -z, don't allow slow fallback -u, -- with -z, allow unmapping -z, -- write zeroes using blk_co_pwrite_zeroes $ qemu-io -f rbd -c 'write -z -u -n 0 1M' rbd:foo/bar write failed: Operation not supported --->8--- Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-6-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: migrate from aio to coroutinesPeter Lieven1-162/+90
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-5-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: update s->image_size in qemu_rbd_getlengthPeter Lieven1-3/+2
While at it just call rbd_get_size and avoid rbd_image_info_t. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-4-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: store object_size in BDRVRBDStatePeter Lieven1-11/+7
Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-3-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: bump librbd requirement to luminous releasePeter Lieven1-112/+8
Ceph Luminous (version 12.2.z) is almost 4 years old at this point. Bump the requirement to get rid of the ifdef'ry in the code. Qemu 6.1 dropped the support for RHEL-7 which was the last supported OS that required an older librbd. Signed-off-by: Peter Lieven <pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Message-Id: <20210702172356.11574-2-idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-09block/rbd: Add support for rbd image encryptionOr Ozeri1-4/+357
Starting from ceph Pacific, RBD has built-in support for image-level encryption. Currently supported formats are LUKS version 1 and 2. There are 2 new relevant librbd APIs for controlling encryption, both expect an open image context: rbd_encryption_format: formats an image (i.e. writes the LUKS header) rbd_encryption_load: loads encryptor/decryptor to the image IO stack This commit extends the qemu rbd driver API to support the above. Signed-off-by: Or Ozeri <oro@il.ibm.com> Message-Id: <20210627114635.39326-1-oro@il.ibm.com> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-05-14block/rbd: Add an escape-aware strchr helperConnor Kuehl1-11/+21
Sometimes the parser needs to further split a token it has collected from the token input stream. Right now, it does a cursory check to see if the relevant characters appear in the token to determine if it should break it down further. However, qemu_rbd_next_tok() will escape characters as it removes tokens from the token stream and plain strchr() won't. This can make the initial strchr() check slightly misleading since it implies qemu_rbd_next_tok() will find the token and split on it, except the reality is that qemu_rbd_next_tok() will pass over it if it is escaped. Use a custom strchr to avoid mixing escaped and unescaped string operations. Furthermore, this code is identical to how qemu_rbd_next_tok() seeks its next token, so incorporate this custom strchr into the body of that function to reduce duplication. Reported-by: Han Han <hhan@redhat.com> Fixes: https://bugzilla.redhat.com/1873913 Signed-off-by: Connor Kuehl <ckuehl@redhat.com> Message-Id: <20210421212343.85524-3-ckuehl@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-04-09block/rbd: fix memory leak in qemu_rbd_co_create_opts()Stefano Garzarella1-0/+1
When we allocate 'q_namespace', we forgot to set 'has_q_namespace' to true. This can cause several issues, including a memory leak, since qapi_free_BlockdevCreateOptions() does not deallocate that memory, as reported by valgrind: 13 bytes in 1 blocks are definitely lost in loss record 7 of 96 at 0x4839809: malloc (vg_replace_malloc.c:307) by 0x48CEBB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x48E3FE3: g_strdup (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x180010: qemu_rbd_co_create_opts (rbd.c:446) by 0x1AE72C: bdrv_create_co_entry (block.c:492) by 0x241902: coroutine_trampoline (coroutine-ucontext.c:173) by 0x57530AF: ??? (in /usr/lib64/libc-2.32.so) by 0x1FFEFFFA6F: ??? Fix setting 'has_q_namespace' to true when we allocate 'q_namespace'. Fixes: 19ae9ae014 ("block/rbd: Add support for ceph namespaces") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210329150129.121182-3-sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-04-09block/rbd: fix memory leak in qemu_rbd_connect()Stefano Garzarella1-4/+5
In qemu_rbd_connect(), 'mon_host' is allocated by qemu_rbd_mon_host() using g_strjoinv(), but it's only freed in the error path, leaking memory in the success path as reported by valgrind: 80 bytes in 4 blocks are definitely lost in loss record 5,028 of 6,516 at 0x4839809: malloc (vg_replace_malloc.c:307) by 0x5315BB8: g_malloc (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x532B6FF: g_strjoinv (in /usr/lib64/libglib-2.0.so.0.6600.8) by 0x87D07E: qemu_rbd_mon_host (rbd.c:538) by 0x87D07E: qemu_rbd_connect (rbd.c:562) by 0x87E1CE: qemu_rbd_open (rbd.c:740) by 0x840EB1: bdrv_open_driver (block.c:1528) by 0x8453A9: bdrv_open_common (block.c:1802) by 0x8453A9: bdrv_open_inherit (block.c:3444) by 0x8464C2: bdrv_open (block.c:3537) by 0x8108CD: qmp_blockdev_add (blockdev.c:3569) by 0x8EA61B: qmp_marshal_blockdev_add (qapi-commands-block-core.c:1086) by 0x90B528: do_qmp_dispatch_bh (qmp-dispatch.c:131) by 0x907EA4: aio_bh_poll (async.c:164) Fix freeing 'mon_host' also when qemu_rbd_connect() ends correctly. Fixes: 0a55679b4a5061f4d74bdb1a0e81611ba3390b00 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210329150129.121182-2-sgarzare@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-19qobject: Change qobject_to_json()'s value to GStringMarkus Armbruster1-1/+1
qobject_to_json() and qobject_to_json_pretty() build a GString, then covert it to QString. Just one of the callers actually needs a QString: qemu_rbd_parse_filename(). A few others need a string they can modify: qmp_send_response(), qga's send_response(), to_json_str(), and qmp_fd_vsend_fds(). The remainder just need a string. Change qobject_to_json() and qobject_to_json_pretty() to return the GString. qemu_rbd_parse_filename() now has to convert to QString. All others save a QString temporary. to_json_str() actually becomes a bit simpler, because GString provides more convenient modification functions. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201211171152.146877-6-armbru@redhat.com>
2020-09-15block/rbd: add 'namespace' to qemu_rbd_strong_runtime_opts[]Stefano Garzarella1-0/+1
Commit 19ae9ae014 ("block/rbd: Add support for ceph namespaces") introduced namespace support for RBD, but we forgot to add the new 'namespace' options to qemu_rbd_strong_runtime_opts[]. The 'namespace' is used to identify the image, so it is a strong option since it can changes the data of a BDS. Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1821528 Fixes: 19ae9ae014 ("block/rbd: Add support for ceph namespaces") Cc: Florian Florensa <fflorensa@online.net> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20200914190553.74871-1-sgarzare@redhat.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-09-15block/rbd: remove runtime_optsJohn Snow1-42/+0
This saw its last use in 4bfb274165ba. Signed-off-by: John Snow <jsnow@redhat.com> Message-Id: <20200806211345.2925343-2-jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-07-10qapi: Smooth another visitor error checking patternMarkus Armbruster1-5/+2
Convert visit_type_FOO(v, ..., &ptr, &err); ... if (err) { ... } to visit_type_FOO(v, ..., &ptr, errp); ... if (!ptr) { ... } for functions that set @ptr to non-null / null on success / error. Eliminate error_propagate() that are now unnecessary. Delete @err that are now unused. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200707160613.848843-40-armbru@redhat.com>
2020-05-08block: Drop unused .bdrv_has_zero_init_truncateEric Blake1-1/+0
Now that there are no clients of bdrv_has_zero_init_truncate, none of the drivers need to worry about providing it. What's more, this eliminates a source of some confusion: a literal reading of the documentation as written in ceaca56f and implemented in commit 1dcaf527 claims that a driver which returns 0 for bdrv_has_zero_init_truncate() must not return 1 for bdrv_has_zero_init(); this condition was violated for parallels, qcow, and sometimes for vdi, although in practice it did not matter since those drivers also lacked .bdrv_co_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-10-eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08rbd: Support BDRV_REQ_ZERO_WRITE for truncateEric Blake1-0/+3
Our .bdrv_has_zero_init_truncate always returns 1 because rbd always 0-fills; we can use that same knowledge to implement BDRV_REQ_ZERO_WRITE by ignoring it. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-5-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-04-30block: Add flags to BlockDriver.bdrv_co_truncate()Kevin Wolf1-0/+1
This adds a new BdrvRequestFlags parameter to the .bdrv_co_truncate() driver callbacks, and a supported_truncate_flags field in BlockDriverState that allows drivers to advertise support for request flags in the context of truncate. For now, we always pass 0 and no drivers declare support for any flag. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200424125448.63318-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-03-26block: pass BlockDriver reference to the .bdrv_co_createMaxim Levitsky1-1/+2
This will allow the reuse of a single generic .bdrv_co_create implementation for several drivers. No functional changes. Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20200326011218.29230-2-mlevitsk@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-03-06block/rbd: Add support for ceph namespacesFlorian Florensa1-13/+31
Starting from ceph Nautilus, RBD has support for namespaces, allowing for finer grain ACLs on images inside a pool, and tenant isolation. In the rbd cli tool documentation, the new image-spec and snap-spec are : - [pool-name/[namespace-name/]]image-name - [pool-name/[namespace-name/]]image-name@snap-name When using an non namespace's enabled qemu, it complains about not finding the image called namespace-name/image-name, thus we only need to parse the image once again to find if there is a '/' in its name, and if there is, use what is before it as the name of the namespace to later pass it to rados_ioctx_set_namespace. rados_ioctx_set_namespace if called with en empty string or a null pointer as the namespace parameters pretty much does nothing, as it then defaults to the default namespace. The namespace is extracted inside qemu_rbd_parse_filename, stored in the qdict, and used in qemu_rbd_connect to make it work with both qemu-img, and qemu itself. Signed-off-by: Florian Florensa <fflorensa@online.net> Message-Id: <20200110111513.321728-2-fflorensa@online.net> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-10-28block: Add @exact parameter to bdrv_co_truncate()Max Reitz1-0/+1
We have two drivers (iscsi and file-posix) that (in some cases) return success from their .bdrv_co_truncate() implementation if the block device is larger than the requested offset, but cannot be shrunk. Some callers do not want that behavior, so this patch adds a new parameter that they can use to turn off that behavior. This patch just adds the parameter and lets the block/io.c and block/block-backend.c functions pass it around. All other callers always pass false and none of the implementations evaluate it, so that this patch does not change existing behavior. Future patches take care of that. Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190918095144.955-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-10-14replay: add BH oneshot event for block layerPavel Dovgalyuk1-2/+3
Replay is capable of recording normal BH events, but sometimes there are single use callbacks scheduled with aio_bh_schedule_oneshot function. This patch enables recording and replaying such callbacks. Block layer uses these events for calling the completion function. Replaying these calls makes the execution deterministic. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-08-19block: Implement .bdrv_has_zero_init_truncate()Max Reitz1-0/+1
We need to implement .bdrv_has_zero_init_truncate() for every block driver that supports truncation and has a .bdrv_has_zero_init() implementation. Implement it the same way each driver implements .bdrv_has_zero_init(). This is at least not any more unsafe than what we had before. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20190724171239.8764-5-mreitz@redhat.com Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-07-02block/rbd: increase dynamically the image sizeStefano Garzarella1-3/+39
RBD APIs don't allow us to write more than the size set with rbd_create() or rbd_resize(). In order to support growing images (eg. qcow2), we resize the image before write operations that exceed the current size. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20190509145927.293369-1-sgarzare@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2019-06-12Include qemu/module.h where needed, drop it from qemu-common.hMarkus Armbruster1-0/+1
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-4-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for hw/usb/dev-hub.c hw/misc/exynos4210_rng.c hw/misc/bcm2835_rng.c hw/misc/aspeed_scu.c hw/display/virtio-vga.c hw/arm/stm32f205_soc.c; ui/cocoa.m fixed up]
2019-02-25block: Add strong_runtime_opts to BlockDriverMax Reitz1-0/+14
This new field can be set by block drivers to list the runtime options they accept that may influence the contents of the respective BDS. As of a follow-up patch, this list will be used by the common bdrv_refresh_filename() implementation to decide which options to put into BDS.full_open_options (and consequently whether a JSON filename has to be created), thus freeing the drivers of having to implement that logic themselves. Additionally, this patch adds the field to all of the block drivers that need it and sets it accordingly. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20190201192935.18394-22-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-11-05block: Require auto-read-only for existing fallbacksKevin Wolf1-11/+4
Some block drivers have traditionally changed their node to read-only mode without asking the user. This behaviour has been marked deprecated since 2.11, expecting users to provide an explicit read-only=on option. Now that we have auto-read-only=on, enable these drivers to make use of the option. This is the only use of bdrv_set_read_only(), so we can make it a bit more specific and turn it into a bdrv_apply_auto_read_only() that is more convenient for drivers to use. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-11-05rbd: Close image in qemu_rbd_open() error pathKevin Wolf1-0/+1
Commit e2b8247a322 introduced an error path in qemu_rbd_open() after calling rbd_open(), but neglected to close the image again in this error path. The error path should contain everything that the regular close function qemu_rbd_close() contains. This adds the missing rbd_close() call. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-10-19block: Use warn_report() & friends to report warningsMarkus Armbruster1-2/+2
Calling error_report() in a function that takes an Error ** argument is suspicious. Convert a few that are actually warnings to warn_report(). While there, split warnings consisting of multiple sentences to conform to conventions spelled out in warn_report()'s contract, and improve a rather useless warning in sheepdog.c. Cc: Kevin Wolf <kwolf@redhat.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Lieven <pl@kamp.de> Cc: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20181017082702.5581-4-armbru@redhat.com> Drop changes to "without an explicit read-only=on" warnings, because there's a series removing them pending. Also drop a cc: to a former Sheepdog maintainer. Reviewed-by: Kevin Wolf <kwolf@redhat.com>
2018-09-24block/rbd: Attempt to parse legacy filenamesJeff Cody1-2/+52
When we converted rbd to get rid of the older key/value-centric encoding format, we broke compatibility with image files with backing file strings encoded in the old format. This leaves a bit of an ugly conundrum, and a hacky solution. If the initial attempt to parse the "proper" options fails, it assumes that we may have an older key/value encoded filename. Fall back to attempting to parse the filename, and extract the required options from it. If that fails, pass along the original error message. We do not support mixed modern usage alongside legacy keyvalue pair usage. A deprecation warning has been added, although care should be taken when actually deprecating since the impact is not limited to commandline or qapi usage, but also opening existing images. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Message-id: 15b332e5432ad069441f7275a46080f465d789a0.1536704901.git.jcody@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>