aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2025-09-03block/curl: drop old/unuspported curl version checksMichael Tokarev1-12/+1
We currently require libcurl >=7.29.0 (since f9cd86fe72be3cd8). Drop older LIBCURL_VERSION_NUM checks from the driver. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-09-03block/curl: fix curl internal handles handlingMichael Tokarev1-5/+2
block/curl.c uses CURLMOPT_SOCKETFUNCTION to register a socket callback. According to the documentation, this callback is called not just with application-created sockets but also with internal curl sockets, - and for such sockets, user data pointer is not set by the application, so the result qemu crashing. Pass BDRVCURLState directly to the callback function as user pointer, instead of relying on CURLINFO_PRIVATE. This problem started happening with update of libcurl from 8.9 to 8.10 -- apparently with this change curl started using private handles more. (CURLINFO_PRIVATE is used in one more place, in curl_multi_check_completion() - it might need a similar fix too) Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3081 Cc: qemu-stable@qemu.org Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2025-08-12rbd: Fix .bdrv_get_specific_info implementationKevin Wolf1-36/+68
qemu_rbd_get_specific_info() has at least two problems: The first is that it issues a blocking rbd_read() call in order to probe the encryption format for the image while querying the node. This means that if the connection to the server goes down, not only I/O is stuck (which is unavoidable), but query-names-block-nodes will actually make the whole QEMU instance unresponsive. .bdrv_get_specific_info implementations shouldn't perform blocking operations, but only return what is already known. The second is that the information returned isn't even correct. If the image is already opened with encryption enabled at the RBD level, we'll probe for "double encryption", i.e. if the encrypted data contains another encryption header. If it doesn't (which is the normal case), we won't return the encryption format. If it does, we return misleading information because it looks like we're talking about the outer level (the encryption format of the image itself) while the information is about an encryption header in the guest data. Fix this by storing the encryption format in BDRVRBDState when the image is opened (and we do blocking operations anyway) and returning only the stored information in qemu_rbd_get_specific_info(). The information we'll store is either the actual encryption format that we enabled on the RBD level, or if the image is unencrypted, the result of the same probing as we previously did when querying the node. Probing image formats based on content that can be modified by the guest has long been known as problematic, but as long as we only output it to the user instead of making decisions based on it, it should be okay. It is undoubtedly useful in the context of 'qemu-img info' when you're trying to figure out which encryption options you have to use to open the image successfully. Fixes: 42e4ac9ef5a6 ("block/rbd: Add support for rbd image encryption") Buglink: https://issues.redhat.com/browse/RHEL-105440 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250811134010.81787-1-kwolf@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14file-posix: Fix aio=threads performance regression after enablign FUAKevin Wolf1-14/+15
For aio=threads, we're currently not implementing REQ_FUA in any useful way, but just do a separate raw_co_flush_to_disk() call. This changes behaviour compared to the old state, which used bdrv_co_flush() with its optimisations. As a quick fix, call bdrv_co_flush() again like before. Eventually, we can use pwritev2() to make use of RWF_DSYNC if available, but we'll still have to keep this code path as a fallback, so this fix is required either way. While the fix itself is a one-liner, some new graph locking annotations are needed to convince TSA that the locking is correct. Cc: qemu-stable@nongnu.org Fixes: 984a32f17e8d ("file-posix: Support FUA writes") Buglink: https://issues.redhat.com/browse/RHEL-96854 Reported-by: Tingting Mao <timao@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250625085019.27735-1-kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block/qapi: make @node-name in @BlockDeviceInfo non-optionalFiona Ebner1-3/+1
Since commit 15489c769b ("block: auto-generated node-names"), if the node name of a block driver state is not explicitly specified, it will be auto-generated. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250702123204.325470-3-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block/qapi: include child references in block device infoFiona Ebner1-0/+10
In combination with using a throttle filter to enforce IO limits for a guest device, knowing the 'file' child of a block device can be useful. If the throttle filter is only intended for guest IO, block jobs should not also be limited by the throttle filter, so the block operations need to be done with the 'file' child of the top throttle node as the target. In combination with mirroring, the name of that child is not fixed. Another scenario is when unplugging a guest device after mirroring below a top throttle node, where the mirror target is added explicitly via blockdev-add. After mirroring, the target becomes the new 'file' child of the throttle node. For unplugging, both the top throttle node and the mirror target need to be deleted, because only implicitly added child nodes are deleted automatically, and the current 'file' child of the throttle node was explicitly added (as the mirror target). In other scenarios, it could be useful to follow the backing chain. Note that iotests 191 and 273 use _filter_img_info, so the 'children' information is filtered out there. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250702123204.325470-2-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block: mark bdrv_open_child_common() and its callers GRAPH_UNLOCKEDFiona Ebner1-2/+4
The function bdrv_open_child_common() calls bdrv_graph_wrlock_drained(), which must be called with the graph unlocked. Mark it and its two callers bdrv_open_file_child() and bdrv_open_child() as GRAPH_UNLOCKED. This requires temporarily unlocking in vmdk_parse_extents() and making the locked section shorter in vmdk_open(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-48-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block/commit: mark commit_abort() as GRAPH_UNLOCKEDFiona Ebner1-1/+1
The function commit_abort() calls bdrv_drained_begin(), which must be called with the graph unlocked. Also mark the JobDriver's abort() callback as GRAPH_UNLOCKED_PTR, because that is the callback via which commit_abort() is reached. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-41-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block: mark blk_remove_bs() as GRAPH_UNLOCKEDFiona Ebner1-5/+10
The function blk_remove_bs() calls bdrv_graph_wrlock_drained() and can also call bdrv_drained_begin(), both of which which must be called with the graph unlocked. Marking blk_remove_bs() as GRAPH_UNLOCKED requires temporarily unlocking in hmp_drive_del(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-38-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block: mark bdrv_reopen_queue() and bdrv_reopen_multiple() as GRAPH_UNLOCKEDFiona Ebner1-1/+2
The function bdrv_reopen_queue() can call bdrv_drain_all_begin(), which must be called with the graph unlocked. The function bdrv_reopen_multiple() calls bdrv_reopen_prepare() which must be called with the graph unlocked. To mark bdrv_reopen_queue() as GRAPH_UNLOCKED, it is necessary to make the locked section in reopen_backing_file() shorter. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-35-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block/stream: mark stream_prepare() as GRAPH_UNLOCKEDFiona Ebner1-1/+1
The function stream_prepare() calls bdrv_drain_all_begin(), which must be called with the graph unlocked. Also mark the JobDriver's prepare() callback as GRAPH_UNLOCKED_PTR, because that is the callback via which stream_prepare() is reached. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-34-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block: drop wrapper for bdrv_set_backing_hd_drained()Fiona Ebner3-11/+10
Nearly all callers (outside of the tests) are already using the _drained() variant of the function. It doesn't seem worth keeping. Simply adapt the remaining callers of bdrv_set_backing_hd() and rename bdrv_set_backing_hd_drained() to bdrv_set_backing_hd(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-31-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block/commit: switch to bdrv_set_backing_hd_drained() variantFiona Ebner1-7/+25
This is in preparation to mark bdrv_set_backing_hd() as GRAPH_UNLOCKED. Switch to using the bdrv_set_backing_hd_drained() variant. For the first pair of calls to avoid draining and locking twice in a row within the individual calls. For the third call, so that the drained and locked section can also cover bdrv_cow_bs(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-27-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block/mirror: switch to bdrv_set_backing_hd_drained() variantFiona Ebner1-3/+9
This is in preparation to mark bdrv_set_backing_hd() as GRAPH_UNLOCKED. Switch to using the bdrv_set_backing_hd_drained() variant, so that the drained and locked section can also cover the calls to bdrv_skip_filters() and bdrv_cow_bs(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-26-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block: add bdrv_graph_wrlock_drained() convenience wrapperFiona Ebner13-73/+57
Many write-locked sections are also drained sections. A new bdrv_graph_wrunlock_drained() wrapper around bdrv_graph_wrunlock() is introduced, which will begin a drained section first. A global variable is used so bdrv_graph_wrunlock() knows if it also needs to end such a drained section. Both the aio_poll call in bdrv_graph_wrlock() and the aio_bh_poll() in bdrv_graph_wrunlock() can re-enter a write-locked section. While for the latter, ending the drain could be moved to before the call, the former requires that the variable is a counter and not just a boolean. Since the wrapper calls bdrv_drain_all_begin(), which must be called with the graph unlocked, mark the wrapper as GRAPH_UNLOCKED too. The switch to the new helpers was generated with the following commands and then manually checked: find . -name '*.c' -exec sed -i -z 's/bdrv_drain_all_begin();\n\s*bdrv_graph_wrlock();/bdrv_graph_wrlock_drained();/g' {} ';' find . -name '*.c' -exec sed -i -z 's/bdrv_graph_wrunlock();\n\s*bdrv_drain_all_end();/bdrv_graph_wrunlock();/g' {} ';' Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-25-f.ebner@proxmox.com> [kwolf: Removed redundant GRAPH_UNLOCKED] Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-07-14block: never use atomics to access bs->quiesce_counterFiona Ebner1-5/+2
All accesses of bs->quiesce_counter are in the main thread, either after a GLOBAL_STATE_CODE() macro or in a function with GRAPH_WRLOCK annotation. This is essentially a revert of 414c2ec358 ("block: access quiesce_counter with atomic ops"). At that time, neither the GLOBAL_STATE_CODE() macro nor the GRAPH_WRLOCK annotation existed. Even if the field was only accessed in the main thread back then (did not check if that is actually the case), it wouldn't have been easy to verify. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-24-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-12block: skip automatic zero-init of large array in ioq_submitDaniel P. Berrangé1-1/+1
The 'ioq_submit' method has a struct array that is 8k in size. Skip the automatic zero-init of this array to eliminate the performance overhead in the I/O hot path. The 'iocbs' array will selectively initialized when processing the I/O data. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20250610123709.835102-4-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-06-04block/io: remove duplicate GLOBAL_STATE_CODE() in bdrv_do_drained_end()Fiona Ebner1-1/+0
Both commit ab61335025 ("block: drain from main loop thread in bdrv_co_yield_to_drain()") and commit d05ab380db ("block: Mark drain related functions GRAPH_RDLOCK") introduced a GLOBAL_STATE_CODE() macro in bdrv_do_drained_end(). The assertion of being in the main thread cannot change here, so keep only the earlier instance. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-23-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of quorum_del_child()Fiona Ebner1-2/+0
The quorum_del_child() callback runs under the graph lock, so it is not allowed to drain. It is only called as the .bdrv_del_child() callback, which is only called in the bdrv_del_child() function, which also runs under the graph lock. The bdrv_del_child() function is called by qmp_x_blockdev_change(). A drained section was already introduced there by commit "block: move drain out of quorum_add_child()". This finally finishes moving out the drain to places that are not under the graph lock started in "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-17-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of bdrv_root_unref_child()Fiona Ebner8-0/+32
This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". bdrv_root_unref_child() is called by: 1. blk_remove_bs(), where a drained section is introduced. 2. bdrv_unref_child(), which runs under the graph lock, so the drain will be moved further up to its callers. 3. block_job_remove_all_bdrv(), where a drained section is introduced. For all callers of bdrv_unref_child() and its generated bdrv_co_unref_child() coroutine variant, a drained section is introduced, they are not explicilty listed here. The caller quorum_del_child() holds the graph lock, so it is not actually allowed to drain. This will be addressed in the next commit. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-16-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of quorum_add_child()Fiona Ebner1-2/+0
This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". The quorum_add_child() callback runs under the graph lock, so it is not allowed to drain. It is only called as the .bdrv_add_child() callback, which is only called in the bdrv_add_child() function, which also runs under the graph lock. The bdrv_add_child() function is called by qmp_x_blockdev_change(), where a drained section is introduced. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-15-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of bdrv_attach_child()Fiona Ebner2-0/+7
This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". The function bdrv_attach_child() runs under the graph lock, so it is not allowed to drain. It is called by: 1. replication_start() 2. quorum_add_child() 3. bdrv_open_child_common() 4. Throughout test-bdrv-graph-mod.c and test-bdrv-drain.c unit tests. In all callers, a drained section is introduced. The function quorum_add_child() runs under the graph lock, so it is not actually allowed to drain. This will be addressed by the following commit. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250530151125.955508-14-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of bdrv_root_attach_child()Fiona Ebner5-0/+17
This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". The function bdrv_root_attach_child() runs under the graph lock, so it is not allowed to drain. It is called by: 1. blk_insert_bs(), where a drained section is introduced. 2. block_job_add_bdrv(), which holds the graph lock itself. block_job_add_bdrv() is called by: 1. mirror_start_job() 2. stream_start() 3. commit_start() 4. backup_job_create() 5. block_job_create() 6. In the test_blockjob_common_drain_node() unit test In all callers, a drained section is introduced. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250530151125.955508-13-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: move drain outside of bdrv_set_backing_hd_drained()Fiona Ebner1-4/+2
This is part of resolving the deadlock mentioned in commit "block: move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK". The function bdrv_set_backing_hd_drained() holds the graph lock, so it is not allowed to drain. It is called by: 1. bdrv_set_backing_hd(), where a drained section is introduced, replacing the previously present bs-specific drains. 2. stream_prepare(), where a drained section is introduced replacing the previously present bs-specific drains. The drain_bs variable in bdrv_set_backing_hd_drained() is now superfluous and thus dropped. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Message-ID: <20250530151125.955508-12-f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block: mark change_aio_ctx() callback and instances as GRAPH_RDLOCK(_PTR)Fiona Ebner1-3/+3
This is a small step in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED. More concretely, it is in preparation to move the drain out of bdrv_change_aio_context() and marking that function as GRAPH_RDLOCK. Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250530151125.955508-7-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-06-04block/snapshot: move drain outside of read-locked bdrv_snapshot_delete()Fiona Ebner1-11/+15
This is in preparation to mark bdrv_drained_begin() as GRAPH_UNLOCKED. More granular draining is not trivially possible, because bdrv_snapshot_delete() can recursively call itself. The return value of bdrv_all_delete_snapshot() changes from -1 to -errno propagated from failed sub-calls. This is fine for the existing callers of bdrv_all_delete_snapshot(). Signed-off-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250530151125.955508-4-f.ebner@proxmox.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-05-22file-posix: Probe paths and retry SG_IO on potential path errorsKevin Wolf1-1/+114
When scsi-block is used on a host multipath device, it runs into the problem that the kernel dm-mpath doesn't know anything about SCSI or SG_IO and therefore can't decide if a SG_IO request returned an error and needs to be retried on a different path. Instead of getting working failover, an error is returned to scsi-block and handled according to the configured error policy. Obviously, this is not what users want, they want working failover. QEMU can parse the SG_IO result and determine whether this could have been a path error, but just retrying the same request could just send it to the same failing path again and result in the same error. With a kernel that supports the DM_MPATH_PROBE_PATHS ioctl on dm-mpath block devices (queued in the device mapper tree for Linux 6.16), we can tell the kernel to probe all paths and tell us if any usable paths remained. If so, we can now retry the SG_IO ioctl and expect it to be sent to a working path. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20250522130803.34738-1-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-05-22file-posix: allow BLKZEROOUT with -t writebackStefan Hajnoczi1-11/+0
The Linux BLKZEROOUT ioctl is only invoked when BDRV_O_NOCACHE is set because old kernels did not invalidate the page cache. In that case mixing BLKZEROOUT with buffered I/O could lead to corruption. However, Linux 4.9 commit 22dd6d356628 ("block: invalidate the page cache when issuing BLKZEROOUT") made BLKZEROOUT coherent with the page cache. I have checked that Linux 4.9+ kernels are shipped at least as far back as Debian 10 (buster), openSUSE Leap 15.2, and RHEL/CentOS 8. Use BLKZEROOUT with buffered I/O, mostly so `qemu-img ... -t writeback` can offload write zeroes. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250417211053.98700-1-stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2025-05-14mirror: Reduce I/O when destination is detect-zeroes:unmapEric Blake1-4/+9
If we are going to punch holes in the mirror destination even for the portions where the source image is unallocated, it is nicer to treat the entire image as dirty and punch as we go, rather than pre-zeroing the entire image just to re-do I/O to the allocated portions of the image. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250513220142.535200-2-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14mirror: Skip writing zeroes when target is already zeroEric Blake1-14/+93
When mirroring, the goal is to ensure that the destination reads the same as the source; this goal is met whether the destination is sparse or fully-allocated (except when explicitly punching holes, then merely reading zero is not enough to know if it is sparse, so we still want to punch the hole). Avoiding a redundant write to zero (whether in the background because the zero cluster was marked in the dirty bitmap, or in the foreground because the guest is writing zeroes) when the destination already reads as zero makes mirroring faster, and avoids allocating the destination merely because the source reports as allocated. The effect is especially pronounced when the source is a raw file. That's because when the source is a qcow2 file, the dirty bitmap only visits the portions of the source that are allocated, which tend to be non-zero. But when the source is a raw file, bdrv_co_is_allocated_above() reports the entire file as allocated so mirror_dirty_init sets the entire dirty bitmap, and it is only later during mirror_iteration that we change to consulting the more precise bdrv_co_block_status_above() to learn where the source reads as zero. Remember that since a mirror operation can write a cluster more than once (every time the guest changes the source, the destination is also changed to keep up), and the guest can change whether a given cluster reads as zero, is discarded, or has non-zero data over the course of the mirror operation, we can't take the shortcut of relying on s->target_is_zero (which is static for the life of the job) in mirror_co_zero() to see if the destination is already zero, because that information may be stale. Any solution we use must be dynamic in the face of the guest writing or discarding a cluster while the mirror has been ongoing. We could just teach mirror_co_zero() to do a block_status() probe of the destination, and skip the zeroes if the destination already reads as zero, but we know from past experience that extra block_status() calls are not always cheap (tmpfs, anyone?), especially when they are random access rather than linear. Use of block_status() of the source by the background task in a linear fashion is not our bottleneck (it's a background task, after all); but since mirroring can be done while the source is actively being changed, we don't want a slow block_status() of the destination to occur on the hot path of the guest trying to do random-access writes to the source. So this patch takes a slightly different approach: any time we have to track dirty clusters, we can also track which clusters are known to read as zero. For sync=TOP or when we are punching holes from "detect-zeroes":"unmap", the zero bitmap starts out empty, but prevents a second write zero to a cluster that was already zero by an earlier pass; for sync=FULL when we are not punching holes, the zero bitmap starts out full if the destination reads as zero during initialization. Either way, I/O to the destination can now avoid redundant write zero to a cluster that already reads as zero, all without having to do a block_status() per write on the destination. With this patch, if I create a raw sparse destination file, connect it with QMP 'blockdev-add' while leaving it at the default "discard": "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the destination remains sparse rather than fully allocated. Meanwhile, a destination image that is already fully allocated remains so unless it was opened with "detect-zeroes": "unmap". And any time writing zeroes is skipped, the job counters are not incremented. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250509204341.3553601-26-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14mirror: Skip pre-zeroing destination if it is already zeroEric Blake1-8/+16
When doing a sync=full mirroring, we can skip pre-zeroing the destination if it already reads as zeroes and we are not also trying to punch holes due to detect-zeroes. With this patch, there are fewer scenarios that have to pass in an explicit target-is-zero, while still resulting in a sparse destination remaining sparse. A later patch will then further improve things to skip writing to the destination for parts of the image where the source is zero; but even with just this patch, it is possible to see a difference for any source that does not report itself as fully allocated, coupled with a destination BDS that can quickly report that it already reads as zero. (For a source that reports as fully allocated, such as a file, the rest of mirror_dirty_init() still sets the entire dirty bitmap to true, so even though we avoided the pre-zeroing, we are not yet avoiding all redundant I/O). Iotest 194 detects the difference made by this patch: for a file source (where block status reports the entire image as allocated, and therefore we end up writing zeroes everywhere in the destination anyways), the job length remains the same. But for a qcow2 source and a destination that reads as all zeroes, the dirty bitmap changes to just tracking the allocated portions of the source, which results in faster completion and smaller job statistics. For the test to pass with both ./check -file and -qcow2, a new python filter is needed to mask out the now-varying job amounts (this matches the shell filters _filter_block_job_{offset,len} in common.filter). A later test will also be added which further validates expected sparseness, so it does not matter that 194 is no longer explicitly looking at how many bytes were copied. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250509204341.3553601-25-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14mirror: Drop redundant zero_target parameterEric Blake1-8/+5
The two callers to a mirror job (drive-mirror and blockdev-mirror) set zero_target precisely when sync mode == FULL, with the one exception that drive-mirror skips zeroing the target if it was newly created and reads as zero. But given the previous patch, that exception is equally captured by target_is_zero. Meanwhile, there is another slight wrinkle, fortunately caught by iotest 185: if the caller uses "sync":"top" but the source has no backing file, the code in blockdev.c was changing sync to be FULL, but only after it had set zero_target=false. In mirror.c, prior to recent patches, this didn't matter: the only places that inspected sync were setting is_none_mode (both TOP and FULL had set that to false), and mirror_start() setting base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL. But now that we are passing sync around, the slammed sync mode would result in a new pre-zeroing pass even when the user had passed "sync":"top" in an effort to skip pre-zeroing. Fortunately, the assignment of base when bs has no backing chain still works out to NULL if we don't slam things. So with the forced change of sync ripped out of blockdev.c, the sync mode is passed through the full callstack unmolested, and we can now reliably reconstruct the same settings as what used to be passed in by zero_target=false, without the redundant parameter. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250509204341.3553601-24-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: Fix regression in iotest 185] Signed-off-by: Eric Blake <eblake@redhat.com>
2025-05-14mirror: Allow QMP override to declare target already zeroEric Blake1-4/+23
QEMU has an optimization for a just-created drive-mirror destination that is not possible for blockdev-mirror (which can't create the destination) - any time we know the destination starts life as all zeroes, we can skip a pre-zeroing pass on the destination. Recent patches have added an improved heuristic for detecting if a file contains all zeroes, and we plan to use that heuristic in upcoming patches. But since a heuristic cannot quickly detect all scenarios, and there may be cases where the caller is aware of information that QEMU cannot learn quickly, it makes sense to have a way to tell QEMU to assume facts about the destination that can make the mirror operation faster. Given our existing example of "qemu-img convert --target-is-zero", it is time to expose this override in QMP for blockdev-mirror as well. This patch results in some slight redundancy between the older s->zero_target (set any time mode==FULL and the destination image was not just created - ie. clear if drive-mirror is asking to skip the pre-zero pass) and the newly-introduced s->target_is_zero (in addition to the QMP override, it is set when drive-mirror creates the destination image); this will be cleaned up in the next patch. There is also a subtlety that we must consider. When drive-mirror is passing target_is_zero on behalf of a just-created image, we know the image is sparse (skipping the pre-zeroing keeps it that way), so it doesn't matter whether the destination also has "discard":"unmap" and "detect-zeroes":"unmap". But now that we are letting the user set the knob for target-is-zero, if the user passes a pre-existing file that is fully allocated, it is fine to leave the file fully allocated under "detect-zeroes":"on", but if the file is open with "detect-zeroes":"unmap", we should really be trying harder to punch holes in the destination for every region of zeroes copied from the source. The easiest way to do this is to still run the pre-zeroing pass (turning the entire destination file sparse before populating just the allocated portions of the source), even though that currently results in double I/O to the portions of the file that are allocated. A later patch will add further optimizations to reduce redundant zeroing I/O during the mirror operation. Since "target-is-zero":true is designed for optimizations, it is okay to silently ignore the parameter rather than erroring if the user ever sets the parameter in a scenario where the mirror job can't exploit it (for example, when doing "sync":"top" instead of "sync":"full", we can't pre-zero, so setting the parameter won't make a speed difference). Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20250509204341.3553601-23-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14mirror: Pass full sync mode rather than bool to internalsEric Blake1-12/+12
Out of the five possible values for MirrorSyncMode, INCREMENTAL and BITMAP are already rejected up front in mirror_start, leaving NONE, TOP, and FULL as the remaining values that the code was collapsing into a single bool is_none_mode. Furthermore, mirror_dirty_init() is only reachable for modes TOP and FULL, as further guided by s->zero_target. However, upcoming patches want to further optimize the pre-zeroing pass of a sync=full mirror in mirror_dirty_init(), while avoiding that pass on a sync=top action. Instead of throwing away context by collapsing these two values into s->is_none_mode=false, it is better to pass s->sync_mode throughout the entire operation. For active commit, the desired semantics match sync mode TOP. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250509204341.3553601-22-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14mirror: Minor refactoringEric Blake1-11/+11
Commit 5791ba52 (v9.2) pre-initialized ret in mirror_dirty_init to silence a false positive compiler warning, even though in all code paths where ret is used, it was guaranteed to be reassigned beforehand. But since the function returns -errno, and -1 is not always the right errno, it's better to initialize to -EIO. An upcoming patch wants to track two bitmaps in do_sync_target_write(); this will be easier if the current variables related to the dirty bitmap are renamed. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-21-eblake@redhat.com>
2025-05-14block: Add new bdrv_co_is_all_zeroes() functionEric Blake1-0/+62
There are some optimizations that require knowing if an image starts out as reading all zeroes, such as making blockdev-mirror faster by skipping the copying of source zeroes to the destination. The existing bdrv_co_is_zero_fast() is a good building block for answering this question, but it tends to give an answer of 0 for a file we just created via QMP 'blockdev-create' or similar (such as 'qemu-img create -f raw'). Why? Because file-posix.c insists on allocating a tiny header to any file rather than leaving it 100% sparse, due to some filesystems that are unable to answer alignment probes on a hole. But teaching file-posix.c to read the tiny header doesn't scale - the problem of a small header is also visible when libvirt sets up an NBD client to a just-created file on a migration destination host. So, we need a wrapper function that handles a bit more complexity in a common manner for all block devices - when the BDS is mostly a hole, but has a small non-hole header, it is still worth the time to read that header and check if it reads as all zeroes before giving up and returning a pessimistic answer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-19-eblake@redhat.com>
2025-05-14block: Let bdrv_co_is_zero_fast consolidate adjacent extentsEric Blake1-12/+15
Some BDS drivers have a cap on how much block status they can supply in one query (for example, NBD talking to an older server cannot inspect more than 4G per query; and qcow2 tends to cap its answers rather than cross a cluster boundary of an L1 table). Although the existing callers of bdrv_co_is_zero_fast are not passing in that large of a 'bytes' parameter, an upcoming caller wants to query the entire image at once, and will thus benefit from being able to treat adjacent zero regions in a coalesced manner, rather than claiming the region is non-zero merely because pnum was truncated and didn't match the incoming bytes. While refactoring this into a loop, note that there is no need to assign pnum prior to calling bdrv_co_common_block_status_above() (it is guaranteed to be assigned deeper in the callstack). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-18-eblake@redhat.com>
2025-05-14file-posix, gluster: Handle zero block status hint betterEric Blake2-2/+3
Although the previous patch to change 'bool want_zero' into a bitmask made no semantic change, it is now time to differentiate. When the caller specifically wants to know what parts of the file read as zero, we need to use lseek and actually reporting holes, rather than short-circuiting and advertising full allocation. This change will be utilized in later patches to let mirroring optimize for the case when the destination already reads as zeroes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-17-eblake@redhat.com>
2025-05-14block: Expand block status mode from bool to flagsEric Blake21-70/+71
This patch is purely mechanical, changing bool want_zero into an unsigned int for bitwise-or of flags. As of this patch, all implementations are unchanged (the old want_zero==true is now mode==BDRV_WANT_PRECISE which is a superset of BDRV_WANT_ZERO); but the callers in io.c that used to pass want_zero==false are now prepared for future driver changes that can now distinguish bewteen BDRV_WANT_ZERO vs. BDRV_WANT_ALLOCATED. The next patch will actually change the file-posix driver along those lines, now that we have more-specific hints. As for the background why this patch is useful: right now, the file-posix driver recognizes that if allocation is being queried, the entire image can be reported as allocated (there is no backing file to refer to) - but this throws away information on whether the entire image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0 returns -ENXIO, a bit more complicated to prove if the raw file was created with 'qemu-img create' since we intentionally allocate a small chunk of all-zero data to help with alignment probing). Later patches will add a generic algorithm for seeing if an entire file reads as zeroes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-16-eblake@redhat.com>
2025-05-14Merge tag 'pull-block-jobs-2025-04-29-v3' of ↵Stefan Hajnoczi5-36/+92
https://gitlab.com/vsementsov/qemu into staging block-job patches - deprecate some old block-job- APIs - on-cbw-error option for backup - more efficient zero handling in block commit # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCgAdFiEEi5wmzbL9FHyIDoahVh8kwfGfefsFAmgiEpwACgkQVh8kwfGf # eftuPw//UWU7MN7Kd8Tc7x/5xJuVOiuOUp8iu78EBtvJy7+yy6lZxDmrVSpob3pI # fiIjZRd0LTO0/hu5nLeTqyGs8cthKNO+hHO1i8xIQuOVC3WqdbCYkiXYUjcHJCeT # ZD2xR2l3F/cjBHXnp7w8K2wuqd4OGjvUpw/JG3mvkDp6uAMJBp+qccAtiCXKLAGv # a4qvFt02TIi7IZYoEyRN+NGuwYvmwrD0TPSbWDzroYsmdZyz93dZniiWkV8elheW # iCDzv4AG1yquAbw6INW3BRWblBYWCLSvrtMVN9XAYf8R+b75bDghUzdHPyFiitsL # aenMMPaNeH1z0jB7oSLrRWx12eCfuRy5UTeil+RQsH9HsGCu7C5yBWkuAyZwlVk1 # Qdu3SQ6HGk6BYET0TSRgk/fivmVq14vYxCFWbwclBEuN1HyNxwDJHZE3YxsqGZnM # KM1rByFViOCA+bjw00dFrn18wO8XRWHmRjed8KMAOZvc3jvJUdlr5OR3zfw3RR8l # bpBETylF7d7IpPs6LnxX08SAMBGLYzQe4rvguxjQ/2YB8C9KBkTodygKUYXR3Afw # Wp+vOVmG03XzOdaffuB9VAfyZrE7QmhbdWZTQVBcoqu/oHUbukHboB5p68L3oHXy # 0AxHjMyaW5d01JELU0Mlj1+R8e+nK2kTq17v+ghmdX/LyySUyzc= # =tjus # -----END PGP SIGNATURE----- # gpg: Signature made Mon 12 May 2025 11:24:12 EDT # gpg: using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB # gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>" [unknown] # gpg: aka "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 8B9C 26CD B2FD 147C 880E 86A1 561F 24C1 F19F 79FB * tag 'pull-block-jobs-2025-04-29-v3' of https://gitlab.com/vsementsov/qemu: blockdev-backup: Add error handling option for copy-before-write jobs qapi/block-core: deprecate some block-job- APIs qapi: synchronize jobs and block-jobs documentation block: add test non-active commit with zeroed data block: allow commit to unmap zero blocks block: refactor error handling of commit_iteration block: move commit_run loop to separate function block: get type of block allocation in commit_run Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-12blockdev-backup: Add error handling option for copy-before-write jobsRaman Dzehtsiar4-2/+8
This patch extends the blockdev-backup QMP command to allow users to specify how to behave when IO errors occur during copy-before-write operations. Previously, the behavior was fixed and could not be controlled by the user. The new 'on-cbw-error' option can be set to one of two values: - 'break-guest-write': Forwards the IO error to the guest and triggers the on-source-error policy. This preserves snapshot integrity at the expense of guest IO operations. - 'break-snapshot': Allows the guest OS to continue running normally, but invalidates the snapshot and aborts related jobs. This prioritizes guest operation over backup consistency. This enhancement provides more flexibility for backup operations in different environments where requirements for guest availability versus backup consistency may vary. The default behavior remains unchanged to maintain backward compatibility. Signed-off-by: Raman Dzehtsiar <Raman.Dzehtsiar@gmail.com> Message-ID: <20250414090025.828660-1-Raman.Dzehtsiar@gmail.com> Acked-by: Markus Armbruster <armbru@redhat.com> [vsementsov: fix long lines] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Tested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2025-05-09Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into stagingStefan Hajnoczi1-18/+23
Pull request Farhan Ali's s390x host PCI support for the block/nvme.c driver. # -----BEGIN PGP SIGNATURE----- # # iQEzBAABCgAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmgcviUACgkQnKSrs4Gr # c8hRswgAupxH5Zhx50F7GzwZyu9TCF2sphEPd2VuFVxze8Sg6mXnJq5BFTjv9IuC # 0trPppfDyKFKujDk+FA3pl9bT45btm0xctNbFYNRS3HXrVUyMQLy73MlFF2twa5g # U3uiX2d7DAYOdi5O1Cn3bhlByDh4qSko7YyUDFKio+WU57cdJxEd+pUqwyVXrU3E # AMC2ZmJdKFGGC+tWxBIAuWNc5apq9yzbiywR8z62/Z2IC+Bym0RpvCbdklqcZb8O # tpGxDKN8bY6s+hy1NZmA8eBA/iCiu6SUFmNpoe2vSwCFEk9R3gi+UNcuTVt3FaWO # lgzoZSOelmI3JkF0UBqvKsPXt3fdJw== # =KII7 # -----END PGP SIGNATURE----- # gpg: Signature made Thu 08 May 2025 10:22:29 EDT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [ultimate] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [ultimate] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * tag 'block-pull-request' of https://gitlab.com/stefanha/qemu: block/nvme: Use host PCI MMIO API include: Add a header to define host PCI MMIO functions util: Add functions for s390x mmio read/write Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-08block/nvme: Use host PCI MMIO APIFarhan Ali1-18/+23
Use the host PCI MMIO functions to read/write to NVMe registers, rather than directly accessing them. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Farhan Ali <alifm@linux.ibm.com> Message-id: 20250430185012.2303-4-alifm@linux.ibm.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-06block: Fix type conflict of the copy_file_range stubKohei Tokunaga1-2/+5
Emscripten doesn't provide copy_file_range implementation but it declares this function in its headers. Meson correctly detects the missing implementation and unsets HAVE_COPY_FILE_RANGE. However, the stub defined in file-posix.c causes a type conflict with the declaration from Emscripten during compilation. To fix this error, this commit updates the stub implementation in file-posix.c to exactly match the declaration in Emscripten's headers. The manpage also aligns with this signature. Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/938d2beba15d4bd496a600ee401995fbaa385c62.1745820062.git.ktokunaga.mail@gmail.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-06block: Add including of ioctl header for Emscripten buildKohei Tokunaga1-0/+4
Including <sys/ioctl.h> is still required on Emscripten, just like on other platforms, to make the ioctl function available. Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Link: https://lore.kernel.org/r/49b6ecdbd23ff83e3f191ef8a9f7cc2feeaea43f.1745820062.git.ktokunaga.mail@gmail.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-05-01block: allow commit to unmap zero blocksVincent Vanlaer1-9/+29
Non-active block commits do not discard blocks only containing zeros, causing images to lose sparseness after the commit. This commit fixes that by writing zero blocks using blk_co_pwrite_zeroes rather than writing them out as any other arbitrary data. Signed-off-by: Vincent Vanlaer <libvirt-e6954efa@volkihar.be> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20241026163010.2865002-5-libvirt-e6954efa@volkihar.be> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2025-05-01block: refactor error handling of commit_iterationVincent Vanlaer1-27/+36
Signed-off-by: Vincent Vanlaer <libvirt-e6954efa@volkihar.be> Message-Id: <20241026163010.2865002-4-libvirt-e6954efa@volkihar.be> [vsementsov]: move action declaration to the top of the function Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2025-04-30file-posix: Fix crash on discard_granularity == 0Kevin Wolf1-1/+1
Block devices that don't support discard have a discard_granularity of 0. Currently, this results in a division by zero when we try to make sure that it's a multiple of request_alignment. Only try to update bs->bl.pdiscard_alignment when we got a non-zero discard_granularity from sysfs. Fixes: f605796aae4 ('file-posix: probe discard alignment on Linux block devices') Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-ID: <20250429155654.102735-1-kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-04-29block: move commit_run loop to separate functionVincent Vanlaer1-37/+52
Signed-off-by: Vincent Vanlaer <libvirt-e6954efa@volkihar.be> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20241026163010.2865002-3-libvirt-e6954efa@volkihar.be> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2025-04-29block: get type of block allocation in commit_runVincent Vanlaer1-3/+9
bdrv_co_common_block_status_above not only returns whether the block is allocated, but also if it contains zeroes. Signed-off-by: Vincent Vanlaer <libvirt-e6954efa@volkihar.be> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20241026163010.2865002-2-libvirt-e6954efa@volkihar.be> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>