aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2022-01-14Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell5-28/+46
Block layer patches - qemu-storage-daemon: Add vhost-user-blk help - block-backend: Fix use-after-free for BDS pointers after aio_poll() - qemu-img: Fix sparseness of output image with unaligned ranges - vvfat: Fix crashes in read-write mode - Fix device deletion events with -device JSON syntax - Code cleanups # gpg: Signature made Fri 14 Jan 2022 13:50:16 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests/testrunner.py: refactor test_field_width block: drop BLK_PERM_GRAPH_MOD qemu-img: make is_allocated_sectors() more efficient iotests: Test qemu-img convert of zeroed data cluster vvfat: Fix vvfat_write() for writes before the root directory vvfat: Fix size of temporary qcow file iotests/308: Fix for CAP_DAC_OVERRIDE iotests/stream-error-on-reset: New test block-backend: prevent dangling BDS pointers across aio_poll() qapi/block: Restrict vhost-user-blk to CONFIG_VHOST_USER_BLK_SERVER qemu-storage-daemon: Add vhost-user-blk help docs: Correct 'vhost-user-blk' spelling softmmu: fix device deletion events with -device JSON syntax include/sysemu/blockdev.h: remove drive_get_max_devs include/sysemu/blockdev.h: remove drive_mark_claimed_by_board and inline drive_def block_int: make bdrv_backing_overridden static Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-01-14block: drop BLK_PERM_GRAPH_MODVladimir Sementsov-Ogievskiy2-13/+3
First, this permission never protected a node from being changed, as generic child-replacing functions don't check it. Second, it's a strange thing: it presents a permission of parent node to change its child. But generally, children are replaced by different mechanisms, like jobs or qmp commands, not by nodes. Graph-mod permission is hard to understand. All other permissions describe operations which done by parent node on its child: read, write, resize. Graph modification operations are something completely different. The only place where BLK_PERM_GRAPH_MOD is used as "perm" (not shared perm) is mirror_start_job, for s->target. Still modern code should use bdrv_freeze_backing_chain() to protect from graph modification, if we don't do it somewhere it may be considered as a bug. So, it's a bit risky to drop GRAPH_MOD, and analyzing of possible loss of protection is hard. But one day we should do it, let's do it now. One more bit of information is that locking the corresponding byte in file-posix doesn't make sense at all. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210902093754.2352-1-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-01-14vvfat: Fix vvfat_write() for writes before the root directoryKevin Wolf1-8/+22
The calculation in sector2cluster() is done relative to the offset of the root directory. Any writes to blocks before the start of the root directory (in particular, writes to the FAT) result in negative values, which are not handled correctly in vvfat_write(). This changes sector2cluster() to return a signed value, and makes sure that vvfat_write() doesn't try to find mappings for negative cluster number. It clarifies the code in vvfat_write() to make it more obvious that the cluster numbers can be negative. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211209152231.23756-1-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-01-14vvfat: Fix size of temporary qcow fileKevin Wolf1-4/+3
The size of the qcow size was calculated so that only the FAT partition would fit on it, but not the whole disk. However, offsets relative to the whole disk are used to access it, so increase its size to be large enough for that. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211209151815.23495-1-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-01-14block-backend: prevent dangling BDS pointers across aio_poll()Stefan Hajnoczi1-2/+17
The BlockBackend root child can change when aio_poll() is invoked. This happens when a temporary filter node is removed upon blockjob completion, for example. Functions in block/block-backend.c must be aware of this when using a blk_bs() pointer across aio_poll() because the BlockDriverState refcnt may reach 0, resulting in a stale pointer. One example is scsi_device_purge_requests(), which calls blk_drain() to wait for in-flight requests to cancel. If the backup blockjob is active, then the BlockBackend root child is a temporary filter BDS owned by the blockjob. The blockjob can complete during bdrv_drained_begin() and the last reference to the BDS is released when the temporary filter node is removed. This results in a use-after-free when blk_drain() calls bdrv_drained_end(bs) on the dangling pointer. Explicitly hold a reference to bs across block APIs that invoke aio_poll(). Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2021778 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2036178 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220111153613.25453-2-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-01-14include/sysemu/blockdev.h: remove drive_mark_claimed_by_board and inline ↵Emanuele Giuseppe Esposito1-1/+1
drive_def drive_def is only a particular use case of qemu_opts_parse_noisily, so it can be inlined. Also remove drive_mark_claimed_by_board, as it is only defined but not implemented (nor used) anywhere. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20211215121140.456939-3-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-01-14Merge remote-tracking branch ↵Peter Maydell9-48/+71
'remotes/stefanha-gitlab/tags/block-pull-request' into staging Pull request # gpg: Signature made Wed 12 Jan 2022 17:13:54 GMT # gpg: using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full] # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" [full] # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha-gitlab/tags/block-pull-request: virtio: unify dataplane and non-dataplane ->handle_output() virtio: use ->handle_output() instead of ->handle_aio_output() virtio-scsi: prepare virtio_scsi_handle_cmd for dataplane virtio-blk: drop unused virtio_blk_handle_vq() return value virtio: get rid of VirtIOHandleAIOOutput aio-posix: split poll check from ready handler Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-01-12aio-posix: split poll check from ready handlerStefan Hajnoczi9-48/+71
Adaptive polling measures the execution time of the polling check plus handlers called when a polled event becomes ready. Handlers can take a significant amount of time, making it look like polling was running for a long time when in fact the event handler was running for a long time. For example, on Linux the io_submit(2) syscall invoked when a virtio-blk device's virtqueue becomes ready can take 10s of microseconds. This can exceed the default polling interval (32 microseconds) and cause adaptive polling to stop polling. By excluding the handler's execution time from the polling check we make the adaptive polling calculation more accurate. As a result, the event loop now stays in polling mode where previously it would have fallen back to file descriptor monitoring. The following data was collected with virtio-blk num-queues=2 event_idx=off using an IOThread. Before: 168k IOPS, IOThread syscalls: 9837.115 ( 0.020 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 16, iocbpp: 0x7fcb9f937db0) = 16 9837.158 ( 0.002 ms): IO iothread1/620155 write(fd: 103, buf: 0x556a2ef71b88, count: 8) = 8 9837.161 ( 0.001 ms): IO iothread1/620155 write(fd: 104, buf: 0x556a2ef71b88, count: 8) = 8 9837.163 ( 0.001 ms): IO iothread1/620155 ppoll(ufds: 0x7fcb90002800, nfds: 4, tsp: 0x7fcb9f1342d0, sigsetsize: 8) = 3 9837.164 ( 0.001 ms): IO iothread1/620155 read(fd: 107, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.174 ( 0.001 ms): IO iothread1/620155 read(fd: 105, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.176 ( 0.001 ms): IO iothread1/620155 read(fd: 106, buf: 0x7fcb9f939cc0, count: 512) = 8 9837.209 ( 0.035 ms): IO iothread1/620155 io_submit(ctx_id: 140512552468480, nr: 32, iocbpp: 0x7fca7d0cebe0) = 32 174k IOPS (+3.6%), IOThread syscalls: 9809.566 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0cdd62be0) = 32 9809.625 ( 0.001 ms): IO iothread1/623061 write(fd: 103, buf: 0x5647cfba5f58, count: 8) = 8 9809.627 ( 0.002 ms): IO iothread1/623061 write(fd: 104, buf: 0x5647cfba5f58, count: 8) = 8 9809.663 ( 0.036 ms): IO iothread1/623061 io_submit(ctx_id: 140539805028352, nr: 32, iocbpp: 0x7fd0d0388b50) = 32 Notice that ppoll(2) and eventfd read(2) syscalls are eliminated because the IOThread stays in polling mode instead of falling back to file descriptor monitoring. As usual, polling is not implemented on Windows so this patch ignores the new io_poll_read() callback in aio-win32.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20211207132336.36627-2-stefanha@redhat.com [Fixed up aio_set_event_notifier() calls in tests/unit/test-fdmon-epoll.c added after this series was queued. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-01-12block/file-posix: Simplify the XFS_IOC_DIOINFO handlingThomas Huth1-21/+16
The handling for the XFS_IOC_DIOINFO ioctl is currently quite excessive: This is not a "real" feature like the other features that we provide with the "--enable-xxx" and "--disable-xxx" switches for the configure script, since this does not influence lots of code (it's only about one call to xfsctl() in file-posix.c), so people don't gain much with the ability to disable this with "--disable-xfsctl". It's also unfortunate that the ioctl will be disabled on Linux in case the user did not install the right xfsprogs-devel package before running configure. Thus let's simplify this by providing the ioctl definition on our own, so we can completely get rid of the header dependency and thus the related code in the configure script. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20211215125824.250091-1-thuth@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-12-28blockjob: drop BlockJob.blk fieldVladimir Sementsov-Ogievskiy1-7/+0
It's unused now (except for permission handling)[*]. The only reasonable user of it was block-stream job, recently updated to use own blk. And other block jobs prefer to use own source node related objects. So, the arguments of dropping the field are: - block jobs prefer not to use it - block jobs usually has more then one node to operate on, and better to operate symmetrically (for example has both source and target blk's in specific block-job state structure) *: BlockJob.blk is used to keep some permissions. We simply move permissions to block-job child created in block_job_create() together with blk. In mirror, we just should not care anymore about restoring state of blk. Most probably this code could be dropped long ago, after dropping bs->job pointer. Now it finally goes away together with BlockJob.blk itself. iotest 141 output is updated, as "bdrv_has_blk(bs)" check in qmp_blockdev_del() doesn't fail (we don't have blk now). Still, new error message looks even better. In iotest 283 we need to add a job id, otherwise "Invalid job ID" happens now earlier than permission check (as permissions moved from blk to block-job node). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>
2021-12-28block/stream: add own blkVladimir Sementsov-Ogievskiy1-6/+18
block-stream is the only block-job, that reasonably use BlockJob.blk. We are going to drop BlockJob.blk soon. So, let block-stream have own blk. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com>
2021-12-21nbd: allow reconnect on open, with corresponding new optionsVladimir Sementsov-Ogievskiy1-1/+44
It is useful when start of vm and start of nbd server are not simple to sync. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2021-12-09block/nvme: fix infinite loop in nvme_free_req_queue_cb()Stefan Hajnoczi1-2/+3
When the request free list is exhausted the coroutine waits on q->free_req_queue for the next free request. Whenever a request is completed a BH is scheduled to invoke nvme_free_req_queue_cb() and wake up waiting coroutines. 1. nvme_get_free_req() waits for a free request: while (q->free_req_head == -1) { ... trace_nvme_free_req_queue_wait(q->s, q->index); qemu_co_queue_wait(&q->free_req_queue, &q->lock); ... } 2. nvme_free_req_queue_cb() wakes up the coroutine: while (qemu_co_enter_next(&q->free_req_queue, &q->lock)) { ^--- infinite loop when free_req_head == -1 } nvme_free_req_queue_cb() and the coroutine form an infinite loop when q->free_req_head == -1. Fix this by checking q->free_req_head in nvme_free_req_queue_cb(). If the free request list is exhausted, don't wake waiting coroutines. Eventually an in-flight request will complete and the BH will be scheduled again, guaranteeing forward progress. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20211208152246.244585-1-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-11-23block/vvfat.c fix leak when failure occursDaniella Lee1-4/+12
Function vvfat_open called function enable_write_target and init_directories, and these functions malloc new memory for BDRVVVFATState::qcow_filename, BDRVVVFATState::used_clusters, and BDRVVVFATState::cluster_buff. When the specified folder does not exist ,it may contains memory leak. After init_directories function is executed, the vvfat_open return -EIO, and bdrv_open_driver goto label open_failed, the program use g_free(bs->opaque) to release BDRVVVFATState struct without members mentioned. command line: qemu-system-x86_64 -hdb <vdisk qcow file> -usb -device usb-storage,drive=fat16 -drive file=fat:rw:fat-type=16:"<path of a host folder does not exist>", id=fat16,format=raw,if=none enable_write_target called: (gdb) bt at ../block/vvfat.c:3114 flags=155650, errp=0x7fffffffd780) at ../block/vvfat.c:1236 node_name=0x0, options=0x555556fa45d0, open_flags=155650, errp=0x7fffffffd890) at ../block.c:1558 errp=0x7fffffffd890) at ../block.c:1852 reference=0x0, options=0x555556fa45d0, flags=40962, parent=0x555556f98cd0, child_class=0x555556b1d6a0 <child_of_bds>, child_role=19, errp=0x7fffffffda90) at ../block.c:3779 options=0x555556f9cfc0, bdref_key=0x555556239bb8 "file", parent=0x555556f98cd0, child_class=0x555556b1d6a0 <child_of_bds>, child_role=19, allow_none=true, errp=0x7fffffffda90) at ../block.c:3419 reference=0x0, options=0x555556f9cfc0, flags=8194, parent=0x0, child_class=0x0, child_role=0, errp=0x555556c98c40 <error_fatal>) at ../block.c:3726 options=0x555556f757b0, flags=0, errp=0x555556c98c40 <error_fatal>) at ../block.c:3872 options=0x555556f757b0, flags=0, errp=0x555556c98c40 <error_fatal>) at ../block/block-backend.c:436 bs_opts=0x555556f757b0, errp=0x555556c98c40 <error_fatal>) at ../blockdev.c:608 errp=0x555556c98c40 <error_fatal>) at ../blockdev.c:992 ...... Signed-off-by: Daniella Lee <daniellalee111@gmail.com> Message-Id: <20211119112553.352222-1-daniellalee111@gmail.com> [hreitz: Took commit message from v1] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-11-16file-posix: Fix alignment after reopen changing O_DIRECTKevin Wolf1-4/+16
At the end of a reopen, we already call bdrv_refresh_limits(), which should update bs->request_alignment according to the new file descriptor. However, raw_probe_alignment() relies on s->needs_alignment and just uses 1 if it isn't set. We neglected to update this field, so starting with cache=writeback and then reopening with cache=none means that we get an incorrect bs->request_alignment == 1 and unaligned requests fail instead of being automatically aligned. Fix this by recalculating s->needs_alignment in raw_refresh_limits() before calling raw_probe_alignment(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211104113109.56336-1-kwolf@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211115145409.176785-13-kwolf@redhat.com> [hreitz: Fix iotest 142 for block sizes greater than 512 by operating on a file with a size of 1 MB] Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20211116101431.105252-1-hreitz@redhat.com>
2021-11-16stream: Traverse graph after modificationHanna Reitz1-2/+5
bdrv_cor_filter_drop() modifies the block graph. That means that other parties can also modify the block graph before it returns. Therefore, we cannot assume that the result of a graph traversal we did before remains valid afterwards. We should thus fetch `base` and `unfiltered_base` afterwards instead of before. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211111120829.81329-2-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211115145409.176785-2-kwolf@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-11-03Merge remote-tracking branch 'remotes/kwolf/tags/for-upstream' into stagingRichard Henderson7-27/+171
Block layer patches - Fail gracefully when blockdev-snapshot creates loops - ide: Fix IDENTIFY DEVICE for disks > 128 GiB - file-posix: Fix return value translation for AIO discards - file-posix: add 'aio-max-batch' option - rbd: implement bdrv_co_block_status - Code cleanups and build fixes # gpg: Signature made Tue 02 Nov 2021 12:04:02 PM EDT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] * remotes/kwolf/tags/for-upstream: block/nvme: Extract nvme_free_queue() from nvme_free_queue_pair() block/nvme: Display CQ/SQ pointer in nvme_free_queue_pair() block/nvme: Automatically free qemu_memalign() with QEMU_AUTO_VFREE block-backend: Silence clang -m32 compiler warning linux-aio: add `dev_max_batch` parameter to laio_io_unplug() linux-aio: add `dev_max_batch` parameter to laio_co_submit() file-posix: add `aio-max-batch` option block/export/fuse.c: fix musl build ide: Cap LBA28 capacity announcement to 2^28-1 block/rbd: implement bdrv_co_block_status block: Fail gracefully when blockdev-snapshot creates loops block/file-posix: Fix return value translation for AIO discards Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-11-02block/nvme: Extract nvme_free_queue() from nvme_free_queue_pair()Philippe Mathieu-Daudé1-2/+7
Instead of duplicating code, extract the common helper to free a single queue. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211006164931.172349-4-philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02block/nvme: Display CQ/SQ pointer in nvme_free_queue_pair()Philippe Mathieu-Daudé2-2/+2
For debugging purpose it is helpful to know the CQ/SQ pointers. We already have a trace event in nvme_free_queue_pair(), extend it to report these pointer addresses. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211006164931.172349-3-philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02block/nvme: Automatically free qemu_memalign() with QEMU_AUTO_VFREEPhilippe Mathieu-Daudé1-7/+4
Since commit 4d324c0bf65 ("introduce QEMU_AUTO_VFREE") buffers allocated by qemu_memalign() can automatically freed when using the QEMU_AUTO_VFREE macro. Use it to simplify a bit. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20211006164931.172349-2-philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02block-backend: Silence clang -m32 compiler warningHanna Reitz1-1/+1
Similarly to e7e588d432d31ecebc26358e47201dd108db964c, there is a warning in block/block-backend.c that qiov->size <= INT64_MAX is always true on machines where size_t is narrower than a uint64_t. In said commit, we silenced this warning by casting to uint64_t. The commit introducing this warning here (a93d81c84afa717b0a1a6947524d8d1fbfd6bbf5) anticipated it and so tried to address it the same way. However, it only did so in one of two places where this comparison occurs, and so we still need to fix up the other one. Fixes: a93d81c84afa717b0a1a6947524d8d1fbfd6bbf5 ("block-backend: convert blk_aio_ functions to int64_t bytes paramter") Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20211026090745.30800-1-hreitz@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02linux-aio: add `dev_max_batch` parameter to laio_io_unplug()Stefano Garzarella2-4/+6
Between the submission of a request and the unplug, other devices with larger limits may have been queued new requests without flushing the batch. Using the new `dev_max_batch` parameter, laio_io_unplug() can check if the batch exceeds the device limit to flush the current batch. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20211026162346.253081-4-sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02linux-aio: add `dev_max_batch` parameter to laio_co_submit()Stefano Garzarella2-9/+24
This new parameter can be used by block devices to limit the Linux AIO batch size more than the limit set by the AIO context. file-posix backend supports this, passing its `aio-max-batch` option previously added. Add an helper function to calculate the maximum batch size. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20211026162346.253081-3-sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02file-posix: add `aio-max-batch` optionStefano Garzarella1-0/+9
Commit d7ddd0a161 ("linux-aio: limit the batch size using `aio-max-batch` parameter") added a way to limit the batch size of Linux AIO backend for the entire AIO context. The same AIO context can be shared by multiple devices, so latency-sensitive devices may want to limit the batch size even more to avoid increasing latency. For this reason we add the `aio-max-batch` option to the file backend, which will be used by the next commits to limit the size of batches including requests generated by this device. Suggested-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20211026162346.253081-2-sgarzare@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02block/export/fuse.c: fix musl buildFabrice Fontaine1-0/+4
Include linux/falloc.h if CONFIG_FALLOCATE_ZERO_RANGE is defined to fix https://gitlab.com/qemu-project/qemu/-/commit/50482fda98bd62e072c30b7ea73c985c4e9d9bbb and avoid the following build failure on musl: ../block/export/fuse.c: In function 'fuse_fallocate': ../block/export/fuse.c:643:21: error: 'FALLOC_FL_ZERO_RANGE' undeclared (first use in this function) 643 | else if (mode & FALLOC_FL_ZERO_RANGE) { | ^~~~~~~~~~~~~~~~~~~~ Fixes: - http://autobuild.buildroot.org/results/be24433a429fda681fb66698160132c1c99bc53b Fixes: 50482fda98b ("block/export/fuse.c: fix musl build") Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com> Message-Id: <20211022095209.1319671-1-fontaine.fabrice@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02block/rbd: implement bdrv_co_block_statusPeter Lieven1-0/+112
the qemu rbd driver currently lacks support for bdrv_co_block_status. This results mainly in incorrect progress during block operations (e.g. qemu-img convert with an rbd image as source). This patch utilizes the rbd_diff_iterate2 call from librbd to detect allocated and unallocated (all zero areas). To avoid querying the ceph OSDs for the answer this is only done if the image has the fast-diff feature which depends on the object-map and exclusive-lock features. In this case it is guaranteed that the information is present in memory in the librbd client and thus very fast. If fast-diff is not available all areas are reported to be allocated which is the current behaviour if bdrv_co_block_status is not implemented. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20211012152231.24868-1-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02block/file-posix: Fix return value translation for AIO discardsAri Sundholm1-2/+2
AIO discards regressed as a result of the following commit: 0dfc7af2 block/file-posix: Optimize for macOS When trying to run blkdiscard within a Linux guest, the request would fail, with some errors in dmesg: ---- [ snip ] ---- [ 4.010070] sd 2:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [ 4.011061] sd 2:0:0:0: [sda] tag#0 Sense Key : Aborted Command [current] [ 4.011061] sd 2:0:0:0: [sda] tag#0 Add. Sense: I/O process terminated [ 4.011061] sd 2:0:0:0: [sda] tag#0 CDB: Unmap/Read sub-channel 42 00 00 00 00 00 00 00 18 00 [ 4.011061] blk_update_request: I/O error, dev sda, sector 0 ---- [ snip ] ---- This turns out to be a result of a flaw in changes to the error value translation logic in handle_aiocb_discard(). The default return value may be left untranslated in some configurations, and the wrong variable is used in one translation. Fix both issues. Fixes: 0dfc7af2b28 ("block/file-posix: Optimize for macOS") Cc: qemu-stable@nongnu.org Signed-off-by: Ari Sundholm <ari@tuxera.com> Signed-off-by: Emil Karlson <jkarlson@tuxera.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20211019110954.4170931-1-ari@tuxera.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-11-02block/vpc: Add a sanity check that fixed-size images have the right typeThomas Huth1-1/+2
The code in vpc.c uses BDRVVPCState->footer.type in various places to decide whether the image is a fixed-size (VHD_FIXED) or a dynamic (VHD_DYNAMIC) image. However, we never check that this field really contains VHD_FIXED if we detected a fixed size image in vpc_open(), so a wrong value here could cause quite some trouble during runtime. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20211012082702.792259-1-thuth@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-11-02vmdk: allow specification of tools versionThomas Weißschuh1-4/+20
VMDK files support an attribute that represents the version of the guest tools that are installed on the disk. This attribute is used by vSphere before a machine has been started to determine if the VM has the guest tools installed. This is important when configuring "Operating system customizations" in vSphere, as it checks for the presence of the guest tools before allowing those customizations. Thus when the VM has not yet booted normally it would be impossible to customize it, therefore preventing a customized first-boot. The attribute should not hurt on disks that do not have the guest tools installed and indeed the VMware tools also unconditionally add this attribute. (Defaulting to the value "2147483647", as is done in this patch) Signed-off-by: Thomas Weißschuh <thomas.weissschuh.ext@zeiss.com> Message-Id: <20210913130419.13241-1-thomas.weissschuh.ext@zeiss.com> [hreitz: Added missing '#' in block-core.json] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-10-15block-backend: drop INT_MAX restriction from blk_check_byte_request()Vladimir Sementsov-Ogievskiy1-1/+1
blk_check_bytes_request is called from blk_co_do_preadv, blk_co_do_pwritev_part, blk_co_do_pdiscard and blk_co_copy_range before (maybe) calling throttle_group_co_io_limits_intercept() (which has int64_t argument) and then calling corresponding bdrv_co_ function. bdrv_co_ functions are OK with int64_t bytes as well. So dropping the check for INT_MAX we just get same restrictions as in bdrv_ layer: discard and write-zeroes goes through bdrv_check_qiov_request() and are allowed to be 64bit. Other requests go through bdrv_check_request32() and still restricted by INT_MAX boundary. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-13-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-backend: blk_pread, blk_pwrite: rename count parameter to bytesVladimir Sementsov-Ogievskiy1-8/+8
To be consistent with declarations in include/sysemu/block-backend.h. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-12-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-backend: convert blk_aio_ functions to int64_t bytes paramterVladimir Sementsov-Ogievskiy1-5/+8
1. Convert bytes in BlkAioEmAIOCB: aio->bytes is only passed to already int64_t interfaces, and set in blk_aio_prwv, which is updated here. 2. For all updated functions the parameter type becomes wider so callers are safe. 3. In blk_aio_prwv we only store bytes to BlkAioEmAIOCB, which is updated here. 4. Other updated functions are wrappers on blk_aio_prwv. Note that blk_aio_preadv and blk_aio_pwritev become safer: before this commit, it's theoretically possible to pass qiov with size exceeding INT_MAX, which than converted to int argument of blk_aio_prwv. Now it's converted to int64_t which is a lot better. Still add assertions. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-11-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: tweak assertion and grammar] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-backend: convert blk_co_copy_range to int64_t bytesVladimir Sementsov-Ogievskiy1-1/+1
Function is updated so that parameter type becomes wider, so all callers should be OK with it. Look at blk_co_copy_range() itself: bytes is passed only to blk_check_byte_request() and bdrv_co_copy_range(), which already have int64_t bytes parameter, so we are OK. Note that requests exceeding INT_MAX are still restricted by blk_check_byte_request(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-10-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-backend: convert blk_foo wrappers to use int64_t bytes parameterVladimir Sementsov-Ogievskiy1-5/+5
Convert blk_pdiscard, blk_pwrite_compressed, blk_pwrite_zeroes. These are just wrappers for functions with int64_t argument, so allow passing int64_t as well. Parameter type becomes wider so all callers should be OK with it. Note that requests exceeding INT_MAX are still restricted by blk_check_byte_request(). Note also that we don't (and are not going to) convert blk_pwrite and blk_pread: these functions return number of bytes on success, so to update them, we should change return type to int64_t as well, which will lead to investigating and updating all callers which is too much. So, blk_pread and blk_pwrite remain unchanged. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-9-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-backend: drop blk_prw, use block-coroutine-wrapperVladimir Sementsov-Ogievskiy2-93/+88
Let's drop hand-made coroutine wrappers and use coroutine wrapper generation like in block/io.c. Now, blk_foo() functions are written in same way as blk_co_foo() ones, but wrap blk_do_foo() instead of blk_co_do_foo(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-8-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: spelling fix] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-coroutine-wrapper.py: support BlockBackend first argumentVladimir Sementsov-Ogievskiy1-0/+3
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-7-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-backend: rename _do_ helper functions to _co_do_Vladimir Sementsov-Ogievskiy1-26/+26
This is a preparation to the following commit, to use automatic coroutine wrapper generation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-backend: convert blk_co_pdiscard to int64_t bytesVladimir Sementsov-Ogievskiy1-2/+3
We updated blk_do_pdiscard() and its wrapper blk_co_pdiscard(). Both functions are updated so that the parameter type becomes wider, so all callers should be OK with it. Look at blk_do_pdiscard(): bytes is passed only to blk_check_byte_request() and bdrv_co_pdiscard(), which already have int64_t bytes parameter, so we are OK. Note that requests exceeding INT_MAX are still restricted by blk_check_byte_request(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-backend: convert blk_co_pwritev_part to int64_t bytesVladimir Sementsov-Ogievskiy2-5/+5
We convert blk_do_pwritev_part() and some wrappers: blk_co_pwritev_part(), blk_co_pwritev(), blk_co_pwrite_zeroes(). All functions are converted so that the parameter type becomes wider, so all callers should be OK with it. Look at blk_do_pwritev_part() body: bytes is passed to: - trace_blk_co_pwritev (we update it here) - blk_check_byte_request, throttle_group_co_io_limits_intercept, bdrv_co_pwritev_part - all already have int64_t argument. Note that requests exceeding INT_MAX are still restricted by blk_check_byte_request(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-backend: make blk_co_preadv() 64bitVladimir Sementsov-Ogievskiy2-3/+3
For both updated functions, the type of bytes becomes wider, so all callers should be OK with it. blk_co_preadv() only passes its arguments to blk_do_preadv(). blk_do_preadv() passes bytes to: - trace_blk_co_preadv, which is updated too - blk_check_byte_request, throttle_group_co_io_limits_intercept, bdrv_co_preadv, which are already int64_t. Note that requests exceeding INT_MAX are still restricted by blk_check_byte_request(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15block-backend: blk_check_byte_request(): int64_t bytesVladimir Sementsov-Ogievskiy1-3/+3
Rename size and make it int64_t to correspond to modern block layer, which always uses int64_t for offset and bytes (not in blk layer yet, which is a task for following commits). All callers pass int or unsigned int. So, for bytes in [0, INT_MAX] nothing is changed, for negative bytes we now fail on "bytes < 0" check instead of "bytes > INT_MAX" check. Note, that blk_check_byte_request() still doesn't allow requests exceeding INT_MAX. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006131718.214235-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-15qcow2: Silence clang -m32 compiler warningHanna Reitz1-1/+2
With -m32, size_t is generally only a uint32_t. That makes clang complain that in the assertion assert(qiov->size <= INT64_MAX); the range of the type of qiov->size (size_t) is too small for any of its values to ever exceed INT64_MAX. Cast qiov->size to uint64_t to silence clang. Fixes: f7ef38dd1310d7d9db76d0aa16899cbc5744f36d ("block: use int64_t instead of uint64_t in driver read handlers") Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20211011155031.149158-1-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2021-10-14configure, meson: move libaio check to meson.buildPaolo Bonzini1-1/+1
Message-Id: <20211007130829.632254-10-pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-10-07Merge remote-tracking branch ↵Richard Henderson4-29/+40
'remotes/vsementsov/tags/pull-jobs-2021-10-07-v2' into staging mirror: Handle errors after READY cancel v2: add small fix by Stefano, Hanna's series fixed # gpg: Signature made Thu 07 Oct 2021 08:25:07 AM PDT # gpg: using RSA key 8B9C26CDB2FD147C880E86A1561F24C1F19F79FB # gpg: Good signature from "Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 8B9C 26CD B2FD 147C 880E 86A1 561F 24C1 F19F 79FB * remotes/vsementsov/tags/pull-jobs-2021-10-07-v2: iotests: Add mirror-ready-cancel-error test mirror: Do not clear .cancelled mirror: Stop active mirroring after force-cancel mirror: Check job_is_cancelled() earlier mirror: Use job_is_cancelled() job: Add job_cancel_requested() job: Do not soft-cancel after a job is done jobs: Give Job.force_cancel more meaning job: @force parameter for job_cancel_sync() job: Force-cancel jobs in a failed transaction mirror: Drop s->synced mirror: Keep s->synced on error job: Context changes in job_completed_txn_abort() block/aio_task: assert `max_busy_tasks` is greater than 0 block/backup: avoid integer overflow of `max-workers` Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-10-07mirror: Do not clear .cancelledHanna Reitz1-2/+0
Clearing .cancelled before leaving the main loop when the job has been soft-cancelled is no longer necessary since job_is_cancelled() only returns true for jobs that have been force-cancelled. Therefore, this only makes a differences in places that call job_cancel_requested(). In block/mirror.c, this is done only before .cancelled was cleared. In job.c, there are two callers: - job_completed_txn_abort() asserts that .cancelled is true, so keeping it true will not affect this place. - job_complete() refuses to let a job complete that has .cancelled set. It is correct to refuse to let the user invoke job-complete on mirror jobs that have already been soft-cancelled. With this change, there are no places that reset .cancelled to false and so we can be sure that .force_cancel can only be true if .cancelled is true as well. Assert this in job_is_cancelled(). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-13-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07mirror: Stop active mirroring after force-cancelHanna Reitz1-0/+2
Once the mirror job is force-cancelled (job_is_cancelled() is true), we should not generate new I/O requests. This applies to active mirroring, too, so stop it once the job is cancelled. (We must still forward all I/O requests to the source, though, of course, but those are not really I/O requests generated by the job, so this is fine.) Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-12-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07mirror: Check job_is_cancelled() earlierHanna Reitz1-5/+5
We must check whether the job is force-cancelled early in our main loop, most importantly before any `continue` statement. For example, we used to have `continue`s before our current checking location that are triggered by `mirror_flush()` failing. So, if `mirror_flush()` kept failing, force-cancelling the job would not terminate it. Jobs can be cancelled while they yield, and once they are (force-cancelled), they should not generate new I/O requests. Therefore, we should put the check after the last yield before mirror_iteration() is invoked. Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-11-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07mirror: Use job_is_cancelled()Hanna Reitz1-1/+1
mirror_drained_poll() returns true whenever the job is cancelled, because "we [can] be sure that it won't issue more requests". However, this is only true for force-cancelled jobs, so use job_is_cancelled(). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-10-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07job: Add job_cancel_requested()Hanna Reitz1-6/+4
Most callers of job_is_cancelled() actually want to know whether the job is on its way to immediate termination. For example, we refuse to pause jobs that are cancelled; but this only makes sense for jobs that are really actually cancelled. A mirror job that is cancelled during READY with force=false should absolutely be allowed to pause. This "cancellation" (which is actually a kind of completion) may take an indefinite amount of time, and so should behave like any job during normal operation. For example, with on-target-error=stop, the job should stop on write errors. (In contrast, force-cancelled jobs should not get write errors, as they should just terminate and not do further I/O.) Therefore, redefine job_is_cancelled() to only return true for jobs that are force-cancelled (which as of HEAD^ means any job that interprets the cancellation request as a request for immediate termination), and add job_cancel_requested() as the general variant, which returns true for any jobs which have been requested to be cancelled, whether it be immediately or after an arbitrarily long completion phase. Finally, here is a justification for how different job_is_cancelled() invocations are treated by this patch: - block/mirror.c (mirror_run()): - The first invocation is a while loop that should loop until the job has been cancelled or scheduled for completion. What kind of cancel does not matter, only the fact that the job is supposed to end. - The second invocation wants to know whether the job has been soft-cancelled. Calling job_cancel_requested() is a bit too broad, but if the job were force-cancelled, we should leave the main loop as soon as possible anyway, so this should not matter here. - The last two invocations already check force_cancel, so they should continue to use job_is_cancelled(). - block/backup.c, block/commit.c, block/stream.c, anything in tests/: These jobs know only force-cancel, so there is no difference between job_is_cancelled() and job_cancel_requested(). We can continue using job_is_cancelled(). - job.c: - job_pause_point(), job_yield(), job_sleep_ns(): Only force-cancelled jobs should be prevented from being paused. Continue using job_is_cancelled(). - job_update_rc(), job_finalize_single(), job_finish_sync(): These functions are all called after the job has left its main loop. The mirror job (the only job that can be soft-cancelled) will clear .cancelled before leaving the main loop if it has been soft-cancelled. Therefore, these functions will observe .cancelled to be true only if the job has been force-cancelled. We can continue to use job_is_cancelled(). (Furthermore, conceptually, a soft-cancelled mirror job should not report to have been cancelled. It should report completion (see also the block-job-cancel QAPI documentation). Therefore, it makes sense for these functions not to distinguish between a soft-cancelled mirror job and a job that has completed as normal.) - job_completed_txn_abort(): All jobs other than @job have been force-cancelled. job_is_cancelled() must be true for them. Regarding @job itself: job_completed_txn_abort() is mostly called when the job's return value is not 0. A soft-cancelled mirror has a return value of 0, and so will not end up here then. However, job_cancel() invokes job_completed_txn_abort() if the job has been deferred to the main loop, which is mostly the case for completed jobs (which skip the assertion), but not for sure. To be safe, use job_cancel_requested() in this assertion. - job_complete(): This is function eventually invoked by the user (through qmp_block_job_complete() or qmp_job_complete(), or job_complete_sync(), which comes from qemu-img). The intention here is to prevent a user from invoking job-complete after the job has been cancelled. This should also apply to soft cancelling: After a mirror job has been soft-cancelled, the user should not be able to decide otherwise and have it complete as normal (i.e. pivoting to the target). - job_cancel(): Both functions are equivalent (see comment there), but we want to use job_is_cancelled(), because this shows that we call job_completed_txn_abort() only for force-cancelled jobs. (As explained for job_update_rc(), soft-cancelled jobs should be treated as if they have completed as normal.) Buglink: https://gitlab.com/qemu-project/qemu/-/issues/462 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20211006151940.214590-9-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2021-10-07jobs: Give Job.force_cancel more meaningHanna Reitz2-7/+20
We largely have two cancel modes for jobs: First, there is actual cancelling. The job is terminated as soon as possible, without trying to reach a consistent result. Second, we have mirror in the READY state. Technically, the job is not really cancelled, but it just is a different completion mode. The job can still run for an indefinite amount of time while it tries to reach a consistent result. We want to be able to clearly distinguish which cancel mode a job is in (when it has been cancelled). We can use Job.force_cancel for this, but right now it only reflects cancel requests from the user with force=true, but clearly, jobs that do not even distinguish between force=false and force=true are effectively always force-cancelled. So this patch has Job.force_cancel signify whether the job will terminate as soon as possible (force_cancel=true) or whether it will effectively remain running despite being "cancelled" (force_cancel=false). To this end, we let jobs that provide JobDriver.cancel() tell the generic job code whether they will terminate as soon as possible or not, and for jobs that do not provide that method we assume they will. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20211006151940.214590-7-hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>