aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2020-06-23block/nvme: support nested aio_poll()Stefan Hajnoczi2-7/+58
QEMU block drivers are supposed to support aio_poll() from I/O completion callback functions. This means completion processing must be re-entrant. The standard approach is to schedule a BH during completion processing and cancel it at the end of processing. If aio_poll() is invoked by a callback function then the BH will run. The BH continues the suspended completion processing. All of this means that request A's cb() can synchronously wait for request B to complete. Previously the nvme block driver would hang because it didn't process completions from nested aio_poll(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-8-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23block/nvme: keep BDRVNVMeState pointer in NVMeQueuePairStefan Hajnoczi1-32/+38
Passing around both BDRVNVMeState and NVMeQueuePair is unwieldy. Reduce the number of function arguments by keeping the BDRVNVMeState pointer in NVMeQueuePair. This will come in handly when a BH is introduced in a later patch and only one argument can be passed to it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23block/nvme: clarify that free_req_queue is protected by q->lockStefan Hajnoczi1-1/+1
Existing users access free_req_queue under q->lock. Document this. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23block/nvme: switch to a NVMeRequest freelistStefan Hajnoczi1-27/+54
There are three issues with the current NVMeRequest->busy field: 1. The busy field is accidentally accessed outside q->lock when request submission fails. 2. Waiters on free_req_queue are not woken when a request is returned early due to submission failure. 2. Finding a free request involves scanning all requests. This makes request submission O(n^2). Switch to an O(1) freelist that is always accessed under the lock. Also differentiate between NVME_QUEUE_SIZE, the actual SQ/CQ size, and NVME_NUM_REQS, the number of usable requests. This makes the code simpler than using NVME_QUEUE_SIZE everywhere and having to keep in mind that one slot is reserved. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-5-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23block/nvme: don't access CQE after moving cq.headStefan Hajnoczi1-1/+4
Do not access a CQE after incrementing q->cq.head and releasing q->lock. It is unlikely that this causes problems in practice but it's a latent bug. The reason why it should be safe at the moment is that completion processing is not re-entrant and the CQ doorbell isn't written until the end of nvme_process_completion(). Make this change now because QEMU expects completion processing to be re-entrant and later patches will do that. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-4-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23block/nvme: drop tautologous assertionStefan Hajnoczi1-1/+0
nvme_process_completion() explicitly checks cid so the assertion that follows is always true: if (cid == 0 || cid > NVME_QUEUE_SIZE) { ... continue; } assert(cid <= NVME_QUEUE_SIZE); Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20200617132201.1832152-3-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-23block/nvme: poll queues without q->lockStefan Hajnoczi1-0/+12
A lot of CPU time is spent simply locking/unlocking q->lock during polling. Check for completion outside the lock to make q->lock disappear from the profile. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Sergio Lopez <slp@redhat.com> Message-id: 20200617132201.1832152-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-17qcow2: Tweak comments on qcow2_get_persistent_dirty_bitmap_sizeEric Blake1-4/+5
For now, we don't have persistent bitmaps in any other formats, but that might not be true in the future. Make it obvious that our incoming parameter is not necessarily a qcow2 image, and therefore is limited to just the bdrv_dirty_bitmap_* API calls (rather than probing into qcow2 internals). Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200608190821.3293867-1-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-06-17block: Refactor subdirectory recursion during makeEric Blake1-0/+1
Rather than listing block/monitor from the top-level Makefile.objs, we should instead list monitor from block/Makefile.objs. Suggested-by: Kevin Wolf <kwolf@redhat.com> Fixes: bb4e58c613 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200608173339.3244211-1-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-06-10block: Call attention to truncation of long NBD exportsEric Blake1-8/+13
Commit 93676c88 relaxed our NBD client code to request export names up to the NBD protocol maximum of 4096 bytes without NUL terminator, even though the block layer can't store anything longer than 4096 bytes including NUL terminator for display to the user. Since this means there are some export names where we have to truncate things, we can at least try to make the truncation a bit more obvious for the user. Note that in spite of the truncated display name, we can still communicate with an NBD server using such a long export name; this was deemed nicer than refusing to even connect to such a server (since the server may not be under our control, and since determining our actual length limits gets tricky when nbd://host:port/export and nbd+unix:///export?socket=/path are themselves variable-length expansions beyond the export name but count towards the block layer name length). Reported-by: Xueqiang Wei <xuwei@redhat.com> Fixes: https://bugzilla.redhat.com/1843684 Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200610163741.3745251-3-eblake@redhat.com>
2020-06-05block: Factor out bdrv_run_co()Vladimir Sementsov-Ogievskiy1-121/+72
We have a few bdrv_*() functions that can either spawn a new coroutine and wait for it with BDRV_POLL_WHILE() or use a fastpath if they are alreeady running in a coroutine. All of them duplicate basically the same code. Factor the common code into a new function bdrv_run_co(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20200520144901.16589-1-vsementsov@virtuozzo.com [Factor out bdrv_run_co_entry too] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-05io_uring: use io_uring_cq_ready() to check for ready cqesStefano Garzarella1-6/+3
In qemu_luring_poll_cb() we are not using the cqe peeked from the CQ ring. We are using io_uring_peek_cqe() only to see if there are cqes ready, so we can replace it with io_uring_cq_ready(). Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20200519134942.118178-1-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-06-05io_uring: retry io_uring_submit() if it fails with errno=EINTRStefano Garzarella1-1/+1
As recently documented [1], io_uring_enter(2) syscall can return an error (errno=EINTR) if the operation was interrupted by a delivery of a signal before it could complete. This should happen when IORING_ENTER_GETEVENTS flag is used, for example during io_uring_submit_and_wait() or during io_uring_submit() when IORING_SETUP_IOPOLL is enabled. We shouldn't have this problem for now, but it's better to prevent it. [1] https://github.com/axboe/liburing/commit/344355ec6619de8f4e64584c9736530b5346e4f4 Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20200519133041.112138-1-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-05-28qcow2: Expose bitmaps' size during measureEric Blake5-5/+51
It's useful to know how much space can be occupied by qcow2 persistent bitmaps, even though such metadata is unrelated to the guest-visible data. Report this value as an additional QMP field, present when measuring an existing image and output format that both support bitmaps. Update iotest 178 and 190 to updated output, as well as new coverage in 190 demonstrating non-zero values made possible with the recently-added qemu-img bitmap command (see 3b51ab4b). The new 'bitmaps size:' field is displayed automatically as part of 'qemu-img measure' any time it is present in QMP (that is, any time both the source image being measured and destination format support bitmaps, even if the measurement is 0 because there are no bitmaps present). If the field is absent, it means that no bitmaps can be copied (source, destination, or both lack bitmaps, including when measuring based on size rather than on a source image). This behavior is compatible with an upcoming patch adding 'qemu-img convert --bitmaps': that command will fail in the same situations where this patch omits the field. The addition of a new field demonstrates why we should always zero-initialize qapi C structs; while the qcow2 driver still fully populates all fields, the raw and crypto drivers had to be tweaked to avoid uninitialized data. Consideration was also given towards having a 'qemu-img measure --bitmaps' which errors out when bitmaps are not possible, and otherwise sums the bitmaps into the existing allocation totals rather than displaying as a separate field, as a potential convenience factor. But this was ultimately decided to be more complexity than necessary when the QMP interface was sufficient enough with bitmaps remaining a separate field. See also: https://bugzilla.redhat.com/1779904 Reported-by: Nir Soffer <nsoffer@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200521192137.1120211-3-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-05-28block/dirty-bitmap: add bdrv_has_named_bitmaps helperVladimir Sementsov-Ogievskiy1-0/+13
To be used for bitmap migration in further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Andrey Shinkevich <andrey.shinkevich@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200521220648.3255-3-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2020-05-19blockdev: Split off basic bitmap operations for qemu-imgEric Blake2-0/+322
Upcoming patches want to add some basic bitmap manipulation abilities to qemu-img. But blockdev.o is too heavyweight to link into qemu-img (among other things, it would drag in block jobs and transaction support - qemu-img does offline manipulation, where atomicity is less important because there are no concurrent modifications to compete with), so it's time to split off the bare bones of what we will need into a new file block/monitor/bitmap-qmp-cmds.o. This is sufficient to expose 6 QMP commands for use by qemu-img (add, remove, clear, enable, disable, merge), as well as move the three helper functions touched in the previous patch. Regarding MAINTAINERS, the new file is automatically part of block core, but also makes sense as related to other dirty bitmap files. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513011648.166876-6-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-05-19block: Make it easier to learn which BDS support bitmapsEric Blake4-0/+19
Upcoming patches will enhance bitmap support in qemu-img, but in doing so, it turns out to be nice to suppress output when persistent bitmaps make no sense (such as on a qcow2 v2 image). Add a hook to make this easier to query. This patch adds a new callback .bdrv_supports_persistent_dirty_bitmap, rather than trying to shoehorn the answer in via existing callbacks. In particular, while it might have been possible to overload .bdrv_co_can_store_new_dirty_bitmap to special-case a NULL input to answer whether any persistent bitmaps are supported, that is at odds with whether a particular bitmap can be stored (for example, even on an image that supports persistent bitmaps but has currently filled up the maximum number of bitmaps, attempts to store another one should fail); and the new functionality doesn't require coroutine safety. Similarly, we could have added one more piece of information to .bdrv_get_info, but then again, most callers to that function tend to already discard extraneous information, and making it a catch-all rather than a series of dedicated scalar queries hasn't really simplified life. In the future, when we improve the ability to look up bitmaps through a filter, we will probably also want to teach the block layer to automatically let filters pass this request on through. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513011648.166876-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2020-05-18block/block-copy: Simplify block_copy_do_copy()Philippe Mathieu-Daudé1-9/+3
block_copy_do_copy() is static, only used in block_copy_task_entry with the error_is_read argument set. No need to check for it, simplify. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200507121129.29760-3-philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block/block-copy: Fix uninitialized variable in block_copy_task_entryPhilippe Mathieu-Daudé1-1/+1
Fix when building with -Os: CC block/block-copy.o block/block-copy.c: In function ‘block_copy_task_entry’: block/block-copy.c:428:38: error: ‘error_is_read’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 428 | t->call_state->error_is_read = error_is_read; | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200507121129.29760-2-philmd@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Drop @child_class from bdrv_child_perm()Max Reitz9-14/+4
Implementations should decide the necessary permissions based on @role. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-35-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Pass BdrvChildRole in remaining casesMax Reitz2-5/+10
These calls have no real use for the child role yet, but it will not harm to give one. Notably, the bdrv_root_attach_child() call in blockjob.c is left unmodified because there is not much the generic BlockJob object wants from its children. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-34-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Use bdrv_default_perms()Max Reitz20-28/+23
bdrv_default_perms() can decide which permission profile to use based on the BdrvChildRole, so block drivers do not need to select it explicitly. The blkverify driver now no longer shares the WRITE permission for the image to verify. We thus have to adjust two places in test-block-iothread not to take it. (Note that in theory, blkverify should behave like quorum in this regard and share neither WRITE nor RESIZE for both of its children. In practice, it does not really matter, because blkverify is used only for debugging, so we might as well keep its permissions rather liberal.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-30-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Use child_of_bds in remaining placesMax Reitz3-6/+17
Replace child_file by child_of_bds in all remaining places (excluding tests). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-28-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Make filter drivers use child_of_bdsMax Reitz8-12/+22
Note that some filters have secondary children, namely blkverify (the image to be verified) and blklogwrites (the log). This patch does not touch those children. Note that for blkverify, the filtered child should not be format-probed. While there is nothing enforcing this here, in practice, it will not be: blkverify implements .bdrv_file_open. The block layer ensures (and in fact, asserts) that BDRV_O_PROTOCOL is set for every BDS whose driver implements .bdrv_file_open. This flag will then be bequeathed to blkverify's children, and they will thus (by default) not be probed either. ("By default" refers to the fact that blkverify's other child (the non-filtered one) will have BDRV_O_PROTOCOL force-unset, because that is what happens for all non-filtered children of non-format drivers.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-27-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Make format drivers use child_of_bdsMax Reitz12-29/+50
Commonly, they need to pass the BDRV_CHILD_IMAGE set as the BdrvChildRole; but there are exceptions for drivers with external data files (qcow2 and vmdk). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-26-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Make backing files child_of_bds childrenMax Reitz2-2/+3
Make all parents of backing files pass the appropriate BdrvChildRole. By doing so, we can switch their BdrvChildClass over to the generic child_of_bds, which will do the right thing when given a correct BdrvChildRole. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-24-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Switch child_format users to child_of_bdsMax Reitz2-4/+6
Both users (quorum and blkverify) use child_format for not-really-filtered children, so the appropriate BdrvChildRole in both cases is DATA. (Note that this will cause bdrv_inherited_options() to force-allow format probing.) Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-22-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18raw-format: Split raw_read_options()Max Reitz1-45/+65
Split raw_read_options() into one function that actually just reads the options, and another that applies them. This will allow us to detect whether the user has specified any options before attaching the file child (so we can decide on its role based on the options). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-21-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Pass parent_is_format to .inherit_options()Max Reitz2-2/+2
We plan to unify the generic .inherit_options() functions. The resulting common function will need to decide whether to force-enable format probing, force-disable it, or leave it as-is. To make this decision, it will need to know whether the parent node is a format node or not (because we never want format probing if the parent is a format node already (except for the backing chain)). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Pass BdrvChildRole to .inherit_options()Max Reitz2-2/+4
For now, all callers (effectively) pass 0 and no callee evaluates thie value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-8-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Pass BdrvChildRole to bdrv_child_perm()Max Reitz9-7/+16
For now, all callers pass 0 and no callee evaluates this value. Later patches will change both. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Add BdrvChildRole to BdrvChildMax Reitz25-33/+34
For now, it is always set to 0. Later patches in this series will ensure that all callers pass an appropriate combination of flags. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200513110544.176672-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Rename BdrvChildRole to BdrvChildClassMax Reitz11-31/+33
This structure nearly only contains parent callbacks for child state changes. It cannot really reflect a child's role, because different roles may overlap (as we will see when real roles are introduced), and because parents can have custom callbacks even when the child fulfills a standard role. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Add BlockDriver.is_formatMax Reitz13-0/+14
We want to unify child_format and child_file at some point. One of the important things that set format drivers apart from other drivers is that they do not expect other format nodes under them (except in the backing chain), i.e. we must not probe formats inside of formats. That means we need something on which to distinguish format drivers from others, and hence this flag. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200513110544.176672-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Mark commit, mirror, blkreplay as filtersMax Reitz3-0/+5
The commit, mirror, and blkreplay block nodes are filters, so they should be marked as such. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200513110544.176672-2-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Use bdrv_make_empty() where possibleMax Reitz2-5/+2
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200429141126.85159-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18replication: Avoid blk_make_empty() on read-only childKevin Wolf1-2/+11
This is just a bandaid to keep tests/test-replication working after bdrv_make_empty() starts to assert that we're not trying to call it on a read-only child. For the real solution in the future, replication should not steal the BdrvChild from its backing file (this is never correct to do!), but instead have its own child node references, with the appropriate permissions. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Use blk_make_empty() after commitsMax Reitz1-7/+9
bdrv_commit() already has a BlockBackend pointing to the BDS that we want to empty, it just has the wrong permissions. qemu-img commit has no BlockBackend pointing to the old backing file yet, but introducing one is simple. After this commit, bdrv_make_empty() is the only remaining caller of BlockDriver.bdrv_make_empty(). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429141126.85159-5-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [kwolf: Fixed up reference output for 098] Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block: Add blk_make_empty()Max Reitz1-0/+10
Two callers of BlockDriver.bdrv_make_empty() remain that should not call this method directly. Both do not have access to a BdrvChild, but they can use a BlockBackend, so we add this function that lets them use it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200429141126.85159-4-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18block/replication.c: Avoid cancelling the job twiceLukas Straub1-0/+2
If qemu in colo secondary mode is stopped, it crashes because s->backup_job is canceled twice: First with job_cancel_sync_all() in qemu_cleanup() and then in replication_stop(). Fix this by assigning NULL to s->backup_job when the job completes so replication_stop() and replication_do_checkpoint() won't touch the job. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Message-Id: <20200511090801.7ed5d8f3@luklap> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-18mirror: Make sure that source and target size matchKevin Wolf1-9/+12
If the target is shorter than the source, mirror would copy data until it reaches the end of the target and then fail with an I/O error when trying to write past the end. If the target is longer than the source, the mirror job would complete successfully, but the target wouldn't actually be an accurate copy of the source image (it would contain some additional garbage at the end). Fix this by checking that both images have the same size when the job starts. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200511135825.219437-4-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-15qom: Drop parameter @errp of object_property_add() & friendsMarkus Armbruster1-4/+2
The only way object_property_add() can fail is when a property with the same name already exists. Since our property names are all hardcoded, failure is a programming error, and the appropriate way to handle it is passing &error_abort. Same for its variants, except for object_property_add_child(), which additionally fails when the child already has a parent. Parentage is also under program control, so this is a programming error, too. We have a bit over 500 callers. Almost half of them pass &error_abort, slightly fewer ignore errors, one test case handles errors, and the remaining few callers pass them to their own callers. The previous few commits demonstrated once again that ignoring programming errors is a bad idea. Of the few ones that pass on errors, several violate the Error API. The Error ** argument must be NULL, &error_abort, &error_fatal, or a pointer to a variable containing NULL. Passing an argument of the latter kind twice without clearing it in between is wrong: if the first call sets an error, it no longer points to NULL for the second call. ich9_pm_add_properties(), sparc32_ledma_realize(), sparc32_dma_realize(), xilinx_axidma_realize(), xilinx_enet_realize() are wrong that way. When the one appropriate choice of argument is &error_abort, letting users pick the argument is a bad idea. Drop parameter @errp and assert the preconditions instead. There's one exception to "duplicate property name is a programming error": the way object_property_add() implements the magic (and undocumented) "automatic arrayification". Don't drop @errp there. Instead, rename object_property_add() to object_property_try_add(), and add the obvious wrapper object_property_add(). Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200505152926.18877-15-armbru@redhat.com> [Two semantic rebase conflicts resolved]
2020-05-13block/block-copy: fix use-after-free of task pointerVladimir Sementsov-Ogievskiy1-1/+1
Obviously, we should g_free the task after trace point and offset update. Reported-by: Coverity (CID 1428756) Fixes: 4ce5dd3e9b5ee0fac18625860eb3727399ee965e Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200507183800.22626-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-13qcow2: add zstd cluster compressionDenis Plotnikov2-0/+176
zstd significantly reduces cluster compression time. It provides better compression performance maintaining the same level of the compression ratio in comparison with zlib, which, at the moment, is the only compression method available. The performance test results: Test compresses and decompresses qemu qcow2 image with just installed rhel-7.6 guest. Image cluster size: 64K. Image on disk size: 2.2G The test was conducted with brd disk to reduce the influence of disk subsystem to the test results. The results is given in seconds. compress cmd: time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd] src.img [zlib|zstd]_compressed.img decompress cmd time ./qemu-img convert -O qcow2 [zlib|zstd]_compressed.img uncompressed.img compression decompression zlib zstd zlib zstd ------------------------------------------------------------ real 65.5 16.3 (-75 %) 1.9 1.6 (-16 %) user 65.0 15.8 5.3 2.5 sys 3.3 0.2 2.0 2.0 Both ZLIB and ZSTD gave the same compression ratio: 1.57 compressed image size in both cases: 1.4G Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> QAPI part: Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200507082521.29210-4-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-13qcow2: rework the cluster compression routineDenis Plotnikov1-11/+60
The patch enables processing the image compression type defined for the image and chooses an appropriate method for image clusters (de)compression. Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200507082521.29210-3-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-13qcow2: introduce compression type featureDenis Plotnikov2-1/+132
The patch adds some preparation parts for incompatible compression type feature to qcow2 allowing the use different compression methods for image clusters (de)compressing. It is implied that the compression type is set on the image creation and can be changed only later by image conversion, thus compression type defines the only compression algorithm used for the image, and thus, for all image clusters. The goal of the feature is to add support of other compression methods to qcow2. For example, ZSTD which is more effective on compression than ZLIB. The default compression is ZLIB. Images created with ZLIB compression type are backward compatible with older qemu versions. Adding of the compression type breaks a number of tests because now the compression type is reported on image creation and there are some changes in the qcow2 header in size and offsets. The tests are fixed in the following ways: * filter out compression_type for many tests * fix header size, feature table size and backing file offset affected tests: 031, 036, 061, 080 header_size +=8: 1 byte compression type 7 bytes padding feature_table += 48: incompatible feature compression type backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change) * add "compression type" for test output matching when it isn't filtered affected tests: 049, 060, 061, 065, 082, 085, 144, 182, 185, 198, 206, 242, 255, 274, 280 Signed-off-by: Denis Plotnikov <dplotnikov@virtuozzo.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> QAPI part: Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20200507082521.29210-2-dplotnikov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-05-08block: Drop unused .bdrv_has_zero_init_truncateEric Blake9-16/+0
Now that there are no clients of bdrv_has_zero_init_truncate, none of the drivers need to worry about providing it. What's more, this eliminates a source of some confusion: a literal reading of the documentation as written in ceaca56f and implemented in commit 1dcaf527 claims that a driver which returns 0 for bdrv_has_zero_init_truncate() must not return 1 for bdrv_has_zero_init(); this condition was violated for parallels, qcow, and sometimes for vdi, although in practice it did not matter since those drivers also lacked .bdrv_co_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-10-eblake@redhat.com> Acked-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08vhdx: Rework truncation logicEric Blake1-38/+51
The vhdx driver uses truncation for image growth, with a special case for blocks that already read as zero but which are only being partially written. But with a bit of rearranging, it's just as easy to defer the decision on whether truncation resulted in zeroes to the actual allocation attempt, reducing the number of places that still use bdrv_has_zero_init_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-9-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08parallels: Rework truncation logicEric Blake1-9/+16
The parallels driver tries to use truncation for image growth, but can only do so when reads are guaranteed as zero. Now that we have a way to request zero contents from truncation, we can defer the decision to actual allocation attempts rather than up front, reducing the number of places that still use bdrv_has_zero_init_truncate. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-8-eblake@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-05-08ssh: Support BDRV_REQ_ZERO_WRITE for truncateEric Blake1-0/+4
Our .bdrv_has_zero_init_truncate can detect when the remote side always zero fills; we can reuse that same knowledge to implement BDRV_REQ_ZERO_WRITE by ignoring it when the server gives it to us for free. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200428202905.770727-7-eblake@redhat.com> Reviewed-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>