aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2020-02-26block/nbd: fix memory leak in nbd_open()Pan Nengyuan1-0/+1
In currently implementation there will be a memory leak when nbd_client_connect() returns error status. Here is an easy way to reproduce: 1. run qemu-iotests as follow and check the result with asan: ./check -raw 143 Following is the asan output backtrack: Direct leak of 40 byte(s) in 1 object(s) allocated from: #0 0x7f629688a560 in calloc (/usr/lib64/libasan.so.3+0xc7560) #1 0x7f6295e7e015 in g_malloc0 (/usr/lib64/libglib-2.0.so.0+0x50015) #2 0x56281dab4642 in qobject_input_start_struct /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295 #3 0x56281dab1a04 in visit_start_struct /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49 #4 0x56281dad1827 in visit_type_SocketAddress qapi/qapi-visit-sockets.c:386 #5 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716 #6 0x56281da8062f in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829 #7 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Direct leak of 15 byte(s) in 1 object(s) allocated from: #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0) #1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd) #2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace) #3 0x56281da804ac in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1834 #4 0x56281da804ac in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Indirect leak of 24 byte(s) in 1 object(s) allocated from: #0 0x7f629688a3a0 in malloc (/usr/lib64/libasan.so.3+0xc73a0) #1 0x7f6295e7dfbd in g_malloc (/usr/lib64/libglib-2.0.so.0+0x4ffbd) #2 0x7f6295e96ace in g_strdup (/usr/lib64/libglib-2.0.so.0+0x68ace) #3 0x56281dab41a3 in qobject_input_type_str_keyval /mnt/sdb/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:536 #4 0x56281dab2ee9 in visit_type_str /mnt/sdb/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:297 #5 0x56281dad0fa1 in visit_type_UnixSocketAddress_members qapi/qapi-visit-sockets.c:141 #6 0x56281dad17b6 in visit_type_SocketAddress_members qapi/qapi-visit-sockets.c:366 #7 0x56281dad186a in visit_type_SocketAddress qapi/qapi-visit-sockets.c:393 #8 0x56281da8062f in nbd_config /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1716 #9 0x56281da8062f in nbd_process_options /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1829 #10 0x56281da8062f in nbd_open /mnt/sdb/qemu-4.2.0-rc0/block/nbd.c:1873 Fixes: 8f071c9db506e03ab Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Cc: qemu-stable <qemu-stable@nongnu.org> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1575517528-44312-3-git-send-email-pannengyuan@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2020-02-26block/nbd: extract the common cleanup codePan Nengyuan1-11/+15
The BDRVNBDState cleanup code is common in two places, add nbd_clear_bdrvstate() function to do these cleanups. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1575517528-44312-2-git-send-email-pannengyuan@huawei.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix compilation error and commit message] Signed-off-by: Eric Blake <eblake@redhat.com>
2020-02-26nbd-client: Support leading / in NBD URIEric Blake1-2/+4
The NBD URI specification [1] states that only one leading slash at the beginning of the URI path component is stripped, not all such slashes. This becomes important to a patch I just proposed to nbdkit [2], which would allow the exportname to select a file embedded within an ext2 image: ext2fs demands an absolute pathname beginning with '/', and because qemu was inadvertantly stripping it, my nbdkit patch had to work around the behavior. [1] https://github.com/NetworkBlockDevice/nbd/blob/master/doc/uri.md [2] https://www.redhat.com/archives/libguestfs/2020-February/msg00109.html Note that the qemu bug only affects handling of URIs such as nbd://host:port//abs/path (where '/abs/path' should be the export name); it is still possible to use --image-opts and pass the desired export name with a leading slash directly through JSON even without this patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20200212023101.1162686-1-eblake@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com>
2020-02-20block: Fix VM size field width in snapshot dumpMax Reitz1-2/+2
When printing the snapshot list (e.g. with qemu-img snapshot -l), the VM size field is only seven characters wide. As of de38b5005e9, this is not necessarily sufficient: We generally print three digits, and this may require a decimal point. Also, the unit field grew from something as plain as "M" to " MiB". This means that number and unit may take up eight characters in total; but we also want spaces in front. Considering previously the maximum width was four characters and the field width was chosen to be three characters wider, let us adjust the field width to be eleven now. Fixes: de38b5005e946aa3714963ea4c501e279e7d3666 Buglink: https://bugs.launchpad.net/qemu/+bug/1859989 Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200117105859.241818-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20iscsi: Drop iscsi_co_create_opts()Max Reitz1-56/+0
The generic fallback implementation effectively does the same. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200122164532.178040-5-mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20file-posix: Drop hdev_co_create_opts()Max Reitz1-67/+0
The generic fallback implementation effectively does the same. Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200122164532.178040-4-mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20block/nbd: Fix hang in .bdrv_close()Max Reitz1-1/+13
When nbd_close() is called from a coroutine, the connection_co never gets to run, and thus nbd_teardown_connection() hangs. This is because aio_co_enter() only puts the connection_co into the main coroutine's wake-up queue, so this main coroutine needs to yield and wait for connection_co to terminate. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200122164532.178040-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20block/backup-top: fix flags handlingVladimir Sementsov-Ogievskiy1-11/+20
backup-top "supports" write-unchanged, by skipping CBW operation in backup_top_co_pwritev. But it forgets to do the same in backup_top_co_pwrite_zeroes, as well as declare support for BDRV_REQ_WRITE_UNCHANGED. Fix this, and, while being here, declare also support for flags supported by source child. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200207161231.32707-1-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20block: always fill entire LUKS header space with zerosDaniel P. Berrangé1-4/+7
When initializing the LUKS header the size with default encryption parameters will currently be 2068480 bytes. This is rounded up to a multiple of the cluster size, 2081792, with 64k sectors. If the end of the header is not the same as the end of the cluster we fill the extra space with zeros. This was forgetting that not even the space allocated for the header will be fully initialized, as we only write key material for the first key slot. The space left for the other 7 slots is never written to. An optimization to the ref count checking code: commit a5fff8d4b4d928311a5005efa12d0991fe3b66f9 (refs/bisect/bad) Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Date: Wed Feb 27 16:14:30 2019 +0300 qcow2-refcount: avoid eating RAM made the assumption that every cluster which was allocated would have at least some data written to it. This was violated by way the LUKS header is only partially written, with much space simply reserved for future use. Depending on the cluster size this problem was masked by the logic which wrote zeros between the end of the LUKS header and the end of the cluster. $ qemu-img create --object secret,id=cluster_encrypt0,data=123456 \ -f qcow2 -o cluster_size=2k,encrypt.iter-time=1,\ encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 \ cluster_size_check.qcow2 100M Formatting 'cluster_size_check.qcow2', fmt=qcow2 size=104857600 encrypt.format=luks encrypt.key-secret=cluster_encrypt0 encrypt.iter-time=1 cluster_size=2048 lazy_refcounts=off refcount_bits=16 $ qemu-img check --object secret,id=cluster_encrypt0,data=redhat \ 'json:{"driver": "qcow2", "encrypt.format": "luks", \ "encrypt.key-secret": "cluster_encrypt0", \ "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}' ERROR: counting reference for region exceeding the end of the file by one cluster or more: offset 0x2000 size 0x1f9000 Leaked cluster 4 refcount=1 reference=0 ...snip... Leaked cluster 130 refcount=1 reference=0 1 errors were found on the image. Data may be corrupted, or further writes to the image may corrupt it. 127 leaked clusters were found on the image. This means waste of disk space, but no harm to data. Image end offset: 268288 The problem only exists when the disk image is entirely empty. Writing data to the disk image payload will solve the problem by causing the end of the file to be extended further. The change fixes it by ensuring that the entire allocated LUKS header region is fully initialized with zeros. The qemu-img check will still fail for any pre-existing disk images created prior to this change, unless at least 1 byte of the payload is written to. Fully writing zeros to the entire LUKS header is a good idea regardless as it ensures that space has been allocated on the host filesystem (or whatever block storage backend is used). Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20200207135520.2669430-1-berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-20qapi: Allow getting flat output from 'query-named-block-nodes'Peter Krempa1-2/+9
When a management application manages node names there's no reason to recurse into backing images in the output of query-named-block-nodes. Add a parameter to the command which will return just the top level structs. Signed-off-by: Peter Krempa <pkrempa@redhat.com> Message-Id: <4470f8c779abc404dcf65e375db195cd91a80651.1579509782.git.pkrempa@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [mreitz: Fixed coding style] Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-18quorum: Stop marking it as a filterMax Reitz1-1/+0
Quorum is not a filter, for example because it cannot guarantee which of its children will serve the next request. Thus, any of its children may differ from the data visible to quorum's parents. We have other filters with multiple children, but they differ in this aspect: - blkverify quits the whole qemu process if its children differ. As such, we can always skip it when we want to skip it (as a filter node) by going to any of its children. Both have the same data. - replication generally serves requests from bs->file, so this is its only actually filtered child. - Block job filters currently only have one child, but they will probably get more children in the future. Still, they will always have only one actually filtered child. Having "filters" as a dedicated node category only makes sense if you can skip them by going to a one fixed child that always shows the same data as the filter node. Quorum cannot fulfill this, so it is not a filter. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-13-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18mirror: Double-check immediately before replacingMax Reitz1-1/+13
There is no guarantee that we can still replace the node we want to replace at the end of the mirror job. Double-check by calling bdrv_recurse_can_replace(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-12-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18block: Remove bdrv_recurse_is_first_non_filter()Max Reitz6-66/+0
It no longer has any users. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-11-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18quorum: Implement .bdrv_recurse_can_replace()Max Reitz1-0/+54
Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200218103454.296704-9-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18blkverify: Implement .bdrv_recurse_can_replace()Max Reitz1-0/+15
Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-8-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18quorum: Fix child permissionsMax Reitz1-1/+18
Quorum cannot share WRITE or RESIZE on its children. Presumably, it only does so because as a filter, it seemed intuitively correct to point its .bdrv_child_perm to bdrv_filter_default_perm(). However, it is not really a filter, and bdrv_filter_default_perm() does not work for it, so we have to provide a custom .bdrv_child_perm implementation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200218103454.296704-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18block/io_uring: Remove superfluous semicolonPhilippe Mathieu-Daudé1-1/+1
Fixes: 6663a0a3376 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200218094402.26625-5-philmd@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18commit: Fix is_read for block_job_error_action()Kevin Wolf1-1/+6
block_job_error_action() needs to know if reading from the top node or writing to the base node failed so that it can set the right 'operation' in the BLOCK_JOB_ERROR QMP event. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-6-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18commit: Inline commit_populate()Kevin Wolf1-22/+6
commit_populate() is a very short function and only called in a single place. Its return value doesn't tell us whether an error happened while reading or writing, which would be necessary for sending the right data in the BLOCK_JOB_ERROR QMP event. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-5-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18commit: Fix argument order for block_job_error_action()Kevin Wolf1-1/+1
The block_job_error_action() error call in the commit job gives the on_err and is_read arguments in the wrong order. Fix this. (Of course, hard-coded is_read = false is wrong, too, but that's a separate problem for a separate patch.) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-4-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18commit: Remove unused bytes_writtenKevin Wolf1-2/+0
The bytes_written variable is only ever written to, it serves no purpose. This has actually been the case since the commit job was first introduced in commit 747ff602636. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200214200812.28180-3-kwolf@redhat.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18block/qcow2-bitmap: Remove unneeded variable assignmentPhilippe Mathieu-Daudé1-1/+0
Fix warning reported by Clang static code analyzer: CC block/qcow2-bitmap.o block/qcow2-bitmap.c:650:5: warning: Value stored to 'ret' is never read ret = -EINVAL; ^ ~~~~~~~ Fixes: 88ddffae8 Reported-by: Clang Static Analyzer Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200215161557.4077-2-philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Ján Tomko <jtomko@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18qcow2: Fix qcow2_alloc_cluster_abort() for external data fileKevin Wolf1-2/+5
For external data file, cluster allocations return an offset in the data file and are not refcounted. In this case, there is nothing to do for qcow2_alloc_cluster_abort(). Freeing the same offset in the qcow2 file is wrong and causes crashes in the better case or image corruption in the worse case. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200211094900.17315-3-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18qcow2: update_refcount(): Reset old_table_index after qcow2_cache_put()Kevin Wolf1-0/+1
In the case that update_refcount() frees a refcount block, it evicts it from the metadata cache. Before doing so, however, it returns the currently used refcount block to the cache because it might be the same. Returning the refcount block early means that we need to reset old_table_index so that we reload the refcount block in the next iteration if it is actually still in use. Fixes: f71c08ea8e60f035485a512fd2af8908567592f0 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200211094900.17315-2-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18block/vvfat: Do not unref qcow on closing backing bdrvHikaru Nishida1-7/+0
Before this commit, BDRVVVFATState.qcow is unrefed in write_target_close on closing backing bdrv of vvfat. However, qcow bdrv is opend as a child of vvfat in enable_write_target() so it will be also unrefed on closing vvfat itself. This causes use-after-free of qcow on freeing vvfat which has backing bdrv and qcow bdrv as children in this order because bdrv_close(vvfat) tries to free qcow bdrv after freeing backing bdrv as QLIST_FOREACH_SAFE() loop keeps next pointer, but BdrvChild of qcow is already freed in bdrv_close(backing bdrv). Signed-off-by: Hikaru Nishida <hikarupsp@gmail.com> Message-Id: <20200209175156.85748-1-hikarupsp@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18qcow2: Fix alignment checks in encrypted imagesAlberto Garcia2-6/+8
I/O requests to encrypted media should be aligned to the sector size used by the underlying encryption method, not to BDRV_SECTOR_SIZE. Fortunately this doesn't break anything at the moment because both existing QCRYPTO_BLOCK_*_SECTOR_SIZE have the same value as BDRV_SECTOR_SIZE. The checks in qcow2_co_preadv_encrypted() are also unnecessary because they are repeated immediately afterwards in qcow2_co_encdec(). Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200213171646.15876-1-berto@igalia.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-02-18mirror: Don't let an operation wait for itselfKevin Wolf1-9/+12
mirror_wait_for_free_in_flight_slot() just picks a random operation to wait for. However, when mirror_co_read() waits for free slots, its MirrorOp is already in s->ops_in_flight, so if not enough slots are immediately available, an operation can end up waiting for itself to complete, which results in a hang. Fix this by passing the current MirrorOp and skipping this operation when picking an operation to wait for. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1794692 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2020-02-18mirror: Store MirrorOp.co for debuggabilityKevin Wolf1-0/+2
If a coroutine is launched, but the coroutine pointer isn't stored anywhere, debugging any problems inside the coroutine is quite hard. Let's store the coroutine pointer of a mirror operation in MirrorOp to have it available in the debugger. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2020-02-07block: fix crash on zero-length unaligned write and readVladimir Sementsov-Ogievskiy1-1/+27
Commit 7a3f542fbd "block/io: refactor padding" occasionally dropped aligning for zero-length request: bdrv_init_padding() blindly return false if bytes == 0, like there is nothing to align. This leads the following command to crash: ./qemu-io --image-opts -c 'write 1 0' \ driver=blkdebug,align=512,image.driver=null-co,image.size=512 >> qemu-io: block/io.c:1955: bdrv_aligned_pwritev: Assertion `(offset & (align - 1)) == 0' failed. >> Aborted (core dumped) Prior to 7a3f542fbd we does aligning of such zero requests. Instead of recovering this behavior let's just do nothing on such requests as it is useless. Note that driver may have special meaning of zero-length reqeusts, like qcow2_co_pwritev_compressed_part, so we can't skip any zero-length operation. But for unaligned ones, we can't pass it to driver anyway. This commit also fixes crash in iotest 80 running with -nocache: ./check -nocache -qcow2 80 which crashes on same assertion due to trying to read empty extra data in qcow2_do_read_snapshots(). Cc: qemu-stable@nongnu.org # v4.2 Fixes: 7a3f542fbd Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20200206164245.17781-1-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-02-06block/backup-top: fix failure pathVladimir Sementsov-Ogievskiy1-9/+12
We can't access top after call bdrv_backup_top_drop, as it is already freed at this time. Also, no needs to unref target child by hand, it will be unrefed on bdrv_close() automatically. So, just do bdrv_backup_top_drop if append succeed and one bdrv_unref otherwise. Note, that in !appended case bdrv_unref(top) moved into drained section on source. It doesn't really matter, but just for code simplicity. Fixes: 7df7868b96404 Cc: qemu-stable@nongnu.org # v4.2.0 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 20200121142802.21467-2-vsementsov@virtuozzo.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06qcow2: Use BDRV_SECTOR_SIZE instead of the hardcoded valueAlberto Garcia1-3/+5
This replaces all remaining instances in the qcow2 code. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: b5f74b606c2d9873b12d29acdb7fd498029c4025.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06qcow2: Don't require aligned offsets in qcow2_co_copy_range_from()Alberto Garcia1-4/+0
qemu-img's convert_co_copy_range() operates at the sector level and block_copy() operates at the cluster level so this condition is always true, but it is not necessary to restrict this here, so let's leave it to the driver implementation return an error if there is any. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: a4264aaee656910c84161a2965f7a501437379ca.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06qcow2: Use bs->bl.request_alignment when updating an L1 entryAlberto Garcia1-10/+15
When updating an L1 entry the qcow2 driver writes a (512-byte) sector worth of data to avoid a read-modify-write cycle. Instead of always writing 512 bytes we should follow the alignment requirements of the storage backend. (the only exception is when the alignment is larger than the cluster size because then we could be overwriting data after the L1 table) Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: 71f34d4ae4b367b32fb36134acbf4f4f7ee681f4.1579374329.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06qcow2: Tighten cluster_offset alignment assertionsAlberto Garcia1-6/+3
qcow2_alloc_cluster_offset() and qcow2_get_cluster_offset() always return offsets that are cluster-aligned so don't just check that they are sector-aligned. The check in qcow2_co_preadv_task() is also replaced by an assertion for the same reason. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: 558ba339965f858bede4c73ce3f50f0c0493597d.1579374329.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06qcow2: Don't round the L1 table allocation up to the sector sizeAlberto Garcia4-7/+5
The L1 table is read from disk using the byte-based bdrv_pread() and is never accessed beyond its last element, so there's no need to allocate more memory than that. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-id: b2e27214ec7b03a585931bcf383ee1ac3a641a10.1579374329.git.berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06qcow2: Use a GString in report_unsupported_feature()Alberto Garcia1-12/+11
This is a bit more efficient than having to allocate and free memory for each item. The default size (60) is enough for all the existing incompatible features or the "Unknown incompatible feature" message. Suggested-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20200115135626.19442-1-berto@igalia.com Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-02-06qcow2: Assert that host cluster offsets fit in L2 table entriesAlberto Garcia1-2/+12
The standard cluster descriptor in L2 table entries has a field to store the host cluster offset. When we need to get that offset from an entry we use L2E_OFFSET_MASK to ensure that we only use the bits that belong to that field. But while that mask is used every time we read from an L2 entry, it is never used when we write to it. Due to the QCOW_MAX_CLUSTER_OFFSET limit set in the cluster allocation code QEMU can never produce offsets that don't fit in that field so any such offset would indicate a bug in QEMU. Compressed cluster descriptors contain two fields (host cluster offset and size of the compressed data) and the situation with them is similar. In this case the masks are not constant but are stored in the csize_mask and cluster_offset_mask fields of BDRVQcow2State. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200113161146.20099-1-berto@igalia.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-01-30block/io_uring: adds userspace completion pollingAarushi Mehta1-1/+16
Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-11-stefanha@redhat.com Message-Id: <20200120141858.587874-11-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30block: add trace events for io_uringAarushi Mehta2-3/+32
Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-10-stefanha@redhat.com Message-Id: <20200120141858.587874-10-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30block/file-posix.c: extend to use io_uringAarushi Mehta1-19/+79
Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Reviewed-by: Maxim Levitsky <maximlevitsky@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-9-stefanha@redhat.com Message-Id: <20200120141858.587874-9-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30block/io_uring: implements interfaces for io_uringAarushi Mehta2-0/+404
Aborts when sqe fails to be set as sqes cannot be returned to the ring. Adds slow path for short reads for older kernels Signed-off-by: Aarushi Mehta <mehta.aaru20@gmail.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200120141858.587874-5-stefanha@redhat.com Message-Id: <20200120141858.587874-5-stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30block/io: take bs->reqs_lock in bdrv_mark_request_serialisingPaolo Bonzini1-49/+63
bdrv_mark_request_serialising is writing the overlap_offset and overlap_bytes fields of BdrvTrackedRequest. Take bs->reqs_lock for the whole duration of it, and not just when waiting for serialising requests, so that tracked_request_overlaps does not look at a half-updated request. The new code does not unlock/relock around retries. This is unnecessary because a retry is always preceded by a CoQueue wait, which already releases and reacquires bs->reqs_lock. Reported-by: Peter Lieven <pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-4-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-4-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30block/io: wait for serialising requests when a request becomes serialisingPaolo Bonzini2-24/+17
Marking without waiting would not result in actual serialising behavior. Thus, make a call bdrv_mark_request_serialising sufficient for serialisation to happen. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-3-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-3-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-30block: eliminate BDRV_REQ_NO_SERIALISINGPaolo Bonzini1-14/+4
It is unused since commit 00e30f0 ("block/backup: use backup-top instead of write notifiers", 2019-10-01), drop it to simplify the code. While at it, drop redundant assertions on flags. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1578495356-46219-2-git-send-email-pbonzini@redhat.com Message-Id: <1578495356-46219-2-git-send-email-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-27iscsi: Don't access non-existent scsi_lba_status_descriptorKevin Wolf1-1/+1
In iscsi_co_block_status(), we may have received num_descriptors == 0 from the iscsi server. Therefore, we can't unconditionally access lbas->descriptors[0]. Add the missing check. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Felipe Franciosi <felipe@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Reviewed-by: Peter Lieven <pl@kamp.de>
2020-01-27iscsi: Cap block count from GET LBA STATUS (CVE-2020-1711)Felipe Franciosi1-2/+3
When querying an iSCSI server for the provisioning status of blocks (via GET LBA STATUS), Qemu only validates that the response descriptor zero's LBA matches the one requested. Given the SCSI spec allows servers to respond with the status of blocks beyond the end of the LUN, Qemu may have its heap corrupted by clearing/setting too many bits at the end of its allocmap for the LUN. A malicious guest in control of the iSCSI server could carefully program Qemu's heap (by selectively setting the bitmap) and then smash it. This limits the number of bits that iscsi_co_block_status() will try to update in the allocmap so it can't overflow the bitmap. Fixes: CVE-2020-1711 Cc: qemu-stable@nongnu.org Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Signed-off-by: Peter Turschmid <peter.turschm@nutanix.com> Signed-off-by: Raphael Norwitz <raphael.norwitz@nutanix.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-01-27block/backup: fix memory leak in bdrv_backup_top_append()Eiichi Tsukata1-1/+1
bdrv_open_driver() allocates bs->opaque according to drv->instance_size. There is no need to allocate it and overwrite opaque in bdrv_backup_top_append(). Reproducer: $ QTEST_QEMU_BINARY=./x86_64-softmmu/qemu-system-x86_64 valgrind -q --leak-check=full tests/test-replication -p /replication/secondary/start ==29792== 24 bytes in 1 blocks are definitely lost in loss record 52 of 226 ==29792== at 0x483AB1A: calloc (vg_replace_malloc.c:762) ==29792== by 0x4B07CE0: g_malloc0 (in /usr/lib64/libglib-2.0.so.0.6000.7) ==29792== by 0x12BAB9: bdrv_open_driver (block.c:1289) ==29792== by 0x12BEA9: bdrv_new_open_driver (block.c:1359) ==29792== by 0x1D15CB: bdrv_backup_top_append (backup-top.c:190) ==29792== by 0x1CC11A: backup_job_create (backup.c:439) ==29792== by 0x1CD542: replication_start (replication.c:544) ==29792== by 0x1401B9: replication_start_all (replication.c:52) ==29792== by 0x128B50: test_secondary_start (test-replication.c:427) ... Fixes: 7df7868b9640 ("block: introduce backup-top filter driver") Signed-off-by: Eiichi Tsukata <devel@etsukata.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-01-27block/backup-top: Don't acquire context while dropping topSergio Lopez2-5/+3
All paths that lead to bdrv_backup_top_drop(), except for the call from backup_clean(), imply that the BDS AioContext has already been acquired, so doing it there too can potentially lead to QEMU hanging on AIO_WAIT_WHILE(). An easy way to trigger this situation is by issuing a two actions transaction, with a proper and a bogus blockdev-backup, so the second one will trigger a rollback. This will trigger a hang with an stack trace like this one: #0 0x00007fb680c75016 in __GI_ppoll (fds=0x55e74580f7c0, nfds=1, timeout=<optimized out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:39 #1 0x000055e743386e09 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77 #2 0x000055e743386e09 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:336 #3 0x000055e743388dc4 in aio_poll (ctx=0x55e7458925d0, blocking=blocking@entry=true) at util/aio-posix.c:669 #4 0x000055e743305dea in bdrv_flush (bs=bs@entry=0x55e74593c0d0) at block/io.c:2878 #5 0x000055e7432be58e in bdrv_close (bs=0x55e74593c0d0) at block.c:4017 #6 0x000055e7432be58e in bdrv_delete (bs=<optimized out>) at block.c:4262 #7 0x000055e7432be58e in bdrv_unref (bs=bs@entry=0x55e74593c0d0) at block.c:5644 #8 0x000055e743316b9b in bdrv_backup_top_drop (bs=bs@entry=0x55e74593c0d0) at block/backup-top.c:273 #9 0x000055e74331461f in backup_job_create (job_id=0x0, bs=bs@entry=0x55e7458d5820, target=target@entry=0x55e74589f640, speed=0, sync_mode=MIRROR_SYNC_MODE_FULL, sync_bitmap=sync_bitmap@entry=0x0, bitmap_mode=BITMAP_SYNC_MODE_ON_SUCCESS, compress=false, filter_node_name=0x0, on_source_error=BLOCKDEV_ON_ERROR_REPORT, on_target_error=BLOCKDEV_ON_ERROR_REPORT, creation_flags=0, cb=0x0, opaque=0x0, txn=0x0, errp=0x7ffddfd1efb0) at block/backup.c:478 #10 0x000055e74315bc52 in do_backup_common (backup=backup@entry=0x55e746c066d0, bs=bs@entry=0x55e7458d5820, target_bs=target_bs@entry=0x55e74589f640, aio_context=aio_context@entry=0x55e7458a91e0, txn=txn@entry=0x0, errp=errp@entry=0x7ffddfd1efb0) at blockdev.c:3580 #11 0x000055e74315c37c in do_blockdev_backup (backup=backup@entry=0x55e746c066d0, txn=0x0, errp=errp@entry=0x7ffddfd1efb0) at /usr/src/debug/qemu-kvm-4.2.0-2.module+el8.2.0+5135+ed3b2489.x86_64/./qapi/qapi-types-block-core.h:1492 #12 0x000055e74315c449 in blockdev_backup_prepare (common=0x55e746a8de90, errp=0x7ffddfd1f018) at blockdev.c:1885 #13 0x000055e743160152 in qmp_transaction (dev_list=<optimized out>, has_props=<optimized out>, props=0x55e7467fe2c0, errp=errp@entry=0x7ffddfd1f088) at blockdev.c:2340 #14 0x000055e743287ff5 in qmp_marshal_transaction (args=<optimized out>, ret=<optimized out>, errp=0x7ffddfd1f0f8) at qapi/qapi-commands-transaction.c:44 #15 0x000055e74333de6c in do_qmp_dispatch (errp=0x7ffddfd1f0f0, allow_oob=<optimized out>, request=<optimized out>, cmds=0x55e743c28d60 <qmp_commands>) at qapi/qmp-dispatch.c:132 #16 0x000055e74333de6c in qmp_dispatch (cmds=0x55e743c28d60 <qmp_commands>, request=<optimized out>, allow_oob=<optimized out>) at qapi/qmp-dispatch.c:175 #17 0x000055e74325c061 in monitor_qmp_dispatch (mon=0x55e745908030, req=<optimized out>) at monitor/qmp.c:145 #18 0x000055e74325c6fa in monitor_qmp_bh_dispatcher (data=<optimized out>) at monitor/qmp.c:234 #19 0x000055e743385866 in aio_bh_call (bh=0x55e745807ae0) at util/async.c:117 #20 0x000055e743385866 in aio_bh_poll (ctx=ctx@entry=0x55e7458067a0) at util/async.c:117 #21 0x000055e743388c54 in aio_dispatch (ctx=0x55e7458067a0) at util/aio-posix.c:459 #22 0x000055e743385742 in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:260 #23 0x00007fb68543e67d in g_main_dispatch (context=0x55e745893a40) at gmain.c:3176 #24 0x00007fb68543e67d in g_main_context_dispatch (context=context@entry=0x55e745893a40) at gmain.c:3829 #25 0x000055e743387d08 in glib_pollfds_poll () at util/main-loop.c:219 #26 0x000055e743387d08 in os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 #27 0x000055e743387d08 in main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 #28 0x000055e74316a3c1 in main_loop () at vl.c:1828 #29 0x000055e743016a72 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4504 Fix this by not acquiring the AioContext there, and ensuring all paths leading to it have it already acquired (backup_clean()). RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1782111 Signed-off-by: Sergio Lopez <slp@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-01-13linux-aio: increasing MAX_EVENTS to a larger hardcoded valueWangyong1-1/+1
Since commit 6040aedddb5f474a9c2304b6a432a652d82b3d3c "virtio-blk: make queue size configurable",if the user set the queue size to more than 128 ,it will not take effect. That's because linux aio's maximum outstanding requests at a time is always less than or equal to 128. This patch simply increase MAX_EVENTS to a larger hardcoded value of 1024 as a shortterm fix. Signed-off-by: wangyong <wang.yongD@h3c.com> Message-id: faa5781afd354a96a0be152b288f636f@h3c.com Message-Id: <faa5781afd354a96a0be152b288f636f@h3c.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-01-06backup-top: Begin drain earlierMax Reitz1-2/+2
When dropping backup-top, we need to drain the node before freeing the BlockCopyState. Otherwise, requests may still be in flight and then the assertion in shres_destroy() will fail. (This becomes visible in intermittent failure of 056.) Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20191219182638.104621-1-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>