aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2021-01-13Merge remote-tracking branch 'remotes/armbru/tags/pull-yank-2021-01-13' into ↵Peter Maydell1-61/+92
staging Yank patches patches for 2021-01-13 # gpg: Signature made Wed 13 Jan 2021 09:25:46 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-yank-2021-01-13: tests/test-char.c: Wait for the chardev to connect in char_socket_client_dupid_test io: Document qmp oob suitability of qio_channel_shutdown and io_shutdown io/channel-tls.c: make qio_channel_tls_shutdown thread-safe migration: Add yank feature chardev/char-socket.c: Add yank feature block/nbd.c: Add yank feature Introduce yank feature Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2021-01-13block/nbd.c: Add yank featureLukas Straub1-61/+92
Register a yank function which shuts down the socket and sets s->state = NBD_CLIENT_QUIT. This is the same behaviour as if an error occured. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <b73eb07db6d1fcd00667beb13ae6117260f002c3.1609167865.git.lukasstraub2@web.de> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2021-01-12meson: Propagate gnutls dependencyRoman Bolshakov1-1/+1
crypto/tlscreds.h includes GnuTLS headers if CONFIG_GNUTLS is set, but GNUTLS_CFLAGS, that describe include path, are not propagated transitively to all users of crypto and build fails if GnuTLS headers reside in non-standard directory (which is a case for homebrew on Apple Silicon). Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Message-Id: <20210102125213.41279-1-r.bolshakov@yadro.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-08Remove superfluous timer_del() callsPeter Maydell3-4/+0
This commit is the result of running the timer-del-timer-free.cocci script on the whole source tree. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Acked-by: Corey Minyard <cminyard@mvista.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20201215154107.3255-4-peter.maydell@linaro.org
2021-01-02libiscsi: convert to mesonPaolo Bonzini1-1/+1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02curl: remove compatibility code, require 7.29.0Paolo Bonzini1-28/+0
cURL 7.16.0 was released in October 2006. Just remove code that is in all likelihood not being used anywhere, and require the oldest version found in currently supported distros, which is 7.29.0 from CentOS 7. pkg-config is enough for QEMU, since it does not need extra information such as the path for certicate authorities. All supported platforms today will all have pkg-config for curl, so we can drop curl-config. Suggested-by: Daniel Berrangé <berrange@redhat.com> Reviewed-by: Daniel Berrangé <berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-02meson: use dependency to gate block modulesPaolo Bonzini1-10/+10
This allows converting the dependencies to meson options one by one. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-01-01Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2020-12-19' into ↵Peter Maydell3-8/+5
staging QAPI patches patches for 2020-12-19 # gpg: Signature made Sat 19 Dec 2020 09:40:05 GMT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2020-12-19: (33 commits) qobject: Make QString immutable block: Use GString instead of QString to build filenames keyval: Use GString to accumulate value strings json: Use GString instead of QString to accumulate strings migration: Replace migration's JSON writer by the general one qobject: Factor JSON writer out of qobject_to_json() qobject: Factor quoted_str() out of to_json() qobject: Drop qstring_get_try_str() qobject: Drop qobject_get_try_str() Revert "qobject: let object_property_get_str() use new API" block: Avoid qobject_get_try_str() qmp: Fix tracing of non-string command IDs qobject: Move internals to qobject-internal.h hw/rdma: Replace QList by GQueue Revert "qstring: add qstring_free()" qobject: Change qobject_to_json()'s value to GString qobject: Use GString instead of QString to accumulate JSON qobject: Make qobject_to_json_pretty() take a pretty argument monitor: Use GString instead of QString for output buffer hmp: Simplify how qmp_human_monitor_command() gets output ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-12-31Merge remote-tracking branch 'remotes/maxreitz/tags/pull-block-2020-12-18' ↵Peter Maydell6-57/+747
into staging Block patches: - New block filter: preallocate (which, on writes beyond an image file's end, allocates big chunks of data so that such post-EOF writes will occur less frequently) - write-zeroes and block-status support for Quorum - Implementation of truncate for the nvme block driver similarly to the existing implementations for host block devices and iscsi devices - Block layer refactoring: Drop the tighten_restrictions concept in the block permission functions - iotest fixes # gpg: Signature made Fri 18 Dec 2020 14:45:30 GMT # gpg: using RSA key 91BEB60A30DB3E8857D11829F407DB0061D5CF40 # gpg: issuer "mreitz@redhat.com" # gpg: Good signature from "Max Reitz <mreitz@redhat.com>" [full] # Primary key fingerprint: 91BE B60A 30DB 3E88 57D1 1829 F407 DB00 61D5 CF40 * remotes/maxreitz/tags/pull-block-2020-12-18: (30 commits) iotests: Fix _send_qemu_cmd with bash 5.1 iotests/102: Pass $QEMU_HANDLE to _send_qemu_cmd block/nvme: Implement fake truncate() coroutine quorum: Implement bdrv_co_pwrite_zeroes() quorum: Implement bdrv_co_block_status() scripts/simplebench: add bench_prealloc.py simplebench/results_to_text: make executable simplebench/results_to_text: add difference line to the table simplebench/results_to_text: improve view of the table simplebench: move results_to_text() into separate file simplebench: rename ascii() to results_to_text() scripts/simplebench: use standard deviation for +- error scripts/simplebench: support iops scripts/simplebench: fix grammar: s/successed/succeeded/ iotests: add 298 to test new preallocate filter driver iotests.py: execute_setup_common(): add required_fmts argument iotests: qemu_io_silent: support --image-opts qemu-io: add preallocate mode parameter for truncate command block: introduce preallocate filter block: bdrv_check_perm(): process children anyway ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-12-19qobject: Change qobject_to_json()'s value to GStringMarkus Armbruster1-1/+1
qobject_to_json() and qobject_to_json_pretty() build a GString, then covert it to QString. Just one of the callers actually needs a QString: qemu_rbd_parse_filename(). A few others need a string they can modify: qmp_send_response(), qga's send_response(), to_json_str(), and qmp_fd_vsend_fds(). The remainder just need a string. Change qobject_to_json() and qobject_to_json_pretty() to return the GString. qemu_rbd_parse_filename() now has to convert to QString. All others save a QString temporary. to_json_str() actually becomes a bit simpler, because GString provides more convenient modification functions. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201211171152.146877-6-armbru@redhat.com>
2020-12-19qapi: Use QAPI_LIST_PREPEND() where possibleEric Blake2-7/+4
Anywhere we create a list of just one item or by prepending items (typically because order doesn't matter), we can use QAPI_LIST_PREPEND(). But places where we must keep the list in order by appending remain open-coded until later patches. Note that as a side effect, this also performs a cleanup of two minor issues in qga/commands-posix.c: the old code was performing new = g_malloc0(sizeof(*ret)); which 1) is confusing because you have to verify whether 'new' and 'ret' are variables with the same type, and 2) would conflict with C++ compilation (not an actual problem for this file, but makes copy-and-paste harder). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201113011340.463563-5-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> [Straightforward conflicts due to commit a8aa94b5f8 "qga: update schema for guest-get-disks 'dependents' field" and commit a10b453a52 "target/mips: Move mips_cpu_add_definition() from helper.c to cpu.c" resolved. Commit message tweaked.] Signed-off-by: Markus Armbruster <armbru@redhat.com>
2020-12-18block/vpc: Use sizeof() instead of HEADER_SIZE for footer sizeMarkus Armbruster1-15/+14
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-10-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18block/vpc: Pass footer buffers as VHDFooter * instead of uint8_t *Markus Armbruster1-7/+7
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-9-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18block/vpc: Pad VHDFooter, replace uint8_t[] buffersMarkus Armbruster1-40/+37
Pad VHDFooter as specified in the "Virtual Hard Disk Image Format Specification" version 1.0[*]. Change footer buffers from uint8_t[HEADER_SIZE] to VHDFooter. Their size remains the same. The VHDFooter * variables pointing to a VHDFooter variable right next to it are now silly. Eliminate them, and shorten the remaining variables' names. Most variables pointing to s->footer are now also silly. Eliminate them, too. [*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-8-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18block/vpc: Use sizeof() instead of 1024 for dynamic header sizeMarkus Armbruster1-4/+5
Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-7-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18block/vpc: Pad VHDDynDiskHeader, replace uint8_t[] buffersMarkus Armbruster1-22/+19
Pad VHDDynDiskHeader as specified in the "Virtual Hard Disk Image Format Specification" version 1.0[*]. Change dynamic disk header buffers from uint8_t[1024] to VHDDynDiskHeader. Their size remains the same. The VHDDynDiskHeader * variables pointing to a VHDDynDiskHeader variable right next to it are now silly. Eliminate them. [*] http://download.microsoft.com/download/f/f/e/ffef50a5-07dd-4cf8-aaa3-442c0673a029/Virtual%20Hard%20Disk%20Format%20Spec_10_18_06.doc Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-6-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18block/vpc: Make vpc_checksum() take void *Markus Armbruster1-1/+2
Some of the next commits will checksum structs. Change vpc_checksum() to take void * instead of uint8_t, to save us pointless casts to uint8_t *. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-5-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18block/vpc: Don't abuse the footer buffer for dynamic headerMarkus Armbruster1-10/+12
create_dynamic_disk() takes a buffer holding the footer as first argument. It writes out the footer (512 bytes), then reuses the buffer to initialize and write out the dynamic header (1024 bytes). Works, because the caller passes a buffer that is large enough for both purposes. I hate that. Use a separate buffer for the dynamic header, and adjust the caller's buffer. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-4-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18block/vpc: Don't abuse the footer buffer as BAT sector bufferMarkus Armbruster1-2/+3
create_dynamic_disk() takes a buffer holding the footer as first argument. It writes out the footer (512 bytes), then reuses the buffer to initialize and write out the dynamic header (1024 bytes), then reuses it again to initialize and write out BAT sectors (512). Works, because the caller passes a buffer that is large enough for all three purposes. I hate that. Use a separate buffer for writing out BAT sectors. The next commit will do the same for the dynamic header. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-3-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18block/vpc: Make vpc_open() read the full dynamic headerMarkus Armbruster1-4/+4
The dynamic header's size is 1024 bytes. vpc_open() reads only the 512 bytes of the dynamic header into buf[]. Works, because it doesn't actually access the second half. However, a colleague told me that GCC 11 warns: ../block/vpc.c:358:51: error: array subscript 'struct VHDDynDiskHeader[0]' is partly outside array bounds of 'uint8_t[512]' [-Werror=array-bounds] Clean up to read the full header. Rename buf[] to dyndisk_header_buf[] while there. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201217162003.1102738-2-armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-18block/nvme: Implement fake truncate() coroutinePhilippe Mathieu-Daudé1-0/+24
NVMe drive cannot be shrunk. Since commit c80d8b06cfa we can use the @exact parameter (set to false) to return success if the block device is larger than the requested offset (even if we can not be shrunk). Use this parameter to implement the NVMe truncate() coroutine, similarly how it is done for the iscsi and file-posix drivers (see commit 82325ae5f2f "Evaluate @exact in protocol drivers"). Reported-by: Xueqiang Wei <xuwei@redhat.com> Suggested-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201210125202.858656-1-philmd@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18quorum: Implement bdrv_co_pwrite_zeroes()Alberto Garcia1-2/+34
This simply calls bdrv_co_pwrite_zeroes() in all children. bs->supported_zero_flags is also set to the flags that are supported by all children. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <2f09c842781fe336b4c2e40036bba577b7430190.1605286097.git.berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18quorum: Implement bdrv_co_block_status()Alberto Garcia1-0/+52
The quorum driver does not implement bdrv_co_block_status() and because of that it always reports to contain data even if all its children are known to be empty. One consequence of this is that if we for example create a quorum with a size of 10GB and we mirror it to a new image the operation will write 10GB of actual zeroes to the destination image wasting a lot of time and disk space. Since a quorum has an arbitrary number of children of potentially different formats there is no way to report all possible allocation status flags in a way that makes sense, so this implementation only reports when a given region is known to contain zeroes (BDRV_BLOCK_ZERO) or not (BDRV_BLOCK_DATA). If all children agree that a region contains zeroes then we can return BDRV_BLOCK_ZERO using the smallest size reported by the children (because all agree that a region of at least that size contains zeroes). If at least one child disagrees we have to return BDRV_BLOCK_DATA. In this case we use the largest of the sizes reported by the children that didn't return BDRV_BLOCK_ZERO (because we know that there won't be an agreement for at least that size). Signed-off-by: Alberto Garcia <berto@igalia.com> Tested-by: Tao Xu <tao3.xu@intel.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <db83149afcf0f793effc8878089d29af4c46ffe1.1605286097.git.berto@igalia.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18block: introduce preallocate filterVladimir Sementsov-Ogievskiy2-0/+560
It's intended to be inserted between format and protocol nodes to preallocate additional space (expanding protocol file) on writes crossing EOF. It improves performance for file-systems with slow allocation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201021145859.11201-9-vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> [mreitz: Two comment fixes, and bumped the version from 5.2 to 6.0] Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18block: introduce BDRV_REQ_NO_WAIT flagVladimir Sementsov-Ogievskiy1-1/+10
Add flag to make serialising request no wait: if there are conflicting requests, just return error immediately. It's will be used in upcoming preallocate filter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-7-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18block: bdrv_mark_request_serialising: split non-waiting functionVladimir Sementsov-Ogievskiy2-13/+24
We'll need a separate function, which will only "mark" request serialising with specified align but not wait for conflicting requests. So, it will be like old bdrv_mark_request_serialising(), before merging bdrv_wait_serialising_requests_locked() into it. To reduce the possible mess, let's do the following: Public function that does both marking and waiting will be called bdrv_make_request_serialising, and private function which will only "mark" will be called tracked_request_set_serialising(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201021145859.11201-6-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18block/io: bdrv_wait_serialising_requests_locked: drop extra bs argVladimir Sementsov-Ogievskiy1-5/+5
bs is linked in req, so no needs to pass it separately. Most of tracked-requests API doesn't have bs argument. Actually, after this patch only tracked_request_begin has it, but it's for purpose. While being here, also add a comment about what "_locked" is. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201021145859.11201-5-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18block/io: split out bdrv_find_conflicting_requestVladimir Sementsov-Ogievskiy1-30/+41
To be reused in separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201021145859.11201-4-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18block/io.c: drop assertion on double waiting for request serialisationVladimir Sementsov-Ogievskiy1-10/+1
The comments states, that on misaligned request we should have already been waiting. But for bdrv_padding_rmw_read, we called bdrv_mark_request_serialising with align = request_alignment, and now we serialise with align = cluster_size. So we may have to wait again with larger alignment. Note, that the only user of BDRV_REQ_SERIALISING is backup which issues cluster-aligned requests, so seems the assertion should not fire for now. But it's wrong anyway. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201021145859.11201-3-vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2020-12-18block/nfs: fix int overflow in nfs_client_open_qdictPeter Lieven1-1/+1
nfs_client_open returns the file size in sectors. This effectively makes it impossible to open files larger than 1TB. Fixes: c22a03454544c2a08f1107c5cc8481a5574533d5 Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20201209121735.16437-1-pl@kamp.de> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-13block/file-posix: fix a possible undefined behaviorPan Nengyuan1-1/+1
local_err is not initialized to NULL, it will cause a assert error as below: qemu/util/error.c:59: error_setv: Assertion `*errp == NULL' failed. Fixes: c6447510690 Reported-by: Euler Robot <euler.robot@huawei.com> Signed-off-by: Pan Nengyuan <pannengyuan@huawei.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Chen Qun <kuhn.chenqun@huawei.com> Message-Id: <20201023061218.2080844-8-kuhn.chenqun@huawei.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-12-11block: Fix deadlock in bdrv_co_yield_to_drain()Kevin Wolf1-17/+24
If bdrv_co_yield_to_drain() is called for draining a block node that runs in a different AioContext, it keeps that AioContext locked while it yields and schedules a BH in the AioContext to do the actual drain. As long as executing the BH is the very next thing that the event loop of the node's AioContext does, this actually happens to work, but when it tries to execute something else that wants to take the AioContext lock, it will deadlock. (In the bug report, this other thing is a virtio-scsi device running virtio_scsi_data_plane_handle_cmd().) Instead, always drop the AioContext lock across the yield and reacquire it only when the coroutine is reentered. The BH needs to unconditionally take the lock for itself now. This fixes the 'block_resize' QMP command on a block node that runs in an iothread. Cc: qemu-stable@nongnu.org Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201203172311.68232-4-kwolf@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11block: introduce BDRV_MAX_LENGTHVladimir Sementsov-Ogievskiy2-12/+45
We are going to modify block layer to work with 64bit requests. And first step is moving to int64_t type for both offset and bytes arguments in all block request related functions. It's mostly safe (when widening signed or unsigned int to int64_t), but switching from uint64_t is questionable. So, let's first establish the set of requests we want to work with. First signed int64_t should be enough, as off_t is signed anyway. Then, obviously offset + bytes should not overflow. And most interesting: (offset + bytes) being aligned up should not overflow as well. Aligned to what alignment? First thing that comes in mind is bs->bl.request_alignment, as we align up request to this alignment. But there is another thing: look at bdrv_mark_request_serialising(). It aligns request up to some given alignment. And this parameter may be bdrv_get_cluster_size(), which is often a lot greater than bs->bl.request_alignment. Note also, that bdrv_mark_request_serialising() uses signed int64_t for calculations. So, actually, we already depend on some restrictions. Happily, bdrv_get_cluster_size() returns int and bs->bl.request_alignment has 32bit unsigned type, but defined to be a power of 2 less than INT_MAX. So, we may establish, that INT_MAX is absolute maximum for any kind of alignment that may occur with the request. Note, that bdrv_get_cluster_size() is not documented to return power of 2, still bdrv_mark_request_serialising() behaves like it is. Also, backup uses bdi.cluster_size and is not prepared to it not being power of 2. So, let's establish that Qemu supports only power-of-2 clusters and alignments. So, alignment can't be greater than 2^30. Finally to be safe with calculations, to not calculate different maximums for different nodes (depending on cluster size and request_alignment), let's simply set QEMU_ALIGN_DOWN(INT64_MAX, 2^30) as absolute maximum bytes length for Qemu. Actually, it's not much less than INT64_MAX. OK, then, let's apply it to block/io. Let's consider all block/io entry points of offset/bytes: 4 bytes/offset interface functions: bdrv_co_preadv_part(), bdrv_co_pwritev_part(), bdrv_co_copy_range_internal() and bdrv_co_pdiscard() and we check them all with bdrv_check_request(). We also have one entry point with only offset: bdrv_co_truncate(). Check the offset. And one public structure: BdrvTrackedRequest. Happily, it has only three external users: file-posix.c: adopted by this patch write-threshold.c: only read fields test-write-threshold.c: sets obviously small constant values Better is to make the structure private and add corresponding interfaces.. Still it's not obvious what kind of interface is needed for file-posix.c. Let's keep it public but add corresponding assertions. After this patch we'll convert functions in block/io.c to int64_t bytes and offset parameters. We can assume that offset/bytes pair always satisfy new restrictions, and make corresponding assertions where needed. If we reach some offset/bytes point in block/io.c missing bdrv_check_request() it is considered a bug. As well, if block/io.c modifies a offset/bytes request, expanding it more then aligning up to request_alignment, it's a bug too. For all io requests except for discard we keep for now old restriction of 32bit request length. iotest 206 output error message changed, as now test disk size is larger than new limit. Add one more test case with new maximum disk size to cover too-big-L1 case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-5-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11block/io: bdrv_check_byte_request(): drop bdrv_is_inserted()Vladimir Sementsov-Ogievskiy1-13/+12
Move bdrv_is_inserted() calls into callers. We are going to make bdrv_check_byte_request() a clean thing. bdrv_is_inserted() is not about checking the request, it's about checking the bs. So, it should be separate. With this patch we probably change error path for some failure scenarios. But depending on the fact that querying too big request on empty cdrom (or corrupted qcow2 node with no drv) will result in EIO and not ENOMEDIUM would be very strange. More over, we are going to move to 64bit requests, so larger requests will be allowed anyway. More over, keeping in mind that cdrom is the only driver that has .bdrv_is_inserted() handler it's strange that we should care so much about it in generic block layer, intuitively we should just do read and write, and cdrom driver should return correct errors if it is not inserted. But it's a work for another series. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-4-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11block/io: bdrv_refresh_limits(): use ERRP_GUARDVladimir Sementsov-Ogievskiy1-4/+3
This simplifies following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-3-vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11block/file-posix: fix workaround in raw_do_pwrite_zeroes()Vladimir Sementsov-Ogievskiy1-1/+0
We should not set overlap_bytes: 1. Don't worry: it is calculated by bdrv_mark_request_serialising() and will be equal to or greater than bytes anyway. 2. If the request was already aligned up to some greater alignment, than we may break things: we reduce overlap_bytes, and further bdrv_mark_request_serialising() may not help, as it will not restore old bigger alignment. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201203222713.13507-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11file-posix: check the use_lock before setting the file lockLi Feng1-1/+1
The scenario is that when accessing a volume on an NFS filesystem without supporting the file lock, Qemu will complain "Failed to lock byte 100", even when setting the file.locking = off. We should do file lock related operations only when the file.locking is enabled, otherwise, the syscall of 'fcntl' will return non-zero. Signed-off-by: Li Feng <fengli@smartx.com> Message-Id: <1607341446-85506-1-git-send-email-fengli@smartx.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11fuse: Implement hole detection through lseekMax Reitz1-0/+77
This is a relatively new feature in libfuse (available since 3.8.0, which was released in November 2019), so we have to add a dedicated check whether it is available before making use of it. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-7-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11fuse: (Partially) implement fallocate()Max Reitz1-0/+84
This allows allocating areas after the (old) EOF as part of a growing resize, writing zeroes, and discarding. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-6-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11fuse: Allow growable exportsMax Reitz1-8/+36
These will behave more like normal files in that writes beyond the EOF will automatically grow the export size. As an optimization, keep the RESIZE permission for growable exports so we do not have to take it for every post-EOF write. (This permission is not released when the export is destroyed, because at that point the BlockBackend is destroyed altogether anyway.) Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-5-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11fuse: Implement standard FUSE operationsMax Reitz1-0/+242
This makes the export actually useful instead of only producing errors whenever it is accessed. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-4-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11fuse: Allow exporting BDSs via FUSEMax Reitz3-0/+301
block-export-add type=fuse allows mounting block graph nodes via FUSE on some existing regular file. That file should then appears like a raw disk image, and accesses to it result in accesses to the exported BDS. Right now, we only implement the necessary block export functions to set it up and shut it down. We do not implement any access functions, so accessing the mount point only results in errors. This will be addressed by a followup patch. We keep a hash table of exported mount points, because we want to be able to detect when users try to use a mount point twice. This is because we invoke stat() to check whether the given mount point is a regular file, but if that file is served by ourselves (because it is already used as a mount point), then this stat() would have to be served by ourselves, too, which is impossible to do while we (as the caller) are waiting for it to settle. Therefore, keep track of mount point paths to at least catch the most obvious instances of that problem. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027190600.192171-3-mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11block/iscsi: Use lock guard macrosGan Qixin1-26/+24
Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/iscsi. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Message-Id: <20201203075055.127773-5-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11block/throttle-groups: Use lock guard macrosGan Qixin1-25/+23
Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/throttle-groups. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Message-Id: <20201203075055.127773-4-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11block/curl: Use lock guard macrosGan Qixin1-14/+14
Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/curl. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201203075055.127773-3-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-11block/accounting: Use lock guard macrosGan Qixin1-17/+15
Replace manual lock()/unlock() calls with lock guard macros (QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD) in block/accounting. Signed-off-by: Gan Qixin <ganqixin@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20201203075055.127773-2-ganqixin@huawei.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-12-10Tweak a few "Parameter 'NAME' expects THING" error messageMarkus Armbruster1-1/+1
Change to "expects a THING" where that's an obvious improvement Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201113082626.2725812-11-armbru@redhat.com>
2020-12-09block/export: avoid g_return_val_if() input validationStefan Hajnoczi1-1/+3
Do not validate input with g_return_val_if(). This API is intended for checking programming errors and is compiled out with -DG_DISABLE_CHECKS. Use an explicit if statement for input validation so it cannot accidentally be compiled out. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20201118091644.199527-5-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-12-08libvhost-user: make it a meson subprojectMarc-André Lureau1-1/+1
By making libvhost-user a subproject, check it builds standalone (without the global QEMU cflags etc). Note that the library still relies on QEMU include/qemu/atomic.h and linux_headers/. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20201125100640.366523-6-marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2020-11-24qcow2: Fix corruption on write_zeroes with MAY_UNMAPMaxim Levitsky1-3/+6
Commit 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") introduced a subtle change to code in zero_in_l2_slice: It swapped the order of 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); 3. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); To 1. qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_slice); 2. qcow2_free_any_clusters(bs, old_offset, 1, QCOW2_DISCARD_REQUEST); 3. set_l2_entry(s, l2_slice, l2_index + i, QCOW_OFLAG_ZERO); It seems harmless, however the call to qcow2_free_any_clusters can trigger a cache flush which can mark the L2 table as clean, and assuming that this was the last write to it, a stale version of it will remain on the disk. Now we have a valid L2 entry pointing to a freed cluster. Oops. Fixes: 205fa50750 ("qcow2: Add subcluster support to zero_in_l2_slice()") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> [ kwolf: Fixed to restore the correct original order from before 205fa50750; added comments like in discard_in_l2_slice(). ] Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20201124092815.39056-1-kwolf@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>