aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2020-10-27block: End quiescent sections when a BDS is deletedGreg Kurz1-0/+13
If a BDS gets deleted during blk_drain_all(), it might miss a call to bdrv_do_drained_end(). This means missing a call to aio_enable_external() and the AIO context remains disabled for ever. This can cause a device to become irresponsive and to disrupt the guest execution, ie. hang, loop forever or worse. This scenario is quite easy to encounter with virtio-scsi on POWER when punching multiple blockdev-create QMP commands while the guest is booting and it is still running the SLOF firmware. This happens because SLOF disables/re-enables PCI devices multiple times via IO/MEM/MASTER bits of PCI_COMMAND register after the initial probe/feature negotiation, as it tends to work with a single device at a time at various stages like probing and running block/network bootloaders without doing a full reset in-between. This naturally generates many dataplane stops and starts, and thus many drain sections that can race with blockdev_create_run(). In the end, SLOF bails out. It is somehow reproducible on x86 but it requires to generate articial dataplane start/stop activity with stop/cont QMP commands. In this case, seabios ends up looping for ever, waiting for the virtio-scsi device to send a response to a command it never received. Add a helper that pairs all previously called bdrv_do_drained_begin() with a bdrv_do_drained_end() and call it from bdrv_close(). While at it, update the "/bdrv-drain/graph-change/drain_all" test in test-bdrv-drain so that it can catch the issue. BugId: https://bugzilla.redhat.com/show_bug.cgi?id=1874441 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160346526998.272601.9045392804399803158.stgit@bahia.lan> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-27qcow2: Skip copy-on-write when allocating a zero clusterAlberto Garcia2-16/+46
Since commit c8bb23cbdbe32f5c326365e0a82e1b0e68cdcd8a when a write request results in a new allocation QEMU first tries to see if the rest of the cluster outside the written area contains only zeroes. In that case, instead of doing a normal copy-on-write operation and writing explicit zero buffers to disk, the code zeroes the whole cluster efficiently using pwrite_zeroes() with BDRV_REQ_NO_FALLBACK. This improves performance very significantly but it only happens when we are writing to an area that was completely unallocated before. Zero clusters (QCOW2_CLUSTER_ZERO_*) are treated like normal clusters and are therefore slower to allocate. This happens because the code uses bdrv_is_allocated_above() rather bdrv_block_status_above(). The former is not as accurate for this purpose but it is faster. However in the case of qcow2 the underlying call does already report zero clusters just fine so there is no reason why we cannot use that information. After testing 4KB writes on an image that only contains zero clusters this patch results in almost five times more IOPS. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <6d77cab968c501c44d6e1089b9bc91b04170b49e.1603731354.git.berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-27qcow2: Report BDRV_BLOCK_ZERO more accurately in bdrv_co_block_status()Alberto Garcia1-4/+4
If a BlockDriverState supports backing files but has none then any unallocated area reads back as zeroes. bdrv_co_block_status() is only reporting this is if want_zero is true, but this is an inexpensive test and there is no reason not to do it in all cases. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <66fa0914a0e2b727ab6d1b63ca773d7cd29a9a9e.1603731354.git.berto@igalia.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-26Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into ↵Peter Maydell1-1/+2
staging * fix --disable-tcg builds (Claudio) * Fixes for macOS --enable-modules build and OpenBSD curses/iconv detection (myself) * Start preparing for meson 0.56 (myself) * Move directory configuration to meson (myself) * Start untangling qemu_init (myself) * Windows fixes (Sunil) * Remove -no-kbm (Thomas) # gpg: Signature made Mon 26 Oct 2020 11:12:17 GMT # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: machine: move SMP initialization from vl.c machine: move UP defaults to class_base_init machine: remove deprecated -machine enforce-config-section option win32: boot broken when bind & data dir are the same WHPX: Fix WHPX build break configure: move install_blobs from configure to meson configure: remove unused variable from config-host.mak configure: move directory options from config-host.mak to meson configure: allow configuring localedir Makefile: separate meson rerun from the rest of the ninja invocation Remove deprecated -no-kvm option replay: do not build if TCG is not available qtest: unbreak non-TCG builds in bios-tables-test hw/core/qdev-clock: add a reference on aliased clocks do not use colons in test names meson: rewrite curses/iconv test build: fix macOS --enable-modules build Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-23block/io: fix bdrv_is_allocated_aboveVladimir Sementsov-Ogievskiy1-38/+5
bdrv_is_allocated_above wrongly handles short backing files: it reports after-EOF space as UNALLOCATED which is wrong, as on read the data is generated on the level of short backing file (if all overlays have unallocated areas at that place). Reusing bdrv_common_block_status_above fixes the issue and unifies code path. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20200924194003.22080-5-vsementsov@virtuozzo.com [Fix s/has/have/ as suggested by Eric Blake. Fix s/area/areas/. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block/io: bdrv_common_block_status_above: support bs == baseVladimir Sementsov-Ogievskiy1-1/+5
We are going to reuse bdrv_common_block_status_above in bdrv_is_allocated_above. bdrv_is_allocated_above may be called with include_base == false and still bs == base (for ex. from img_rebase()). So, support this corner case. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20200924194003.22080-4-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block/io: bdrv_common_block_status_above: support include_baseVladimir Sementsov-Ogievskiy2-7/+16
In order to reuse bdrv_common_block_status_above in bdrv_is_allocated_above, let's support include_base parameter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200924194003.22080-3-vsementsov@virtuozzo.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block/io: fix bdrv_co_block_status_aboveVladimir Sementsov-Ogievskiy2-16/+68
bdrv_co_block_status_above has several design problems with handling short backing files: 1. With want_zeros=true, it may return ret with BDRV_BLOCK_ZERO but without BDRV_BLOCK_ALLOCATED flag, when actually short backing file which produces these after-EOF zeros is inside requested backing sequence. 2. With want_zero=false, it may return pnum=0 prior to actual EOF, because of EOF of short backing file. Fix these things, making logic about short backing files clearer. With fixed bdrv_block_status_above we also have to improve is_zero in qcow2 code, otherwise iotest 154 will fail, because with this patch we stop to merge zeros of different types (produced by fully unallocated in the whole backing chain regions vs produced by short backing files). Note also, that this patch leaves for another day the general problem around block-status: misuse of BDRV_BLOCK_ALLOCATED as is-fs-allocated vs go-to-backing. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200924194003.22080-2-vsementsov@virtuozzo.com [Fix s/comes/come/ as suggested by Eric Blake --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block/export: add vhost-user-blk multi-queue supportStefan Hajnoczi1-6/+18
Allow the number of queues to be configured using --export vhost-user-blk,num-queues=N. This setting should match the QEMU --device vhost-user-blk-pci,num-queues=N setting but QEMU vhost-user-blk.c lowers its own value if the vhost-user-blk backend offers fewer queues than QEMU. The vhost-user-blk-server.c code is already capable of multi-queue. All virtqueue processing runs in the same AioContext. No new locking is needed. Add the num-queues=N option and set the VIRTIO_BLK_F_MQ feature bit. Note that the feature bit only announces the presence of the num_queues configuration space field. It does not promise that there is more than 1 virtqueue, so we can set it unconditionally. I tested multi-queue by running a random read fio test with numjobs=4 on an -smp 4 guest. After the benchmark finished the guest /proc/interrupts file showed activity on all 4 virtio-blk MSI-X. The /sys/block/vda/mq/ directory shows that Linux blk-mq has 4 queues configured. An automated test is included in the next commit. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20201001144604.559733-2-stefanha@redhat.com [Fixed accidental tab characters as suggested by Markus Armbruster --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block/export: add iothread and fixed-iothread optionsStefan Hajnoczi2-2/+34
Make it possible to specify the iothread where the export will run. By default the block node can be moved to other AioContexts later and the export will follow. The fixed-iothread option forces strict behavior that prevents changing AioContext while the export is active. See the QAPI docs for details. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200929125516.186715-5-stefanha@redhat.com [Fix stray '#' character in block-export.json and add missing "(since: 5.2)" as suggested by Eric Blake. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block: move block exports to libblockdevStefan Hajnoczi1-2/+2
Block exports are used by softmmu, qemu-storage-daemon, and qemu-nbd. They are not used by other programs and are not otherwise needed in libblock. Undo the recent move of blockdev-nbd.c from blockdev_ss into block_ss. Since bdrv_close_all() (libblock) calls blk_exp_close_all() (libblockdev) a stub function is required.. Make qemu-nbd.c use signal handling utility functions instead of duplicating the code. This helps because os-posix.c is in libblockdev and it depends on a qemu_system_killed() symbol that qemu-nbd.c lacks. Once we use the signal handling utility functions we also end up providing the necessary symbol. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20200929125516.186715-4-stefanha@redhat.com [Fixed s/ndb/nbd/ typo in commit description as suggested by Eric Blake --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23util/vhost-user-server: use static library in meson.buildStefan Hajnoczi2-5/+5
Don't compile contrib/libvhost-user/libvhost-user.c again. Instead build the static library once and then reuse it throughout QEMU. Also switch from CONFIG_LINUX to CONFIG_VHOST_USER, which is what the vhost-user tools (vhost-user-gpu, etc) do. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-14-stefanha@redhat.com [Added CONFIG_LINUX again because libvhost-user doesn't build on macOS. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23util/vhost-user-server: move header to include/Stefan Hajnoczi1-1/+1
Headers used by other subsystems are located in include/. Also add the vhost-user-server and vhost-user-blk-server headers to MAINTAINERS. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-13-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block/export: convert vhost-user-blk server to block export APIStefan Hajnoczi5-353/+126
Use the new QAPI block exports API instead of defining our own QOM objects. This is a large change because the lifecycle of VuBlockDev needs to follow BlockExportDriver. QOM properties are replaced by QAPI options objects. VuBlockDev is renamed VuBlkExport and contains a BlockExport field. Several fields can be dropped since BlockExport already has equivalents. The file names and meson build integration will be adjusted in a future patch. libvhost-user should probably be built as a static library that is linked into QEMU instead of as a .c file that results in duplicate compilation. The new command-line syntax is: $ qemu-storage-daemon \ --blockdev file,node-name=drive0,filename=test.img \ --export vhost-user-blk,node-name=drive0,id=export0,unix-socket=/tmp/vhost-user-blk.sock Note that unix-socket is optional because we may wish to accept chardevs too in the future. Markus noted that supported address families are not explicit in the QAPI schema. It is unlikely that support for more address families will be added since file descriptor passing is required and few address families support it. If a new address family needs to be added, then the QAPI 'features' syntax can be used to advertize them. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20200924151549.913737-12-stefanha@redhat.com [Skip test on big-endian host architectures because this device doesn't support them yet (as already mentioned in a code comment). --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block/export: report flush errorsStefan Hajnoczi1-4/+7
Propagate the flush return value since errors are possible. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-11-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23util/vhost-user-server: rework vu_client_trip() coroutine lifecycleStefan Hajnoczi1-7/+2
The vu_client_trip() coroutine is leaked during AioContext switching. It is also unsafe to destroy the vu_dev in panic_cb() since its callers still access it in some cases. Rework the lifecycle to solve these safety issues. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-10-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23util/vhost-user-server: drop unused DevicePanicNotifierStefan Hajnoczi1-2/+1
The device panic notifier callback is not used. Drop it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-7-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block/export: consolidate request structs into VuBlockReqStefan Hajnoczi1-47/+21
Only one struct is needed per request. Drop req_data and the separate VuBlockReq instance. Instead let vu_queue_pop() allocate everything at once. This fixes the req_data memory leak in vu_block_virtio_process_req(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20200924151549.913737-6-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block/export: vhost-user block device backend serverCoiby Xu3-0/+698
By making use of libvhost-user, block device drive can be shared to the connected vhost-user client. Only one client can connect to the server one time. Since vhost-user-server needs a block drive to be created first, delay the creation of this object. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Coiby Xu <coiby.xu@gmail.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-id: 20200918080912.321299-6-coiby.xu@gmail.com [Shorten "vhost_user_blk_server" string to "vhost_user_blk" to avoid the following compiler warning: ../block/export/vhost-user-blk-server.c:178:50: error: ‘%s’ directive output truncated writing 21 bytes into a region of size 20 [-Werror=format-truncation=] and fix "Invalid size %ld ..." ssize_t format string arguments for 32-bit hosts. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-23block/nvme: Add driver statistics for access alignment and hw errorsPhilippe Mathieu-Daudé1-0/+27
Keep statistics of some hardware errors, and number of aligned/unaligned I/O accesses. QMP example booting a full RHEL 8.3 aarch64 guest: { "execute": "query-blockstats" } { "return": [ { "device": "", "node-name": "drive0", "stats": { "flush_total_time_ns": 6026948, "wr_highest_offset": 3383991230464, "wr_total_time_ns": 807450995, "failed_wr_operations": 0, "failed_rd_operations": 0, "wr_merged": 3, "wr_bytes": 50133504, "failed_unmap_operations": 0, "failed_flush_operations": 0, "account_invalid": false, "rd_total_time_ns": 1846979900, "flush_operations": 130, "wr_operations": 659, "rd_merged": 1192, "rd_bytes": 218244096, "account_failed": false, "idle_time_ns": 2678641497, "rd_operations": 7406, }, "driver-specific": { "driver": "nvme", "completion-errors": 0, "unaligned-accesses": 2959, "aligned-accesses": 4477 }, "qdev": "/machine/peripheral-anon/device[0]/virtio-backend" } ] } Suggested-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20201001162939.1567915-1-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-22replay: do not build if TCG is not availableClaudio Fontana1-1/+2
this fixes non-TCG builds broken recently by replay reverse debugging. Stub the needed functions in stub/, splitting roughly between functions needed only by system emulation, by system emulation and tools, and by everyone. This includes duplicating some code in replay/, and puts the logic for non-replay related events in the replay/ module (+ the stubs), so this should be revisited in the future. Surprisingly, only _one_ qtest was affected by this, ide-test.c, which resulted in a buzz as the bh events were never delivered, and the bh never executed. Many other subsystems _should_ have been affected. This fixes the immediate issue, however a better way to group replay functionality to TCG-only code could be developed in the long term. Signed-off-by: Claudio Fontana <cfontana@suse.de> Message-Id: <20201013192123.22632-4-cfontana@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-15block: deprecate the sheepdog block driverDaniel P. Berrangé1-0/+14
This thread from a little over a year ago: http://lists.wpkg.org/pipermail/sheepdog/2019-March/thread.html states that sheepdog is no longer actively developed. The only mentioned users are some companies who are said to have it for legacy reasons with plans to replace it by Ceph. There is talk about cutting out existing features to turn it into a simple demo of how to write a distributed block service. There is no evidence of anyone working on that idea: https://github.com/sheepdog/sheepdog/commits/master No real commits to git since Jan 2018, and before then just some minor technical debt cleanup. There is essentially no activity on the mailing list aside from patches to QEMU that get CC'd due to our MAINTAINERS entry. Fedora packages for sheepdog failed to build from upstream source because of the more strict linker that no longer merges duplicate global symbols. Fedora patches it to add the missing "extern" annotations and presumably other distros do to, but upstream source remains broken. There is only basic compile testing, no functional testing of the driver. Since there are no build pre-requisites the sheepdog driver is currently enabled unconditionally. This would result in configure issuing a deprecation warning by default for all users. Thus the configure default is changed to disable it, requiring users to pass --enable-sheepdog to build the driver. Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20201002113243.2347710-3-berrange@redhat.com> Reviewed-by: Neal Gompa <ngompa13@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-13block/blkdebug: fix memory leakElena Afanasova1-0/+1
Spotted by PVS-Studio Signed-off-by: Elena Afanasova <eafanasova@gmail.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <1e903f928eb3da332cc95e2a6f87243bd9fe66e4.camel@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-10-13vmdk: fix maybe uninitialized warningsChristian Borntraeger1-4/+4
Fedora 32 gcc 10 seems to give false positives: Compiling C object libblock.fa.p/block_vmdk.c.o ../block/vmdk.c: In function ‘vmdk_parse_extents’: ../block/vmdk.c:587:5: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 587 | g_free(extent->l1_table); | ^~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:754:17: note: ‘extent’ was declared here 754 | VmdkExtent *extent; | ^~~~~~ ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 620 | ret = vmdk_init_tables(bs, extent, errp); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:598:17: note: ‘extent’ was declared here 598 | VmdkExtent *extent; | ^~~~~~ ../block/vmdk.c:1178:39: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 1178 | extent->flat_start_offset = flat_offset << 9; | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~ ../block/vmdk.c: In function ‘vmdk_open_vmdk4’: ../block/vmdk.c:581:22: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 581 | extent->l2_cache = | ~~~~~~~~~~~~~~~~~^ 582 | g_malloc(extent->entry_size * extent->l2_size * L2_CACHE_SIZE); | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:872:17: note: ‘extent’ was declared here 872 | VmdkExtent *extent; | ^~~~~~ ../block/vmdk.c: In function ‘vmdk_open’: ../block/vmdk.c:620:11: error: ‘extent’ may be used uninitialized in this function [-Werror=maybe-uninitialized] 620 | ret = vmdk_init_tables(bs, extent, errp); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../block/vmdk.c:598:17: note: ‘extent’ was declared here 598 | VmdkExtent *extent; | ^~~~~~ cc1: all warnings being treated as errors make: *** [Makefile.ninja:884: libblock.fa.p/block_vmdk.c.o] Error 1 fix them by assigning a default value. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Fam Zheng <fam@euphon.net> Message-Id: <20200930155859.303148-2-borntraeger@de.ibm.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2020-10-09block/nbd: nbd_co_reconnect_loop(): don't connect if drainedVladimir Sementsov-Ogievskiy1-0/+3
In a recent commit 12c75e20a269ac we've improved nbd_co_reconnect_loop() to not make drain wait for additional sleep. Similarly, we shouldn't try to connect, if previous sleep was interrupted by drain begin, otherwise drain_begin will have to wait for the whole connection attempt. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09block/nbd: fix reconnect-delayVladimir Sementsov-Ogievskiy1-9/+50
reconnect-delay has a design flaw: we handle it in the same loop where we do connection attempt. So, reconnect-delay may be exceeded by unpredictable time of connection attempt. Let's instead use separate timer. How to reproduce the bug: 1. Create an image on node1: qemu-img create -f qcow2 xx 100M 2. Start NBD server on node1: qemu-nbd xx 3. On node2 start qemu-io: ./build/qemu-io --image-opts \ driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=15 4. Type 'read 0 512' in qemu-io interface to check that connection works Be careful: you should make steps 5-7 in a short time, less than 15 seconds. 5. Kill nbd server on node1 6. Run 'read 0 512' in qemu-io interface again, to be sure that nbd client goes to reconnect loop. 7. On node1 run the following command sudo iptables -A INPUT -p tcp --dport 10809 -j DROP This will make the connect() call of qemu-io at node2 take a long time. And you'll see that read command in qemu-io will hang for a long time, more than 15 seconds specified by reconnect-delay parameter. It's the bug. 8. Don't forget to drop iptables rule on node1: sudo iptables -D INPUT -p tcp --dport 10809 -j DROP Important note: Step [5] is necessary to reproduce _this_ bug. If we miss step [5], the read command (step 6) will hang for a long time and this commit doesn't help, because there will be not long connect() to unreachable host, but long sendmsg() to unreachable host, which should be fixed by enabling and adjusting keep-alive on the socket, which is a thing for further patch set. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09block/nbd: correctly use qio_channel_detach_aio_context when neededVladimir Sementsov-Ogievskiy1-2/+2
Don't use nbd_client_detach_aio_context() driver handler where we want to finalize the connection. We should directly use qio_channel_detach_aio_context() in such cases. Driver handler may (and will) contain another things, unrelated to the qio channel. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-09block/nbd: fix drain dead-lock because of nbd reconnect-delayVladimir Sementsov-Ogievskiy1-0/+5
We pause reconnect process during drained section. So, if we have some requests, waiting for reconnect we should cancel them, otherwise they deadlock the drained section. How to reproduce: 1. Create an image: qemu-img create -f qcow2 xx 100M 2. Start NBD server: qemu-nbd xx 3. Start vm with second nbd disk on node2, like this: ./build/x86_64-softmmu/qemu-system-x86_64 -nodefaults -drive \ file=/work/images/cent7.qcow2 -drive \ driver=nbd,server.type=inet,server.host=192.168.100.5,server.port=10809,reconnect-delay=60 \ -vnc :0 -m 2G -enable-kvm -vga std 4. Access the vm through vnc (or some other way?), and check that NBD drive works: dd if=/dev/sdb of=/dev/null bs=1M count=10 - the command should succeed. 5. Now, kill the nbd server, and run dd in the guest again: dd if=/dev/sdb of=/dev/null bs=1M count=10 Now Qemu is trying to reconnect, and dd-generated requests are waiting for the connection (they will wait up to 60 seconds (see reconnect-delay option above) and than fail). But suddenly, vm may totally hang in the deadlock. You may need to increase reconnect-delay period to catch the dead-lock. VM doesn't respond because drain dead-lock happens in cpu thread with global mutex taken. That's not good thing by itself and is not fixed by this commit (true way is using iothreads). Still this commit fixes drain dead-lock itself. Note: probably, we can instead continue to reconnect during drained section. To achieve this, we may move negotiation to the connect thread to make it independent of bs aio context. But expanding drained section doesn't seem good anyway. So, let's now fix the bug the simplest way. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20200903190301.367620-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2020-10-06Merge remote-tracking branch 'remotes/bonzini-gitlab/tags/for-upstream' into ↵Peter Maydell3-4/+26
staging * Reverse debugging (Pavel) * CFLAGS cleanup (Paolo) * ASLR fix (Mark) * cpus.c refactoring (Claudio) # gpg: Signature made Tue 06 Oct 2020 07:35:09 BST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini-gitlab/tags/for-upstream: (37 commits) tests/acceptance: add reverse debugging test replay: create temporary snapshot at debugger connection replay: describe reverse debugging in docs/replay.txt gdbstub: add reverse continue support in replay mode gdbstub: add reverse step support in replay mode replay: flush rr queue before loading the vmstate replay: implement replay-seek command replay: introduce breakpoint at the specified step replay: introduce info hmp/qmp command qapi: introduce replay.json for record/replay-related stuff migration: introduce icount field for snapshots qcow2: introduce icount field for snapshots replay: provide an accessor for rr filename replay: don't record interrupt poll configure: don't enable ASLR for --enable-debug Windows builds configure: consistently pass CFLAGS/CXXFLAGS/LDFLAGS to meson configure: do not clobber environment CFLAGS/CXXFLAGS/LDFLAGS dtc: Convert Makefile bits to meson bits slirp: Convert Makefile bits to meson bits accel/tcg: use current_machine as it is always set for softmmu ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-10-06migration: introduce icount field for snapshotsPavel Dovgalyuk2-4/+16
Saving icount as a parameters of the snapshot allows navigation between them in the execution replay scenario. This information can be used for finding a specific snapshot for proceeding the recorded execution to the specific moment of the time. E.g., 'reverse step' action (introduced in one of the following patches) needs to load the nearest snapshot which is prior to the current moment of time. This patch also updates snapshot test which verifies qemu monitor output. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Kevin Wolf <kwolf@redhat.com> -- v4 changes: - squashed format update with test output update v7 changes: - introduced the spaces between the fields in snapshot info output - updated the test to match new field widths Message-Id: <160174518865.12451.14327573383978752463.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-06qcow2: introduce icount field for snapshotsPavel Dovgalyuk2-0/+10
This patch introduces the icount field for saving within the snapshot. It is required for navigation between the snapshots in record/replay mode. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Acked-by: Kevin Wolf <kwolf@redhat.com> -- v7 changes: - also fix the test which checks qcow2 snapshot extra data Message-Id: <160174518284.12451.2301137308458777398.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-10-05block/io: refactor save/load vmstateVladimir Sementsov-Ogievskiy2-39/+39
Like for read/write in a previous commit, drop extra indirection layer, generate directly bdrv_readv_vmstate() and bdrv_writev_vmstate(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-8-vsementsov@virtuozzo.com>
2020-10-05block: drop bdrv_prwvVladimir Sementsov-Ogievskiy2-45/+14
Now that we are not maintaining boilerplate code for coroutine wrappers, there is no more sense in keeping the extra indirection layer of bdrv_prwv(). Let's drop it and instead generate pure bdrv_preadv() and bdrv_pwritev(). Currently, bdrv_pwritev() and bdrv_preadv() are returning bytes on success, auto generated functions will instead return zero, as their _co_ prototype. Still, it's simple to make the conversion safe: the only external user of bdrv_pwritev() is test-bdrv-drain, and it is comfortable enough with bdrv_co_pwritev() instead. So prototypes are moved to local block/coroutines.h. Next, the only internal use is bdrv_pread() and bdrv_pwrite(), which are modified to return bytes on success. Of course, it would be great to convert bdrv_pread() and bdrv_pwrite() to return 0 on success. But this requires audit (and probably conversion) of all their users, let's leave it for another day refactoring. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-7-vsementsov@virtuozzo.com>
2020-10-05block: generate coroutine-wrapper codeVladimir Sementsov-Ogievskiy2-215/+3
Use code generation implemented in previous commit to generated coroutine wrappers in block.c and block/io.c Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-6-vsementsov@virtuozzo.com>
2020-10-05scripts: add block-coroutine-wrapper.pyVladimir Sementsov-Ogievskiy2-0/+57
We have a very frequent pattern of creating a coroutine from a function with several arguments: - create a structure to pack parameters - create _entry function to call original function taking parameters from struct - do different magic to handle completion: set ret to NOT_DONE or EINPROGRESS or use separate bool field - fill the struct and create coroutine from _entry function with this struct as a parameter - do coroutine enter and BDRV_POLL_WHILE loop Let's reduce code duplication by generating coroutine wrappers. This patch adds scripts/block-coroutine-wrapper.py together with some friends, which will generate functions with declared prototypes marked by the 'generated_co_wrapper' specifier. The usage of new code generation is as follows: 1. define the coroutine function somewhere int coroutine_fn bdrv_co_NAME(...) {...} 2. declare in some header file int generated_co_wrapper bdrv_NAME(...); with same list of parameters (generated_co_wrapper is defined in "include/block/block.h"). 3. Make sure the block_gen_c declaration in block/meson.build mentions the file with your marker function. Still, no function is now marked, this work is for the following commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20200924185414.28642-5-vsementsov@virtuozzo.com> [Added encoding='utf-8' to open() calls as requested by Vladimir. Fixed typo and grammar issues pointed out by Eric Blake. Removed clang-format dependency that caused build test issues. --Stefan] Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2020-10-05block: declare some coroutine functions in block/coroutines.hVladimir Sementsov-Ogievskiy2-17/+84
We are going to keep coroutine-wrappers code (structure-packing parameters, BDRV_POLL wrapper functions) in separate auto-generated files. So, we'll need a header with declaration of original _co_ functions, for those which are static now. As well, we'll need declarations for wrapper functions. Do these declarations now, as a preparation step. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-4-vsementsov@virtuozzo.com>
2020-10-05block/io: refactor coroutine wrappersVladimir Sementsov-Ogievskiy1-28/+32
Most of our coroutine wrappers already follow this convention: We have 'coroutine_fn bdrv_co_<something>(<normal argument list>)' as the core function, and a wrapper 'bdrv_<something>(<same argument list>)' which does parameter packing and calls bdrv_run_co(). The only outsiders are the bdrv_prwv_co and bdrv_common_block_status_above wrappers. Let's refactor them to behave as the others, it simplifies further conversion of coroutine wrappers. This patch adds an indirection layer, but it will be compensated by a further commit, which will drop bdrv_co_prwv together with the is_write logic, to keep the read and write paths separate. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200924185414.28642-3-vsementsov@virtuozzo.com>
2020-10-05block/nvme: Replace magic value by SCALE_MS definitionPhilippe Mathieu-Daudé1-1/+1
Use self-explicit SCALE_MS definition instead of magic value (missed in similar commit e4f310fe7f5). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-7-philmd@redhat.com>
2020-10-05block/nvme: Use register definitions from 'block/nvme.h'Philippe Mathieu-Daudé1-10/+11
Use the NVMe register definitions from "block/nvme.h" which ease a bit reviewing the code while matching the datasheet. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-6-philmd@redhat.com>
2020-10-05block/nvme: Drop NVMeRegs structure, directly use NvmeBarPhilippe Mathieu-Daudé1-14/+9
NVMeRegs only contains NvmeBar. Simplify the code by using NvmeBar directly. This triggers a checkpatch.pl error: ERROR: Use of volatile is usually wrong, please add a comment #30: FILE: block/nvme.c:691: + volatile NvmeBar *regs; This is a false positive as in our case we are using I/O registers, so the 'volatile' use is justified. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-5-philmd@redhat.com>
2020-10-05block/nvme: Reduce I/O registers scopePhilippe Mathieu-Daudé1-13/+16
We only access the I/O register in nvme_init(). Remove the reference in BDRVNVMeState and reduce its scope. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-4-philmd@redhat.com>
2020-10-05block/nvme: Map doorbells pages write-onlyPhilippe Mathieu-Daudé1-10/+19
Per the datasheet sections 3.1.13/3.1.14: "The host should not read the doorbell registers." As we don't need read access, map the doorbells with write-only permission. We keep a reference to this mapped address in the BDRVNVMeState structure. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-3-philmd@redhat.com>
2020-10-05util/vfio-helpers: Pass page protections to qemu_vfio_pci_map_bar()Philippe Mathieu-Daudé1-1/+2
Pages are currently mapped READ/WRITE. To be able to use different protections, add a new argument to qemu_vfio_pci_map_bar(). Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20200922083821.578519-2-philmd@redhat.com>
2020-10-02qcow2: Use L1E_SIZE in qcow2_write_l1_entry()Alberto Garcia1-2/+2
We overlooked these in 02b1ecfa100e7ecc2306560cd27a4a2622bfeb04 Signed-off-by: Alberto Garcia <berto@igalia.com> Message-Id: <20200928162333.14998-1-berto@igalia.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02block/export: Move writable to BlockExportOptionsKevin Wolf1-1/+15
The 'writable' option is a basic option that will probably be applicable to most if not all export types that we will implement. Move it from NBD to the generic BlockExport layer. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-26-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02block/export: Add query-block-exportsKevin Wolf1-0/+23
This adds a simple QMP command to query the list of block exports. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-25-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02block/export: Create BlockBackend in blk_exp_add()Kevin Wolf1-4/+45
Every export type will need a BlockBackend, so creating it centrally in blk_exp_add() instead of the .create driver callback avoids duplication. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-24-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02block/export: Move blk to BlockExportKevin Wolf1-0/+3
Every block export has a BlockBackend representing the disk that is exported. It should live in BlockExport therefore. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-23-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02block/export: Add BLOCK_EXPORT_DELETED eventKevin Wolf1-0/+2
Clients may want to know when an export has finally disappeard (block-export-del returns earlier than that in the general case), so add a QAPI event for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20200924152717.287415-22-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2020-10-02block/export: Add block-export-delKevin Wolf2-3/+44
Implement a new QMP command block-export-del and make nbd-server-remove a wrapper around it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20200924152717.287415-21-kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>