aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2021-09-15block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts()Stefano Garzarella1-9/+16
In mirror_iteration() we call mirror_wait_on_conflicts() with `self` parameter set to NULL. Starting from commit d44dae1a7c we dereference `self` pointer in mirror_wait_on_conflicts() without checks if it is not NULL. Backtrace: Program terminated with signal SIGSEGV, Segmentation fault. #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172 172 self->waiting_for_op = op; [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))] (gdb) bt #0 mirror_wait_on_conflicts (self=0x0, s=<optimized out>, offset=<optimized out>, bytes=<optimized out>) at ../block/mirror.c:172 #1 0x00005610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=<optimized out>) at ../block/mirror.c:491 #2 0x00005610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at ../job.c:917 #3 0x00005610c5f046c6 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:173 #4 0x00007f0909975820 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 from /usr/lib64/libc.so.6 Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001404 Fixes: d44dae1a7c ("block/mirror: fix active mirror dead-lock in mirror_wait_on_conflicts") Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20210910124533.288318-1-sgarzare@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15block/iscsi: Do not force-cap *pnumHanna Reitz1-3/+0
bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping *pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-7-hreitz@redhat.com>
2021-09-15block/gluster: Do not force-cap *pnumHanna Reitz1-3/+4
bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping *pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-6-hreitz@redhat.com>
2021-09-15block/file-posix: Do not force-cap *pnumHanna Reitz1-3/+4
bdrv_co_block_status() does it for us, we do not need to do it here. The advantage of not capping *pnum is that bdrv_co_block_status() can cache larger data regions than requested by its caller. Signed-off-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210812084148.14458-5-hreitz@redhat.com>
2021-09-15block: block-status cache for data regionsHanna Reitz1-3/+65
As we have attempted before (https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg06451.html, "file-posix: Cache lseek result for data regions"; https://lists.nongnu.org/archive/html/qemu-block/2021-02/msg00934.html, "file-posix: Cache next hole"), this patch seeks to reduce the number of SEEK_DATA/HOLE operations the file-posix driver has to perform. The main difference is that this time it is implemented as part of the general block layer code. The problem we face is that on some filesystems or in some circumstances, SEEK_DATA/HOLE is unreasonably slow. Given the implementation is outside of qemu, there is little we can do about its performance. We have already introduced the want_zero parameter to bdrv_co_block_status() to reduce the number of SEEK_DATA/HOLE calls unless we really want zero information; but sometimes we do want that information, because for files that consist largely of zero areas, special-casing those areas can give large performance boosts. So the real problem is with files that consist largely of data, so that inquiring the block status does not gain us much performance, but where such an inquiry itself takes a lot of time. To address this, we want to cache data regions. Most of the time, when bad performance is reported, it is in places where the image is iterated over from start to end (qemu-img convert or the mirror job), so a simple yet effective solution is to cache only the current data region. (Note that only caching data regions but not zero regions means that returning false information from the cache is not catastrophic: Treating zeroes as data is fine. While we try to invalidate the cache on zero writes and discards, such incongruences may still occur when there are other processes writing to the image.) We only use the cache for nodes without children (i.e. protocol nodes), because that is where the problem is: Drivers that rely on block-status implementations outside of qemu (e.g. SEEK_DATA/HOLE). Resolves: https://gitlab.com/qemu-project/qemu/-/issues/307 Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210812084148.14458-3-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [hreitz: Added `local_file == bs` assertion, as suggested by Vladimir] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15gluster: Align block-status tailMax Reitz1-0/+16
gluster's block-status implementation is basically a copy of that in block/file-posix.c, there is only one thing missing, and that is aligning trailing data extents to the request alignment (as added by commit 9c3db310ff0). Note that 9c3db310ff0 mentions that "there seems to be no other block driver that sets request_alignment and [...]", but while block/gluster.c does indeed not set request_alignment, block/io.c's bdrv_refresh_limits() will still default to an alignment of 512 because block/gluster.c does not provide a byte-aligned read function. Therefore, unaligned tails can conceivably occur, and so we should apply the change from 9c3db310ff0 to gluster's block-status implementation. Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210805143603.59503-1-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-15spelling: sytem => systemMichael Tokarev1-1/+1
Signed-off-By: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <fefb5f5c-82bc-05e2-b4c1-665e9d6896ff@msgid.tls.msk.ru> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2021-09-07block/nvme: Only report VFIO error on failed retryPhilippe Mathieu-Daudé1-1/+7
We expect the first qemu_vfio_dma_map() to fail (indicating DMA mappings exhaustion, see commit 15a730e7a3a). Do not report the first failure as error, since we are going to flush the mappings and retry. This removes spurious error message displayed on the monitor: (qemu) c (qemu) qemu-kvm: VFIO_MAP_DMA failed: No space left on device (qemu) info status VM status: running Reported-by: Tingting Mao <timao@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-12-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-09-07util/vfio-helpers: Pass Error handle to qemu_vfio_dma_map()Philippe Mathieu-Daudé1-11/+11
Currently qemu_vfio_dma_map() displays errors on stderr. When using management interface, this information is simply lost. Pass qemu_vfio_dma_map() an Error** handle so it can propagate the error to callers. Reviewed-by: Fam Zheng <fam@euphon.net> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-7-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-09-07block/nvme: Have nvme_create_queue_pair() report errors consistentlyPhilippe Mathieu-Daudé1-0/+3
nvme_create_queue_pair() does not return a boolean value (indicating eventual error) but a pointer, and is inconsistent in how it fills the error handler. To fulfill callers expectations, always set an error message on failure. Reported-by: Auger Eric <eric.auger@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-6-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-09-07block/nvme: Use safer trace format stringPhilippe Mathieu-Daudé1-1/+1
Fix when building with -Wshorten-64-to-32: warning: implicit conversion loses integer precision: 'unsigned long' to 'int' [-Wshorten-64-to-32] Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210902070025.197072-2-philmd@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-09-01block/file-win32: add reopen handlersViktor Prutyanov1-1/+100
Make 'qemu-img commit' work on Windows. Command 'commit' requires reopening backing file in RW mode. So, add reopen prepare/commit/abort handlers and change dwShareMode for CreateFile call in order to allow further read/write reopening. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/418 Suggested-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Viktor Prutyanov <viktor.prutyanov@phystech.edu> Tested-by: Helge Konetzka <hk@zapateado.de> Message-Id: <20210825173625.19415-1-viktor.prutyanov@phystech.edu> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/export/fuse.c: fix fuse-lseek on uclibc or muslFabrice Fontaine1-0/+3
Include linux/fs.h to avoid the following build failure on uclibc or musl raised since version 6.0.0: ../block/export/fuse.c: In function 'fuse_lseek': ../block/export/fuse.c:641:19: error: 'SEEK_HOLE' undeclared (first use in this function) 641 | if (whence != SEEK_HOLE && whence != SEEK_DATA) { | ^~~~~~~~~ ../block/export/fuse.c:641:19: note: each undeclared identifier is reported only once for each function it appears in ../block/export/fuse.c:641:42: error: 'SEEK_DATA' undeclared (first use in this function); did you mean 'SEEK_SET'? 641 | if (whence != SEEK_HOLE && whence != SEEK_DATA) { | ^~~~~~~~~ | SEEK_SET Fixes: - http://autobuild.buildroot.org/results/33c90ebf04997f4d3557cfa66abc9cf9a3076137 Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com> Message-Id: <20210827220301.272887-1-fontaine.fabrice@gmail.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/block-copy: block_copy_state_new(): drop extra argumentsVladimir Sementsov-Ogievskiy2-4/+3
The only caller pass copy_range and compress both false. Let's just drop these arguments. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824083856.17408-35-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: make public block driverVladimir Sementsov-Ogievskiy1-32/+28
Finally, copy-before-write gets own .bdrv_open and .bdrv_close handlers, block_init() call and becomes available through bdrv_open(). To achieve this: - cbw_init gets unused flags argument and becomes cbw_open - block_copy_state_free() call moved to new cbw_close() - in bdrv_cbw_append: - options are completed with driver and node-name, and we can simply use bdrv_insert_node() to do both open and drained replacing - in bdrv_cbw_drop: - cbw_close() is now responsible for freeing s->bcs, so don't do it here Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-22-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/block-copy: make setting progress optionalVladimir Sementsov-Ogievskiy1-7/+11
Now block-copy will crash if user don't set progress meter by block_copy_set_progress_meter(). copy-before-write filter will be used in separate of backup job, and it doesn't want any progress meter (for now). So, allow not setting it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-21-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: initialize block-copy bitmapVladimir Sementsov-Ogievskiy2-9/+11
We are going to publish copy-before-write filter to be used in separate of backup. Future step would support bitmap for the filter. But let's start from full set bitmap. We have to modify backup, as bitmap is first initialized by copy-before-write filter, and then backup modifies it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-20-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: cbw_init(): use optionsVladimir Sementsov-Ogievskiy1-14/+15
One more step closer to .bdrv_open(): use options instead of plain arguments. Move to bdrv_open_child() calls, native for drive open handlers. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-19-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: bdrv_cbw_append(): drop unused compress argVladimir Sementsov-Ogievskiy3-6/+4
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-18-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: cbw_init(): use file child after attachingVladimir Sementsov-Ogievskiy1-7/+7
In the next commit we'll get rid of source argument of cbw_init(). Prepare to it now, to make next commit simpler: move the code block that uses source below attaching the child and use bs->file->bs instead of source variable. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-17-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: cbw_init(): rename variablesVladimir Sementsov-Ogievskiy1-15/+14
One more step closer to real .bdrv_open() handler: use more usual names for bs being initialized and its state. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-16-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: introduce cbw_init()Vladimir Sementsov-Ogievskiy1-28/+41
Move part of bdrv_cbw_append() to new function cbw_open(). It's an intermediate step for adding normal .bdrv_open() handler to the filter. With this commit no logic is changed, but we have a function which will be turned into .bdrv_open() handler in future commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-15-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: bdrv_cbw_append(): replace child at lastVladimir Sementsov-Ogievskiy1-22/+11
Refactor the function to replace child at last. Thus we don't need to revert it and code is simplified. block-copy state initialization being done before replacing the child doesn't need any drained section. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-14-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: use file child instead of backingVladimir Sementsov-Ogievskiy1-17/+22
We are going to publish copy-before-write filter, and there no public backing-child-based filter in Qemu. No reason to create a precedent, so let's refactor copy-before-write filter instead. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-13-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: drop extra bdrv_unref on failure pathVladimir Sementsov-Ogievskiy1-1/+0
bdrv_attach_child() do bdrv_unref() on failure, so we shouldn't do it by hand here. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-12-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/copy-before-write: relax permission requirements when no parentsVladimir Sementsov-Ogievskiy1-3/+5
We are going to publish copy-before-write filter. So, user should be able to create it with blockdev-add first, specifying both filtered and target children. And then do blockdev-reopen, to actually insert the filter where needed. Currently, filter unshares write permission unconditionally on source node. It's good, but it will not allow to do blockdev-add. So, let's relax restrictions when filter doesn't have any parent. Test output is modified, as now permission conflict happens only when job creates a blk parent for filter node. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-11-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/backup: move cluster size calculation to block-copyVladimir Sementsov-Ogievskiy4-61/+64
The main consumer of cluster-size is block-copy. Let's calculate it here instead of passing through backup-top. We are going to publish copy-before-write filter soon, so it will be created through options. But we don't want for now to make explicit option for cluster-size, let's continue to calculate it automatically. So, now is the time to get rid of cluster_size argument for bdrv_cbw_append(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-10-vsementsov@virtuozzo.com> [hreitz: Add qemu/error-report.h include to block/block-copy.c] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/backup: set copy_range and compress after filter insertionVladimir Sementsov-Ogievskiy3-5/+3
We are going to publish copy-before-write filter, so it would be initialized through options. Still we don't want to publish compress and copy-range options, as 1. Modern way to enable compression is to use compress filter. 2. For copy-range it's unclean how to make proper interface: - it's has experimental prefix for backup job anyway - the whole BackupPerf structure doesn't make sense for the filter So, let's just add copy-range possibility to the filter later if needed. Still, we are going to continue support for compression and experimental copy-range in backup job. So, set these options after filter creation. Note, that we can drop "compress" argument of bdrv_cbw_append() now, as well as "perf". The only reason not doing so is that now, when I prepare this patch the big series around it is already reviewed and I want to avoid extra rebase conflicts to simplify review of the following version. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-9-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/block-copy: introduce block_copy_set_copy_opts()Vladimir Sementsov-Ogievskiy1-20/+29
We'll need a possibility to set compress and use_copy_range options after initialization of the state. So make corresponding part of block_copy_state_new() separate and public. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210824083856.17408-8-vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block-copy: move detecting fleecing scheme to block-copyVladimir Sementsov-Ogievskiy4-26/+25
We want to simplify initialization interface of copy-before-write filter as we are going to make it public. So, let's detect fleecing scheme exactly in block-copy code, to not pass this information through extra levels. Why not just set BDRV_REQ_SERIALISING unconditionally: because we are going to implement new more efficient fleecing scheme which will not rely on backing feature. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20210824083856.17408-7-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block: rename backup-top to copy-before-writeVladimir Sementsov-Ogievskiy4-76/+76
We are going to convert backup_top to full featured public filter, which can be used in separate of backup job. Start from renaming from "how it used" to "what it does". While updating comments in 283 iotest, drop and rephrase also things about ".active", as this field is now dropped, and filter doesn't have "inactive" mode. Note that this change may be considered as incompatible interface change, as backup-top filter format name was visible through query-block and query-named-block-nodes. Still, consider the following reasoning: 1. backup-top was never documented, so if someone depends on format name (for driver that can't be used other than it is automatically inserted on backup job start), it's a kind of "undocumented feature use". So I think we are free to change it. 2. There is a hope, that there is no such users: it's a lot more native to give a good node-name to backup-top filter if need to operate with it somehow, and don't touch format name. 3. Another "incompatible" change in further commit would be moving copy-before-write filter from using backing child to file child. And this is even more reasonable than renaming: for now all public filters are file-child based. So, it's a risky change, but risk seems small and good interface worth it. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-6-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block: introduce blk_replace_bsVladimir Sementsov-Ogievskiy1-0/+8
Add function to change bs inside blk. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210824083856.17408-3-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01raw-format: drop WRITE and RESIZE child perms when possibleStefan Hajnoczi1-1/+20
The following command-line fails due to a permissions conflict: $ qemu-storage-daemon \ --blockdev driver=nvme,node-name=nvme0,device=0000:08:00.0,namespace=1 \ --blockdev driver=raw,node-name=l1-1,file=nvme0,offset=0,size=1073741824 \ --blockdev driver=raw,node-name=l1-2,file=nvme0,offset=1073741824,size=1073741824 \ --nbd-server addr.type=unix,addr.path=/tmp/nbd.sock,max-connections=2 \ --export type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on \ --export type=nbd,id=nbd-l1-2,node-name=l1-2,name=l1-2,writable=on qemu-storage-daemon: --export type=nbd,id=nbd-l1-1,node-name=l1-1,name=l1-1,writable=on: Permission conflict on node 'nvme0': permissions 'resize' are both required by node 'l1-1' (uses node 'nvme0' as 'file' child) and unshared by node 'l1-2' (uses node 'nvme0' as 'file' child). The problem is that block/raw-format.c relies on bdrv_default_perms() to set permissions on the nvme node. The default permissions add RESIZE in anticipation of a format driver like qcow2 that needs to grow the image file. This fails because RESIZE is unshared, so we cannot get the RESIZE permission. Max Reitz pointed out that block/crypto.c already handles this case by implementing a custom ->bdrv_child_perm() function that adjusts the result of bdrv_default_perms(). This patch takes the same approach in block/raw-format.c so that RESIZE is only required if it's actually necessary (e.g. the parent is qcow2). Cc: Max Reitz <mreitz@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20210726122839.822900-1-stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-09-01block/monitor: Consolidate hmp_handle_error calls to reduce redundant codeMao Zhongyi1-6/+6
Signed-off-by: Mao Zhongyi <maozhongyi@cmss.chinamobile.com> Message-Id: <20210802062507.347555-1-maozhongyi@cmss.chinamobile.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-08-09block/export/fuse.c: fix musl buildFabrice Fontaine1-2/+6
Fix the following build failure on musl raised since version 6.0.0 and https://gitlab.com/qemu-project/qemu/-/commit/4ca37a96a75aafe7a37ba51ab1912b09b7190a6b because musl does not define FALLOC_FL_ZERO_RANGE: ../block/export/fuse.c: In function 'fuse_fallocate': ../block/export/fuse.c:563:23: error: 'FALLOC_FL_ZERO_RANGE' undeclared (first use in this function) 563 | } else if (mode & FALLOC_FL_ZERO_RANGE) { | ^~~~~~~~~~~~~~~~~~~~ Fixes: - http://autobuild.buildroot.org/results/b96e3d364fd1f8bbfb18904a742e73327d308f64 Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com> Message-Id: <20210809095101.1101336-1-fontaine.fabrice@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2021-08-03block: Fix in_flight leak in request padding error pathKevin Wolf1-3/+4
When bdrv_pad_request() fails in bdrv_co_preadv_part(), bs->in_flight has been increased, but is never decreased again. This leads to a hang when trying to drain the block node. This bug was observed with Windows guests which issue a request that fully uses IOV_MAX during installation, so that when padding is necessary (O_DIRECT with a 4k sector size block device on the host), adding another entry causes failure. Call bdrv_dec_in_flight() to fix this. There is a larger problem to solve here because this request shouldn't even fail, but Windows doesn't seem to care and with this minimal fix the installation succeeds. So given that we're already in freeze, let's take this minimal fix for 6.1. Fixes: 98ca45494fcd6bf0336ecd559e440b6de6ea4cd3 Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1972079 Reported-by: Qing Wang <qinwang@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20210727154923.91067-1-kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-29block/io_uring: resubmit when result is -EAGAINFabian Ebner1-1/+15
Linux SCSI can throw spurious -EAGAIN in some corner cases in its completion path, which will end up being the result in the completed io_uring request. Resubmitting such requests should allow block jobs to complete, even if such spurious errors are encountered. Co-authored-by: Stefan Hajnoczi <stefanha@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Fabian Ebner <f.ebner@proxmox.com> Message-id: 20210729091029.65369-1-f.ebner@proxmox.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-26block/nvme: Fix VFIO_MAP_DMA failed: No space left on devicePhilippe Mathieu-Daudé1-0/+22
When the NVMe block driver was introduced (see commit bdd6a90a9e5, January 2018), Linux VFIO_IOMMU_MAP_DMA ioctl was only returning -ENOMEM in case of error. The driver was correctly handling the error path to recycle its volatile IOVA mappings. To fix CVE-2019-3882, Linux commit 492855939bdb ("vfio/type1: Limit DMA mappings per container", April 2019) added the -ENOSPC error to signal the user exhausted the DMA mappings available for a container. The block driver started to mis-behave: qemu-system-x86_64: VFIO_MAP_DMA failed: No space left on device (qemu) (qemu) info status VM status: paused (io-error) (qemu) c VFIO_MAP_DMA failed: No space left on device (qemu) c VFIO_MAP_DMA failed: No space left on device (The VM is not resumable from here, hence stuck.) Fix by handling the new -ENOSPC error (when DMA mappings are exhausted) without any distinction to the current -ENOMEM error, so we don't change the behavior on old kernels where the CVE-2019-3882 fix is not present. An easy way to reproduce this bug is to restrict the DMA mapping limit (65535 by default) when loading the VFIO IOMMU module: # modprobe vfio_iommu_type1 dma_entry_limit=666 Cc: qemu-stable@nongnu.org Cc: Fam Zheng <fam@euphon.net> Cc: Maxim Levitsky <mlevitsk@redhat.com> Cc: Alex Williamson <alex.williamson@redhat.com> Reported-by: Michal Prívozník <mprivozn@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20210723195843.1032825-1-philmd@redhat.com Fixes: bdd6a90a9e5 ("block: Add VFIO based NVMe driver") Buglink: https://bugs.launchpad.net/qemu/+bug/1863333 Resolves: https://gitlab.com/qemu-project/qemu/-/issues/65 Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-21iotests: Improve and rename test 291 to qemu-img-bitmapEric Blake1-1/+1
Enhance the test to demonstrate existing less-than-stellar behavior of qemu-img with a qcow2 image containing an inconsistent bitmap: we don't diagnose the problem until after copying the entire image (a potentially long time), and when we do diagnose the failure, we still end up leaving an empty bitmap in the destination. This mess will be cleaned up in the next patch. While at it, rename the test now that we support useful iotest names, and fix a missing newline in the error message thus exposed. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20210709153951.2801666-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nir Soffer <nsoffer@redhat.com>
2021-07-21linux-aio: limit the batch size using `aio-max-batch` parameterStefano Garzarella1-1/+8
When there are multiple queues attached to the same AIO context, some requests may experience high latency, since in the worst case the AIO engine queue is only flushed when it is full (MAX_EVENTS) or there are no more queues plugged. Commit 2558cb8dd4 ("linux-aio: increasing MAX_EVENTS to a larger hardcoded value") changed MAX_EVENTS from 128 to 1024, to increase the number of in-flight requests. But this change also increased the potential maximum batch to 1024 elements. When there is a single queue attached to the AIO context, the issue is mitigated from laio_io_unplug() that will flush the queue every time is invoked since there can't be others queue plugged. Let's use the new `aio-max-batch` IOThread parameter to mitigate this issue, limiting the number of requests in a batch. We also define a default value (32): this value is obtained running some benchmarks and it represents a good tradeoff between the latency increase while a request is queued and the cost of the io_submit(2) system call. Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-id: 20210721094211.69853-4-sgarzare@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-07-20block/export: Conditionally ignore set-context errorMax Reitz1-1/+4
When invoking block-export-add with some iothread and fixed-iothread=false, and changing the node's iothread fails, the error is supposed to be ignored. However, it is still stored in *errp, which is wrong. If a second error occurs, the "*errp must be NULL" assertion in error_setv() fails: qemu-system-x86_64: ../util/error.c:59: error_setv: Assertion `*errp == NULL' failed. So if fixed-iothread=false, we should ignore the error by passing NULL to bdrv_try_set_aio_context(). Fixes: f51d23c80af73c95e0ce703ad06a300f1b3d63ef ("block/export: add iothread and fixed-iothread options") Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20210624083825.29224-2-mreitz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20block/vvfat: fix: drop backingVladimir Sementsov-Ogievskiy1-39/+4
Most probably this fake backing child doesn't work anyway (see notes about it in a8a4d15c1c34d). Still, since 25f78d9e2de528473d52 drivers are required to set .supports_backing if they want to call bdrv_set_backing_hd, so now vvfat just doesn't work because of this check. Let's finally drop this fake backing file. Fixes: 25f78d9e2de528473d52acfcf7acdfb64e3453d4 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210715124853.13335-1-vsementsov@virtuozzo.com> Tested-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20replication: Remove workaroundLukas Straub1-11/+1
Remove the workaround introduced in commit 6ecbc6c52672db5c13805735ca02784879ce8285 "replication: Avoid blk_make_empty() on read-only child". It is not needed anymore since s->hidden_disk is guaranteed to be writable when secondary_do_checkpoint() runs. Because replication_start(), _do_checkpoint() and _stop() are only called by COLO migration code and COLO-migration activates all disks via bdrv_invalidate_cache_all() before it calls these functions. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <d3acfad43879e9f376bffa7dd797ae74d0a7c81a.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20replication: Properly attach childrenLukas Straub1-3/+27
The replication driver needs access to the children block-nodes of it's child so it can issue bdrv_make_empty() and bdrv_co_pwritev() to manage the replication. However, it does this by directly copying the BdrvChilds, which is wrong. Fix this by properly attaching the block-nodes with bdrv_attach_child() and requesting the required permissions. This ultimatively fixes a potential crash in replication_co_writev(), because it may write to s->secondary_disk if it is in state BLOCK_REPLICATION_FAILOVER_FAILED, without requesting write permissions first. And now the workaround in secondary_do_checkpoint() can be removed. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <5d0539d729afb8072d0d7cde977c5066285591b4.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20replication: Reduce usage of s->hidden_disk and s->secondary_diskLukas Straub1-17/+28
In preparation for the next patch, initialize s->hidden_disk and s->secondary_disk later and replace access to them with local variables in the places where they aren't initialized yet. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1eb9dc179267207d9c7eccaeb30761758e32e9ab.1626619393.git.lukasstraub2@web.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20replication: Remove s->active_diskLukas Straub1-17/+17
s->active_disk is bs->file. Remove it and use local variables instead. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <2534f867ea9be5b666dfce19744b7d4e2b96c976.1626619393.git.lukasstraub2@web.de> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20block/mirror: fix active mirror dead-lock in mirror_wait_on_conflictsVladimir Sementsov-Ogievskiy1-0/+12
It's possible that requests start to wait each other in mirror_wait_on_conflicts(). To avoid it let's use same technique as in block/io.c in bdrv_wait_serialising_requests_locked() / bdrv_find_conflicting_request(): don't wait on intersecting request if it is already waiting for some other request. For details of the dead-lock look at testIntersectingActiveIO() test-case which we actually fixing now. Fixes: d06107ade0ce74dc39739bac80de84b51ec18546 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210702211636.228981-4-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-20block/mirror: set .co for active-write MirrorOp objectsVladimir Sementsov-Ogievskiy1-0/+1
This field is unused, but it very helpful for debugging. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20210702211636.228981-2-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2021-07-19blkdebug: protect rules and suspended_reqs with a lockEmanuele Giuseppe Esposito1-10/+39
First, categorize the structure fields to identify what needs to be protected and what doesn't. We essentially need to protect only .state, and the 3 lists in BDRVBlkdebugState. Then, add the lock and mark the functions accordingly. Co-developed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614082931.24925-7-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2021-07-19block/blkdebug: remove new_state field and instead use a local variableEmanuele Giuseppe Esposito1-7/+6
There seems to be no benefit in using a field. Replace it with a local variable, and move the state update before the yields. The state update has do be done before the yields because now using a local variable does not allow the new updated state to be visible by the other yields. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20210614082931.24925-6-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Max Reitz <mreitz@redhat.com>