aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2016-09-13Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell7-51/+870
staging Pull request v2: * Fixed qcow2 sanitizer warnings [Peter] * Renamed get_error test cases to get_error_all to avoid tripping "error:" grep scripts [Peter] * Added Fam's iothread stop patch # gpg: Signature made Tue 13 Sep 2016 11:02:30 BST # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: iothread: Stop threads before main() quits tests: fix qvirtqueue_kick MAINTAINERS: add maintainer for replication support replication driver in blockdev-add tests: add unit test case for replication replication: Implement new driver for block replication replication: Introduce new APIs to do replication operation configure: support replication mirror: auto complete active commit docs: block replication's description block: Link backup into block core Backup: export interfaces for extra serialization Backup: clear all bitmap when doing block checkpoint block: unblock backup operations in backing file virtio-blk: rename virtio_device_info to virtio_blk_info linux-aio: process completions from ioq_submit() linux-aio: split processing events function linux-aio: consume events in userspace instead of calling io_getevents qcow2: avoid memcpy(dst, NULL, len) Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-09-13replication: Implement new driver for block replicationWen Congyang2-0/+660
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Message-id: 1469602913-20979-10-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13mirror: auto complete active commitWen Congyang1-4/+9
Auto complete mirror job in background to prevent from blocking synchronously Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Message-id: 1469602913-20979-7-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13block: Link backup into block coreWen Congyang1-1/+1
Some programs that add a dependency on it will use the block layer directly. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1469602913-20979-5-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13Backup: export interfaces for extra serializationChanglong Xie1-7/+34
Normal backup(sync='none') workflow: step 1. NBD peformance I/O write from client to server qcow2_co_writev bdrv_co_writev ... bdrv_aligned_pwritev notifier_with_return_list_notify -> backup_do_cow bdrv_driver_pwritev // write new contents step 2. drive-backup sync=none backup_do_cow { wait_for_overlapping_requests cow_request_begin for(; start < end; start++) { bdrv_co_readv_no_serialising //read old contents from Secondary disk bdrv_co_writev // write old contents to hidden-disk } cow_request_end } step 3. Then roll back to "step 1" to write new contents to Secondary disk. And for replication, we must make sure that we only read the old contents from Secondary disk in order to keep contents consistent. 1) Replication workflow of Secondary virtio-blk ^ -------> 1 NBD | || server 3 replication || ^ ^ || | backing backing | || Secondary disk 6<-------- hidden-disk 5 <-------- active-disk 4 || | ^ || '-------------------------' || drive-backup sync=none 2 Hence, we need these interfaces to implement coarse-grained serialization between COW of Secondary disk and the read operation of replication. Example codes about how to use them: *#include "block/block_backup.h" static coroutine_fn int xxx_co_readv() { CowRequest req; BlockJob *job = secondary_disk->bs->job; if (job) { backup_wait_for_overlapping_requests(job, start, end); backup_cow_request_begin(&req, job, start, end); ret = bdrv_co_readv(); backup_cow_request_end(&req); goto out; } ret = bdrv_co_readv(); out: return ret; } Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1469602913-20979-4-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13Backup: clear all bitmap when doing block checkpointWen Congyang1-0/+18
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Signed-off-by: Changlong Xie <xiecl.fnst@cn.fujitsu.com> Signed-off-by: Wang WeiWei <wangww.fnst@cn.fujitsu.com> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1469602913-20979-3-git-send-email-xiecl.fnst@cn.fujitsu.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13linux-aio: process completions from ioq_submit()Roman Pen1-2/+22
In order to reduce completion latency it makes sense to harvest completed requests ASAP. Very fast backend device can complete requests just after submission, so it is worth trying to check ring buffer in order to peek completed requests directly after io_submit() has been called. Indeed, this patch reduces the completions latencies and increases the overall throughput, e.g. the following is the percentiles of number of completed requests at once: 1th 10th 20th 30th 40th 50th 60th 70th 80th 90th 99.99th Before 2 4 42 112 128 128 128 128 128 128 128 After 1 1 4 14 33 45 47 48 50 51 108 That means, that before the current patch is applied the ring buffer is observed as full (128 requests were consumed at once) in 60% of calls. After patch is applied the distribution of number of completed requests is "smoother" and the queue (requests in-flight) is almost never full. The fio read results are the following (write results are almost the same and are not showed here): Before ------ job: (groupid=0, jobs=8): err= 0: pid=2227: Tue Jul 19 11:29:50 2016 Description : [Emulation of Storage Server Access Pattern] read : io=54681MB, bw=1822.7MB/s, iops=179779, runt= 30001msec slat (usec): min=172, max=16883, avg=338.35, stdev=109.66 clat (usec): min=1, max=21977, avg=1051.45, stdev=299.29 lat (usec): min=317, max=22521, avg=1389.83, stdev=300.73 clat percentiles (usec): | 1.00th=[ 346], 5.00th=[ 596], 10.00th=[ 708], 20.00th=[ 852], | 30.00th=[ 932], 40.00th=[ 996], 50.00th=[ 1048], 60.00th=[ 1112], | 70.00th=[ 1176], 80.00th=[ 1256], 90.00th=[ 1384], 95.00th=[ 1496], | 99.00th=[ 1800], 99.50th=[ 1928], 99.90th=[ 2320], 99.95th=[ 2672], | 99.99th=[ 4704] bw (KB /s): min=205229, max=553181, per=12.50%, avg=233278.26, stdev=18383.51 After ------ job: (groupid=0, jobs=8): err= 0: pid=2220: Tue Jul 19 11:31:51 2016 Description : [Emulation of Storage Server Access Pattern] read : io=57637MB, bw=1921.2MB/s, iops=189529, runt= 30002msec slat (usec): min=169, max=20636, avg=329.61, stdev=124.18 clat (usec): min=2, max=19592, avg=988.78, stdev=251.04 lat (usec): min=381, max=21067, avg=1318.42, stdev=243.58 clat percentiles (usec): | 1.00th=[ 310], 5.00th=[ 580], 10.00th=[ 748], 20.00th=[ 876], | 30.00th=[ 908], 40.00th=[ 948], 50.00th=[ 1012], 60.00th=[ 1064], | 70.00th=[ 1080], 80.00th=[ 1128], 90.00th=[ 1224], 95.00th=[ 1288], | 99.00th=[ 1496], 99.50th=[ 1608], 99.90th=[ 1960], 99.95th=[ 2256], | 99.99th=[ 5408] bw (KB /s): min=212149, max=390160, per=12.49%, avg=245746.04, stdev=11606.75 Throughput increased from 1822MB/s to 1921MB/s, average completion latencies decreased from 1051us to 988us. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Message-id: 1468931263-32667-4-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13linux-aio: split processing events functionRoman Pen1-10/+21
Prepare processing events function to be called from ioq_submit(), thus split function on two parts: the first harvests completed IO requests, the second submits pending requests. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Message-id: 1468931263-32667-3-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13linux-aio: consume events in userspace instead of calling io_geteventsRoman Pen1-26/+99
AIO context in userspace is represented as a simple ring buffer, which can be consumed directly without entering the kernel, which obviously can bring some performance gain. QEMU does not use timeout value for waiting for events completions, so we can consume all events from userspace. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Message-id: 1468931263-32667-2-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13qcow2: avoid memcpy(dst, NULL, len)Stefan Hajnoczi2-2/+7
Section "7.1.4 Use of library functions" in the C99 standard says: If an argument to a function has an invalid value (such as [...] a null pointer [...]) [...] the behavior is undefined. Additionally the "searching and sorting" functions are specified as requiring valid pointer values as described in 7.1.4. This patch fixes the following sanitizer errors: block/qcow2.c:1807:41: runtime error: null pointer passed as argument 2, which is declared to never be null block/qcow2-cluster.c:86:26: runtime error: null pointer passed as argument 2, which is declared to never be null Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1473758138-19260-1-git-send-email-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-09-13block/gluster: add support to choose libgfapi logfilePrasanna Kumar Kalever1-4/+38
currently all the libgfapi logs defaults to '/dev/stderr' as it was hardcoded in a call to glfs logging api. When the debug level is chosen to DEBUG/TRACE, gfapi logs will be huge and fill/overflow the console view. This patch provides a commandline option to mention log file path which helps in logging to the specified file and also help in persisting the gfapi logs. Usage: ----- *URI Style: --------- -drive file=gluster://hostname/volname/image.qcow2,file.debug=9,\ file.logfile=/var/log/qemu/qemu-gfapi.log *JSON Style: ---------- 'json:{ "driver":"qcow2", "file":{ "driver":"gluster", "volume":"volname", "path":"image.qcow2", "debug":"9", "logfile":"/var/log/qemu/qemu-gfapi.log", "server":[ { "type":"tcp", "host":"1.2.3.4", "port":24007 }, { "type":"unix", "socket":"/var/run/glusterd.socket" } ] } }' Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-09-05qcow2: fix iovec size at qcow2_co_pwritev_compressedPavel Butsykin1-1/+1
Use bytes as the size would be more exact than s->cluster_size. Although qemu_iovec_to_buf() will not allow to go beyond the qiov. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05drive-backup: added support for data compressionPavel Butsykin1-1/+11
The idea is simple - backup is "written-once" data. It is written block by block and it is large enough. It would be nice to save storage space and compress it. The patch adds a flag to the qmp/hmp drive-backup command which enables block compression. Compression should be implemented in the format driver to enable this feature. There are some limitations of the format driver to allow compressed writes. We can write data only once. Though for backup this is perfectly fine. These limitations are maintained by the driver and the error will be reported if we are doing something wrong. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05block/io: turn on dirty_bitmaps for the compressed writesPavel Butsykin1-9/+5
Previously was added the assert: commit 1755da16e32c15b22a521e8a38539e4b5cf367f3 Author: Paolo Bonzini <pbonzini@redhat.com> Date: Thu Oct 18 16:49:18 2012 +0200 block: introduce new dirty bitmap functionality Now the compressed write is always in coroutine and setting the bits is done after the write, so that we can return the dirty_bitmaps for the compressed writes. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05block: remove BlockDriver.bdrv_write_compressedPavel Butsykin2-37/+2
There are no block drivers left that implement the old .bdrv_write_compressed interface, so it can be removed. Also now we have no need to use the bdrv_pwrite_compressed function and we can remove it entirely. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05qcow: cleanup qcow_co_pwritev_compressed to avoid the recursionPavel Butsykin1-17/+7
Now that the function uses a vector instead of a buffer, there is no need to use recursive code. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05qcow: add qcow_co_pwritev_compressedPavel Butsykin1-67/+42
Added implementation of the qcow_co_pwritev_compressed function that will allow us to safely use compressed writes for the qcow from running VMs. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05vmdk: add vmdk_co_pwritev_compressedPavel Butsykin1-50/+5
Added implementation of the vmdk_co_pwritev_compressed function that will allow us to safely use compressed writes for the vmdk from running VMs. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05qcow2: cleanup qcow2_co_pwritev_compressed to avoid the recursionPavel Butsykin1-17/+7
Now that the function uses a vector instead of a buffer, there is no need to use recursive code. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05qcow2: add qcow2_co_pwritev_compressedPavel Butsykin1-74/+50
Added implementation of the qcow2_co_pwritev_compressed function that will allow us to safely use compressed writes for the qcow2 from running VMs. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05block/io: reuse bdrv_co_pwritev() for write compressedPavel Butsykin1-16/+40
For bdrv_pwrite_compressed() it looks like most of the code creating coroutine is duplicated in bdrv_prwv_co(). So we can just add a flag (BDRV_REQ_WRITE_COMPRESSED) and use bdrv_prwv_co() as a generic one. In the end we get coroutine oriented function for write compressed by using bdrv_co_pwritev/blk_co_pwritev with BDRV_REQ_WRITE_COMPRESSED flag. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05block: Convert bdrv_pwrite_compressed() to BdrvChildPavel Butsykin2-2/+3
Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Eric Blake <eblake@redhat.com> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05block: switch blk_write_compressed() to byte-based interfacePavel Butsykin2-34/+11
This is a preparatory patch, which continues the general trend of the transition to the byte-based interfaces. bdrv_check_request() and blk_check_request() are no longer used, thus we can remove them. Signed-off-by: Pavel Butsykin <pbutsykin@virtuozzo.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Jeff Cody <jcody@redhat.com> CC: Markus Armbruster <armbru@redhat.com> CC: Eric Blake <eblake@redhat.com> CC: John Snow <jsnow@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-09-05block: Accept node-name for block-streamKevin Wolf1-0/+16
In order to remove the necessity to use BlockBackend names in the external API, we want to allow node-names everywhere. This converts block-stream to accept a node-name without lifting the restriction that we're operating at a root node. In case of an invalid device name, the command returns the GenericError error class now instead of DeviceNotFound, because this is what qmp_get_root_bs() returns. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>
2016-08-18block: fix possible reorder of flush operationsDenis V. Lunev1-1/+2
This patch reduce CPU usage of flush operations a bit. When we have one flush completed we should kick only next operation. We should not start all pending operations in the hope that they will go back to wait on wait_queue. Also there is a technical possibility that requests will get reordered with the previous approach. After wakeup all requests are removed from the wait queue. They become active and they are processed one-by-one adding to the wait queue in the same order. Though new flush can arrive while all requests are not put into the queue. Signed-off-by: Denis V. Lunev <den@openvz.org> Tested-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Message-id: 1471457214-3994-3-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-18block: fix deadlock in bdrv_co_flushEvgeny Yakovlev1-2/+3
The following commit commit 3ff2f67a7c24183fcbcfe1332e5223ac6f96438c Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Date: Mon Jul 18 22:39:52 2016 +0300 block: ignore flush requests when storage is clean has introduced a regression. There is a problem that it is still possible for 2 requests to execute in non sequential fashion and sometimes this results in a deadlock when bdrv_drain_one/all are called for BDS with such stalled requests. 1. Current flushed_gen and flush_started_gen is 1. 2. Request 1 enters bdrv_co_flush to with write_gen 1 (i.e. the same as flushed_gen). It gets past flushed_gen != flush_started_gen and sets flush_started_gen to 1 (again, the same it was before). 3. Request 1 yields somewhere before exiting bdrv_co_flush 4. Request 2 enters bdrv_co_flush with write_gen 2. It gets past flushed_gen != flush_started_gen and sets flush_started_gen to 2. 5. Request 2 runs to completion and sets flushed_gen to 2 6. Request 1 is resumed, runs to completion and sets flushed_gen to 1. However flush_started_gen is now 2. From here on out flushed_gen is always != to flush_started_gen and all further requests will wait on flush_queue. This change replaces flush_started_gen with an explicitly tracked active flush request. Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Message-id: 1471457214-3994-2-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-17curl: Cast fd to int for DPRINTFFam Zheng1-1/+1
Currently "make docker-test-mingw@fedora" has a warning like: /tmp/qemu-test/src/block/curl.c: In function 'curl_sock_cb': /tmp/qemu-test/src/block/curl.c:172:6: warning: format '%d' expects argument of type 'int', but argument 4 has type 'curl_socket_t {aka long long unsigned int}' DPRINTF("CURL (AIO): Sock action %d on fd %d\n", action, fd); ^ cc1: all warnings being treated as errors Cast to int to suppress it. Signed-off-by: Fam Zheng <famz@redhat.com> Message-Id: <1470027888-24381-1-git-send-email-famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com>
2016-08-16Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into stagingPeter Maydell3-83/+172
Block layer patches for 2.7.0-rc3 # gpg: Signature made Mon 15 Aug 2016 14:55:46 BST # gpg: using RSA key 0x7F09B272C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * remotes/kevin/tags/for-upstream: iotests: Test case for wrong runtime option types block/nbd: Store runtime option values block/blkdebug: Store config filename block/nbd: Use QemuOpts for runtime options block/ssh: Use QemuOpts for runtime options Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-08-15block/nbd: Store runtime option valuesMax Reitz1-44/+61
Store the runtime option values in the BDRVNBDState so they can later be used in nbd_refresh_filename() without having to directly access the options QDict which may contain values of non-string types. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-15block/blkdebug: Store config filenameMax Reitz1-5/+12
Store the configuration file's filename so it can later be used in bdrv_refresh_filename() without having to directly access the options QDict which may contain a value of a non-string type. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-15block/nbd: Use QemuOpts for runtime optionsMax Reitz1-20/+54
Using QemuOpts will prevent qemu from crashing if the input options have not been validated (which is the case when they are specified on the command line or in a json: filename) and some have the wrong type. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-15block/ssh: Use QemuOpts for runtime optionsMax Reitz1-24/+55
Using QemuOpts will prevent qemu from crashing if the input options have not been validated (which is the case when they are specified on the command line or in a json: filename) and some have the wrong type. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-12trace-events: fix first line comment in trace-eventsLaurent Vivier1-1/+1
Documentation is docs/tracing.txt instead of docs/trace-events.txt. find . -name trace-events -exec \ sed -i "s?See docs/trace-events.txt for syntax documentation.?See docs/tracing.txt for syntax documentation.?" \ {} \; Signed-off-by: Laurent Vivier <lvivier@redhat.com> Message-id: 1470669081-17860-1-git-send-email-lvivier@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-11linux-aio: Handle io_submit() failure gracefullyKevin Wolf1-1/+7
It is generally not expected that io_submit() fails other than with -EAGAIN, but corner cases like SELinux refusing I/O when permissions are revoked are still possible. In this case, we shouldn't abort, but just return an I/O error for the request. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1470741619-23231-1-git-send-email-kwolf@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-08mirror: finish earlier on errorVladimir Sementsov-Ogievskiy1-0/+4
Stop to produce new async copy requests from mirror_iteration if critical error (error action = BLOCK_ERROR_ACTION_REPORT) detected. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-08-05block/parallels: check new image sizeKlim Kireev1-0/+5
Before this patch incorrect image could be created via qemu-img (Example: qemu-img create -f parallels -o size=4096T hack.img), incorrect images cannot be used due to overflow in main image structure. This patch add check of size in image creation. After reading size it compare it with UINT32_MAX * cluster_size. Signed-off-by: Klim Kireev <proffk@virtuozzo.mipt.ru> Signed-off-by: Denis V. Lunev <den@openvz.org> Message-id: 1469639300-12155-1-git-send-email-den@openvz.org CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-08-03block: Cater to iscsi with non-power-of-2 discardEric Blake1-6/+9
Dell Equallogic iSCSI SANs have a very unusual advertised geometry: $ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0 wsnz:0 maximum compare and write length:1 optimal transfer length granularity:0 maximum transfer length:0 optimal transfer length:0 maximum prefetch xdread xdwrite transfer length:0 maximum unmap lba count:30720 maximum unmap block descriptor count:2 optimal unmap granularity:30720 ugavalid:1 unmap granularity alignment:0 maximum write same length:30720 which says that both the maximum and the optimal discard size is 15M. It is not immediately apparent if the device allows discard requests not aligned to the optimal size, nor if it allows discards at a finer granularity than the optimal size. I tried to find details in the SCSI Commands Reference Manual Rev. A on what valid values of maximum and optimal sizes are permitted, but while that document mentions a "Block Limits VPD Page", I couldn't actually find documentation of that page or what values it would have, or if a SCSI device has an advertisement of its minimal unmap granularity. So it is not obvious to me whether the Dell Equallogic device is compliance with the SCSI specification. Fortunately, it is easy enough to support non-power-of-2 sizing, even if it means we are less efficient than truly possible when targetting that device (for example, it means that we refuse to unmap anything that is not a multiple of 15M and aligned to a 15M boundary, even if the device truly does support a smaller granularity where unmapping actually works). Reported-by: Peter Lieven <pl@kamp.de> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1469129688-22848-5-git-send-email-eblake@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-08-03nbd: Limit nbdflags to 16 bitsEric Blake1-1/+1
Rather than asserting that nbdflags is within range, just give it the correct type to begin with :) nbdflags corresponds to the per-export portion of NBD Protocol "transmission flags", which is 16 bits in response to NBD_OPT_EXPORT_NAME and NBD_OPT_GO. Furthermore, upstream NBD has never passed the global flags to the kernel via ioctl(NBD_SET_FLAGS) (the ioctl was first introduced in NBD 2.9.22; then a latent bug in NBD 3.1 actually tried to OR the global flags with the transmission flags, with the disaster that the addition of NBD_FLAG_NO_ZEROES in 3.9 caused all earlier NBD 3.x clients to treat every export as read-only; NBD 3.10 and later intentionally clip things to 16 bits to pass only transmission flags). Qemu should follow suit, since the current two global flags (NBD_FLAG_FIXED_NEWSTYLE and NBD_FLAG_NO_ZEROES) have no impact on the kernel's behavior during transmission. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1469129688-22848-3-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-27Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into stagingPeter Maydell1-2/+8
# gpg: Signature made Tue 26 Jul 2016 21:51:38 BST # gpg: using RSA key 0xBDBE7B27C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" # Primary key fingerprint: 9957 4B4D 3474 90E7 9D98 D624 BDBE 7B27 C0DE 3057 * remotes/cody/tags/block-pull-request: mirror: double performance of the bulk stage if the disc is full block/gluster: fix doc in the qapi schema and member name Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-26mirror: double performance of the bulk stage if the disc is fullVladimir Sementsov-Ogievskiy1-2/+8
Mirror can do up to 16 in-flight requests, but actually on full copy (the whole source disk is non-zero) in-flight is always 1. This happens as the request is not limited in size: the data occupies maximum available capacity of s->buf. The patch limits the size of the request to some artificial constant (1 Mb here), which is not that big or small. This effectively enables back parallelism in mirror code as it was designed. The result is important: the time to migrate 10 Gb disk is reduced from ~350 sec to 170 sec. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Message-id: 1468516741-82174-1-git-send-email-vsementsov@virtuozzo.com CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Jeff Cody <jcody@redhat.com> CC: Eric Blake <eblake@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>
2016-07-26block: export LUKS specific data to qemu-img infoDaniel P. Berrange1-0/+49
The qemu-img info command has the ability to expose format specific metadata about volumes. Wire up this facility for the LUKS driver to report on cipher configuration and key slot usage. $ qemu-img info ~/VirtualMachines/demo.luks image: /home/berrange/VirtualMachines/demo.luks file format: luks virtual size: 98M (102760448 bytes) disk size: 100M encrypted: yes Format specific information: ivgen alg: plain64 hash alg: sha1 cipher alg: aes-128 uuid: 6ddee74b-3a22-408c-8909-6789d4fa2594 cipher mode: xts slots: [0]: active: true iters: 572706 key offset: 4096 stripes: 4000 [1]: active: false key offset: 135168 [2]: active: false key offset: 266240 [3]: active: false key offset: 397312 [4]: active: false key offset: 528384 [5]: active: false key offset: 659456 [6]: active: false key offset: 790528 [7]: active: false key offset: 921600 payload offset: 2097152 master key iters: 142375 One somewhat undesirable artifact is that the data fields are printed out in (apparently) random order. This will be addressed later by changing the way the block layer pretty-prints the image specific data. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 1469192015-16487-3-git-send-email-berrange@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-26qcow2: do not allocate extra memoryVladimir Sementsov-Ogievskiy2-2/+2
There are no needs to allocate more than one cluster, as we set avail_out for deflate to one cluster. Zlib docs (http://www.zlib.net/manual.html) says: "deflate compresses as much data as possible, and stops when the input buffer becomes empty or the output buffer becomes full." So, deflate will not write more than avail_out to output buffer. If there is not enough space in output buffer for compressed data (it may be larger than input data) deflate just returns Z_OK. (if all data is compressed and written to output buffer deflate returns Z_STREAM_END). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 1468515565-81313-1-git-send-email-vsementsov@virtuozzo.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-21Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell1-9/+0
pc, pci, virtio: new features, cleanups, fixes - interrupt remapping for intel iommus - a bunch of virtio cleanups - fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Thu 21 Jul 2016 18:49:30 BST # gpg: using RSA key 0x281F0DB8D28D5469 # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: (57 commits) intel_iommu: avoid unnamed fields virtio: Update migration docs virtio-gpu: Wrap in vmstate virtio-gpu: Use migrate_add_blocker for virgl migration blocking virtio-input: Wrap in vmstate 9pfs: Wrap in vmstate virtio-serial: Wrap in vmstate virtio-net: Wrap in vmstate virtio-balloon: Wrap in vmstate virtio-rng: Wrap in vmstate virtio-blk: Wrap in vmstate virtio-scsi: Wrap in vmstate virtio: Migration helper function and macro virtio-serial: Remove old migration version support virtio-net: Remove old migration version support virtio-scsi: Replace HandleOutput typedef Revert "mirror: Workaround for unexpected iohandler events during completion" virtio-scsi: Call virtio_add_queue_aio virtio-blk: Call virtio_add_queue_aio virtio: Introduce virtio_add_queue_aio ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-21Revert "mirror: Workaround for unexpected iohandler events during completion"Fam Zheng1-9/+0
This reverts commit ab27c3b5e7408693dde0b565f050aa55c4a1bcef. The virtio storage device host notifiers now work with bdrv_drained_begin/end, so we don't need this hack any more. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-21Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into ↵Peter Maydell17-319/+319
staging Pull request v2: * Resolved merge conflict with block/iscsi.c [Peter] # gpg: Signature made Wed 20 Jul 2016 17:20:52 BST # gpg: using RSA key 0x9CA4ABB381AB73C8 # gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" # gpg: aka "Stefan Hajnoczi <stefanha@gmail.com>" # Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35 775A 9CA4 ABB3 81AB 73C8 * remotes/stefanha/tags/block-pull-request: (25 commits) raw_bsd: Convert to byte-based interface nbd: Convert to byte-based interface block: Kill .bdrv_co_discard() sheepdog: Switch .bdrv_co_discard() to byte-based raw_bsd: Switch .bdrv_co_discard() to byte-based qcow2: Switch .bdrv_co_discard() to byte-based nbd: Switch .bdrv_co_discard() to byte-based iscsi: Switch .bdrv_co_discard() to byte-based gluster: Switch .bdrv_co_discard() to byte-based blkreplay: Switch .bdrv_co_discard() to byte-based block: Add .bdrv_co_pdiscard() driver callback block: Convert .bdrv_aio_discard() to byte-based rbd: Switch rbd_start_aio() to byte-based raw-posix: Switch paio_submit() to byte-based block: Convert BB interface to byte-based discards block: Convert bdrv_aio_discard() to byte-based block: Switch BlockRequest to byte-based block: Convert bdrv_discard() to byte-based block: Convert bdrv_co_discard() to byte-based iscsi: Rely on block layer to break up large requests ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Conflicts: block/gluster.c
2016-07-20raw_bsd: Convert to byte-based interfaceEric Blake1-19/+26
Since the raw format driver is just passing things through, we can do byte-based read and write if the underlying protocol does likewise. There's one tricky part - if we probed the image format, we document that we restrict operations on the initial sector. It's easiest to keep this guarantee by enforcing read-modify-write on sub-sector operations (yes, this partially reverts commit ad82be2f). Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 1468624988-423-20-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20nbd: Convert to byte-based interfaceEric Blake3-23/+27
The NBD protocol doesn't have any notion of sectors, so it is a fairly easy conversion to use byte-based read and write. Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1468624988-423-19-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20block: Kill .bdrv_co_discard()Eric Blake1-7/+2
Now that all drivers have a byte-based .bdrv_co_pdiscard(), we no longer need to worry about the sector-based version. We can also relax our minimum alignment to 1 for drivers that support it. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-18-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20sheepdog: Switch .bdrv_co_discard() to byte-basedEric Blake1-7/+10
Another step towards killing off sector-based block APIs. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-17-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20raw_bsd: Switch .bdrv_co_discard() to byte-basedEric Blake1-5/+4
Another step towards killing off sector-based block APIs. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468624988-423-16-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>