aboutsummaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)AuthorFilesLines
2016-07-20raw_bsd: Don't advertise flags not supported by protocol layerEric Blake1-2/+4
The raw format layer supports all flags via passthrough - but it only makes sense to pass through flags that the lower layer actually supports. The next patch gives stronger reasoning for why this is correct. At the moment, the raw format layer ignores the max_transfer limit of its protocol layer, and an attempt to do the qemu-io 'w -f 0 40m' to an NBD server that lacks FUA will pass the entire 40m request to the NBD driver, which then fragments the request itself into a 32m write, 8m write, and flush. But once the block layer starts honoring limits and fragmenting packets, the raw driver will hand the NBD driver two separate requests; if both requests have BDRV_REQ_FUA set, then this would result in a 32m write, flush, 8m write, and second flush. By having the raw layer no longer advertise FUA support when the protocol layer lacks it, we are back to a single flush at the block layer for the overall 40m request. Note that 'w -f -z 0 40m' does not currently exhibit the same problem, because there, the fragmentation does not occur until at the NBD layer (the raw layer has .bdrv_co_pwrite_zeroes, and the NBD layer doesn't advertise max_pwrite_zeroes to constrain things at the raw layer) - but the problem is latent and we would again have too many flushes without this patch once the NBD layer implements support for the new NBD_CMD_WRITE_ZEROES command, if it sets max_pwrite_zeroes to the same 32m limit as recommended by the NBD protocol. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468607524-19021-3-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-20block: Fragment reads to max transfer lengthEric Blake1-20/+33
Drivers should be able to rely on the block layer honoring the max transfer length, rather than needing to return -EINVAL (iscsi) or manually fragment things (nbd). This patch adds the fragmentation in the block layer, after requests have been aligned (fragmenting before alignment would lead to multiple unaligned requests, rather than just the head and tail). The return value was previously nebulous on success on whether it was zero or the length read; and fragmenting may introduce yet other non-zero values if we use the last length read. But as at least some callers are sloppy and expect only zero on success, it is easiest to just guarantee 0. [Fix uninitialized ret local variable in bdrv_aligned_preadv(). --Stefan] Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 1468607524-19021-2-git-send-email-eblake@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-19Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into stagingPeter Maydell1-56/+192
* two old patches from prospective GSoC students * i386 -kernel device tree support * Coverity fix * memory usage improvement from Peter * checkpatch fix * g_path_get_dirname cleanup * caching of block status for iSCSI # gpg: Signature made Tue 19 Jul 2016 07:43:41 BST # gpg: using RSA key 0xBFFBD25F78C7AE83 # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * remotes/bonzini/tags/for-upstream: target-i386: Remove redundant HF_SOFTMMU_MASK block/iscsi: allow caching of the allocation map block/iscsi: fix rounding in iscsi_allocationmap_set Move README to markdown cpu-exec: Move down some declarations in cpu_exec() exec: avoid realloc in phys_map_node_reserve checkpatch: consider git extended headers valid patches megasas: remove useless check for cmd->frame compiler: never omit assertions if using a static analysis tool hw/i386: add device tree support Changed malloc to g_malloc, free to g_free in bsd-user/qemu.h use g_path_get_dirname instead of dirname Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-19Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into stagingPeter Maydell1-0/+21
# gpg: Signature made Mon 18 Jul 2016 23:53:15 BST # gpg: using RSA key 0x7DEF8106AAFC390E # gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>" # Primary key fingerprint: FAEB 9711 A12C F475 812F 18F2 88A9 064D 1835 61EB # Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76 CBD0 7DEF 8106 AAFC 390E * remotes/jnsnow/tags/ide-pull-request: block: ignore flush requests when storage is clean tests: in IDE and AHCI tests perform DMA write before flushing ide: set retry_unit for PIO and FLUSH requests ide: refactor retry_unit set and clear into separate function Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-19block/iscsi: allow caching of the allocation mapPeter Lieven1-58/+192
until now the allocation map was used only as a hint if a cluster is allocated or not. If a block was not allocated (or Qemu had no info about the allocation status) a get_block_status call was issued to check the allocation status and possibly avoid a subsequent read of unallocated sectors. If a block known to be allocated the get_block_status call was omitted. In the other case a get_block_status call was issued before every read to avoid the necessity for a consistent allocation map. To avoid the potential overhead of calling get_block_status for each and every read request this took only place for the bigger requests. This patch enhances this mechanism to cache the allocation status and avoid calling get_block_status for blocks where the allocation status has been queried before. This allows for bypassing the read request even for smaller requests and additionally omits calling get_block_status for known to be unallocated blocks. Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1468831940-15556-3-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-19block/iscsi: fix rounding in iscsi_allocationmap_setPeter Lieven1-3/+5
when setting clusters as alloacted the boundaries have to be expanded. As Paolo pointed out the calculation of the number of clusters is wrong: Suppose cluster_sectors is 2, sector_num = 1, nb_sectors = 6: In the "mark allocated" case, you want to set 0..8, i.e. cluster_num=0, nb_clusters=4. 0--.--2--.--4--.--6--.--8 <--|_________________|--> (<--> = expanded) Instead you are setting nb_clusters=3, so that 6..8 is not marked. 0--.--2--.--4--.--6--.--8 <--|______________|!!! (! = wrong) Cc: qemu-stable@nongnu.org Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <1468831940-15556-2-git-send-email-pl@kamp.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2016-07-18block: ignore flush requests when storage is cleanEvgeny Yakovlev1-0/+21
Some guests (win2008 server for example) do a lot of unnecessary flushing when underlying media has not changed. This adds additional overhead on host when calling fsync/fdatasync. This change introduces a write generation scheme in BlockDriverState. Current write generation is checked against last flushed generation to avoid unnessesary flushes. The problem with excessive flushing was found by a performance test which does parallel directory tree creation (from 2 processes). Results improved from 0.424 loops/sec to 0.432 loops/sec. Each loop creates 10^3 directories with 10 files in each. This affected some blkdebug testcases that were expecting error logs from failure-injected flushes which are now skipped entirely (tests 026 071 089). This also affects the performance of block jobs and thus BLOCK_JOB_READY events for driver-mirror and active block-commit commands now arrives faster, before QMP send successfully returns to caller (tests 141 144). Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1468870792-7411-5-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> CC: Fam Zheng <famz@redhat.com> CC: John Snow <jsnow@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com>
2016-07-18linux-aio: prevent submitting more than MAX_EVENTSRoman Pen1-10/+16
Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us with specified number of events. But kernel ring buffer allocation logic is a bit tricky (ring buffer is page size aligned + some percpu allocation are required) so eventually more than requested events number is allocated. From a userspace side we have to follow the convention and should not try to io_submit() more or logic, which consumes completed events, should be changed accordingly. The pitfall is in the following sequence: MAX_EVENTS = 128 io_setup(MAX_EVENTS) io_submit(MAX_EVENTS) io_submit(MAX_EVENTS) /* now 256 events are in-flight */ io_getevents(MAX_EVENTS) = 128 /* we can handle only 128 events at once, to be sure * that nothing is pended the io_getevents(MAX_EVENTS) * call must be invoked once more or hang will happen. */ To prevent the hang or reiteration of io_getevents() call this patch restricts the number of in-flights, which is now limited to MAX_EVENTS. Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1468415004-31755-1-git-send-email-roman.penyaev@profitbricks.com Cc: Stefan Hajnoczi <stefanha@redhat.com> Cc: qemu-devel@nongnu.org Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18linux-aio: share one LinuxAioState within an AioContextPaolo Bonzini5-179/+22
This has better performance because it executes fewer system calls and does not use a bottom half per disk. Originally proposed by Ming Lei. [Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix build error as reported by Peter Maydell <peter.maydell@linaro.org>. --Stefan] Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> squash! linux-aio: share one LinuxAioState within an AioContext
2016-07-13vvfat: Fix qcow write target driver specificationMax Reitz1-1/+2
First, bdrv_open_child() expects all options for the child to be prefixed by the child's name (and a separating dot). Second, bdrv_open_child() does not take ownership of the QDict passed to it but only extracts all options for the child, so if a QDict is created for the sole purpose of passing it to bdrv_open_child(), it needs to be freed afterwards. This patch makes vvfat adhere to both of these rules. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160711135452.11304-1-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13vmdk: fix metadata write regressionReda Sallahi1-7/+7
Commit "cdeaf1f vmdk: add bdrv_co_write_zeroes" causes a regression on writes. It writes metadata after every write instead of doing it only once for each cluster. vmdk_pwritev() writes metadata whenever m_data is set as valid so this patch sets m_data as valid only when we have a new cluster which hasn't been allocated before or a zero grain. Signed-off-by: Reda Sallahi <fullmanet@gmail.com> Message-id: 20160707084249.29084-1-fullmanet@gmail.com Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13Improve block job rate limiting for small bandwidth valuesSascha Silbe3-17/+12
ratelimit_calculate_delay() previously reset the accounting every time slice, no matter how much data had been processed before. This had (at least) two consequences: 1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream. Not sure if there are real-world use cases where this would be a problem. Mirroring and backup over a slow link (e.g. DSL) would come to mind, though. 2. Tests for block job operations (e.g. cancel) were rather racy All block jobs currently use a time slice of 100ms. That's a reasonable value to get smooth output during regular operation. However this also meant that the state of block jobs changed every 100ms, no matter how low the configured limit was. On busy hosts, qemu often transferred additional chunks until the test case had a chance to cancel the job. Fix the block job rate limit code to delay for more than one time slice to address the above issues. To make it easier to handle oversized chunks we switch the semantics from returning a delay _before_ the current request to a delay _after_ the current request. If necessary, this delay consists of multiple time slice units. Since the mirror job sends multiple chunks in one go even if the rate limit was exceeded in between, we need to keep track of the start of the current time slice so we can correctly re-compute the delay for the updated amount of data. The minimum bandwidth now is 1 data unit per time slice. The block jobs are currently passing the amount of data transferred in sectors and using 100ms time slices, so this translates to 5120 bytes/second. With chunk sizes usually being O(512KiB), tests have plenty of time (O(100s)) to operate on block jobs. The chance of a race condition now is fairly remote, except possibly on insanely loaded systems. Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13qcow2: Fix qcow2_get_cluster_offset()Max Reitz1-5/+11
Recently, qcow2_get_cluster_offset() has been changed to work with bytes instead of sectors. This invalidated some assertions and introduced a possible integer multiplication overflow. This could be reproduced using e.g. $ qemu-img create -f qcow2 -o cluster_size=1M blub.qcow2 8G Formatting 'foo.qcow2', fmt=qcow2 size=8589934592 encryption=off cluster_size=1048576 lazy_refcounts=off refcount_bits=16 $ qemu-io -c map blub.qcow2 qemu-io: qemu/block/qcow2-cluster.c:504: qcow2_get_cluster_offset: Assertion `bytes_needed <= INT_MAX' failed. [1] 20775 abort (core dumped) qemu-io -c map foo.qcow2 This patch removes the now wrong assertion, adding comments and more assertions to prove its correctness (and fixing the overflow which would become apparent with the original assertion removed). Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160620142623.24471-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13qcow2: Avoid making the L1 table too bigMax Reitz1-1/+2
We refuse to open images whose L1 table we deem "too big". Consequently, we should not produce such images ourselves. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20160615153630.2116-3-mreitz@redhat.com Reviewed-by: Eric Blake <eblake@redhat.com> [mreitz: Added QEMU_BUILD_BUG_ON()] Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-13block/qdev: Allow configuring rerror/werror with qdev propertiesKevin Wolf1-0/+1
The rerror/werror policies are implemented in the devices, so that's where they should be configured. In comparison to the old options in -drive, the qdev properties are only added to those devices that actually support them. If the option isn't given (or "auto" is specified), the setting of the BlockBackend is used for compatibility with the old options. For block jobs, "auto" is the same as "enospc". Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-07-13commit: Fix use of error handling policyKevin Wolf1-3/+3
Commit implemented the 'enospc' policy as 'ignore' if the error was not ENOSPC. The QAPI documentation promises that it's treated as 'stop'. Using the common block job error handling function fixes this and also adds the missing QMP event. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2016-07-13coroutine: move entry argument to qemu_coroutine_createPaolo Bonzini19-62/+63
In practice the entry argument is always known at creation time, and it is confusing that sometimes qemu_coroutine_enter is used with a non-NULL argument to re-enter a coroutine (this happens in block/sheepdog.c and tests/test-coroutine.c). So pass the opaque value at creation time, for consistency with e.g. aio_bh_new. Mostly done with the following semantic patch: @ entry1 @ expression entry, arg, co; @@ - co = qemu_coroutine_create(entry); + co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry2 @ expression entry, arg; identifier co; @@ - Coroutine *co = qemu_coroutine_create(entry); + Coroutine *co = qemu_coroutine_create(entry, arg); ... - qemu_coroutine_enter(co, arg); + qemu_coroutine_enter(co); @ entry3 @ expression entry, arg; @@ - qemu_coroutine_enter(qemu_coroutine_create(entry), arg); + qemu_coroutine_enter(qemu_coroutine_create(entry, arg)); @ reentry @ expression co; @@ - qemu_coroutine_enter(co, NULL); + qemu_coroutine_enter(co); except for the aforementioned few places where the semantic patch stumbled (as expected) and for test_co_queue, which would otherwise produce an uninitialized variable warning. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13raw-posix: Use qemu_dupFam Zheng1-9/+1
Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13commit: Add 'job-id' parameter to 'block-commit'Alberto Garcia2-6/+7
This patch adds a new optional 'job-id' parameter to 'block-commit', allowing the user to specify the ID of the block job to be created. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13stream: Add 'job-id' parameter to 'block-stream'Alberto Garcia1-6/+6
This patch adds a new optional 'job-id' parameter to 'block-stream', allowing the user to specify the ID of the block job to be created. The HMP 'block_stream' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13backup: Add 'job-id' parameter to 'blockdev-backup' and 'drive-backup'Alberto Garcia1-4/+4
This patch adds a new optional 'job-id' parameter to 'blockdev-backup' and 'drive-backup', allowing the user to specify the ID of the block job to be created. The HMP 'drive_backup' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13mirror: Add 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror'Alberto Garcia1-7/+8
This patch adds a new optional 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror', allowing the user to specify the ID of the block job to be created. The HMP 'drive_mirror' command remains unchanged. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13blockjob: Add 'job_id' parameter to block_job_create()Alberto Garcia4-4/+5
When a new job is created, the job ID is taken from the device name of the BDS. This patch adds a new 'job_id' parameter to let the caller provide one instead. This patch also verifies that the ID is always unique and well-formed. This causes problems in a couple of places where no ID is being set, because the BDS does not have a device name. In the case of test_block_job_start() (from test-blockjob-txn.c) we can simply use this new 'job_id' parameter to set the missing ID. In the case of img_commit() (from qemu-img.c) we still don't have the API to make commit_active_start() set the job ID, so we solve it by setting a default value. We'll get rid of this as soon as we extend the API. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-13blockjob: Update description of the 'id' fieldAlberto Garcia1-1/+2
The 'id' field of the BlockJob structure will be able to hold any ID, not only a device name. This patch updates the description of that field and the error messages where it is being used. Soon we'll add the ability to set an arbitrary ID when creating a block job. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-12Use #include "..." for our own headers, <...> for othersMarkus Armbruster1-1/+0
Tracked down with an ugly, brittle and probably buggy Perl script. Also move includes converted to <...> up so they get included before ours where that's obviously okay. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Tested-by: Eric Blake <eblake@redhat.com> Reviewed-by: Richard Henderson <rth@twiddle.net>
2016-07-06Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2016-07-06' into ↵Peter Maydell2-21/+18
staging QAPI patches for 2016-07-06 # gpg: Signature made Wed 06 Jul 2016 10:00:51 BST # gpg: using RSA key 0x3870B400EB918653 # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * remotes/armbru/tags/pull-qapi-2016-07-06: replay: Use new QAPI cloning sockets: Use new QAPI cloning qapi: Add new clone visitor qapi: Add new visit_complete() function tests: Factor out common code in qapi output tests tests: Clean up test-string-output-visitor qmp-output-visitor: Favor new visit_free() function string-output-visitor: Favor new visit_free() function qmp-input-visitor: Favor new visit_free() function string-input-visitor: Favor new visit_free() function opts-visitor: Favor new visit_free() function qapi: Add new visit_free() function qapi: Add parameter to visit_end_* qemu-img: Don't leak errors when outputting JSON qapi: Improve use of qmp/types.h Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-06qapi: Add new visit_complete() functionEric Blake1-5/+4
Making each output visitor provide its own output collection function was the only remaining reason for exposing visitor sub-types to the rest of the code base. Add a polymorphic visit_complete() function which is a no-op for input visitors, and which populates an opaque pointer for output visitors. For maximum type-safety, also add a parameter to the output visitor constructors with a type-correct version of the output pointer, and assert that the two uses match. This approach was considered superior to either passing the output parameter only during construction (action at a distance during visit_free() feels awkward) or only during visit_complete() (defeating type safety makes it easier to use incorrectly). Most callers were function-local, and therefore a mechanical conversion; the testsuite was a bit trickier, but the previous cleanup patch minimized the churn here. The visit_complete() function may be called at most once; doing so lets us use transfer semantics rather than duplication or ref-count semantics to get the just-built output back to the caller, even though it means our behavior is not idempotent. Generated code is simplified as follows for events: |@@ -26,7 +26,7 @@ void qapi_event_send_acpi_device_ost(ACP | QDict *qmp; | Error *err = NULL; | QMPEventFuncEmit emit; |- QmpOutputVisitor *qov; |+ QObject *obj; | Visitor *v; | q_obj_ACPI_DEVICE_OST_arg param = { | info |@@ -39,8 +39,7 @@ void qapi_event_send_acpi_device_ost(ACP | | qmp = qmp_event_build_dict("ACPI_DEVICE_OST"); | |- qov = qmp_output_visitor_new(); |- v = qmp_output_get_visitor(qov); |+ v = qmp_output_visitor_new(&obj); | | visit_start_struct(v, "ACPI_DEVICE_OST", NULL, 0, &err); | if (err) { |@@ -55,7 +54,8 @@ void qapi_event_send_acpi_device_ost(ACP | goto out; | } | |- qdict_put_obj(qmp, "data", qmp_output_get_qobject(qov)); |+ visit_complete(v, &obj); |+ qdict_put_obj(qmp, "data", obj); | emit(QAPI_EVENT_ACPI_DEVICE_OST, qmp, &err); and for commands: | { | Error *err = NULL; |- QmpOutputVisitor *qov = qmp_output_visitor_new(); | Visitor *v; | |- v = qmp_output_get_visitor(qov); |+ v = qmp_output_visitor_new(ret_out); | visit_type_AddfdInfo(v, "unused", &ret_in, &err); |- if (err) { |- goto out; |+ if (!err) { |+ visit_complete(v, ret_out); | } |- *ret_out = qmp_output_get_qobject(qov); |- |-out: | error_propagate(errp, err); Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1465490926-28625-13-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-07-06qmp-output-visitor: Favor new visit_free() functionEric Blake1-1/+1
Now that we have a polymorphic visit_free(), we no longer need qmp_output_visitor_cleanup(); however, we still need to expose the subtype for qmp_output_get_qobject(). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1465490926-28625-10-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-07-06opts-visitor: Favor new visit_free() functionEric Blake1-16/+14
Now that we have a polymorphic visit_free(), we no longer need opts_visitor_cleanup(); which in turn means we no longer need to return a subtype from opts_visitor_new() nor a public upcast function. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1465490926-28625-6-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-07-06qapi: Add parameter to visit_end_*Eric Blake1-2/+2
Rather than making the dealloc visitor track of stack of pointers remembered during visit_start_* in order to free them during visit_end_*, it's a lot easier to just make all callers pass the same pointer to visit_end_*. The generated code has access to the same pointer, while all other users are doing virtual walks and can pass NULL. The dealloc visitor is then greatly simplified. All three visit_end_*() functions intentionally take a void**, even though the visit_start_*() functions differ between void**, GenericList**, and GenericAlternate**. This is done for several reasons: when doing a virtual walk, passing NULL doesn't care what the type is, but when doing a generated walk, we already have to cast the caller's specific FOO* to call visit_start, while using void** lets us use visit_end without a cast. Also, an upcoming patch will add a clone visitor that wants to use the same implementation for all three visit_end callbacks, which is made easier if all three share the same signature. For visitors with already track per-object state (the QMP visitors via a stack, and the string visitors which do not allow nesting), add an assertion that the caller is indeed passing the same pointer to paired calls. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1465490926-28625-4-git-send-email-eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2016-07-05block/qcow2: Don't use cpu_to_*w()Peter Maydell3-10/+9
Don't use the cpu_to_*w() functions, which we are trying to deprecate. Instead either just use cpu_to_*() to do the byteswap, or use st*_be_p() if we need to do the store somewhere other than to a variable that's already the correct type. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1466093177-17890-1-git-send-email-peter.maydell@linaro.org Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2016-07-05block: Convert bdrv_co_preadv/pwritev to BdrvChildKevin Wolf10-33/+34
This is the final patch for converting the common I/O path to take a BdrvChild parameter instead of BlockDriverState. The completion of this conversion means that all users that perform I/O on an image need to actually hold a reference (in the form of BdrvChild, possible as part of a BlockBackend) to that image. This also protects against inconsistent use of BlockBackend vs. BlockDriverState functions because direct use of a BlockDriverState isn't possible any more and blk->root is private for block-backends.c. In addition, we can now distinguish different users in the I/O path, and the future op blockers work is going to add assertions based on permissions stored in BdrvChild. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_prwv_co() to BdrvChildKevin Wolf1-20/+25
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_pwrite_zeroes() to BdrvChildKevin Wolf5-8/+14
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_pwrite(v/_sync) to BdrvChildKevin Wolf14-65/+62
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_pread(v) to BdrvChildKevin Wolf16-95/+102
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_write() to BdrvChildKevin Wolf7-11/+99
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_read() to BdrvChildKevin Wolf4-8/+8
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Use BlockBackend for I/O in bdrv_commit()Kevin Wolf1-12/+22
Just like block jobs, the HMP commit command should use its own BlockBackend for doing I/O on BlockDriverStates. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Move bdrv_commit() to block/commit.cKevin Wolf2-2/+112
No code changes, just moved from one file to another. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_co_do_readv/writev to BdrvChildKevin Wolf1-14/+15
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_aio_writev() to BdrvChildKevin Wolf6-12/+12
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_aio_readv() to BdrvChildKevin Wolf6-12/+12
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_co_writev() to BdrvChildKevin Wolf5-7/+7
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: Convert bdrv_co_readv() to BdrvChildKevin Wolf6-11/+10
Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05vhdx: Some more BlockBackend use in vhdx_create()Kevin Wolf1-20/+21
This does some easy conversions from bdrv_* to blk_* functions in vhdx_create(). We should avoid bypassing the BlockBackend layer whenever possible. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05blkreplay: Convert to byte-based I/OKevin Wolf1-8/+8
The blkreplay driver only forwards the requests it gets, so converting it to byte granularity is trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05vvfat: Use BdrvChild for s->qcowKevin Wolf1-23/+43
vvfat uses a temporary qcow file to cache written data in read-write mode. In order to do things properly, this should show up in the BDS graph and I/O should go through BdrvChild like for every other node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-05block: fix return code for partial write for Linux AIODenis V. Lunev1-1/+1
Partial write most likely means that there is not space rather than "something wrong happens". Thus it would be more natural to return ENOSPC rather than EINVAL. The problem actually happens with NBD server, which has reported EINVAL rather then ENOSPC on the first error using its protocol, which makes report to the user wrong. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Pavel Borzenkov <pborzenkov@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2016-07-05block: Use bool as appropriate for BDS membersEric Blake8-10/+10
Using int for values that are only used as booleans is confusing. While at it, rearrange a couple of members so that all the bools are contiguous. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>