aboutsummaryrefslogtreecommitdiff
path: root/block/quorum.c
AgeCommit message (Collapse)AuthorFilesLines
2015-09-04quorum: validate vote threshold against num_children even if read-pattern is ↵Wen Congyang1-6/+6
fifo We need to use threshold to check if too many write operation fails. If threshold is larger than num children, we always get write error event even if all write operations success. Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Message-id: 55962F72.3060003@cn.fujitsu.com Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-08-05block: don't register quorum driver if SHA256 support is unavailableSascha Silbe1-6/+4
Commit 488981a4 [block: convert quorum blockdrv to use crypto APIs] broke qemu-iotest 041 on hosts with GnuTLS < 2.10.0. It converted a compile-time check to a run-time check at device open time. The result is that we now advertise a feature (the quorum block driver) that will never work (on those hosts). There's no way (short of parsing human-readable error messages) for qemu-iotests or any other API consumer to recognise that the quorum block driver isn't _actually_ available and shouldn't be used or tested. Move the run-time check to bdrv_quorum_init() to avoid registering the quorum block driver if we know it cannot work. This way API consumers can recognise it's unavailable. Fixes: 488981a4af396551a3178d032cc2b41d9553ada2 Signed-off-by: Sascha Silbe <silbe@linux.vnet.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 1438699705-21761-1-git-send-email-silbe@linux.vnet.ibm.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-07-08block: convert quorum blockdrv to use crypto APIsDaniel P. Berrange1-19/+20
Get rid of direct use of gnutls APIs in quorum blockdrv in favour of using the crypto APIs. This avoids the need to do conditional compilation of the quorum driver. It can simply report an error at file open file instead if the required hash algorithm isn't supported by QEMU. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <1435770638-25715-8-git-send-email-berrange@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-06-22Include qapi/qmp/qerror.h exactly where neededMarkus Armbruster1-0/+1
In particular, don't include it into headers. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-06-22qerror: Clean up QERR_ macros to expand into a single stringMarkus Armbruster1-2/+2
These macros expand into error class enumeration constant, comma, string. Unclean. Has been that way since commit 13f59ae. The error class is always ERROR_CLASS_GENERIC_ERROR since the previous commit. Clean up as follows: * Prepend every use of a QERR_ macro by ERROR_CLASS_GENERIC_ERROR, and delete it from the QERR_ macro. No change after preprocessing. * Rewrite error_set(ERROR_CLASS_GENERIC_ERROR, ...) into error_setg(...). Again, no change after preprocessing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Luiz Capitulino <lcapitulino@redhat.com>
2015-06-22qobject: Use 'bool' for qboolEric Blake1-2/+2
We require a C99 compiler, so let's use 'bool' instead of 'int' when dealing with boolean values. There are few enough clients to fix them all in one pass. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Andreas Färber <afaerber@suse.de> Reviewed-by: Alberto Garcia <berto@igalia.com> Acked-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2015-06-12block: Move flag inheritance to bdrv_open_inherit()Kevin Wolf1-2/+2
Instead of letting every caller of bdrv_open() determine the right flags for its child node manually and pass them to the function, pass the parent node and the role of the newly opened child (like backing file, protocol layer, etc.). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2015-06-12quorum: Use bdrv_open_image()Kevin Wolf1-40/+11
Besides standardising on a single interface for opening child nodes, this simplifies the .bdrv_open() implementation of the quorum block driver by using block layer functionality for handling BlockdevRefs. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com>
2015-04-28block: add bdrv_get_device_or_node_name()Alberto Garcia1-4/+1
This function gets the device name associated with a BlockDriverState, or its node name if the device name is empty. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 4fa30aa8d61d9052ce266fd5429a59a14e941255.1428485266.git.berto@igalia.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20block: Rename BlockDriverCompletionFunc to BlockCompletionFuncMarkus Armbruster1-3/+3
I'll use it with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20block: Rename BlockDriverAIOCB* to BlockAIOCB*Markus Armbruster1-18/+18
I'll use BlockDriverAIOCB with block backends shortly, and the name is going to fit badly there. It's a block layer thing anyway, not just a block driver thing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20block: Eliminate BlockDriverState member device_name[]Markus Armbruster1-2/+2
device_name[] can become non-empty only in bdrv_new_root() and bdrv_move_feature_fields(). The latter is used only to undo damage done by bdrv_swap(). The former is called only by blk_new_with_bs(). Therefore, when a BlockDriverState's device_name[] is non-empty, then it's been created with a BlockBackend, and vice versa. Furthermore, blk_new_with_bs() keeps the two names equal. Therefore, device_name[] is redundant. Eliminate it. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-09-22block: Rename qemu_aio_release -> qemu_aio_unrefFam Zheng1-1/+1
Suggested-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22quorum: Convert quorum_aiocb_info.cancel to .cancel_asyncFam Zheng1-5/+2
Before, we cancel all the child requests with bdrv_aio_cancel, then free the acb.. Now we just kick off asynchronous cancellation of child requests and return, we know quorum_aio_cb will be called later, so in the end quorum_aio_finalize will take care of calling the caller's cb. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-09-22quorum: fix quorum_aio_cancel()Liu Yuan1-1/+3
For a fifo read pattern, we only have one running aio (possible other cases that has less number than num_children in the future), so we need to check if .acb is NULL against bdrv_aio_cancel() to avoid segfault. Cc: Eric Blake <eblake@redhat.com> Cc: Benoit Canet <benoit@irqsave.net> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29quorum: Fix leak of opts in quorum_openFam Zheng1-1/+2
Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-29block/quorum: add simple read pattern supportLiu Yuan1-48/+129
This patch adds single read pattern to quorum driver and quorum vote is default pattern. For now we do a quorum vote on all the reads, it is designed for unreliable underlying storage such as non-redundant NFS to make sure data integrity at the cost of the read performance. For some use cases as following: VM -------------- | | v v A B Both A and B has hardware raid storage to justify the data integrity on its own. So it would help performance if we do a single read instead of on all the nodes. Further, if we run VM on either of the storage node, we can make a local read request for better performance. This patch generalize the above 2 nodes case in the N nodes. That is, vm -> write to all the N nodes, read just one of them. If single read fails, we try to read next node in FIFO order specified by the startup command. The 2 nodes case is very similar to DRBD[1] though lack of auto-sync functionality in the single device/node failure for now. But compared with DRBD we still have some advantages over it: - Suppose we have 20 VMs running on one(assume A) of 2 nodes' DRBD backed storage. And if A crashes, we need to restart all the VMs on node B. But for practice case, we can't because B might not have enough resources to setup 20 VMs at once. So if we run our 20 VMs with quorum driver, and scatter the replicated images over the data center, we can very likely restart 20 VMs without any resource problem. After all, I think we can build a more powerful replicated image functionality on quorum and block jobs(block mirror) to meet various High Availibility needs. E.g, Enable single read pattern on 2 children, -drive driver=quorum,children.0.file.filename=0.qcow2,\ children.1.file.filename=1.qcow2,read-pattern=fifo,vote-threshold=1 [1] http://en.wikipedia.org/wiki/Distributed_Replicated_Block_Device [Dropped \n from an error_setg() error message --Stefan] Cc: Benoit Canet <benoit@irqsave.net> Cc: Eric Blake <eblake@redhat.com> Cc: Kevin Wolf <kwolf@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Liu Yuan <namei.unix@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-08-20quorum: Implement bdrv_refresh_filename()Max Reitz1-0/+39
Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-27quorum: Add the rewrite-corrupted parameter to quorumBenoît Canet1-6/+91
On read operations when this parameter is set and some replicas are corrupted while quorum can be reached quorum will proceed to rewrite the correct version of the data to fix the corrupted replicas. This will shine with SSD where the FTL will remap the same block at another place on rewrite. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-06-23qapi event: convert QUORUM eventsWenchao Xia1-17/+8
Signed-off-by: Wenchao Xia <wenchaoqemu@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-06-04quorum: implement .bdrv_detach/attach_aio_context()Stefan Hajnoczi1-12/+36
Implement .bdrv_detach/attach_aio_context() interfaces to propagate detach/attach to BDRVQuorumState->bs[] children. The block layer takes care of ->file and ->backing_hd but doesn't know about our ->bs[] BlockDriverStates, which is also part of the graph. Reviewed-by: Benoît Canet <benoit.canet@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-25Use error_is_set() only when necessary (again)Markus Armbruster1-2/+2
error_is_set(&var) is the same as var != NULL, but it takes whole-program analysis to figure that out. Unnecessarily hard for optimizers, static checkers, and human readers. Commit 84d18f0 dumbed it down to obvious, but a few more have crept in since, and documentation was overlooked. Dumb these down, too. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-19block: Add error handling to bdrv_invalidate_cache()Kevin Wolf1-2/+7
If it returns an error, the migrated VM will not be started, but qemu exits with an error message. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-03-13block: Rewrite the snapshot authorization mechanism for block filters.Benoît Canet1-2/+1
This patch keep the recursive way of doing things but simplify it by giving two responsabilities to all block filters implementors. They will need to do two things: -Set the is_filter field of their block driver to true. -Implement the bdrv_recurse_is_first_non_filter method of their block driver like it is done on the Quorum block driver. (block/quorum.c) [Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes the semantics of blkverify, which now recurses down both bs->file and s->test_file. -- Stefan] Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-28qmp: Make Quorum error events more palatable.Benoît Canet1-3/+6
Insert quorum QMP events documentation alphabetically. Also change the "ret" errno value by an optional "error" being an strerror(-ret) in the QUORUM_REPORT_BAD qmp event. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-21quorum: Simplify quorum_open()Max Reitz1-27/+39
Although it may not look like it, this patch simplifies quorum_open(). qdict_array_split() is now able to return QLists with different objects than only QDicts, therefore it will now do all the work and quorum_open() does not have to handle reference strings by itself. This allows mixing full option dicts and reference strings for specifying the child block devices of quorum; furthermore, it improves handling of malformed specifications. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_open() and quorum_close().Benoît Canet1-0/+161
Example of command line: -drive if=virtio,driver=quorum,\ children.0.file.filename=1.raw,\ children.0.node-name=1.raw,\ children.0.driver=raw,\ children.1.file.filename=2.raw,\ children.1.node-name=2.raw,\ children.1.driver=raw,\ children.2.file.filename=3.raw,\ children.2.node-name=3.raw,\ children.2.driver=raw,\ vote-threshold=2 blkverify=on with vote-threshold=2 and two files can be passed to emulate blkverify. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Implement recursive .bdrv_recurse_is_first_non_filter in quorum.Benoît Canet1-0/+19
This is used to activate quorum snapshot. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_co_flush().Benoît Canet1-0/+28
Makes a vote to select error if any. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_invalidate_cache().Benoît Canet1-0/+11
We really want that live migration works with quorum so implement quorum_invalidate_cache(). Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_getlength().Benoît Canet1-0/+26
Check that every bs file returns the same length. Otherwise, return -EIO to disable the quorum and avoid length discrepancy. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum mechanism.Benoît Canet1-1/+390
This patchset enables the core of the quorum mechanism. The num_children reads are compared to get the majority version and if this version exists more than threshold times the guest won't see the error at all. If a block is corrupted or if an error occurs during an IO or if the quorum cannot be established QMP events are used to report to the management. Use gnutls's SHA-256 to compare versions. --enable-quorum must be used to enable the feature. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_aio_readv.Benoît Canet1-1/+38
Add code to do num_children reads in parallel and cleanup the structures afterwards. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Add quorum_aio_writev and its dependencies.Benoît Canet1-0/+103
Writes are mirrored num_children times on num_children devices. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Create BDRVQuorumState and BlkDriver and do init.Benoît Canet1-0/+31
Create the structure holding the quorum settings and write the minimal block driver instanciation boilerplate. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21quorum: Create quorum.c, add QuorumChildRequest and QuorumAIOCB.Benoît Canet1-0/+53
Quorum is a block filter mirroring writes to num_children children. For reads quorum reads each children and does a vote. If more than vote_threshold versions are identical the quorum is reached and this winning version is returned to the guest. So quorum prevents bit corruption. For high availability purpose minority errors are reported via QMP but the guest does not see them. This patch creates the driver C source file and introduces the structures that will be used in asynchronous reads and writes. Signed-off-by: Benoit Canet <benoit@irqsave.net> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>