aboutsummaryrefslogtreecommitdiff
path: root/nbd/server.c
AgeCommit message (Collapse)AuthorFilesLines
2019-06-13nbd/server: Nicer spelling of max BLOCK_STATUS reply lengthEric Blake1-5/+8
Commit 3d068aff (3.0) introduced NBD_MAX_BITMAP_EXTENTS as a limit on how large we would allow a reply to NBD_CMD_BLOCK_STATUS to grow when it is visiting a qemu:dirty-bitmap: context. Later, commit fb7afc79 (3.1) reused the constant to limit base:allocation context replies, although the name is now less appropriate in that situation. Rename things, and improve the macro to use units.h for better legibility. Then reformat the comment to comply with checkpatch rules added in the meantime. No semantic change. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190510151735.29687-1-eblake@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-06-04block: Add BlockBackend.ctxKevin Wolf1-2/+3
This adds a new parameter to blk_new() which requires its callers to declare from which AioContext this BlockBackend is going to be used (or the locks of which AioContext need to be taken anyway). The given context is only stored and kept up to date when changing AioContexts. Actually applying the stored AioContext to the root node is saved for another commit. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2019-06-04nbd-server: Call blk_set_allow_aio_context_change()Kevin Wolf1-0/+1
The NBD server uses an AioContext notifier, so it can tolerate that its BlockBackend is switched to a different AioContext. Before we start actually calling bdrv_try_set_aio_context(), which checks for consistency, outside of test cases, we need to make sure that the NBD server actually allows this. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2019-04-08nbd/server: Don't fail NBD_OPT_INFO for byte-aligned sourcesEric Blake1-5/+8
In commit 0c1d50bd, I added a couple of TODO comments about whether we consult bl.request_alignment when responding to NBD_OPT_INFO. At the time, qemu as server was hard-coding an advertised alignment of 512 to clients that promised to obey constraints, and there was no function for getting at a device's preferred alignment. But in hindsight, advertising 512 when the block device prefers 1 caused other compliance problems, and commit b0245d64 changed one of the two TODO comments to advertise a more accurate alignment. Time to fix the other TODO. Doesn't really impact qemu as client (our normal client doesn't use NBD_OPT_INFO, and qemu-nbd --list promises to obey block sizes), but it might prove useful to other clients. Fixes: b0245d64 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190403030526.12258-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-08nbd/server: Trace client noncompliance on unaligned requestsEric Blake1-1/+16
We've recently added traces for clients to flag server non-compliance; let's do the same for servers to flag client non-compliance. According to the spec, if the client requests NBD_INFO_BLOCK_SIZE, it is promising to send all requests aligned to those boundaries. Of course, if the client does not request NBD_INFO_BLOCK_SIZE, then it made no promises so we shouldn't flag anything; and because we are willing to handle clients that made no promises (the spec allows us to use NBD_REP_ERR_BLOCK_SIZE_REQD if we had been unwilling), we already have to handle unaligned requests (which the block layer already does on our behalf). So even though the spec allows us to return EINVAL for clients that promised to behave, it's easier to always answer unaligned requests. Still, flagging non-compliance can be useful in debugging a client that is trying to be maximally portable. Qemu as client used to have one spot where it sent non-compliant requests: if the server sends an unaligned reply to NBD_CMD_BLOCK_STATUS, and the client was iterating over the entire disk, the next request would start at that unaligned point; this was fixed in commit a39286dd when the client was taught to work around server non-compliance; but is equally fixed if the server is patched to not send unaligned replies in the first place (yes, qemu 4.0 as server still has few such bugs, although they will be patched in 4.1). Fortunately, I did not find any more spots where qemu as client was non-compliant. I was able to test the patch by using the following hack to convince qemu-io to run various unaligned commands, coupled with serving 512-byte alignment by intentionally omitting '-f raw' on the server while viewing server traces. | diff --git i/nbd/client.c w/nbd/client.c | index 427980bdd22..1858b2aac35 100644 | --- i/nbd/client.c | +++ w/nbd/client.c | @@ -449,6 +449,7 @@ static int nbd_opt_info_or_go(QIOChannel *ioc, uint32_t opt, | nbd_send_opt_abort(ioc); | return -1; | } | + info->min_block = 1;//hack | if (!is_power_of_2(info->min_block)) { | error_setg(errp, "server minimum block size %" PRIu32 | " is not a power of two", info->min_block); Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190403030526.12258-3-eblake@redhat.com> [eblake: address minor review nits] Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-08nbd/server: Fix blockstatus traceEric Blake1-6/+3
Don't increment remaining_bytes until we know that we will actually be including the current block status extent in the reply; otherwise, the value traced will include a bytes value that is oversized by the length of the next block status extent which did not get sent because it instead ended the loop. Fixes: fb7afc79 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190403030526.12258-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-04-01nbd/server: Advertise actual minimum block sizeEric Blake1-5/+8
Both NBD_CMD_BLOCK_STATUS and structured NBD_CMD_READ will split their reply according to bdrv_block_status() boundaries. If the block device has a request_alignment smaller than 512, but we advertise a block alignment of 512 to the client, then this can result in the server reply violating client expectations by reporting a smaller region of the export than what the client is permitted to address (although this is less of an issue for qemu 4.0 clients, given recent client patches to overlook our non-compliance at EOF). Since it's always better to be strict in what we send, it is worth advertising the actual minimum block limit rather than blindly rounding it up to 512. Note that this patch is not foolproof - it is still possible to provoke non-compliant server behavior using: $ qemu-nbd --image-opts driver=blkdebug,align=512,image.driver=file,image.filename=/path/to/non-aligned-file That is arguably a bug in the blkdebug driver (it should never pass back block status smaller than its alignment, even if it has to make multiple bdrv_get_status calls and determine the least-common-denominator status among the group to return). It may also be possible to observe issues with a backing layer with smaller alignment than the active layer, although so far I have been unable to write a reliable iotest for that scenario (but again, an issue like that could be argued to be a bug in the block layer, or something where we need a flag to bdrv_block_status() to state whether the result must be aligned to the current layer's limits or can be subdivided for accuracy when chasing backing files). Anyways, as blkdebug is not normally used, and as this patch makes our server more interoperable with qemu 3.1 clients, it is worth applying now, even while we still work on a larger patch series for the 4.1 timeframe to have byte-accurate file lengths. Note that the iotests output changes - for 223 and 233, we can see the server's better granularity advertisement; and for 241, the three test cases have the following effects: - natural alignment: the server's smaller alignment is now advertised, and the hole reported at EOF is now the right result; we've gotten rid of the server's non-compliance - forced server alignment: the server still advertises 512 bytes, but still sends a mid-sector hole. This is still a server compliance bug, which needs to be fixed in the block layer in a later patch; output does not change because the client is already being tolerant of the non-compliance - forced client alignment: the server's smaller alignment means that the client now sees the server's status change mid-sector without any protocol violations, but the fact that the map shows an unaligned mid-sector hole is evidence of the block layer problems with aligned block status, to be fixed in a later patch Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190329042750.14704-7-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: rebase to enhanced iotest 241 coverage]
2019-03-12block/dirty-bitmaps: add block_dirty_bitmap_check functionJohn Snow1-2/+1
Instead of checking against busy, inconsistent, or read only directly, use a check function with permissions bits that let us streamline the checks without reproducing them in many places. Included in this patch are permissions changes that simply add the inconsistent check to existing permissions call spots, without addressing existing bugs. In general, this means that busy+readonly checks become BDRV_BITMAP_DEFAULT, which checks against all three conditions. busy-only checks become BDRV_BITMAP_ALLOW_RO. Notably, remove allows inconsistent bitmaps, so it doesn't follow the pattern. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190301191545.8728-4-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12block/dirty-bitmaps: unify qmp_locked and user_locked callsJohn Snow1-3/+3
These mean the same thing now. Unify them and rename the merged call bdrv_dirty_bitmap_busy to indicate semantically what we are describing, as well as help disambiguate from the various _locked and _unlocked versions of bitmap helpers that refer to mutex locks. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-8-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-12nbd: change error checking order for bitmapsJohn Snow1-5/+5
Check that the bitmap is not in use prior to it checking if it is not enabled/recording guest writes. The bitmap being busy was likely at the behest of the user, so this error has a greater chance of being understood by the user. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20190223000614.13894-6-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2019-03-06qemu-nbd: add support for authorization of TLS clientsDaniel P. Berrange1-5/+5
Currently any client which can complete the TLS handshake is able to use the NBD server. The server admin can turn on the 'verify-peer' option for the x509 creds to require the client to provide a x509 certificate. This means the client will have to acquire a certificate from the CA before they are permitted to use the NBD server. This is still a fairly low bar to cross. This adds a '--tls-authz OBJECT-ID' option to the qemu-nbd command which takes the ID of a previously added 'QAuthZ' object instance. This will be used to validate the client's x509 distinguished name. Clients failing the authorization check will not be permitted to use the NBD server. For example to setup authorization that only allows connection from a client whose x509 certificate distinguished name is CN=laptop.example.com,O=Example Org,L=London,ST=London,C=GB escape the commas in the name and use: qemu-nbd --object tls-creds-x509,id=tls0,dir=/home/berrange/qemutls,\ endpoint=server,verify-peer=yes \ --object 'authz-simple,id=auth0,identity=CN=laptop.example.com,,\ O=Example Org,,L=London,,ST=London,,C=GB' \ --tls-creds tls0 \ --tls-authz authz0 \ ....other qemu-nbd args... NB: a real shell command line would not have leading whitespace after the line continuation, it is just included here for clarity. Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-Id: <20190227162035.18543-2-berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: split long line in --help text, tweak 233 to show that whitespace after ,, in identity= portion is actually okay] Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-11nbd/server: Kill pointless shadowed variableEric Blake1-1/+0
lgtm.com pointed out that commit 678ba275 introduced a shadowed declaration of local variable 'bs'; thankfully, the inner 'bs' obtained by 'blk_bs(blk)' matches the outer one given that we had 'blk_insert_bs(blk, bs, errp)' a few lines earlier, and there are no later uses of 'bs' beyond the scope of the 'if (bitmap)' to care if we change the value stored in 'bs' while traveling the backing chain to find a bitmap. So simply get rid of the extra declaration. Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190207191357.6665-1-eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2019-02-04nbd: generalize usage of nbd_readVladimir Sementsov-Ogievskiy1-18/+9
We generally do very similar things around nbd_read: error_prepend specifying what we have tried to read, and be_to_cpu conversion of integers. So, it seems reasonable to move common things to helper functions, which: 1. simplify code a bit 2. generalize nbd_read error descriptions, all starting with "Failed to read" 3. make it more difficult to forget to convert things from BE Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20190128165830.165170-1-vsementsov@virtuozzo.com> [eblake: rename macro to DEF_NBD_READ_N and formatting tweaks; checkpatch has false positive complaint] Signed-off-by: Eric Blake <eblake@redhat.com>
2019-01-21nbd/server: Favor [u]int64_t over off_tEric Blake1-9/+9
Although our compile-time environment is set up so that we always support long files with 64-bit off_t, we have no guarantee whether off_t is the same type as int64_t. This requires casts when printing values, and prevents us from directly using qemu_strtoi64() (which will be done in the next patch). Let's just flip to uint64_t where possible, and stick to int64_t for detecting failure of blk_getlength(); we also keep the assertions added in the previous patch that the resulting values fit in 63 bits. The overflow check in nbd_co_receive_request() was already sane (request->from is validated to fit in 63 bits, and request->len is 32 bits, so the addition can't overflow 64 bits), but rewrite it in a form easier to recognize as a typical overflow check. Rename the variable 'description' to keep line lengths reasonable. Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190117193658.16413-7-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-21nbd/server: Hoist length check to qmp_nbd_server_addEric Blake1-7/+3
We only had two callers to nbd_export_new; qemu-nbd.c always passed a valid offset/length pair (because it already checked the file length, to ensure that offset was in bounds), while blockdev-nbd.c always passed 0/-1. Then nbd_export_new reduces the size to a multiple of BDRV_SECTOR_SIZE (can only happen when offset is not sector-aligned, since bdrv_getlength() currently rounds up) (someday, it would be nice to have byte-accurate lengths - but not today). However, I'm finding it easier to work with the code if we are consistent on having both callers pass in a valid length, and just assert that things are sane in nbd_export_new, meaning that no negative values were passed, and that offset+size does not exceed 63 bits (as that really is a fundamental limit to later operations, whether we use off_t or uint64_t). Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190117193658.16413-6-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-15dirty-bitmap: improve bdrv_dirty_bitmap_next_zeroVladimir Sementsov-Ogievskiy1-1/+1
Add bytes parameter to the function, to limit searched range. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2019-01-14nbd: Merge nbd_export_bitmap into nbd_export_newEric Blake1-47/+40
We only have one caller that wants to export a bitmap name, which it does right after creation of the export. But there is still a brief window of time where an NBD client could see the export but not the dirty bitmap, which a robust client would have to interpret as meaning the entire image should be treated as dirty. Better is to eliminate the window entirely, by inlining nbd_export_bitmap() into nbd_export_new(), and refusing to create the bitmap in the first place if the requested bitmap can't be located. We also no longer need logic for setting a different bitmap name compared to the bitmap being exported. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190111194720.15671-8-eblake@redhat.com>
2019-01-14nbd: Merge nbd_export_set_name into nbd_export_newEric Blake1-29/+23
The existing NBD code had a weird split where nbd_export_new() created an export but did not add it to the list of exported names until a later nbd_export_set_name() came along and grabbed a second reference on the object; later, the first call to nbd_export_close() drops the second reference while removing the export from the list. This is in part because the QAPI NbdServerRemoveNode enum documents the possibility of adding a mode where we could do a soft disconnect: preventing new clients, but waiting for existing clients to gracefully quit, based on the mode used when calling nbd_export_close(). But in spite of all that, note that we never change the name of an NBD export while it is exposed, which means it is easier to just inline the process of setting the name as part of creating the export. Inline the contents of nbd_export_set_name() and nbd_export_set_description() into the two points in an export lifecycle where they matter, then adjust both callers to pass the name up front. Note that for creation, all callers pass a non-NULL name, (passing NULL at creation was for old style servers, but we removed support for that in commit 7f7dfe2a), so we can add an assert and do things unconditionally; but for cleanup, because of the dual nature of nbd_export_close(), we still have to be careful to avoid use-after-free. Along the way, add a comment reminding ourselves of the potential of adding a middle mode disconnect. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20190111194720.15671-5-eblake@redhat.com>
2019-01-14nbd: Only require disabled bitmap for read-only exportsEric Blake1-2/+5
Our initial implementation of x-nbd-server-add-bitmap put in a restriction because of incremental backups: in that usage, we are exporting one qcow2 file (the temporary overlay target of a blockdev-backup sync:none job) and a dirty bitmap owned by a second qcow2 file (the source of the blockdev-backup, which is the backing file of the temporary). While both qcow2 files are still writable (the target in order to capture copy-on-write of old contents, and the source in order to track live guest writes in the meantime), the NBD client expects to see constant data, including the dirty bitmap. An enabled bitmap in the source would be modified by guest writes, which is at odds with the NBD export being a read-only constant view, hence the initial code choice of enforcing a disabled bitmap (the intent is that the exposed bitmap was disabled in the same transaction that started the blockdev-backup job, although we don't want to track enough state to actually enforce that). However, consider the case of a bitmap contained in a read-only node (including when the bitmap is found in a backing layer of the active image). Because the node can't be modified, the bitmap won't change due to writes, regardless of whether it is still enabled. Forbidding the export unless the bitmap is disabled is awkward, paritcularly since we can't change the bitmap to be disabled (because the node is read-only). Alternatively, consider the case of live storage migration, where management directs the destination to create a writable NBD server, then performs a drive-mirror from the source to the target, prior to doing the rest of the live migration. Since storage migration can be time-consuming, it may be wise to let the destination include a dirty bitmap to track which portions it has already received, where even if the migration is interrupted and restarted, the source can query the destination block status in order to potentially minimize re-sending data that has not changed in the meantime on a second attempt. Such code has not been written, and might not be trivial (after all, a cluster being marked dirty in the bitmap does not necessarily guarantee it has the desired contents), but it makes sense that letting an active dirty bitmap be exposed and changing alongside writes may prove useful in the future. Solve both issues by gating the restriction against a disabled bitmap to only happen when the caller has requested a read-only export, and where the BDS that owns the bitmap (whether or not it is the BDS handed to nbd_export_new() or from its backing chain) is still writable. We could drop the check altogether (if management apps are prepared to deal with a changing bitmap even on a read-only image), but for now keeping a check for the read-only case still stands a chance of preventing management errors. Update iotest 223 to show the looser behavior by leaving a bitmap enabled the whole run; note that we have to tear down and re-export a node when handling an error. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20190111194720.15671-4-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2018-11-30nbd/server: Advertise all contexts in response to bare LISTEric Blake1-0/+1
The NBD spec, and even our code comment, says that if the client asks for NBD_OPT_LIST_META_CONTEXT with 0 queries, then we should reply with (a possibly-compressed representation of) ALL contexts that we are willing to let them try. But commit 3d068aff forgot to advertise qemu:dirty-bitmap:FOO. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20181130023232.3079982-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2018-11-19nbd/server: Ignore write errors when replying to NBD_OPT_ABORTEric Blake1-4/+8
Commit 37ec36f6 intentionally ignores errors when trying to reply to an NBD_OPT_ABORT request for plaintext clients, but did not make the same change for a TLS server. Since NBD_OPT_ABORT is documented as being a potential for an EPIPE when the client hangs up without waiting for our reply, we don't need to pollute the server's output with that failure. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20181117223221.2198751-1-eblake@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2018-11-19nbd: fix whitespace in server error messageDaniel P. Berrangé1-1/+1
A space was missing after the option number was printed: Option 0x8not permitted before TLS becomes Option 0x8 not permitted before TLS This fixes commit 3668328303429f3bc93ab3365c66331600b06a2d Author: Eric Blake <eblake@redhat.com> Date: Fri Oct 14 13:33:09 2016 -0500 nbd: Send message along with server NBD_REP_ERR errors Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20181116155325.22428-2-berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: move lone space to next line] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-10-29nbd: forbid use of frozen bitmapsJohn Snow1-2/+2
Whether it's "locked" or "frozen", it's in use and should not be allowed for the purposes of this operation. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-id: 20181002230218.13949-7-jsnow@redhat.com Signed-off-by: John Snow <jsnow@redhat.com>
2018-10-03nbd/server: drop old-style negotiationVladimir Sementsov-Ogievskiy1-38/+15
After the previous commit, nbd_client_new's first parameter is always NULL. Let's drop it with all corresponding old-style negotiation code path which is unreachable now. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20181003170228.95973-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: re-wrap short line] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-10-03nbd/server: fix NBD_CMD_CACHEVladimir Sementsov-Ogievskiy1-1/+2
We should not go to structured-read branch on CACHE command, fix that. Bug introduced in bc37b06a5cde24 "nbd/server: introduce NBD_CMD_CACHE" with the whole feature and affects 3.0.0 release. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> CC: qemu-stable@nongnu.org Message-Id: <20181003144738.70670-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: commit message typo fix] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-10-03nbd: Don't take address of fields in packed structsPeter Maydell1-12/+12
Taking the address of a field in a packed struct is a bad idea, because it might not be actually aligned enough for that pointer type (and thus cause a crash on dereference on some host architectures). Newer versions of clang warn about this. Avoid the bug by not using the "modify in place" byte swapping functions. This patch was produced with the following spatch script: @@ expression E; @@ -be16_to_cpus(&E); +E = be16_to_cpu(E); @@ expression E; @@ -be32_to_cpus(&E); +E = be32_to_cpu(E); @@ expression E; @@ -be64_to_cpus(&E); +E = be64_to_cpu(E); @@ expression E; @@ -cpu_to_be16s(&E); +E = cpu_to_be16(E); @@ expression E; @@ -cpu_to_be32s(&E); +E = cpu_to_be32(E); @@ expression E; @@ -cpu_to_be64s(&E); +E = cpu_to_be64(E); Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180927164200.15097-1-peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: rebase, and squash in missed changes] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-09-26nbd/server: send more than one extent of base:allocation contextVladimir Sementsov-Ogievskiy1-19/+60
This is necessary for efficient block-status export, for clients which support it. (qemu is not yet such a client, but could become one.) Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180704112302.471456-3-vsementsov@virtuozzo.com> [eblake: grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-09-26nbd/server: fix bitmap exportVladimir Sementsov-Ogievskiy1-1/+4
bitmap_to_extents function is broken: it switches dirty variable after every iteration, however it can process only part of dirty (or zero) area during one iteration in case when this area is too large for one extent. Fortunately, the bug doesn't produce wrong extent flags: it just inserts a zero-length extent between sequential extents representing large dirty (or zero) area. However, zero-length extents are forbidden by the NBD protocol. So, a careful client should consider such a reply as a server fault, while a less-careful will likely ignore zero-length extents. The bug can only be triggered by a client that requests block status for nearly 4G at once (a request of 4G and larger is impossible per the protocol, and requests smaller than 4G less the bitmap granularity cause the loop to quit iterating rather than revisit the tail of the large area); it also cannot trigger if the client used the NBD_CMD_FLAG_REQ_ONE flag. Since qemu 3.0 as client (using the x-dirty-bitmap extension) always passes the flag, it is immune; and we are not aware of other open-source clients that know how to request qemu:dirty-bitmap:FOO contexts. Clients that want to avoid the bug could cap block status requests to a smaller length, such as 2G or 3G. Fix this by more careful handling of dirty variable. Bug was introduced in 3d068aff16 "nbd/server: implement dirty bitmap export", with the whole function. and is present in v3.0.0 release. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180914165116.23182-1-vsementsov@virtuozzo.com> CC: qemu-stable@nongnu.org Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: improved commit message] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-07-07nbd/server: fix nbd_co_send_block_statusVladimir Sementsov-Ogievskiy1-2/+3
Call nbd_co_send_extents() with correct length parameter (extent.length may be smaller than original length). Also, switch length parameter type to uint32_t, to correspond with request->len and similar nbd_co_send_bitmap(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180704112302.471456-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2018-07-02nbd/server: Fix dirty bitmap logic regressionEric Blake1-1/+1
In my hurry to fix a build failure, I introduced a logic bug. The assertion conditional is backwards, meaning that qemu will now abort instead of reporting dirty bitmap status. The bug can only be tickled by an NBD client using an exported dirty bitmap (which is still an experimental QMP command), so it's not the end of the world for supported usage (and neither 'make check' nor qemu-iotests fails); but it also shows that we really want qemu-io support for reading dirty bitmaps if only so that I can add iotests coverage to prevent future brown-bag-of-shame events like this one. Fixes: 45eb6fb6 Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180622153509.375130-1-eblake@redhat.com>
2018-06-22nbd/server: Silence gcc false positiveEric Blake1-1/+2
The code has a while() loop that always initialized 'end', and the loop always executes at least once (as evidenced by the assert() just prior to the loop). But some versions of gcc still complain that 'end' is used uninitialized, so silence them. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20180622125814.345274-1-eblake@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-06-21nbd/server: introduce NBD_CMD_CACHEVladimir Sementsov-Ogievskiy1-4/+7
Handle nbd CACHE command. Just do read, without sending read data back. Cache mechanism should be done by exported node driver chain. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180413143156.11409-1-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: fix two missing case labels in switch statements] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-21nbd/server: implement dirty bitmap exportVladimir Sementsov-Ogievskiy1-23/+255
Handle a new NBD meta namespace: "qemu", and corresponding queries: "qemu:dirty-bitmap:<export bitmap name>". With the new metadata context negotiated, BLOCK_STATUS query will reply with dirty-bitmap data, converted to extents. The new public function nbd_export_bitmap selects which bitmap to export. For now, only one bitmap may be exported. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: wording tweaks, minor cleanups, additional tracing] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-21nbd/server: add nbd_meta_empty_or_pattern helperVladimir Sementsov-Ogievskiy1-27/+62
Add nbd_meta_pattern() and nbd_meta_empty_or_pattern() helpers for metadata query parsing. nbd_meta_pattern() will be reused for the "qemu" namespace in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: comment tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-21nbd/server: refactor NBDExportMetaContextsVladimir Sementsov-Ogievskiy1-12/+11
Use NBDExport pointer instead of just export name: there is no need to store a duplicated name in the struct; moreover, NBDExport will be used further. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: commit message grammar tweak] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-21nbd/server: fix traceVladimir Sementsov-Ogievskiy1-4/+10
Return code = 1 doesn't mean that we parsed base:allocation. Use correct traces in both -parsed and -skipped cases. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180609151758.17343-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: comment tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-06-21nbd/server: Reject 0-length block status requestEric Blake1-0/+4
The NBD spec says that behavior is unspecified if the client requests 0 length for block status; but since the structured reply is documenting as returning a non-zero length, it's easier to just diagnose this with an EINVAL error than to figure out what to return. CC: qemu-stable@nongnu.org Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180621124937.166549-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2018-04-02nbd: trace meta context negotiationEric Blake1-0/+8
Having a more detailed log of the interaction between client and server is invaluable in debugging how meta context negotiation actually works. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180330130950.1931229-1-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
2018-03-13nbd: BLOCK_STATUS for standard get_block_status function: server partVladimir Sementsov-Ogievskiy1-0/+311
Minimal realization: only one extent in server answer is supported. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180312152126.286890-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: tweak whitespace, move constant from .h to .c, improve logic of check_meta_export_name, simplify nbd_negotiate_options by doing more in nbd_negotiate_meta_queries] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13nbd/server: add nbd_read_opt_name helperVladimir Sementsov-Ogievskiy1-10/+43
Add helper to read name in format: uint32 len (<= NBD_MAX_NAME_SIZE) len bytes string (not 0-terminated) The helper will be reused in following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180312152126.286890-3-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar fixes, actually check error] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13nbd/server: add nbd_opt_invalid helperVladimir Sementsov-Ogievskiy1-14/+36
NBD_REP_ERR_INVALID is often parameter to nbd_opt_drop and it would be used more in following patches. So, let's add a helper. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-Id: <20180312152126.286890-2-vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13nbd/server: Honor FUA request on NBD_CMD_TRIMEric Blake1-0/+3
The NBD spec states that since trim requests can affect disk contents, then they should allow for FUA semantics just like writes for ensuring the disk has settled before returning. As bdrv_[co_]pdiscard() does not support a flags argument, we can't pass FUA down the block layer stack, and must therefore emulate it with a flush at the NBD layer. Note that in all reality, generic well-behaved clients will never send TRIM+FUA (in fact, qemu as a client never does, and we have no intention to plumb flags into bdrv_pdiscard). This is because the NBD protocol states that it is unspecified to READ a trimmed area (you might read stale data, all zeroes, or even random unrelated data) without first rewriting it, and even the experimental BLOCK_STATUS extension states that TRIM need not affect reported status. Thus, in the general case, a client cannot tell the difference between an arbitrary server that ignores TRIM, a server that had a power outage without flushing to disk, and a server that actually affected the disk before returning; so waiting for the trim actions to flush to disk makes little sense. However, for a specific client and server pair, where the client knows the server treats TRIM'd areas as guaranteed reads-zero, waiting for a flush makes sense, hence why the protocol documents that FUA is valid on trim. So, even though the NBD protocol doesn't have a way for the server to advertise what effects (if any) TRIM will actually have, and thus any client that relies on specific effects is probably in error, we can at least support a client that requests TRIM+FUA. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20180307225732.155835-1-eblake@redhat.com>
2018-03-13nbd/server: refactor nbd_trip: split out nbd_handle_requestVladimir Sementsov-Ogievskiy1-62/+66
Split out request handling logic. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180308184636.178534-6-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: touch up blank line placement] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13nbd/server: refactor nbd_trip: cmd_read and generic replyVladimir Sementsov-Ogievskiy1-81/+95
nbd_trip has difficult logic when sending replies: it tries to use one code path for all replies. It is ok for simple replies, but is not comfortable for structured replies. Also, two types of error (and corresponding messages in local_err) - fatal (leading to disconnect) and not-fatal (just to be sent to the client) are difficult to follow. To make things a bit clearer, the following is done: - split CMD_READ logic to separate function. It is the most difficult command for now, and it is definitely cramped inside nbd_trip. Also, it is difficult to follow CMD_READ logic, shared between "case NBD_CMD_READ" and "if"s under "reply:" label. - create separate helper function nbd_send_generic_reply() and use it both in new nbd_do_cmd_read and for other commands in nbd_trip instead of common code-path under "reply:" label in nbd_trip. The helper supports an error message, so logic with local_err in nbd_trip is simplified. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180308184636.178534-5-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: grammar tweaks and blank line placement] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13nbd/server: fix: check client->closing before sending replyVladimir Sementsov-Ogievskiy1-8/+9
Since the unchanged code has just set client->recv_coroutine to NULL before calling nbd_client_receive_next_request(), we are spawning a new coroutine unconditionally, but the first thing that coroutine will do is check for client->closing, making it a no-op if we have already detected that the client is going away. Furthermore, for any error other than EIO (where we disconnect, which itself sets client->closing), if the client has already gone away, we'll probably encounter EIO later in the function and attempt disconnect at that point. Logically, as soon as we know the connection is closing, there is no need to try a likely-to-fail a response or spawn a no-op coroutine. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180308184636.178534-4-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> [eblake: squash in further reordering: hoist check before spawning next coroutine, and document rationale in commit message] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13nbd/server: fix sparse readVladimir Sementsov-Ogievskiy1-3/+14
In case of io error in nbd_co_send_sparse_read we should not "goto reply:", as it was a fatal error and the common behavior is to disconnect in this case. We should not try to send the client an additional error reply, since we already hit a channel-io error on our previous attempt to send one. Fix this by handling block-status error in nbd_co_send_sparse_read, so nbd_co_send_sparse_read fails only on io error. Then just skip common "reply:" code path in nbd_trip. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180308184636.178534-3-vsementsov@virtuozzo.com> [eblake: grammar tweaks] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-13nbd/server: move nbd_co_send_structured_error upVladimir Sementsov-Ogievskiy1-24/+24
To be reused in nbd_co_send_sparse_read() in the following patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180308184636.178534-2-vsementsov@virtuozzo.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2018-03-06qio: non-default context for TLS handshakePeter Xu1-0/+1
A new parameter "context" is added to qio_channel_tls_handshake() is to allow the TLS to be run on a non-default context. Still, no functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2018-03-01nbd/client: fix error messages in nbd_handle_reply_errVladimir Sementsov-Ogievskiy1-2/+2
1. NBD_REP_ERR_INVALID is not only about length, so, make message more general 2. hex format is not very good: it's hard to read something like "option a (set meta context)", so switch to dec. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <1518702707-7077-6-git-send-email-vsementsov@virtuozzo.com> [eblake: expand scope of patch: ALL uses of nbd_opt_lookup and nbd_rep_lookup are now decimal] Signed-off-by: Eric Blake <eblake@redhat.com>
2018-01-26qapi: add nbd-server-removeVladimir Sementsov-Ogievskiy1-0/+13
Add command for removing an export. It is needed for cases when we don't want to keep the export after the operation on it was completed. The other example is a temporary node, created with blockdev-add. If we want to delete it we should firstly remove any corresponding NBD export. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20180119135719.24745-3-vsementsov@virtuozzo.com> [eblake: drop dead nb_clients code] Signed-off-by: Eric Blake <eblake@redhat.com>