aboutsummaryrefslogtreecommitdiff
path: root/include/block
AgeCommit message (Collapse)AuthorFilesLines
2022-07-15nvme: Fix misleading macro when mixed with ternary operatorDarren Kenny1-22/+22
Using the Parfait source code analyser and issue was found in hw/nvme/ctrl.c where the macros NVME_CAP_SET_CMBS and NVME_CAP_SET_PMRS are called with a ternary operatore in the second parameter, resulting in a potentially unexpected expansion of the form: x ? a: b & FLAG_TEST which will result in a different result to: (x ? a: b) & FLAG_TEST. The macros should wrap each of the parameters in brackets to ensure the correct result on expansion. Signed-off-by: Darren Kenny <darren.kenny@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-07-15hw/nvme: Implement shadow doorbell buffer supportJinhao Fan1-0/+2
Implement Doorbel Buffer Config command (Section 5.7 in NVMe Spec 1.3) and Shadow Doorbel buffer & EventIdx buffer handling logic (Section 7.13 in NVMe Spec 1.3). For queues created before the Doorbell Buffer Config command, the nvme_dbbuf_config function tries to associate each existing SQ and CQ with its Shadow Doorbel buffer and EventIdx buffer address. Queues created after the Doorbell Buffer Config command will have the doorbell buffers associated with them when they are initialized. In nvme_process_sq and nvme_post_cqe, proactively check for Shadow Doorbell buffer changes instead of wait for doorbell register changes. This reduces the number of MMIOs. In nvme_process_db(), update the shadow doorbell buffer value with the doorbell register value if it is the admin queue. This is a hack since hosts like Linux NVMe driver and SPDK do not use shadow doorbell buffer for the admin queue. Copying the doorbell register value to the shadow doorbell buffer allows us to support these hosts as well as spec-compliant hosts that use shadow doorbell buffer for the admin queue. Signed-off-by: Jinhao Fan <fanjinhao21s@ict.ac.cn> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Keith Busch <kbusch@kernel.org> [k.jensen: rebased] Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-07-12block: Add bdrv_co_pwrite_sync()Alberto Faria1-2/+6
Also convert bdrv_pwrite_sync() to being implemented using generated_co_wrapper. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220609152744.3891847-9-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-07-12block: Implement bdrv_{pread,pwrite,pwrite_zeroes}() using generated_co_wrapperAlberto Faria1-6/+9
bdrv_{pread,pwrite}() now return -EIO instead of -EINVAL when 'bytes' is negative, making them consistent with bdrv_{preadv,pwritev}() and bdrv_co_{pread,pwrite,preadv,pwritev}(). bdrv_pwrite_zeroes() now also calls trace_bdrv_co_pwrite_zeroes() and clears the BDRV_REQ_MAY_UNMAP flag when appropriate, which it didn't previously. Signed-off-by: Alberto Faria <afaria@redhat.com> Message-Id: <20220609152744.3891847-8-afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-07-12block: Make 'bytes' param of bdrv_co_{pread,pwrite,preadv,pwritev}() an int64_tAlberto Faria1-2/+2
For consistency with other I/O functions, and in preparation to implement bdrv_{pread,pwrite}() using generated_co_wrapper. unsigned int fits in int64_t, so all callers remain correct. bdrv_check_request32() is called further down the stack and causes -EIO to be returned if 'bytes' is negative or greater than BDRV_REQUEST_MAX_BYTES, which in turns never exceeds SIZE_MAX. Signed-off-by: Alberto Faria <afaria@redhat.com> Message-Id: <20220609152744.3891847-7-afaria@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-07-12block: Make bdrv_co_pwrite() take a const bufferAlberto Faria1-1/+1
It does not mutate the buffer. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220609152744.3891847-6-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-07-12block: Change bdrv_{pread,pwrite,pwrite_sync}() param orderAlberto Faria1-5/+5
Swap 'buf' and 'bytes' around for consistency with bdrv_co_{pread,pwrite}(), and in preparation to implement these functions using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pread(child, offset, buf, bytes, flags) + bdrv_pread(child, offset, bytes, buf, flags) @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pwrite(child, offset, buf, bytes, flags) + bdrv_pwrite(child, offset, bytes, buf, flags) @@ expression child, offset, buf, bytes, flags; @@ - bdrv_pwrite_sync(child, offset, buf, bytes, flags) + bdrv_pwrite_sync(child, offset, bytes, buf, flags) Resulting overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220609152744.3891847-3-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-07-12block: Add a 'flags' param to bdrv_{pread,pwrite,pwrite_sync}()Alberto Faria1-3/+4
For consistency with other I/O functions, and in preparation to implement them using generated_co_wrapper. Callers were updated using this Coccinelle script: @@ expression child, offset, buf, bytes; @@ - bdrv_pread(child, offset, buf, bytes) + bdrv_pread(child, offset, buf, bytes, 0) @@ expression child, offset, buf, bytes; @@ - bdrv_pwrite(child, offset, buf, bytes) + bdrv_pwrite(child, offset, buf, bytes, 0) @@ expression child, offset, buf, bytes; @@ - bdrv_pwrite_sync(child, offset, buf, bytes) + bdrv_pwrite_sync(child, offset, buf, bytes, 0) Resulting overly-long lines were then fixed by hand. Signed-off-by: Alberto Faria <afaria@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Message-Id: <20220609152744.3891847-2-afaria@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-06-29block/block-copy: block_copy(): add timeout_ns parameterVladimir Sementsov-Ogievskiy1-1/+3
Add possibility to limit block_copy() call in time. To be used in the next commit. As timed-out block_copy() call will continue in background anyway (we can't immediately cancel IO operation), it's important also give user a possibility to pass a callback, to do some additional actions on block-copy call finish. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
2022-06-24aio_wait_kick: add missing memory barrierEmanuele Giuseppe Esposito1-0/+2
It seems that aio_wait_kick always required a memory barrier or atomic operation in the caller, but nobody actually took care of doing it. Let's put the barrier in the function instead, and pair it with another one in AIO_WAIT_WHILE. Read aio_wait_kick() comment for further explanation. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220524173054.12651-1-eesposit@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24block: simplify handling of try to merge different sized bitmapsVladimir Sementsov-Ogievskiy1-1/+1
We have too much logic to simply check that bitmaps are of the same size. Let's just define that hbitmap_merge() and bdrv_dirty_bitmap_merge_internal() require their argument bitmaps be of same size, this simplifies things. Let's look through the callers: For backup_init_bcs_bitmap() we already assert that merge can't fail. In bdrv_reclaim_dirty_bitmap_locked() we gracefully handle the error that can't happen: successor always has same size as its parent, drop this logic. In bdrv_merge_dirty_bitmap() we already has assertion and separate check. Make the check explicit and improve error message. Signed-off-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220517111206.23585-4-v.sementsov-og@mail.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-24block: drop unused bdrv_co_drain() APIStefan Hajnoczi1-1/+0
bdrv_co_drain() has not been used since commit 9a0cec664eef ("mirror: use bdrv_drained_begin/bdrv_drained_end") in 2016. Remove it so there are fewer drain scenarios to worry about. Use bdrv_drained_begin()/bdrv_drained_end() instead. They are "mixed" functions that can be called from coroutine context. Unlike bdrv_co_drain(), these functions provide control of the length of the drained section, which is usually the right thing. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220521122714.3837731-1-stefanha@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Alberto Faria <afaria@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-06-23hw/nvme: Add support for the Virtualization Management commandŁukasz Gieryk1-0/+17
With the new command one can: - assign flexible resources (queues, interrupts) to primary and secondary controllers, - toggle the online/offline state of given controller. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-06-23hw/nvme: Initialize capability structures for primary/secondary controllersŁukasz Gieryk1-0/+5
With four new properties: - sriov_v{i,q}_flexible, - sriov_max_v{i,q}_per_vf, one can configure the number of available flexible resources, as well as the limits. The primary and secondary controller capability structures are initialized accordingly. Since the number of available queues (interrupts) now varies between VF/PF, BAR size calculation is also adjusted. Signed-off-by: Łukasz Gieryk <lukasz.gieryk@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-06-23hw/nvme: Add support for Secondary Controller ListLukasz Maniak1-0/+20
Introduce handling for Secondary Controller List (Identify command with CNS value of 15h). Secondary controller ids are unique in the subsystem, hence they are reserved by it upon initialization of the primary controller to the number of sriov_max_vfs. ID reservation requires the addition of an intermediate controller slot state, so the reserved controller has the address 0xFFFF. A secondary controller is in the reserved state when it has no virtual function assigned, but its primary controller is realized. Secondary controller reservations are released to NULL when its primary controller is unregistered. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-06-23hw/nvme: Add support for Primary Controller CapabilitiesLukasz Maniak1-0/+23
Implementation of Primary Controller Capabilities data structure (Identify command with CNS value of 14h). Currently, the command returns only ID of a primary controller. Handling of remaining fields are added in subsequent patches implementing virtualization enhancements. Signed-off-by: Lukasz Maniak <lukasz.maniak@linux.intel.com> Reviewed-by: Klaus Jensen <k.jensen@samsung.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-05-12nbd/server: Allow MULTI_CONN for shared writable exportsEric Blake1-1/+2
According to the NBD spec, a server that advertises NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will not see any cache inconsistencies: when properly separated by a single flush, actions performed by one client will be visible to another client, regardless of which client did the flush. We always satisfy these conditions in qemu - even when we support multiple clients, ALL clients go through a single point of reference into the block layer, with no local caching. The effect of one client is instantly visible to the next client. Even if our backend were a network device, we argue that any multi-path caching effects that would cause inconsistencies in back-to-back actions not seeing the effect of previous actions would be a bug in that backend, and not the fault of caching in qemu. As such, it is safe to unconditionally advertise CAN_MULTI_CONN for any qemu NBD server situation that supports parallel clients. Note, however, that we don't want to advertise CAN_MULTI_CONN when we know that a second client cannot connect (for historical reasons, qemu-nbd defaults to a single connection while nbd-server-add and QMP commands default to unlimited connections; but we already have existing means to let either style of NBD server creation alter those defaults). This is visible by no longer advertising MULTI_CONN for 'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation. The harder part of this patch is setting up an iotest to demonstrate behavior of multiple NBD clients to a single server. It might be possible with parallel qemu-io processes, but I found it easier to do in python with the help of libnbd, and help from Nir and Vladimir in writing the test. Signed-off-by: Eric Blake <eblake@redhat.com> Suggested-by: Nir Soffer <nsoffer@redhat.com> Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru> Message-Id: <20220512004924.417153-3-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-12qemu-nbd: Pass max connections to blockdev layerEric Blake1-1/+1
The next patch wants to adjust whether the NBD server code advertises MULTI_CONN based on whether it is known if the server limits to exactly one client. For a server started by QMP, this information is obtained through nbd_server_start (which can support more than one export); but for qemu-nbd (which supports exactly one export), it is controlled only by the command-line option -e/--shared. Since we already have a hook function used by qemu-nbd, it's easiest to just alter its signature to fit our needs. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20220512004924.417153-2-eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-05-11Clean up decorations and whitespace around header guardsMarkus Armbruster1-1/+2
Cleaned up with scripts/clean-header-guards.pl. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-5-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-11Clean up header guards that don't match their file nameMarkus Armbruster1-2/+2
Header guard symbols should match their file name to make guard collisions less likely. Cleaned up with scripts/clean-header-guards.pl, followed by some renaming of new guard symbols picked by the script to better ones. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220506134911.2856099-2-armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [Change to generated file ebpf/rss.bpf.skeleton.h backed out]
2022-05-09util/event-loop-base: Introduce options to set the thread pool sizeNicolas Saenz Julienne2-0/+13
The thread pool regulates itself: when idle, it kills threads until empty, when in demand, it creates new threads until full. This behaviour doesn't play well with latency sensitive workloads where the price of creating a new thread is too high. For example, when paired with qemu's '-mlock', or using safety features like SafeStack, creating a new thread has been measured take multiple milliseconds. In order to mitigate this let's introduce a new 'EventLoopBase' property to set the thread pool size. The threads will be created during the pool's initialization or upon updating the property's value, remain available during its lifetime regardless of demand, and destroyed upon freeing it. A properly characterized workload will then be able to configure the pool to avoid any latency spikes. Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-id: 20220425075723.20019-4-nsaenzju@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-04block: Classify bdrv_get_flags() as I/O functionHanna Reitz2-1/+1
This function is safe to call in an I/O context, and qcow2_do_open() does so (invoked in an I/O context by qcow2_co_invalidate_cache()). Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220427114057.36651-2-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-04-26qapi: rename BlockDirtyBitmapMergeSource to BlockDirtyBitmapOrStrVladimir Sementsov-Ogievskiy1-1/+1
Rename the type to be reused. Old name is "what is it for". To be natively reused for other needs, let's name it exactly "what is it". Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@openvz.org> Message-Id: <20220314213226.362217-2-v.sementsov-og@mail.ru> [eblake: Adjust S-o-b to Vladimir's new email, with permission] Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: John Snow <jsnow@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2022-04-21include: move qdict_{crumple,flatten} declarationsMarc-André Lureau1-0/+3
Move them where they belong, since the functions are implemented in block-qdict.c. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220420132624.2439741-25-marcandre.lureau@redhat.com>
2022-03-07block: pass desired TLS hostname through from block driver clientDaniel P. Berrangé1-1/+2
In commit a71d597b989fd701b923f09b3c20ac4fcaa55e81 Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Date: Thu Jun 10 13:08:00 2021 +0300 block/nbd: reuse nbd_co_do_establish_connection() in nbd_open() the use of the 'hostname' field from the BDRVNBDState struct was lost, and 'nbd_connect' just hardcoded it to match the IP socket address. This was a harmless bug at the time since we block use with anything other than IP sockets. Shortly though, we want to allow the caller to override the hostname used in the TLS certificate checks. This is to allow for TLS when doing port forwarding or tunneling. Thus we need to reinstate the passing along of the 'hostname'. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220304193610.3293146-3-berrange@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
2022-03-07block: introduce snapshot-access block driverVladimir Sementsov-Ogievskiy1-0/+6
The new block driver simply utilizes snapshot-access API of underlying block node. In further patches we want to use it like this: [guest] [NBD export] | | | root | root v file v [copy-before-write]<------[snapshot-access] | | | file | target v v [active-disk] [temp.img] This way, NBD client will be able to read snapshotted state of active disk, when active disk is continued to be written by guest. This is known as "fleecing", and currently uses another scheme based on qcow2 temporary image which backing file is active-disk. New scheme comes with benefits - see next commit. The other possible application is exporting internal snapshots of qcow2, like this: [guest] [NBD export] | | | root | root v file v [qcow2]<---------[snapshot-access] For this, we'll need to implement snapshot-access API handlers in qcow2 driver, and improve snapshot-access block driver (and API) to make it possible to select snapshot by name. Another thing to improve is size of snapshot. Now for simplicity we just use size of bs->file, which is OK for backup, but for qcow2 snapshots export we'll need to imporve snapshot-access API to get size of snapshot. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20220303194349.2304213-12-vsementsov@virtuozzo.com> [hreitz: Rebased on block GS/IO split] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-03-07block/io: introduce block driver snapshot-access APIVladimir Sementsov-Ogievskiy2-0/+27
Add new block driver handlers and corresponding generic wrappers. It will be used to allow copy-before-write filter to provide reach fleecing interface in further commit. In future this approach may be used to allow reading qcow2 internal snapshots, for example to export them through NBD. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-11-vsementsov@virtuozzo.com> [hreitz: Rebased on block GS/IO split] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-03-07block/reqlist: add reqlist_wait_all()Vladimir Sementsov-Ogievskiy1-0/+8
Add function to wait for all intersecting requests. To be used in the further commit. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Nikita Lapshin <nikita.lapshin@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-10-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-03-07block/dirty-bitmap: introduce bdrv_dirty_bitmap_status()Vladimir Sementsov-Ogievskiy1-0/+2
Add a convenient function similar with bdrv_block_status() to get status of dirty bitmap. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-9-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-03-07block: intoduce reqlistVladimir Sementsov-Ogievskiy1-0/+67
Split intersecting-requests functionality out of block-copy to be reused in copy-before-write filter. Note: while being here, fix tiny typo in MAINTAINERS. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-7-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-03-07block/block-copy: add block_copy_reset()Vladimir Sementsov-Ogievskiy1-0/+1
Split block_copy_reset() out of block_copy_reset_unallocated() to be used separately later. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-6-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-03-07block/block-copy: block_copy_state_new(): add bitmap parameterVladimir Sementsov-Ogievskiy1-0/+1
This will be used in the following commit to bring "incremental" mode to copy-before-write filter. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-4-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-03-07block/dirty-bitmap: bdrv_merge_dirty_bitmap(): add return valueVladimir Sementsov-Ogievskiy1-1/+1
That simplifies handling failure in existing code and in further new usage of bdrv_merge_dirty_bitmap(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220303194349.2304213-3-vsementsov@virtuozzo.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-03-07block: fix preallocate filter: don't do unaligned preallocate requestsVladimir Sementsov-Ogievskiy1-1/+2
There is a bug in handling BDRV_REQ_NO_WAIT flag: we still may wait in wait_serialising_requests() if request is unaligned. And this is possible for the only user of this flag (preallocate filter) if underlying file is unaligned to its request_alignment on start. So, we have to fix preallocate filter to do only aligned preallocate requests. Next, we should fix generic block/io.c somehow. Keeping in mind that preallocate is the only user of BDRV_REQ_NO_WAIT and that we have to fix its behavior now, it seems more safe to just assert that we never use BDRV_REQ_NO_WAIT with unaligned requests and add corresponding comment. Let's do so. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Message-Id: <20220215121609.38570-1-vsementsov@virtuozzo.com> [hreitz: Rebased on block GS/IO split] Signed-off-by: Hanna Reitz <hreitz@redhat.com>
2022-03-04block_int-common.h: split function pointers in BdrvChildClassEmanuele Giuseppe Esposito1-34/+47
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-28-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04block_int-common.h: split function pointers in BlockDriverEmanuele Giuseppe Esposito1-208/+237
Similar to the header split, also the function pointers in BlockDriver can be split in I/O and global state. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-26-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04include/block/snapshot: global state API + assertionsEmanuele Giuseppe Esposito1-2/+11
Snapshots run also under the BQL, so they all are in the global state API. The aiocontext lock that they hold is currently an overkill and in future could be removed. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-23-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04include/block/blockjob.h: global state APIEmanuele Giuseppe Esposito1-7/+22
blockjob functions run always under the BQL lock. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-19-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04include/block/blockjob_int.h: split header into I/O and GS APIEmanuele Giuseppe Esposito1-0/+28
Since the I/O functions are not many, keep a single file. Also split the function pointers in BlockJobDriver. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20220303151616.325444-16-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04block: introduce assert_bdrv_graph_writableEmanuele Giuseppe Esposito1-0/+17
We want to be sure that the functions that write the child and parent list of a bs are under BQL and drain. BQL prevents from concurrent writings from the GS API, while drains protect from I/O. TODO: drains are missing in some functions using this assert. Therefore a proper assertion will fail. Because adding drains requires additional discussions, they will be added in future series. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-15-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04IO_CODE and IO_OR_GS_CODE for block_int I/O APIEmanuele Giuseppe Esposito1-0/+6
Mark all I/O functions with IO_CODE, and all "I/O OR GS" with IO_OR_GS_CODE. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-14-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04include/block/block_int: split header into I/O and global state APIEmanuele Giuseppe Esposito4-1486/+1674
Similarly to the previous patch, split block_int.h in block_int-io.h and block_int-global-state.h block_int-common.h contains the structures shared between the two headers, and the functions that can't be categorized as I/O or global state. Assertions are added in the next patch. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-12-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04IO_CODE and IO_OR_GS_CODE for block I/O APIEmanuele Giuseppe Esposito1-0/+1
Mark all I/O functions with IO_CODE, and all "I/O OR GS" with IO_OR_GS_CODE. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-6-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04include/block/block: split header into I/O and global state APIEmanuele Giuseppe Esposito4-856/+1061
block.h currently contains a mix of functions: some of them run under the BQL and modify the block layer graph, others are instead thread-safe and perform I/O in iothreads. Some others can only be called by either the main loop or the iothread running the AioContext (and not other iothreads), and using them in another thread would cause deadlocks, and therefore it is not ideal to define them as I/O. It is not easy to understand which function is part of which group (I/O vs GS vs "I/O or GS"), and this patch aims to clarify it. The "GS" functions need the BQL, and often use aio_context_acquire/release and/or drain to be sure they can modify the graph safely. The I/O function are instead thread safe, and can run in any AioContext. "I/O or GS" functions run instead in the main loop or in a single iothread, and use BDRV_POLL_WHILE(). By splitting the header in two files, block-io.h and block-global-state.h we have a clearer view on what needs what kind of protection. block-common.h contains common structures shared by both headers. block.h is left there for legacy and to avoid changing all includes in all c files that use the block APIs. Assertions are added in the next patch. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220303151616.325444-4-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04block: rename bdrv_invalidate_cache_all, blk_invalidate_cache and ↵Emanuele Giuseppe Esposito1-1/+1
test_sync_op_invalidate_cache Following the bdrv_activate renaming, change also the name of the respective callers. bdrv_invalidate_cache_all -> bdrv_activate_all blk_invalidate_cache -> blk_activate test_sync_op_invalidate_cache -> test_sync_op_activate No functional change intended. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220209105452.1694545-5-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04block: introduce bdrv_activateEmanuele Giuseppe Esposito1-0/+1
This function is currently just a wrapper for bdrv_invalidate_cache(), but in future will contain the code of bdrv_co_invalidate_cache() that has to always be protected by BQL, and leave the rest in the I/O coroutine. Replace all bdrv_invalidate_cache() invokations with bdrv_activate(). Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220209105452.1694545-4-eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-04crypto: perform permission checks under BQLEmanuele Giuseppe Esposito1-0/+14
Move the permission API calls into driver-specific callbacks that always run under BQL. In this case, bdrv_crypto_luks needs to perform permission checks before and after qcrypto_block_amend_options(). The problem is that the caller, block_crypto_amend_options_generic_luks(), can also run in I/O from .bdrv_co_amend(). This does not comply with Global State-I/O API split, as permissions API must always run under BQL. Firstly, introduce .bdrv_amend_pre_run() and .bdrv_amend_clean() callbacks. These two callbacks are guaranteed to be invoked under BQL, respectively before and after .bdrv_co_amend(). They take care of performing the permission checks in the same way as they are currently done before and after qcrypto_block_amend_options(). These callbacks are in preparation for next patch, where we delete the original permission check. Right now they just add redundant control. Then, call .bdrv_amend_pre_run() before job_start in qmp_x_blockdev_amend(), so that it will be run before the job coroutine is created and stay in the main loop. As a cleanup, use JobDriver's .clean() callback to call .bdrv_amend_clean(), and run amend-specific cleanup callbacks under BQL. After this patch, permission failures occur early in the blockdev-amend job to update a LUKS volume's keys. iotest 296 must now expect them in x-blockdev-amend's QMP reply instead of waiting for the actual job to fail later. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220209105452.1694545-2-eesposit@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220304153729.711387-6-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2022-03-03hw/nvme: 64-bit pi supportNaveen Nagar1-13/+54
This adds support for one possible new protection information format introduced in TP4068 (and integrated in NVMe 2.0): the 64-bit CRC guard and 48-bit reference tag. This version does not support storage tags. Like the CRC16 support already present, this uses a software implementation of CRC64 (so it is naturally pretty slow). But its good enough for verification purposes. This may go nicely hand-in-hand with the support that Keith submitted for the Linux kernel[1]. [1]: https://lore.kernel.org/linux-nvme/20220126165214.GA1782352@dhcp-10-100-145-180.wdc.com/T/ Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-03-03hw/nvme: add support for the lbafee hbs featureNaveen Nagar1-2/+5
Add support for up to 64 LBA formats through the LBAFEE field of the Host Behavior Support feature. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2022-03-03hw/nvme: add host behavior support featureNaveen Nagar1-0/+9
Add support for getting and setting the Host Behavior Support feature. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Naveen Nagar <naveen.n1@samsung.com> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>