Age | Commit message (Collapse) | Author | Files | Lines |
|
This commit adapts the vfio-user protocol specification and the libvfio-user
implementation to v2 of the VFIO live migration interface, as used in the kernel
and QEMU.
The differences between v1 and v2 are discussed in this email thread [1], and we
slightly differ from upstream VFIO v2 in that instead of transferring data over
a new FD, we use the existing UNIX socket with new commands
VFIO_USER_MIG_DATA_READ/WRITE. We also don't yet use P2P states.
The updated spec was submitted to qemu-devel [2].
[1] https://lore.kernel.org/all/20220130160826.32449-9-yishaih@nvidia.com/
[2] https://lore.kernel.org/all/20230718094150.110183-1-william.henderson@nutanix.com/
Signed-off-by: William Henderson <william.henderson@nutanix.com>
|
|
The `log_dirty_bitmap` function in `dma.c` would output the wrong number of
dirty pages due to the `char` of the bitmap being sign-extended when implicitly
being converted to `unsigned int` for `__builtin_popcount`. By adding an
intermediate cast to `uint8_t` we avoid this incorrect behaviour.
See https://github.com/nutanix/libvfio-user/pull/746#discussion_r1297173318.
Signed-off-by: William Henderson <william.henderson@nutanix.com>
|
|
The helper function centralizes some extra checks and diligence desired
by many/most current code paths but currently inconsistently applied.
This includes bypassing the close call when the file descriptor is -1
already, resetting the file descriptor variable to -1 after closing, and
preserving errno.
All calls to close are replaced by close_safely. Some warning log output
is lost over this, but it doesn't seem like this was very useful anyways
given that Linux always closes the file descriptor anyways.
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reported-by: Eduardo Lima <eblima@gmail.com>
|
|
Use atomic operations to allow concurrent bitmap updates with
VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP operations.
Dirtying clients can race against each other, so we must use atomic or
when marking dirty: we do this byte-by-byte.
When reading the dirty bitmap, we must be careful to not race and lose
any set bits within the same byte. If we miss an update, we'll catch it
the next time around, presuming that before the final pass we'll have
quiesced all I/O.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Harmonize and rename the vfu_*sg() APIs to better reflect their functionality:
in our case, there is no mapping happening as part of these calls, they are
merely housekeeping for range splitting, dirty tracking, and so on.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
->maps existed so that if a consumer does vfu_map_sg() and then we are asked to
enable dirty page tracking, we won't mark those pages as dirty, and will hence
potentially lose data.
Now that we require quiesce and the use of either vfu_unmap_sg() or
vfu_sg_mark_dirty(), there's no need to have this list any more.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The reference count is unused, and not atomically handled, remove it.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The dma_sg_size() method is listed in libvfio-user.h but the symbol
is marked private in the ELF library.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Leon <john.levon@nutanix.com>
|
|
DMA regions not mapped by the server are not dirty tracked (the client must
track changes via handling VFIO_USER_DMA_WRITE), but we weren't correctly
enforcing this, which could segfault when ->dirty_bitmap was NULL.
Found via AFL++.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
_dma_addr_sg_split() is supposed to return back sg's if the requested
dma addr spans across regions.
Also adding unit tests to cover these case.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
dirty (#551)
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Since the dirty bitmap in message replies is allocated based upon the maximum
size of an individual region, add a limit (somewhat arbitrarily 8TiB, which is a
bitmap size of 256MiB). Add a couple of basic tests on the two DMA limits.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
- we should only accept one range, not multiple ones
- clearly define and implement argsz behaviour
- we need to check if migration is configured
- add proper test coverage; move existing testing to python
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
New DMA regions don't get their pages tracked if dirty page logging has
already been started, this patch fixes this bug.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
- document how to use a vfio-user device with libvirt
- document how to use SPDK's nvmf/vfio-user target with libvirt
- replace vfio_bitmap with vfio_user_bitmap and vfio_iommu_type1_dirty_bitmap_get with vfio_user_bitmap_range
- fix bug for calculating number of pages needed for dirty page bitmap
- align number of bytes for dirty page bitmap to QWORD
- add debug messages around dirty page tracking
- only support flags=0 when doing DMA unmap
- set device state to running after reset
- allow region read/write even if device is in stopped state
- allow transitioning from stopped/stop-and-copy state to running state
- fix unit tests
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
* dma: Use correct len type
vfio_iommu_type1_dirty_bitmap_get.size is of type __u64
dma_controller_dirty_page_get() receives it as int, instead it should be u64
Also added UT to test overflow of length passed to dma_controller_dirty_page_get
Fixes: #477
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
There are two issues with the unregister callback:
- we were requiring the callback to be set when removing a region, but it's
only required if a consumer wants to map regions
- when we removed all regions (for example, on a reset), we weren't triggering
the callback
Signed-off-by: John Levon <john.levon@nutanix.com>
swapnil code review
add assert
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
The first in a series excising the use of the "return -errno" idiom. This is a
non-standard usage, and in userspace, we have "errno" for delivering side-band
error values. As there have been multiple bugs from not using standard error
return methods like -1+errno or NULL+errno, let's do that.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Code was expecting -errno style returns, but the DMA code didn't do this.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
vfu_log() and err() should not take newlines.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Fix up all resulting fallout.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
When we lose the client connection, the IRQ and DMA region state is no longer
valid; clean them up.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Prepare this function for re-usability by clearing the array after removal.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This fixes a number of issues with how DMA is handled, based on some changes by
Thanos Makatos:
- rename callbacks to register/unregister, as there is not necessarily
any mapping
- provide the (large) page-aligned mapped start and size, the page size used,
as well as the protection flags: some API users need these
- for convenience, provide the virtual address separately that corresponds to
the mapped region
- we should only require a DMA controller to use vfu_addr_to_sg(),
not an unregister callback
- the callbacks should return errno not -errno
- region removal was incorrectly updating the region array
- various other cleanups and clarifications
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Instead of trying to use the linker's --wrap, which just led to more problems
when we want to call the real function, we'll add two defines, MOCK_DEFINE() and
MOCK_DECLARE(), that behave differently when building the unit tests, such that
all wrapped functions are picked up from test/mocks.c instead, regardless of
compilation unit.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This is used by SPDK, and it's generally useful. This also uncovered some issues
in the test mocking.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Plus always notify user when DMA region is removed.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
The most common way we have written this is as "sizeof()"; use this form
consistently.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
* Misc fixes for DMA_MAP region prot
1. Validate prot passed in vfu_addr_to_sg()
2. Let user know region prot via vfu_unmap_dma_cb_t
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
* Use prot flags sent by client to map dma regions
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
remove duplicate code for initializing DMA segment, mark DMA segment whether it's mappable, plus basic unit test for dma_addr_to_sg
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The muser name no longer reflects the implementation, and will just serve to
confuse. Bite the bullet now, and rename ourselves to reflect the actual
implementation.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
API refactoring
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
|
|
This patch adds support for the live migration region and dirty page logging
following VFIO. Live migration is NOT yet functional as handling accesses to
the migration region is not yet implemented. Currenty the live migration region
is fixed at index 9 simply for simplifying the implementation. Dirty page
logging is simplified by requiring IOVA ranges to match exactly the entire IOVA
range.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|