Age | Commit message (Collapse) | Author | Files | Lines |
|
This commit adapts the vfio-user protocol specification and the libvfio-user
implementation to v2 of the VFIO live migration interface, as used in the kernel
and QEMU.
The differences between v1 and v2 are discussed in this email thread [1], and we
slightly differ from upstream VFIO v2 in that instead of transferring data over
a new FD, we use the existing UNIX socket with new commands
VFIO_USER_MIG_DATA_READ/WRITE. We also don't yet use P2P states.
The updated spec was submitted to qemu-devel [2].
[1] https://lore.kernel.org/all/20220130160826.32449-9-yishaih@nvidia.com/
[2] https://lore.kernel.org/all/20230718094150.110183-1-william.henderson@nutanix.com/
Signed-off-by: William Henderson <william.henderson@nutanix.com>
|
|
Use separate socket for server->client commands
This change adds support for a separate socket to carry commands in the
server-to-client direction. It has proven problematic to send commands
in both directions over a single socket, since matching replies to
commands can become non-trivial when both sides send commands at the same
time and adds significant complexity. See issue #279 for details.
To set up the reverse communication channel, the client indicates
support for it via a new capability flag in the version message. The
server will then create a fresh pair of sockets and pass one end to the
client in its version reply. When the server wishes to send commands to
the client at a later point, it now uses its end of the new socket pair
rather than the main socket. Corresponding replies are also passed back
over the new socket pair.
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
|
|
It turns out that the bit field will not yield the desired / specified
bit layout on big-endian systems, see issue #768 for details. Thus,
replace the bit field with constants for the individual fields and use
bit masking when accessing the flags field.
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Florian Freudiger <25648113+FlorianFreudiger@users.noreply.github.com>
|
|
Document that on vfu_sgl_write(), it's the client's responsibility to
track any dirty pages.
Signed-off-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
When an ioeventfd is written to, KVM discards the value since it has no
memory to write it to, and simply kicks the eventfd. This a problem for
devices such a NVMe controllers that need the value (e.g. doorbells on
BAR0). This patch allows the vfio-user server to pass a file descriptor
that can be mmap'ed and KVM can write the ioeventfd value to this
_shadow_ memory instead of discarding it. This shadow memory is not
exposed to the guest.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Change-Id: Iad849c94076ffa5988e034c8bf7ec312d01f095f
|
|
Client masks or unmasks a device IRQ using the
VFIO_USER_DEVICE_SET_IRQS message. Inform the device of such changes to
the IRQ state.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Harmonize and rename the vfu_*sg() APIs to better reflect their functionality:
in our case, there is no mapping happening as part of these calls, they are
merely housekeeping for range splitting, dirty tracking, and so on.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Move SG dirtying to vfu_unmap_sg(): as we don't want to track SGs
ourselves, doing this in vfu_map_sg() is no longer the right place.
Note that the lack of tracking implies that any SGs must be unmapped
before the final stop and copy phase. To avoid the need for this, add
vfu_mark_sg_dirty(): this allows a consumer to mark a region as dirty
explicitly without needing to unmap it. Currently it's the same as
vfu_unmap_sg(), but that's an implementation detail.
Note this still marks current maps after a get operation; that will
change subsequently.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Rename VFIO_DEVICE_STATE_XXXX defines as VFIO_DEVICE_STATE_V1_XXXX.
Upstream renamed these variable to be of the XXXX_V1_XXXX format and
switched an enum for VFIO_DEVICE_STATE_XXXX.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Now that Meson is functional, support for building with CMake is
removed so that there is only one build system to maintain.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
|
|
The Meson build system used by many other virt projects (QEMU, libvirt
and others) is easier to understand & maintain rules for than cmake,
guiding towards best practice.
Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
|
|
To support fuzzing with AFL++, add a "pipe" transport that reads from stdin and
outputs to stdout: this is the most convenient way of doing fuzzing.
Add some docs on how to run a fuzzing session.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Leon <john.levon@nutanix.com>
|
|
We were incorrectly claiming we'd return EAGAIN, but now we'd return 0.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Some devices need the migration state callback to be asynchronous. The simplest way to implement this is to require from the callback to return -1 and set errno to EBUSY, not process any other new messages (vfu_ctx_run returns -1 and sets errno to EBUSY), and provide a way to the user to complete migration (vfu_migr_done).
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Provide initial support for handling VFIO_USER_DEVICE_GET_REGION_IO_FDS, along with a new vfu_create_ioeventfd() API.
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
* Add support for VFIO_DMA_UNMAP_FLAG_ALL flag
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Clarify a couple of minor things in the API documentation and README.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Multiple places in dma_map_sg() and dma_unmap_sg() were dereferencing sg[0]
instead of the correct index.
Take the opportunity to improve the doc comments at the same time.
Reported-by: Changpeng Liu <changpeng.liu@intel.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Complain about a region that isn't readable *or* writable, or any unknown flags.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Consumers such as SPDK would like to know if any actual work was done. Modify
the API to support this. Also, clean up some stale mocking we no longer use.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
* superficially handle Device Control 2 and Link Control 2
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
dirty (#551)
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
The flags field belongs to VFIO and it's not a good idea to reuse as new
VFIO flags can break things. Instead, we derive whether or not a region
is mappable if a file descriptor is passed.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
The previously specified max_msg_size had one major issue: it implied a (way too
small) limit on the size of dirty bitmaps that could be requested by a client,
and as a result a hard limit on memory region size. It seemed awkward to attempt
to split up an unmap request instead.
Instead, let most requests and replies be limited by their "natural" limits; for
example, the number of booleans in VFIO_USER_SET_IRQS is limited by MSI-X count.
For the requests that solicit or provide data - that is, VFIO_USER_DMA_READ/WRITE
and VFIO_USER_REGION_READ/WRITE - we negotiate a new max_data_xfer_size value.
These are much easier to split up into separate requests at the client side
so should not present an implementation problem. For our server, chunking is
implemented in vfu_dma_read/vfu_dma_write().
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
We should explicitly define the expected migration register contents for API
users who aren't using the callbacks. Clean up some related lint.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Since the dirty bitmap in message replies is allocated based upon the maximum
size of an individual region, add a limit (somewhat arbitrarily 8TiB, which is a
bitmap size of 256MiB). Add a couple of basic tests on the two DMA limits.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
- we should only accept one range, not multiple ones
- clearly define and implement argsz behaviour
- we need to check if migration is configured
- add proper test coverage; move existing testing to python
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
use DMA map/unmap format similar to VFIO's
Using a DMA map/unmap format similar to VFIO's (vfio_iommu_type1_dma_map / vfio_iommu_type1_dma_unmap) makes it easier to adapt to future changes. Consequently we also honor the passed argsz.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanitx.com>
|
|
* Added missing reserved bits and renamed per to rer nameing as the
nvme specs
* Add pxcap capability in lspci test
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
In case of bitfields compiler may use data type to allocate struct size
and add additional paddings. Instead use data type which is closest to
the total size of the bitfields.
This patch also uses uint* version consistently throughout.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
We're dropping this behavior from the spec.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
update spec to v0.9.1
Changes include:
- reply message includes the command number
- split out message definitions into request/reply sections, and
skip the repeated standard header definitions
- lots of markup fixes
- re-organization for clarity
- further documentation of argsz
- remove VFIO_USER_VM_INTERRUPT until we have a working implementation
- dirty page tracking is optional
- fix implementations to match the spec
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
* Handle support of PCI FLR capability
If device supports FLR cap then call vfu_reset_cb_t when FLR is
initiated by client.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The specification states that the region offset given in the region info should
be used as the "offset" when mmap()ing the region from the client side. However,
the library instead implemented a fixed offset scheme similar to that of vfio -
and no clients actually set up the file like that.
Instead, let servers define their own offsets, and pass them through to clients
as is. It's up to the server to decide how its backing file or files is
organized.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
- document how to use a vfio-user device with libvirt
- document how to use SPDK's nvmf/vfio-user target with libvirt
- replace vfio_bitmap with vfio_user_bitmap and vfio_iommu_type1_dirty_bitmap_get with vfio_user_bitmap_range
- fix bug for calculating number of pages needed for dirty page bitmap
- align number of bytes for dirty page bitmap to QWORD
- add debug messages around dirty page tracking
- only support flags=0 when doing DMA unmap
- set device state to running after reset
- allow region read/write even if device is in stopped state
- allow transitioning from stopped/stop-and-copy state to running state
- fix unit tests
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
* spec: Fixed DMA_READ/WRITE data count
DMA region size is maxed to uint64_t.
Updated DMA_READ/WRITE data count to be defined as uint64_t.
* Fix vfu_dma_read/write() as per spec changes
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
* dma: Use correct len type
vfio_iommu_type1_dirty_bitmap_get.size is of type __u64
dma_controller_dirty_page_get() receives it as int, instead it should be u64
Also added UT to test overflow of length passed to dma_controller_dirty_page_get
Fixes: #477
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
This reverts commit 250aedb026ba557fc4fae6ff301b3b1dfd953c7e, reversing
changes made to 71f8b30557d3635336aec06c084188370ed5e248.
|
|
Instead of having local copy use the defines from
linux-headers/linux/vfio.h.
Same as how Qemu does.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|