Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Leon <john.levon@nutanix.com>
|
|
We were incorrectly claiming we'd return EAGAIN, but now we'd return 0.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Some devices need the migration state callback to be asynchronous. The simplest way to implement this is to require from the callback to return -1 and set errno to EBUSY, not process any other new messages (vfu_ctx_run returns -1 and sets errno to EBUSY), and provide a way to the user to complete migration (vfu_migr_done).
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Provide initial support for handling VFIO_USER_DEVICE_GET_REGION_IO_FDS, along with a new vfu_create_ioeventfd() API.
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
* Add support for VFIO_DMA_UNMAP_FLAG_ALL flag
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Clarify a couple of minor things in the API documentation and README.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Multiple places in dma_map_sg() and dma_unmap_sg() were dereferencing sg[0]
instead of the correct index.
Take the opportunity to improve the doc comments at the same time.
Reported-by: Changpeng Liu <changpeng.liu@intel.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Complain about a region that isn't readable *or* writable, or any unknown flags.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Consumers such as SPDK would like to know if any actual work was done. Modify
the API to support this. Also, clean up some stale mocking we no longer use.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
* superficially handle Device Control 2 and Link Control 2
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
dirty (#551)
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
The flags field belongs to VFIO and it's not a good idea to reuse as new
VFIO flags can break things. Instead, we derive whether or not a region
is mappable if a file descriptor is passed.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
The previously specified max_msg_size had one major issue: it implied a (way too
small) limit on the size of dirty bitmaps that could be requested by a client,
and as a result a hard limit on memory region size. It seemed awkward to attempt
to split up an unmap request instead.
Instead, let most requests and replies be limited by their "natural" limits; for
example, the number of booleans in VFIO_USER_SET_IRQS is limited by MSI-X count.
For the requests that solicit or provide data - that is, VFIO_USER_DMA_READ/WRITE
and VFIO_USER_REGION_READ/WRITE - we negotiate a new max_data_xfer_size value.
These are much easier to split up into separate requests at the client side
so should not present an implementation problem. For our server, chunking is
implemented in vfu_dma_read/vfu_dma_write().
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
We should explicitly define the expected migration register contents for API
users who aren't using the callbacks. Clean up some related lint.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Since the dirty bitmap in message replies is allocated based upon the maximum
size of an individual region, add a limit (somewhat arbitrarily 8TiB, which is a
bitmap size of 256MiB). Add a couple of basic tests on the two DMA limits.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
- we should only accept one range, not multiple ones
- clearly define and implement argsz behaviour
- we need to check if migration is configured
- add proper test coverage; move existing testing to python
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
use DMA map/unmap format similar to VFIO's
Using a DMA map/unmap format similar to VFIO's (vfio_iommu_type1_dma_map / vfio_iommu_type1_dma_unmap) makes it easier to adapt to future changes. Consequently we also honor the passed argsz.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanitx.com>
|
|
* Added missing reserved bits and renamed per to rer nameing as the
nvme specs
* Add pxcap capability in lspci test
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
In case of bitfields compiler may use data type to allocate struct size
and add additional paddings. Instead use data type which is closest to
the total size of the bitfields.
This patch also uses uint* version consistently throughout.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
We're dropping this behavior from the spec.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
update spec to v0.9.1
Changes include:
- reply message includes the command number
- split out message definitions into request/reply sections, and
skip the repeated standard header definitions
- lots of markup fixes
- re-organization for clarity
- further documentation of argsz
- remove VFIO_USER_VM_INTERRUPT until we have a working implementation
- dirty page tracking is optional
- fix implementations to match the spec
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
* Handle support of PCI FLR capability
If device supports FLR cap then call vfu_reset_cb_t when FLR is
initiated by client.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The specification states that the region offset given in the region info should
be used as the "offset" when mmap()ing the region from the client side. However,
the library instead implemented a fixed offset scheme similar to that of vfio -
and no clients actually set up the file like that.
Instead, let servers define their own offsets, and pass them through to clients
as is. It's up to the server to decide how its backing file or files is
organized.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
- document how to use a vfio-user device with libvirt
- document how to use SPDK's nvmf/vfio-user target with libvirt
- replace vfio_bitmap with vfio_user_bitmap and vfio_iommu_type1_dirty_bitmap_get with vfio_user_bitmap_range
- fix bug for calculating number of pages needed for dirty page bitmap
- align number of bytes for dirty page bitmap to QWORD
- add debug messages around dirty page tracking
- only support flags=0 when doing DMA unmap
- set device state to running after reset
- allow region read/write even if device is in stopped state
- allow transitioning from stopped/stop-and-copy state to running state
- fix unit tests
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
* spec: Fixed DMA_READ/WRITE data count
DMA region size is maxed to uint64_t.
Updated DMA_READ/WRITE data count to be defined as uint64_t.
* Fix vfu_dma_read/write() as per spec changes
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
* dma: Use correct len type
vfio_iommu_type1_dirty_bitmap_get.size is of type __u64
dma_controller_dirty_page_get() receives it as int, instead it should be u64
Also added UT to test overflow of length passed to dma_controller_dirty_page_get
Fixes: #477
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
This reverts commit 250aedb026ba557fc4fae6ff301b3b1dfd953c7e, reversing
changes made to 71f8b30557d3635336aec06c084188370ed5e248.
|
|
Instead of having local copy use the defines from
linux-headers/linux/vfio.h.
Same as how Qemu does.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
There are two issues with the unregister callback:
- we were requiring the callback to be set when removing a region, but it's
only required if a consumer wants to map regions
- when we removed all regions (for example, on a reset), we weren't triggering
the callback
Signed-off-by: John Levon <john.levon@nutanix.com>
swapnil code review
add assert
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
This struct from vfio.h has grown larger in newer Linux versions; this breaks
older clients, as now the server would require the larger size. Replace with our
own definition.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
the static size assert for the PMCS register was checking the wrong struct;
however, the struct was nonetheless 4 bytes long, due to uint bitfields. This
accidentally meant the containing struct pmcap was the correct size (the
alignment attribute makes no difference).
After fixing struct pmcs, we'll include the additional two bytes defined in the
PCI PM specification, Section 3.2. These are "optional", but as elsewhere, we'll
require them when adding the capability.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
In addition, return ENOTSUP for unknown device types, and add some unit tests.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
As we are now pure userspace, there is no need for us to use non-standard
integer types. This leaves the copied defines from Linux's vfio.h alone,
however.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The first in a series excising the use of the "return -errno" idiom. This is a
non-standard usage, and in userspace, we have "errno" for delivering side-band
error values. As there have been multiple bugs from not using standard error
return methods like -1+errno or NULL+errno, let's do that.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
vfu_log() and err() should not take newlines.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Fix up all resulting fallout.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Give API users an opportunity to clean up when a client disconnects from the
vfio-user socket.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Report any short reads to callers as ECONNRESET, which is the closest we can
meaningfully get right now. This also fixes get_next_command(), which previously
wasn't checking for short reads at all.
When we fail to send or recv from the socket due to the client disappearing in
some manner, call into vfu_reset_ctx() to clean up the connection fd, allowing a
subsequent vfu_attach_ctx() to work.
If we get 0 bytes from recv[msg](), this is reported by the transport as ENOMSG,
and is a normal EOF condition.
We can also get ECONNRESET: this can happen when we've written unacknowledged
data to the socket, the client side socket is closed, and we try a subsequent
read.
Finally, we can get a short read or write. Our handling of these still has
issues, but for now we'll presume this means the client has gone too. It may
in fact be due to a client bug - if it failed to write enough data - but right
now, we can't easily tell that.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This fixes a number of issues with how DMA is handled, based on some changes by
Thanos Makatos:
- rename callbacks to register/unregister, as there is not necessarily
any mapping
- provide the (large) page-aligned mapped start and size, the page size used,
as well as the protection flags: some API users need these
- for convenience, provide the virtual address separately that corresponds to
the mapped region
- we should only require a DMA controller to use vfu_addr_to_sg(),
not an unregister callback
- the callbacks should return errno not -errno
- region removal was incorrectly updating the region array
- various other cleanups and clarifications
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This sends a message to a vfio-user client to trigger an IRQ, instead of writing
to an eventfd. However, this isn't necessary on the cases we care about, where
eventfds *are* available. Furthermore, this isn't something an API user should
need to know about: if we ever care, the better way to do this is to make
vfu_irq_trigger() automatically use a message if an eventfd isn't available.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
RHEL kernels have some of the migration work backported
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The most common way we have written this is as "sizeof()"; use this form
consistently.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|