Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
dirty (#551)
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
The flags field belongs to VFIO and it's not a good idea to reuse as new
VFIO flags can break things. Instead, we derive whether or not a region
is mappable if a file descriptor is passed.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
The previously specified max_msg_size had one major issue: it implied a (way too
small) limit on the size of dirty bitmaps that could be requested by a client,
and as a result a hard limit on memory region size. It seemed awkward to attempt
to split up an unmap request instead.
Instead, let most requests and replies be limited by their "natural" limits; for
example, the number of booleans in VFIO_USER_SET_IRQS is limited by MSI-X count.
For the requests that solicit or provide data - that is, VFIO_USER_DMA_READ/WRITE
and VFIO_USER_REGION_READ/WRITE - we negotiate a new max_data_xfer_size value.
These are much easier to split up into separate requests at the client side
so should not present an implementation problem. For our server, chunking is
implemented in vfu_dma_read/vfu_dma_write().
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Since the dirty bitmap in message replies is allocated based upon the maximum
size of an individual region, add a limit (somewhat arbitrarily 8TiB, which is a
bitmap size of 256MiB). Add a couple of basic tests on the two DMA limits.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
- we should only accept one range, not multiple ones
- clearly define and implement argsz behaviour
- we need to check if migration is configured
- add proper test coverage; move existing testing to python
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Default to hidden visibility to remove non-public symbols from API users (and
improve performance a little). Every public function gets an EXPORT annotation.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
use DMA map/unmap format similar to VFIO's
Using a DMA map/unmap format similar to VFIO's (vfio_iommu_type1_dma_map / vfio_iommu_type1_dma_unmap) makes it easier to adapt to future changes. Consequently we also honor the passed argsz.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanitx.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
We should require a non-empty payload for every command type except
VFIO_USER_DEVICE_RESET.
We should also reply to the caller with such failures.
Add some testing for is_valid_header(), and move the fd handling test over to it
too.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
We're dropping this behavior from the spec.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
update spec to v0.9.1
Changes include:
- reply message includes the command number
- split out message definitions into request/reply sections, and
skip the repeated standard header definitions
- lots of markup fixes
- re-organization for clarity
- further documentation of argsz
- remove VFIO_USER_VM_INTERRUPT until we have a working implementation
- dirty page tracking is optional
- fix implementations to match the spec
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The specification states that the region offset given in the region info should
be used as the "offset" when mmap()ing the region from the client side. However,
the library instead implemented a fixed offset scheme similar to that of vfio -
and no clients actually set up the file like that.
Instead, let servers define their own offsets, and pass them through to clients
as is. It's up to the server to decide how its backing file or files is
organized.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
- document how to use a vfio-user device with libvirt
- document how to use SPDK's nvmf/vfio-user target with libvirt
- replace vfio_bitmap with vfio_user_bitmap and vfio_iommu_type1_dirty_bitmap_get with vfio_user_bitmap_range
- fix bug for calculating number of pages needed for dirty page bitmap
- align number of bytes for dirty page bitmap to QWORD
- add debug messages around dirty page tracking
- only support flags=0 when doing DMA unmap
- set device state to running after reset
- allow region read/write even if device is in stopped state
- allow transitioning from stopped/stop-and-copy state to running state
- fix unit tests
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
* spec: Fixed DMA_READ/WRITE data count
DMA region size is maxed to uint64_t.
Updated DMA_READ/WRITE data count to be defined as uint64_t.
* Fix vfu_dma_read/write() as per spec changes
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Make a few specification updates after review by Stefan Hajnoczi.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
There are two issues with the unregister callback:
- we were requiring the callback to be set when removing a region, but it's
only required if a consumer wants to map regions
- when we removed all regions (for example, on a reset), we weren't triggering
the callback
Signed-off-by: John Levon <john.levon@nutanix.com>
swapnil code review
add assert
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
We made a silly mistake in free_msg(): the file descriptors we send out in
message replies should never be closed.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
This struct from vfio.h has grown larger in newer Linux versions; this breaks
older clients, as now the server would require the larger size. Replace with our
own definition.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Capture message handling inside a new vfu_msg_t private structure and pass that
around to the handlers. This provides no functional change, but greatly
simplifies and cleans up that path, especially around fd and iovec handling.
As part of fixing up the unit tests, start using global variables to reduce the
amount of boiler-plate.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This was an error handling message that was missed when converting from -errno
to -1 return style.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Fix check for an un-configured PCI config space region (the previous method was
not accounting for the initialized ->fd).
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This is only used internally, and not really useful.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
In addition, return ENOTSUP for unknown device types, and add some unit tests.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Now we are confident we are OK with a hard-coded VFU_PCI_DEV_MIGR_REGION_IDX
value, there's no need for us to track .migr_reg any more, either in the client
or internally.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The first in a series excising the use of the "return -errno" idiom. This is a
non-standard usage, and in userspace, we have "errno" for delivering side-band
error values. As there have been multiple bugs from not using standard error
return methods like -1+errno or NULL+errno, let's do that.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Fix up all resulting fallout.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Give API users an opportunity to clean up when a client disconnects from the
vfio-user socket.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
When we lose the client connection, the IRQ and DMA region state is no longer
valid; clean them up.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Report any short reads to callers as ECONNRESET, which is the closest we can
meaningfully get right now. This also fixes get_next_command(), which previously
wasn't checking for short reads at all.
When we fail to send or recv from the socket due to the client disappearing in
some manner, call into vfu_reset_ctx() to clean up the connection fd, allowing a
subsequent vfu_attach_ctx() to work.
If we get 0 bytes from recv[msg](), this is reported by the transport as ENOMSG,
and is a normal EOF condition.
We can also get ECONNRESET: this can happen when we've written unacknowledged
data to the socket, the client side socket is closed, and we try a subsequent
read.
Finally, we can get a short read or write. Our handling of these still has
issues, but for now we'll presume this means the client has gone too. It may
in fact be due to a client bug - if it failed to write enough data - but right
now, we can't easily tell that.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This fixes a number of issues with how DMA is handled, based on some changes by
Thanos Makatos:
- rename callbacks to register/unregister, as there is not necessarily
any mapping
- provide the (large) page-aligned mapped start and size, the page size used,
as well as the protection flags: some API users need these
- for convenience, provide the virtual address separately that corresponds to
the mapped region
- we should only require a DMA controller to use vfu_addr_to_sg(),
not an unregister callback
- the callbacks should return errno not -errno
- region removal was incorrectly updating the region array
- various other cleanups and clarifications
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Instead of trying to use the linker's --wrap, which just led to more problems
when we want to call the real function, we'll add two defines, MOCK_DEFINE() and
MOCK_DECLARE(), that behave differently when building the unit tests, such that
all wrapped functions are picked up from test/mocks.c instead, regardless of
compilation unit.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This avoids any issues with multiple definitions when passing CFLAGS in.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Properly fix IRQ disabling:
Allow count == 0 to mean "disable all IRQS of the given type". On our side,
disabling an IRQ means forgetting about the eventfd that was previously passed
over the socket.
Allow individual IRQs to be disabled, by means of a VFIO_IRQ_SET_DATA_EVENTFD
message with no file descriptors passed. In vfio, this is done via setting "-1"
in the fd slots; which isn't possible via auxiliary data. Thus, only one IRQ can
be disabled a a time in vfio-user.
Clean up "->type": this is never set, so wasn't having any effect. Follow up
changes will likely re-introduce this in some form.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Plus unit tests.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reported-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
The most common way we have written this is as "sizeof()"; use this form
consistently.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The specification for DEVICE_GET_INFO differed from the implementation. After
some discussion, fix the spec such that the struct should be passed in with
->argsz set.
As it happened, the implementation was also wrong: we weren't actually checking
the incoming ->argsz for validation, but we should.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Library users can use this to sleep on either a newly-attached socket client, or
a new message.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
General code has no business knowing about the socket file descriptors.
vfu_attach_ctx() is changed to not return the file descriptor; we'll re-expose a
suitable file descriptor in a follow-up
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|