Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reported-by: Eduardo Lima <eblima@gmail.com>
|
|
When an ioeventfd is written to, KVM discards the value since it has no
memory to write it to, and simply kicks the eventfd. This a problem for
devices such a NVMe controllers that need the value (e.g. doorbells on
BAR0). This patch allows the vfio-user server to pass a file descriptor
that can be mmap'ed and KVM can write the ioeventfd value to this
_shadow_ memory instead of discarding it. This shadow memory is not
exposed to the guest.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Change-Id: Iad849c94076ffa5988e034c8bf7ec312d01f095f
|
|
Client masks or unmasks a device IRQ using the
VFIO_USER_DEVICE_SET_IRQS message. Inform the device of such changes to
the IRQ state.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
To support fuzzing with AFL++, add a "pipe" transport that reads from stdin and
outputs to stdout: this is the most convenient way of doing fuzzing.
Add some docs on how to run a fuzzing session.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This make it tidier and easier to pass to function the buffer and
length, instead of passing the whole msg.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Many region accesses of interest are of normal register sizes; sniff the region
access size, and report the read/written value if possible. Clean up
dump_buffer() now, as it's not of much use.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Instead of process_request() having a dual role, split into get_request() and
handle_request().
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Leon <john.levon@nutanix.com>
|
|
Some devices need the migration state callback to be asynchronous. The simplest way to implement this is to require from the callback to return -1 and set errno to EBUSY, not process any other new messages (vfu_ctx_run returns -1 and sets errno to EBUSY), and provide a way to the user to complete migration (vfu_migr_done).
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Provide initial support for handling VFIO_USER_DEVICE_GET_REGION_IO_FDS, along with a new vfu_create_ioeventfd() API.
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Consumers such as SPDK would like to know if any actual work was done. Modify
the API to support this. Also, clean up some stale mocking we no longer use.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The previously specified max_msg_size had one major issue: it implied a (way too
small) limit on the size of dirty bitmaps that could be requested by a client,
and as a result a hard limit on memory region size. It seemed awkward to attempt
to split up an unmap request instead.
Instead, let most requests and replies be limited by their "natural" limits; for
example, the number of booleans in VFIO_USER_SET_IRQS is limited by MSI-X count.
For the requests that solicit or provide data - that is, VFIO_USER_DMA_READ/WRITE
and VFIO_USER_REGION_READ/WRITE - we negotiate a new max_data_xfer_size value.
These are much easier to split up into separate requests at the client side
so should not present an implementation problem. For our server, chunking is
implemented in vfu_dma_read/vfu_dma_write().
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Since the dirty bitmap in message replies is allocated based upon the maximum
size of an individual region, add a limit (somewhat arbitrarily 8TiB, which is a
bitmap size of 256MiB). Add a couple of basic tests on the two DMA limits.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
- we should only accept one range, not multiple ones
- clearly define and implement argsz behaviour
- we need to check if migration is configured
- add proper test coverage; move existing testing to python
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
use DMA map/unmap format similar to VFIO's
Using a DMA map/unmap format similar to VFIO's (vfio_iommu_type1_dma_map / vfio_iommu_type1_dma_unmap) makes it easier to adapt to future changes. Consequently we also honor the passed argsz.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanitx.com>
|
|
We should require a non-empty payload for every command type except
VFIO_USER_DEVICE_RESET.
We should also reply to the caller with such failures.
Add some testing for is_valid_header(), and move the fd handling test over to it
too.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
We're dropping this behavior from the spec.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
The specification states that the region offset given in the region info should
be used as the "offset" when mmap()ing the region from the client side. However,
the library instead implemented a fixed offset scheme similar to that of vfio -
and no clients actually set up the file like that.
Instead, let servers define their own offsets, and pass them through to clients
as is. It's up to the server to decide how its backing file or files is
organized.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
- document how to use a vfio-user device with libvirt
- document how to use SPDK's nvmf/vfio-user target with libvirt
- replace vfio_bitmap with vfio_user_bitmap and vfio_iommu_type1_dirty_bitmap_get with vfio_user_bitmap_range
- fix bug for calculating number of pages needed for dirty page bitmap
- align number of bytes for dirty page bitmap to QWORD
- add debug messages around dirty page tracking
- only support flags=0 when doing DMA unmap
- set device state to running after reset
- allow region read/write even if device is in stopped state
- allow transitioning from stopped/stop-and-copy state to running state
- fix unit tests
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Capture message handling inside a new vfu_msg_t private structure and pass that
around to the handlers. This provides no functional change, but greatly
simplifies and cleans up that path, especially around fd and iovec handling.
As part of fixing up the unit tests, start using global variables to reduce the
amount of boiler-plate.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This is only used internally, and not really useful.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Now we are confident we are OK with a hard-coded VFU_PCI_DEV_MIGR_REGION_IDX
value, there's no need for us to track .migr_reg any more, either in the client
or internally.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The first in a series excising the use of the "return -errno" idiom. This is a
non-standard usage, and in userspace, we have "errno" for delivering side-band
error values. As there have been multiple bugs from not using standard error
return methods like -1+errno or NULL+errno, let's do that.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This fixes a number of issues with how DMA is handled, based on some changes by
Thanos Makatos:
- rename callbacks to register/unregister, as there is not necessarily
any mapping
- provide the (large) page-aligned mapped start and size, the page size used,
as well as the protection flags: some API users need these
- for convenience, provide the virtual address separately that corresponds to
the mapped region
- we should only require a DMA controller to use vfu_addr_to_sg(),
not an unregister callback
- the callbacks should return errno not -errno
- region removal was incorrectly updating the region array
- various other cleanups and clarifications
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Instead of trying to use the linker's --wrap, which just led to more problems
when we want to call the real function, we'll add two defines, MOCK_DEFINE() and
MOCK_DECLARE(), that behave differently when building the unit tests, such that
all wrapped functions are picked up from test/mocks.c instead, regardless of
compilation unit.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This is used by SPDK, and it's generally useful. This also uncovered some issues
in the test mocking.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Properly fix IRQ disabling:
Allow count == 0 to mean "disable all IRQS of the given type". On our side,
disabling an IRQ means forgetting about the eventfd that was previously passed
over the socket.
Allow individual IRQs to be disabled, by means of a VFIO_IRQ_SET_DATA_EVENTFD
message with no file descriptors passed. In vfio, this is done via setting "-1"
in the fd slots; which isn't possible via auxiliary data. Thus, only one IRQ can
be disabled a a time in vfio-user.
Clean up "->type": this is never set, so wasn't having any effect. Follow up
changes will likely re-introduce this in some form.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
The specification for DEVICE_GET_INFO differed from the implementation. After
some discussion, fix the spec such that the struct should be passed in with
->argsz set.
As it happened, the implementation was also wrong: we weren't actually checking
the incoming ->argsz for validation, but we should.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Library users can use this to sleep on either a newly-attached socket client, or
a new message.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
General code has no business knowing about the socket file descriptors.
vfu_attach_ctx() is changed to not return the file descriptor; we'll re-expose a
suitable file descriptor in a follow-up
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Also clean up some code surrounding this. In particular, don't play games with
modifying the message header.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This matches the tran_* namespace better.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
* API error return converged to one func
Use ERROR_INT() or ERROR_PTR() to return errors from API's.
This way if we want to change the behaviour later, we will just need
to update these funcitons.
Also fixed some error return cases and comments.
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
We were forgetting to close vfu_ctx->fd, add a tran callback for this. While
we're there, clean up the tran callbacks somewhat.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Provide initial support for extended capabilities, and implement handlers for
the Device Serial Number and Vendor-Specific capabilities.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
If an fd is provided, automatically add a mmap area covering the entire region
unless an mmap_areas array is provided.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Allow to add capabilities individually, including extended capabilities, and
those to be handled via the region callback.
As a side effect, rework config space accesses to handle reads that straddle
capabilities and non-standard areas and use callbacks as needed.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Various cleanups and fixes to handling of region accesses, including:
- there should be no reason for us to split accesses into 1/2/4/8 byte accesses:
in general, the client will have already be doing that, and if not, there's no
particular reason we should be the ones to split up such larger accesses.
- use a callback for PCI config space reads and writes if one is provided (needs
more work for capabilities)
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
It's still pretty entangled, but move the bulk of the non-cap PCI code over to
pci.c.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This patch returns region capabilities the same way VFIO does: if argsz
is not large enough then it returns only region info and sets argsz to
what it should be in order to fit the capabilities, the client then
retries with a large enough argsz. The protocol specification has been
updated as well.
Plus unit tests.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Misc changes for vfu_ctx_try_attach()
* Rename to vfu_attach_ctx()
* Removed call to vfu_realize_ctx(), should be called separately
* Now vfu_attach_ctx() must also be called for blocking ctx.
Misc changes for vfu_realize_ctx()
* Made calling vfu_realize_ctx() mandatory
* vfu_ctx_drive() and vfu_poll_ctx() returns EINVAL if the device is not
realized.
* Renamed vfu_ctx->ready to vfu_ctx->realized
Added unit test for vfu_attach_ctx() and vfu_realize_ctx()
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|