Age | Commit message (Collapse) | Author | Files | Lines |
|
The first in a series excising the use of the "return -errno" idiom. This is a
non-standard usage, and in userspace, we have "errno" for delivering side-band
error values. As there have been multiple bugs from not using standard error
return methods like -1+errno or NULL+errno, let's do that.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
vfu_log() and err() should not take newlines.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Fix up all resulting fallout.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Give API users an opportunity to clean up when a client disconnects from the
vfio-user socket.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Report any short reads to callers as ECONNRESET, which is the closest we can
meaningfully get right now. This also fixes get_next_command(), which previously
wasn't checking for short reads at all.
When we fail to send or recv from the socket due to the client disappearing in
some manner, call into vfu_reset_ctx() to clean up the connection fd, allowing a
subsequent vfu_attach_ctx() to work.
If we get 0 bytes from recv[msg](), this is reported by the transport as ENOMSG,
and is a normal EOF condition.
We can also get ECONNRESET: this can happen when we've written unacknowledged
data to the socket, the client side socket is closed, and we try a subsequent
read.
Finally, we can get a short read or write. Our handling of these still has
issues, but for now we'll presume this means the client has gone too. It may
in fact be due to a client bug - if it failed to write enough data - but right
now, we can't easily tell that.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This fixes a number of issues with how DMA is handled, based on some changes by
Thanos Makatos:
- rename callbacks to register/unregister, as there is not necessarily
any mapping
- provide the (large) page-aligned mapped start and size, the page size used,
as well as the protection flags: some API users need these
- for convenience, provide the virtual address separately that corresponds to
the mapped region
- we should only require a DMA controller to use vfu_addr_to_sg(),
not an unregister callback
- the callbacks should return errno not -errno
- region removal was incorrectly updating the region array
- various other cleanups and clarifications
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This sends a message to a vfio-user client to trigger an IRQ, instead of writing
to an eventfd. However, this isn't necessary on the cases we care about, where
eventfds *are* available. Furthermore, this isn't something an API user should
need to know about: if we ever care, the better way to do this is to make
vfu_irq_trigger() automatically use a message if an eventfd isn't available.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
RHEL kernels have some of the migration work backported
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The most common way we have written this is as "sizeof()"; use this form
consistently.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Library users can use this to sleep on either a newly-attached socket client, or
a new message.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
General code has no business knowing about the socket file descriptors.
vfu_attach_ctx() is changed to not return the file descriptor; we'll re-expose a
suitable file descriptor in a follow-up
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Regardless of what we do internally, most of our API uses standard mechanisms
for reporting errors. Fix vfu_run_ctx() to do so properly as well, and fix a
couple of other references for user-provided callbacks.
This will require a small fix to SPDK.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
* API error return converged to one func
Use ERROR_INT() or ERROR_PTR() to return errors from API's.
This way if we want to change the behaviour later, we will just need
to update these funcitons.
Also fixed some error return cases and comments.
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
This patch exposes the fact that live migration is implemented as a
special device region. Hiding this from the user doesn't offer much
benefit since it only takes just a little bit of extra code for the user
to handle it as a region. We do keep the migration callback
functionality since this feature substantially simplifies supporting
live migration from the device implementation's perspective.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Co-authored-by: John Levon <john.levon@nutanix.com>
|
|
Define the full-size capability as defined in the specification.
Previously, we were defining the structure as in the form used by PCI Express
Integrated Endpoints. It's reasonable to assume, however, that a vfio-user
device is a normal PCI Express Endpoint connected over a Link.
We'll go further, and define the whole structure, including the slot registers
at the end that are usually only used for Ports.
The presumption here is that it can't hurt to use the larger size: the only way
a client could care is if it presumed the next capability was at a particular
offset from this one, and we must hope nothing is that silly.
This also corrects a buffer overflow: cap_size() in fact disagreed with the
original size of our struct pxcap (found via clang's address sanitizer).
Signed-off-by: John Levon <john.levon@nutanix.com
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Handling data_offset and data_size internally is wrong: we can't simply
assume that the migration data should be appending to the migration
region, devices might have their own requirements.
This also requires a way for the device to return the data_offset, we
do this by making the prepare_data callback applicable in resume state.
Also, allow migration read/write callabcks to return errors.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
* Misc fixes for DMA_MAP region prot
1. Validate prot passed in vfu_addr_to_sg()
2. Let user know region prot via vfu_unmap_dma_cb_t
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Provide initial support for extended capabilities, and implement handlers for
the Device Serial Number and Vendor-Specific capabilities.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
If an fd is provided, automatically add a mmap area covering the entire region
unless an mmap_areas array is provided.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Allow to add capabilities individually, including extended capabilities, and
those to be handled via the region callback.
As a side effect, rework config space accesses to handle reads that straddle
capabilities and non-standard areas and use callbacks as needed.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Various cleanups and fixes to handling of region accesses, including:
- there should be no reason for us to split accesses into 1/2/4/8 byte accesses:
in general, the client will have already be doing that, and if not, there's no
particular reason we should be the ones to split up such larger accesses.
- use a callback for PCI config space reads and writes if one is provided (needs
more work for capabilities)
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Explicitly mimic the Linux kernel API: the searching functions return an offset
into configuration space. Just like a driver, libvfio-user devices can then
look into config space via vfu_pci_get_config_space() to read the capability as
needed. In general, the driver itself will know exactly what the size and shape
of the capability is, so this seems like a low-friction, and familiar to driver
writers, API.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Split up vfu_pci_setup_config_hdr(): individual "helpers" like vfu_pci_set_id()
are much simpler to use than making the user specify the values in
header-formatted structs; and this way if we want to add additional helpers, we
won't need to modify the existing functions.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
It's still pretty entangled, but move the bulk of the non-cap PCI code over to
pci.c.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This is no longer useful.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
It's easy (with the new vfu_get_private()) to go from a vfu_ctx to the private
pointer, but not the reverse; pass the ctx into all the callbacks.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
remove duplicate code for initializing DMA segment, mark DMA segment whether it's mappable, plus basic unit test for dma_addr_to_sg
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
* Reorg vfu_create_ctx()
* Unconditionally call dma_controller_create() in vfu_setup_device_dma_cb().
* Added UT for vfu_setup_device_dma_cb()
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
enable ERR and REQ IRQs by default
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This is almost entirely re-ordering: first the basic lifecycle things, then
vfu_setup_*() group, then handlers and helpers, and finally PCI handling.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Remove this API as well as vfu_pci_non_std_config_space_t. It's at best
confusing to try to represent this area as if it's not just a normal part of the
overall config space, and we don't really want an additional API for the
extended space past either.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The PCI vendor-specific capability is blindly read/written by the
library. It is possible that the user might want to intercept accesses
to it, in which case we'll have to add callback. The best way to do this
to introduce a new function that configures callbacks for the PCI
capabilities, e.g.
typedef ssize_t (vfu_cap_access_t) (void *pvt, uint8_t id,
char *buf, size_t count, loff_t offset, bool is_write);
vfu_pci_cap_set_cb(vfu_ctx-T *vfu_ctx, uint8_t cap_id,
vfu_cap_access_t *cb);
This way the existing API won't have to change.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
* Drop vfu_ctx_drive() and use vfu_run_ctx()
Renamed vfu_ctx_poll() to vfu_run_ctx().
Updated vfu_run_ctx() to also handle blocking ctx.
Instead of having separate functions for blocking and
non-blocking ctx, better to have one.
This way user can call same set of functions for both cases.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Misc changes for vfu_ctx_try_attach()
* Rename to vfu_attach_ctx()
* Removed call to vfu_realize_ctx(), should be called separately
* Now vfu_attach_ctx() must also be called for blocking ctx.
Misc changes for vfu_realize_ctx()
* Made calling vfu_realize_ctx() mandatory
* vfu_ctx_drive() and vfu_poll_ctx() returns EINVAL if the device is not
realized.
* Renamed vfu_ctx->ready to vfu_ctx->realized
Added unit test for vfu_attach_ctx() and vfu_realize_ctx()
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
We renamed other code to be "REGION" instead of "REG" so it's less ambiguous. Do
the same for VFU_REG_FLAG_*.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
|