Age | Commit message (Collapse) | Author | Files | Lines |
|
* Support 64 bits and prefetchable BARs
Add two new flags for lib user to request 64bits and/or prefetchable
BARs.
Tested with a vfio-user client patched QEMU.
Signed-off-by: Jérémy Fanguède <jfanguede@kalrayinc.com>
|
|
When performing DMA via VFIO-user commands over the socket,
vfu_dma_transfer breaks large requests into chunks according to the
client's maximum data transfer size negotiated at connection setup time.
This change fixes the calculation of the chunk size for the case where
the last chunk is less than the maximum transfer size.
Unfortunately, the existing test didn't catch this due to the request
size being a multiple of that maximum data transfer size. Adjust the
test to make the last chunk size a true remainder.
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
|
|
clang-tidy static analysis identified a zero-sized allocation in the
case that no ioregionfds had been configured. Fix this issue and add a
test for it.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Fixes the following Coverity reports:
________________________________________________________________________________________________________
*** CID 417161: Memory - corruptions (ARRAY_VS_SINGLETON)
/samples/server.c: 438 in migration_write_data()
432 }
433
434 /* write to bar0, if any */
435 if (write_end > server_data->bar1_size) {
436 length_in_bar0 = write_end - write_start;
437 write_start -= server_data->bar1_size;
CID 417161: Memory - corruptions (ARRAY_VS_SINGLETON)
Using "&server_data->bar0" as an array. This might corrupt or misinterpret adjacent memory locations.
438 memcpy(&server_data->bar0 + write_start, buf + length_in_bar1,
439 length_in_bar0);
440 }
441
442 server_data->migration.bytes_transferred += bytes_written;
443
________________________________________________________________________________________________________
*** CID 417160: Memory - corruptions (ARRAY_VS_SINGLETON)
/samples/server.c: 394 in migration_read_data()
388 }
389
390 /* read bar0, if any */
391 if (read_end > server_data->bar1_size) {
392 length_in_bar0 = read_end - read_start;
393 read_start -= server_data->bar1_size;
CID 417160: Memory - corruptions (ARRAY_VS_SINGLETON)
Using "&server_data->bar0" as an array. This might corrupt or misinterpret adjacent memory locations.
394 memcpy(buf + length_in_bar1, &server_data->bar0 + read_start,
395 length_in_bar0);
396 }
397
398 server_data->migration.bytes_transferred += bytes_read;
399
________________________________________________________________________________________________________
*** CID 417159: Possible Control flow issues (DEADCODE)
/lib/libvfio-user.c: 121 in dev_get_caps()
115
116 header = (struct vfio_info_cap_header*)(vfio_reg + 1);
117
118 if (vfu_reg->mmap_areas != NULL) {
119 int i, nr_mmap_areas = vfu_reg->nr_mmap_areas;
120 if (type != NULL) {
CID 417159: Possible Control flow issues (DEADCODE)
Execution cannot reach this statement: "type->header.next = vfio_re...".
121 type->header.next = vfio_reg->cap_offset + sizeof(struct vfio_region_info_cap_type);
122 sparse = (struct vfio_region_info_cap_sparse_mmap*)(type + 1);
123 } else {
124 vfio_reg->cap_offset = sizeof(struct vfio_region_info);
125 sparse = (struct vfio_region_info_cap_sparse_mmap*)header;
126 }
Signed-off-by: William Henderson <william.henderson@nutanix.com>
|
|
This commit adapts the vfio-user protocol specification and the libvfio-user
implementation to v2 of the VFIO live migration interface, as used in the kernel
and QEMU.
The differences between v1 and v2 are discussed in this email thread [1], and we
slightly differ from upstream VFIO v2 in that instead of transferring data over
a new FD, we use the existing UNIX socket with new commands
VFIO_USER_MIG_DATA_READ/WRITE. We also don't yet use P2P states.
The updated spec was submitted to qemu-devel [2].
[1] https://lore.kernel.org/all/20220130160826.32449-9-yishaih@nvidia.com/
[2] https://lore.kernel.org/all/20230718094150.110183-1-william.henderson@nutanix.com/
Signed-off-by: William Henderson <william.henderson@nutanix.com>
|
|
It turns out that the bit field will not yield the desired / specified
bit layout on big-endian systems, see issue #768 for details. Thus,
replace the bit field with constants for the individual fields and use
bit masking when accessing the flags field.
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
The helper function centralizes some extra checks and diligence desired
by many/most current code paths but currently inconsistently applied.
This includes bypassing the close call when the file descriptor is -1
already, resetting the file descriptor variable to -1 after closing, and
preserving errno.
All calls to close are replaced by close_safely. Some warning log output
is lost over this, but it doesn't seem like this was very useful anyways
given that Linux always closes the file descriptor anyways.
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
|
|
The correct DMA address is formed by adding base and offset - the latter
was accidentally missing. Change the server example to read and write
blocks at non-zero offsets, such that `test-client-server.sh` exercises
offset handling.
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
A reset callback is allowed to call functions disallowed in quiescent
state. However, the FLR reset path neglected to account for this
properly, causing an incorrect assert to be triggered if, for example,
vfu_sgl_put() is called. To fix this, make sure all reset paths go
through call_reset_cb().
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Use misspell-fixer if available, and correct the small number of errors
it found. Rather than trying to install into the CI, run it directly from a
github action.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
This is out of spec.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reported-by: Eduardo Lima <eblima@gmail.com>
|
|
As vfu_addr_to_sgl() and co are on the hot path, compile out these
sanity checks for non-DEBUG builds.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
fixes #660
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
When an ioeventfd is written to, KVM discards the value since it has no
memory to write it to, and simply kicks the eventfd. This a problem for
devices such a NVMe controllers that need the value (e.g. doorbells on
BAR0). This patch allows the vfio-user server to pass a file descriptor
that can be mmap'ed and KVM can write the ioeventfd value to this
_shadow_ memory instead of discarding it. This shadow memory is not
exposed to the guest.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Change-Id: Iad849c94076ffa5988e034c8bf7ec312d01f095f
|
|
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Client masks or unmasks a device IRQ using the
VFIO_USER_DEVICE_SET_IRQS message. Inform the device of such changes to
the IRQ state.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
While libvfio-user doesn't use them all, at least SPDK was expecting to
be able to set LOG_NOTICE level, and silently failing. There's no reason
we can't support any valid syslog level.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Use atomic operations to allow concurrent bitmap updates with
VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP operations.
Dirtying clients can race against each other, so we must use atomic or
when marking dirty: we do this byte-by-byte.
When reading the dirty bitmap, we must be careful to not race and lose
any set bits within the same byte. If we miss an update, we'll catch it
the next time around, presuming that before the final pass we'll have
quiesced all I/O.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Harmonize and rename the vfu_*sg() APIs to better reflect their functionality:
in our case, there is no mapping happening as part of these calls, they are
merely housekeeping for range splitting, dirty tracking, and so on.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Move SG dirtying to vfu_unmap_sg(): as we don't want to track SGs
ourselves, doing this in vfu_map_sg() is no longer the right place.
Note that the lack of tracking implies that any SGs must be unmapped
before the final stop and copy phase. To avoid the need for this, add
vfu_mark_sg_dirty(): this allows a consumer to mark a region as dirty
explicitly without needing to unmap it. Currently it's the same as
vfu_unmap_sg(), but that's an implementation detail.
Note this still marks current maps after a get operation; that will
change subsequently.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
If we require a quiesce for these calls, we can be sure that it will not race
with any usage of vfu_*_sg() calls, as a first step towards concurrency.
This is not ideal for VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP, which can
potentially be called multiple times during pre-copy phase, but that's something
we can fix later.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Rename VFIO_DEVICE_STATE_XXXX defines as VFIO_DEVICE_STATE_V1_XXXX.
Upstream renamed these variable to be of the XXXX_V1_XXXX format and
switched an enum for VFIO_DEVICE_STATE_XXXX.
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Yet another static analyzer pass, this one is used by SPDK, and as it
did detect some minor issues, it's worth running.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
To support fuzzing with AFL++, add a "pipe" transport that reads from stdin and
outputs to stdout: this is the most convenient way of doing fuzzing.
Add some docs on how to run a fuzzing session.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
This make it tidier and easier to pass to function the buffer and
length, instead of passing the whole msg.
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Many region accesses of interest are of normal register sizes; sniff the region
access size, and report the read/written value if possible. Clean up
dump_buffer() now, as it's not of much use.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
LGTM.com reports "Comparison is always true because ret <= -1.",
and it's indeed correct (but harmless). Clean this up.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Fix three remaining low priority coverity issues; they do not represent bugs.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Instead of process_request() having a dual role, split into get_request() and
handle_request().
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Leon <john.levon@nutanix.com>
|
|
We weren't checking for a too-large ->argsz for this command.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
There were two issues with unmap request validation when the dirty bitmap flag was set:
- we weren't checking ->argsz against the maximum transfer size, allowing a client
to trigger unbounded allocations
- we needed to check for overflow when calculating the requested message out size
Found via AFL++.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
AFL++ found this, though we already knew about it, so fix it by comparing
against a saturating addition. This was the only instance of client-controlled
potential overflow I noticed.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
As clients control ->client_max_fds, we should return an error, not assert, if
we can't represent a region's mmap_areas.
Found via AFL++.
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
* Fix reply of VFIO_USER_DEVICE_GET_REGION_INFO
Set VFIO_REGION_INFO_FLAG_CAPS flag only if caps are part of the reply.
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
Some devices need the migration state callback to be asynchronous. The simplest way to implement this is to require from the callback to return -1 and set errno to EBUSY, not process any other new messages (vfu_ctx_run returns -1 and sets errno to EBUSY), and provide a way to the user to complete migration (vfu_migr_done).
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
|
|
If a region is not set up, asking for its iofds should fail with EINVAL.
Co-authored-by: John Levon <john.levon@nutanix.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
An unmappable region should still allow io fds, as they are orthogonal.
Co-authored-by: John Levon <john.levon@nutanix.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
|
|
Provide initial support for handling VFIO_USER_DEVICE_GET_REGION_IO_FDS, along with a new vfu_create_ioeventfd() API.
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
* Add support for VFIO_DMA_UNMAP_FLAG_ALL flag
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|
|
* initial dma_unmap test
Signed-off-by: John Levon <john.levon@nutanix.com>
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
* Fix err path of handle_dma_unmap()
Set msg->out_size before successful return. Otherwise in case of error
reply path we may endup setting iovecs[1].iov_len with invalid
iovecs[1].iov_base in tran_sock_reply()
Signed-off-by: Swapnil Ingle <swapnil.ingle@nutanix.com>
Reviewed-by: John Levon <john.levon@nutanix.com>
|