aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorJohn Levon <john.levon@nutanix.com>2022-05-27 19:17:28 +0100
committerGitHub <noreply@github.com>2022-05-27 19:17:28 +0100
commit79e83e482d4eb0b7a07cfa207506d33edf05d04b (patch)
treea5be09b69f9000bddb55a1d729071398c59a62c1 /docs
parent188cd00c520855615331d35c087a22215767b8fb (diff)
downloadlibvfio-user-79e83e482d4eb0b7a07cfa207506d33edf05d04b.zip
libvfio-user-79e83e482d4eb0b7a07cfa207506d33edf05d04b.tar.gz
libvfio-user-79e83e482d4eb0b7a07cfa207506d33edf05d04b.tar.bz2
add some documentation on memory handling (#676)
Cover briefly how to access client memory. Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com> Reviewed-by: Thanos Makatos <thanos.makatos@nutanix.com>
Diffstat (limited to 'docs')
-rw-r--r--docs/memory-mapping.md139
1 files changed, 139 insertions, 0 deletions
diff --git a/docs/memory-mapping.md b/docs/memory-mapping.md
new file mode 100644
index 0000000..f5c5bce
--- /dev/null
+++ b/docs/memory-mapping.md
@@ -0,0 +1,139 @@
+# Accessing memory with libvfio-user
+
+A vfio-user client informs the server of its memory regions available for
+access. Each DMA region might correspond, for example, to a guest VM's memory
+region.
+
+A server that wishes to access such client-shared memory must call:
+
+```
+vfu_setup_device_dma(..., register_cb, unregister_cb);
+```
+
+during initialization. The two callbacks are invoked when client regions are
+added and removed.
+
+## Memory region callbacks
+
+For either callback, the following information is given:
+
+```
+/*
+ * Info for a guest DMA region. @iova is always valid; the other parameters
+ * will only be set if the guest DMA region is mappable.
+ *
+ * @iova: guest DMA range. This is the guest physical range (as we don't
+ * support vIOMMU) that the guest registers for DMA, via a VFIO_USER_DMA_MAP
+ * message, and is the address space used as input to vfu_addr_to_sgl().
+ * @vaddr: if the range is mapped into this process, this is the virtual address
+ * of the start of the region.
+ * @mapping: if @vaddr is non-NULL, this range represents the actual range
+ * mmap()ed into the process. This might be (large) page aligned, and
+ * therefore be different from @vaddr + @iova.iov_len.
+ * @page_size: if @vaddr is non-NULL, page size of the mapping (e.g. 2MB)
+ * @prot: if @vaddr is non-NULL, protection settings of the mapping as per
+ * mmap(2)
+ *
+ * For a real example, using the gpio sample server, and a qemu configured to
+ * use huge pages and share its memory:
+ *
+ * gpio: mapped DMA region iova=[0xf0000-0x10000000) vaddr=0x2aaaab0f0000
+ * page_size=0x200000 mapping=[0x2aaaab000000-0x2aaabb000000)
+ *
+ * 0xf0000 0x10000000
+ * | |
+ * v v
+ * +-----------------------------------+
+ * | Guest IOVA (DMA) space |
+ * +--+-----------------------------------+--+
+ * | | | |
+ * | +-----------------------------------+ |
+ * | ^ libvfio-user server address space |
+ * +--|--------------------------------------+
+ * ^ vaddr=0x2aaaab0f0000 ^
+ * | |
+ * 0x2aaaab000000 0x2aaabb000000
+ *
+ * This region can be directly accessed at 0x2aaaab0f0000, but the underlying
+ * large page mapping is in the range [0x2aaaab000000-0x2aaabb000000).
+ */
+typedef struct vfu_dma_info {
+ struct iovec iova;
+ void *vaddr;
+ struct iovec mapping;
+ size_t page_size;
+ uint32_t prot;
+} vfu_dma_info_t;
+```
+
+The remove callback is expected to arrange for all usage of the memory region to
+be stopped (or to return `EBUSY`, to trigger quiescence instead), including all
+needed `vfu_sgl_put()` calls for SGLs that are within the memory region.
+
+## Accessing mapped regions
+
+As described above, `libvfio-user` may map remote client memory into the
+process's address space, allowing direct access. To access these mappings, the
+caller must first construct an SGL corresponding to the IOVA start and length:
+
+```
+dma_sg_t *sgl = calloc(2, dma_sg_size());
+
+vfu_addr_to_sgl(vfu_ctx, iova, len, sgl, 2, PROT_READ | PROT_WRITE);
+```
+
+For example, the device may have received an IOVA from a write to PCI config
+space. Due to guest memory topology, certain accesses may not fit in a single
+scatter-gather entry, therefore this API allows for an array of SGs to be
+provided as necessary.
+
+If `PROT_WRITE` is given, the library presumes that the user may write to the
+SGL mappings at any time; this is used for dirty page tracking.
+
+### `iovec` construction
+
+Next, a user wishing to directly access shared memory should convert the SGL
+into an array of iovecs:
+
+```
+vfu_sgl_get(vfu_ctx, sgl, iovec, cnt, 0);
+```
+
+The caller should provide an array of `struct iovec` that correspond with the
+number of SGL entries. After this call, `iovec.iov_base` is the virtual address
+with which the range may be directly read from (or written to).
+
+### Releasing SGL access
+
+When a particular iovec is finished with, the user can call:
+
+```
+vfu_sgl_put(vfu_ctx, sgl, iovec, cnt);
+```
+
+After this call, the SGL must not be accessed via the iovec VAs. As mentioned
+above, if the SGL was writeable, this will automatically mark all pages within
+the SGL as dirty for live migration purposes.
+
+### Dirty page handling
+
+In some cases, such as when entering stop-and-copy state in live migration, it
+can be useful to mark an SGL as dirty without releasing it. This can be done via
+the call:
+
+```
+vfu_sgl_mark_dirty(vfu_ctx, sgl, cnt);
+```
+
+## Non-mapped region access
+
+Clients are not required to share the memory mapping. If this is *not* the
+case, then the server may only read or write the region the slower way:
+
+
+```
+...
+vfu_addr_to_sgl(ctx, iova, len, sg, 1, PROT_READ);
+
+vfu_sgl_read(ctx, sg, 1, &buf);
+```