diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/memory-mapping.md | 139 |
1 files changed, 139 insertions, 0 deletions
diff --git a/docs/memory-mapping.md b/docs/memory-mapping.md new file mode 100644 index 0000000..f5c5bce --- /dev/null +++ b/docs/memory-mapping.md @@ -0,0 +1,139 @@ +# Accessing memory with libvfio-user + +A vfio-user client informs the server of its memory regions available for +access. Each DMA region might correspond, for example, to a guest VM's memory +region. + +A server that wishes to access such client-shared memory must call: + +``` +vfu_setup_device_dma(..., register_cb, unregister_cb); +``` + +during initialization. The two callbacks are invoked when client regions are +added and removed. + +## Memory region callbacks + +For either callback, the following information is given: + +``` +/* + * Info for a guest DMA region. @iova is always valid; the other parameters + * will only be set if the guest DMA region is mappable. + * + * @iova: guest DMA range. This is the guest physical range (as we don't + * support vIOMMU) that the guest registers for DMA, via a VFIO_USER_DMA_MAP + * message, and is the address space used as input to vfu_addr_to_sgl(). + * @vaddr: if the range is mapped into this process, this is the virtual address + * of the start of the region. + * @mapping: if @vaddr is non-NULL, this range represents the actual range + * mmap()ed into the process. This might be (large) page aligned, and + * therefore be different from @vaddr + @iova.iov_len. + * @page_size: if @vaddr is non-NULL, page size of the mapping (e.g. 2MB) + * @prot: if @vaddr is non-NULL, protection settings of the mapping as per + * mmap(2) + * + * For a real example, using the gpio sample server, and a qemu configured to + * use huge pages and share its memory: + * + * gpio: mapped DMA region iova=[0xf0000-0x10000000) vaddr=0x2aaaab0f0000 + * page_size=0x200000 mapping=[0x2aaaab000000-0x2aaabb000000) + * + * 0xf0000 0x10000000 + * | | + * v v + * +-----------------------------------+ + * | Guest IOVA (DMA) space | + * +--+-----------------------------------+--+ + * | | | | + * | +-----------------------------------+ | + * | ^ libvfio-user server address space | + * +--|--------------------------------------+ + * ^ vaddr=0x2aaaab0f0000 ^ + * | | + * 0x2aaaab000000 0x2aaabb000000 + * + * This region can be directly accessed at 0x2aaaab0f0000, but the underlying + * large page mapping is in the range [0x2aaaab000000-0x2aaabb000000). + */ +typedef struct vfu_dma_info { + struct iovec iova; + void *vaddr; + struct iovec mapping; + size_t page_size; + uint32_t prot; +} vfu_dma_info_t; +``` + +The remove callback is expected to arrange for all usage of the memory region to +be stopped (or to return `EBUSY`, to trigger quiescence instead), including all +needed `vfu_sgl_put()` calls for SGLs that are within the memory region. + +## Accessing mapped regions + +As described above, `libvfio-user` may map remote client memory into the +process's address space, allowing direct access. To access these mappings, the +caller must first construct an SGL corresponding to the IOVA start and length: + +``` +dma_sg_t *sgl = calloc(2, dma_sg_size()); + +vfu_addr_to_sgl(vfu_ctx, iova, len, sgl, 2, PROT_READ | PROT_WRITE); +``` + +For example, the device may have received an IOVA from a write to PCI config +space. Due to guest memory topology, certain accesses may not fit in a single +scatter-gather entry, therefore this API allows for an array of SGs to be +provided as necessary. + +If `PROT_WRITE` is given, the library presumes that the user may write to the +SGL mappings at any time; this is used for dirty page tracking. + +### `iovec` construction + +Next, a user wishing to directly access shared memory should convert the SGL +into an array of iovecs: + +``` +vfu_sgl_get(vfu_ctx, sgl, iovec, cnt, 0); +``` + +The caller should provide an array of `struct iovec` that correspond with the +number of SGL entries. After this call, `iovec.iov_base` is the virtual address +with which the range may be directly read from (or written to). + +### Releasing SGL access + +When a particular iovec is finished with, the user can call: + +``` +vfu_sgl_put(vfu_ctx, sgl, iovec, cnt); +``` + +After this call, the SGL must not be accessed via the iovec VAs. As mentioned +above, if the SGL was writeable, this will automatically mark all pages within +the SGL as dirty for live migration purposes. + +### Dirty page handling + +In some cases, such as when entering stop-and-copy state in live migration, it +can be useful to mark an SGL as dirty without releasing it. This can be done via +the call: + +``` +vfu_sgl_mark_dirty(vfu_ctx, sgl, cnt); +``` + +## Non-mapped region access + +Clients are not required to share the memory mapping. If this is *not* the +case, then the server may only read or write the region the slower way: + + +``` +... +vfu_addr_to_sgl(ctx, iova, len, sg, 1, PROT_READ); + +vfu_sgl_read(ctx, sg, 1, &buf); +``` |