1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
|
# Accessing memory with libvfio-user
A vfio-user client informs the server of its memory regions available for
access. Each DMA region might correspond, for example, to a guest VM's memory
region.
A server that wishes to access such client-shared memory must call:
```
vfu_setup_device_dma(..., register_cb, unregister_cb);
```
during initialization. The two callbacks are invoked when client regions are
added and removed.
## Memory region callbacks
For either callback, the following information is given:
```
/*
* Info for a guest DMA region. @iova is always valid; the other parameters
* will only be set if the guest DMA region is mappable.
*
* @iova: guest DMA range. This is the guest physical range (as we don't
* support vIOMMU) that the guest registers for DMA, via a VFIO_USER_DMA_MAP
* message, and is the address space used as input to vfu_addr_to_sgl().
* @vaddr: if the range is mapped into this process, this is the virtual address
* of the start of the region.
* @mapping: if @vaddr is non-NULL, this range represents the actual range
* mmap()ed into the process. This might be (large) page aligned, and
* therefore be different from @vaddr + @iova.iov_len.
* @page_size: if @vaddr is non-NULL, page size of the mapping (e.g. 2MB)
* @prot: if @vaddr is non-NULL, protection settings of the mapping as per
* mmap(2)
*
* For a real example, using the gpio sample server, and a qemu configured to
* use huge pages and share its memory:
*
* gpio: mapped DMA region iova=[0xf0000-0x10000000) vaddr=0x2aaaab0f0000
* page_size=0x200000 mapping=[0x2aaaab000000-0x2aaabb000000)
*
* 0xf0000 0x10000000
* | |
* v v
* +-----------------------------------+
* | Guest IOVA (DMA) space |
* +--+-----------------------------------+--+
* | | | |
* | +-----------------------------------+ |
* | ^ libvfio-user server address space |
* +--|--------------------------------------+
* ^ vaddr=0x2aaaab0f0000 ^
* | |
* 0x2aaaab000000 0x2aaabb000000
*
* This region can be directly accessed at 0x2aaaab0f0000, but the underlying
* large page mapping is in the range [0x2aaaab000000-0x2aaabb000000).
*/
typedef struct vfu_dma_info {
struct iovec iova;
void *vaddr;
struct iovec mapping;
size_t page_size;
uint32_t prot;
} vfu_dma_info_t;
```
The remove callback is expected to arrange for all usage of the memory region to
be stopped (or to return `EBUSY`, to trigger quiescence instead), including all
needed `vfu_sgl_put()` calls for SGLs that are within the memory region.
## Accessing mapped regions
As described above, `libvfio-user` may map remote client memory into the
process's address space, allowing direct access. To access these mappings, the
caller must first construct an SGL corresponding to the IOVA start and length:
```
dma_sg_t *sgl = calloc(2, dma_sg_size());
vfu_addr_to_sgl(vfu_ctx, iova, len, sgl, 2, PROT_READ | PROT_WRITE);
```
For example, the device may have received an IOVA from a write to PCI config
space. Due to guest memory topology, certain accesses may not fit in a single
scatter-gather entry, therefore this API allows for an array of SGs to be
provided as necessary.
If `PROT_WRITE` is given, the library presumes that the user may write to the
SGL mappings at any time; this is used for dirty page tracking.
### `iovec` construction
Next, a user wishing to directly access shared memory should convert the SGL
into an array of iovecs:
```
vfu_sgl_get(vfu_ctx, sgl, iovec, cnt, 0);
```
The caller should provide an array of `struct iovec` that correspond with the
number of SGL entries. After this call, `iovec.iov_base` is the virtual address
with which the range may be directly read from (or written to).
### Releasing SGL access
When a particular iovec is finished with, the user can call:
```
vfu_sgl_put(vfu_ctx, sgl, iovec, cnt);
```
After this call, the SGL must not be accessed via the iovec VAs. As mentioned
above, if the SGL was writeable, this will automatically mark all pages within
the SGL as dirty for live migration purposes.
### Dirty page handling
In some cases, such as when entering stop-and-copy state in live migration, it
can be useful to mark an SGL as dirty without releasing it. This can be done via
the call:
```
vfu_sgl_mark_dirty(vfu_ctx, sgl, cnt);
```
## Non-mapped region access
Clients are not required to share the memory mapping. If this is *not* the
case, then the server may only read or write the region the slower way:
```
...
vfu_addr_to_sgl(ctx, iova, len, sg, 1, PROT_WRITE);
vfu_sgl_write(ctx, sg, 1, &buf);
```
Note that in this case, the server is not expected to report any dirty writes
via `vfu_sgl_mark_dirty()`: as the client is actually writing to memory, it's
the client's responsibility to track any dirtying.
|