libvfio-user ============ vfio-user is a framework that allows implementing PCI devices in userspace. Clients (such as [qemu](https://qemu.org)) talk the [vfio-user protocol](https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg02458.html) over a UNIX socket to a server. This library, `libvfio-user`, provides an API for implementing such servers. ![vfio-user example block diagram](docs/libvfio-user.png) [VFIO](https://www.kernel.org/doc/Documentation/vfio.txt) is a kernel facility for providing secure access to PCI devices in userspace (including pass-through to a VM). With `vfio-user`, instead of talking to the kernel, all interactions are done in userspace, without requiring any kernel component; the kernel `VFIO` implementation is not used at all for a `vfio-user` device. Put another way, `vfio-user` is to VFIO as [vhost-user](https://www.qemu.org/docs/master/interop/vhost-user.html) is to `vhost`. The `vfio-user` protocol is intentionally modelled after the VFIO `ioctl()` interface, and shares many of its definitions. However, there is not an exact equivalence: for example, IOMMU groups are not represented in `vfio-user`. There many different purposes you might put this library to, such as prototyping novel devices, testing frameworks, implementing alternatives to qemu's device emulation, adapting a device class to work over a network, etc. The library abstracts most of the complexity around representing the device. Applications using libvfio-user provide a description of the device (eg. region and IRQ information) and as set of callbacks which are invoked by `libvfio-user` when those regions are accessed. Memory Mapping the Device ------------------------- The device driver can allow parts of the virtual device to be memory mapped by the virtual machine (e.g. the PCI BARs). The business logic needs to implement the mmap callback and reply to the request passing the memory address whose backing pages are then used to satisfy the original mmap call; [more details here](./docs/memory-mapping.md). Interrupts ---------- Interrupts are implemented via eventfd's passed from the client and registered with the library. `libvfio-user` consumers can then trigger interrupts by writing to the eventfd. Building libvfio-user ===================== Build requirements: * `meson` (v0.53.0 or above) * `apt install libjson-c-dev libcmocka-dev` or * `yum install json-c-devel libcmocka-devel` The kernel headers are necessary because VFIO structs and defines are reused. To build: ``` meson build ninja -C build ``` Finally build your program and link with `libvfio-user.so`. Supported features ================== With the client support found in [cloud-hypervisor](https://github.com/cloud-hypervisor/cloud-hypervisor/) or the in-development [qemu](https://gitlab.com/qemu-project/qemu) support, most guest VM use cases will work. See below for some details on how to try this out. However, guests with an IOMMU (vIOMMU) will not currently work: the number of DMA regions is strictly limited, and there are also issues with some server implementations such as SPDK's virtual NVMe controller. Currently, `libvfio-user` has explicit support for PCI devices only. In addition, only PCI endpoints are supported (no bridges etc.). API === The API is currently documented via the [libvfio-user header file](./include/libvfio-user.h), along with some additional [documentation](docs/). The library (and the protocol) are actively under development, and should not yet be considered a stable API or interface. The API is not thread safe, but individual `vfu_ctx_t` handles can be used separately by each thread: that is, there is no global library state. Mailing List & Chat =================== libvfio-user development is discussed in libvfio-user-devel@nongnu.org. Subscribe here: https://lists.gnu.org/mailman/listinfo/libvfio-user-devel. We are on Slack at [libvfio-user.slack.com](https://libvfio-user.slack.com) ([invite link](https://join.slack.com/t/libvfio-user/shared_invite/zt-193oqc8jl-a2nKYFZESQMMlsiYHSsAMw)); or IRC at [#qemu on OFTC](https://oftc.net/). Contributing ============ Contributions are welcome; please file an [issue](https://github.com/nutanix/libvfio-user/issues/) or [open a PR](https://github.com/nutanix/libvfio-user/pulls). Anything substantial is worth discussing with us first. Please make sure to mark any commits with `Signed-off-by` (`git commit -s`), which signals agreement with the [Developer Certificate of Origin v1.1](https://en.wikipedia.org/wiki/Developer_Certificate_of_Origin). Running `make pre-push` will do the same checks as done in github CI. After merging, a Coverity scan is also done. See [Testing](docs/testing.md) for details on how the library is tested. Examples ======== The [samples directory](./samples/) contains various libvfio-user examples. lspci ----- [lspci](./samples/lspci.c) implements an example of how to dump the PCI header of a libvfio-user device and examine it with lspci(8): ``` # lspci -vv -F <(build/samples/lspci) 00:00.0 Non-VGA unclassified device: Device 0000:0000 Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- [disabled] Region 1: I/O ports at [disabled] Region 2: I/O ports at [disabled] Region 3: I/O ports at [disabled] Region 4: I/O ports at [disabled] Region 5: I/O ports at [disabled] Capabilities: [40] Power Management version 0 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- ``` The above sample implements a very simple PCI device that supports the Power Management PCI capability. The sample can be trivially modified to change the PCI configuration space header and add more PCI capabilities. Client/Server Implementation ---------------------------- [Client](./samples/client.c)/[server](./samples/server.c) implements a basic client/server model where basic tasks are performed. The server implements a device that can be programmed to trigger interrupts (INTx) to the client. This is done by writing the desired time in seconds since Epoch to BAR0. The server then triggers an eventfd-based IRQ and then a message-based one (in order to demonstrate how it's done when passing of file descriptors isn't possible/desirable). The device also works as memory storage: BAR1 can be freely written to/read from by the host. Since this is a completely made up device, there's no kernel driver (yet). [Client](./samples/client.c) implements a client that knows how to drive this particular device (that would normally be QEMU + guest VM + kernel driver). The client exercises all commands in the vfio-user protocol, and then proceeds to perform live migration. The client spawns the destination server (this would be normally done by libvirt) and then migrates the device state, before switching entirely to the destination server. We re-use the source client instead of spawning a destination one as this is something libvirt/QEMU would normally do. To spice things up, the client programs the source server to trigger an interrupt and then migrates to the destination server; the programmed interrupt is delivered by the destination server. Also, while the device is being live migrated, the client spawns a thread that constantly writes to BAR1 in a tight loop. This thread emulates the guest VM accessing the device while the main thread (what would normally be QEMU) is driving the migration. Start the source server as follows (pick whatever you like for `/tmp/vfio-user.sock`): ``` rm -f /tmp/vfio-user.sock* ; build/samples/server -v /tmp/vfio-user.sock ``` And then the client: ``` build/samples/client /tmp/vfio-user.sock ``` After a couple of seconds the client will start live migration. The source server will exit and the destination server will start, watch the client terminal for destination server messages. gpio ---- A [gpio](./samples/gpio-pci-idio-16.c) server implements a very simple GPIO device that can be used with a Linux VM. Start the `gpio` server process: ``` rm /tmp/vfio-user.sock ./build/samples/gpio-pci-idio-16 -v /tmp/vfio-user.sock & ``` Next, build `qemu` and start a VM, as described below. Log in to your guest VM. You'll probably need to build the `gpio-pci-idio-16` kernel module yourself - it's part of the standard Linux kernel, but not usually built and shipped on x86. Once built, you should be able to load the module and observe the emulated GPIO device's pins: ``` insmod gpio-pci-idio-16.ko cat /sys/class/gpio/gpiochip480/base > /sys/class/gpio/export for ((i=0;i<12;i++)); do cat /sys/class/gpio/OUT0/value; done ``` shadow_ioeventfd_server ----------------------- shadow_ioeventfd_server.c and shadow_ioeventfd_speed_test.c are used to demonstrate the benefits of shadow ioeventfd, see [ioregionfd](./docs/ioregionfd.md) for more information. Other usage notes ================= Live migration -------------- The `master` branch of `libvfio-user` implements live migration with a protocol based on vfio's v2 protocol. Currently, there is no support for this in any qemu client. For current use cases that support live migration, such as SPDK, you should refer to the [https://github.com/nutanix/libvfio-user/tree/migration-v1] (migration-v1 branch). qemu ---- `vfio-user` client support is not yet merged into `qemu`. Instead, download and build [this branch of qemu](https://github.com/oracle/qemu/tree/vfio-user-6.2). Create a Linux install image, or use a pre-made one. Then, presuming you have a `libvfio-user` server listening on the UNIX socket `/tmp/vfio-user.sock`, you can start your guest VM with something like this: ``` ./x86_64-softmmu/qemu-system-x86_64 -mem-prealloc -m 256 \ -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/gpio,share=yes,size=256M \ -numa node,memdev=ram-node0 \ -kernel ~/vmlinuz -initrd ~/initrd -nographic \ -append "console=ttyS0 root=/dev/sda1 single" \ -hda ~/bionic-server-cloudimg-amd64-0.raw \ -device vfio-user-pci,socket=/tmp/vfio-user.sock ``` SPDK ---- SPDK uses `libvfio-user` to implement a virtual NVMe controller: see [docs/spdk.md](docs/spdk.md) for more details. libvirt ------- You can configure `vfio-user` devices in a `libvirt` domain configuration: 1. Add `xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'` to the `domain` element. 2. Enable sharing of the guest's RAM: ```xml ``` 3. Pass the vfio-user device: ```xml ``` History ======= This project was formerly known as "muser", short for "Mediated Userspace Device". It implemented a proof-of-concept [VFIO mediated device](https://www.kernel.org/doc/Documentation/vfio-mediated-device.txt) in userspace. Normally, VFIO mdev devices require a kernel module; `muser` implemented a small kernel module that forwarded onto userspace. The old kernel-module-based implementation can be found in the [kmod branch](https://github.com/nutanix/muser/tree/kmod).