aboutsummaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorThanos Makatos <thanos.makatos@nutanix.com>2020-07-21 05:15:44 -0400
committerThanos Makatos <thanos.makatos@nutanix.com>2020-07-21 05:15:44 -0400
commit13094f1a226dc677c9fde53c5e3abfa8023da6e5 (patch)
tree2781b41938455db8396b33ed951f2e38720b9625 /README.md
parent62fe0d0f3d33eb2a1ce2d0fef86e70041ad6180b (diff)
downloadlibvfio-user-13094f1a226dc677c9fde53c5e3abfa8023da6e5.zip
libvfio-user-13094f1a226dc677c9fde53c5e3abfa8023da6e5.tar.gz
libvfio-user-13094f1a226dc677c9fde53c5e3abfa8023da6e5.tar.bz2
focus README on VFIO-over-socket
Signed-off-by: Thanos Makatos <thanos.makatos@nutanix.com>
Diffstat (limited to 'README.md')
-rw-r--r--README.md152
1 files changed, 71 insertions, 81 deletions
diff --git a/README.md b/README.md
index a4af5da..96223b3 100644
--- a/README.md
+++ b/README.md
@@ -13,21 +13,33 @@ This provides interesting benefits, including:
* Simplification of the initial development of kernel drivers for new devices
* Easy plumbing to hypervisors that support VFIO device pass-through
* Performance benefits as a single process can poll multiple drivers
-
-MUSER is implemented by two components: a loadable kernel module (muser.ko) and
-a userspace library (libmuser). The LKM registers itself with MDEV and relay
-VFIO requests to libmuser via a custom ioctl-based interface. The library, in
-turn, abstracts most of the complexity around representing the device.
+
+In this fork we focus on making QEMU and MUSER work without the need of the
+MUSER kernel module. This has been demonstrated as a PoC in
+https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07900.html. In the PoC
+we use a library to intercept QEMU's syscalls to VFIO (libpathtrap) and convert
+theme into messages that we send to the process where device emulation is
+implemented (libvfio). Any QEMU version can be used, unpatched.
+
+The PoC is merely a step towards defining a device offloading protocol that
+will hopefully be officially suported by QEMU so we won't need to do tricks with
+intercepting syscalls etc. This protocol will be called VFIO-over-socket (or
+vfio-user) and is based on the existing VFIO framework (it reuses structs,
+defines, concepts, etc). Hopefully the protocol won't be too different from the
+one in the PoC. You can follow/participate in the discussion here:
+https://www.mail-archive.com/qemu-devel@nongnu.org/msg723773.html
+
+The library abstracts most of the complexity around representing the device.
Applications using libmuser provide a description of the device (eg. region and
irq information) and as set of callbacks which are invoked by libmuser when
those regions are accessed. See src/samples on how to build such an
application.
-Currently there is a one, single-threaded application instance per device,
+Currently there is one, single-threaded application instance per device,
however the application can employ any form of concurrency needed. In the
future we plan to make libmuser multi-threaded. The application can be
implemented in whatever way is convenient, e.g. as a Python script using
-bindings, on the cloud, etc.
+bindings, on the cloud, etc. There's also experimental support for polling.
Memory Mapping the Device
@@ -46,7 +58,7 @@ page is written to.
Interrupts
----------
-Interrupts are implemented by installing the event file descriptor in libmuser
+Interrupts are implemented by passing the event file descriptor to libmuser
and then notifying it about it. libmuser can then trigger interrupts simply by
writing to it. This can be much more expensive compared to triggering interrupts
from the kernel, however this performance penalty is perfectly acceptable when
@@ -56,38 +68,23 @@ prototyping the functional aspect of a device driver.
System Architecture
-------------------
-muser.ko and libmuser communicate via ioctl on a control device. This control
-device is create when the mediated device is created and appears as
-/dev/muser/<UUID>. libmuser opens this device and then executes a "wait
-command" ioctl. Whenever a callback of muser.ko is executed, it fills a struct
-with the command details and then completes the ioctl, unblocking libmuser. It
-then waits to receive another ioctl from libmuser with the result. Currently
-there can be only one command pending, we plan to allow multiple commands to be
-executed in parallel.
+QEMU (with the "help" of libpathtrap and livfio) and libmuser communicate via a
+UNIX domain socket (in the future it can be anything, e.g. UDP). Whenever QEMU
+executes an ioctl to the VFIO device, libpathtrap/libvfio convert the operation
+into a message and send it to libmuser, unblocking it. libmuser executed the
+request and sends back the response. Currently there can be only one command
+pending, we plan to allow multiple commands to be executed in parallel.
Building muser
==============
-vfio/mdev needs to be patched:
-
- patch -p1 < muser/patches/vfio.diff
-
-Apply the patch and rebuild the vfio/mdev modules:
-
- make SUBDIRS=drivers/vfio/ modules
-
-Reload the relevant kernel modules:
-
- drivers/vfio/vfio_iommu_type1.ko
- drivers/vfio/vfio.ko
- drivers/vfio/mdev/mdev.ko
- drivers/vfio/mdev/vfio_mdev.ko
-
-To build and install the library run:
+Just do:
+ git submodule update --init
make && make install
+The kernel headers are necessary because VFIO structs and defines are resused.
To specify an alternative kernel directory set the KDIR environment variable
accordingly.
To enable Python bindings set the PYTHON_BINDINGS environment variable to a
@@ -98,10 +95,25 @@ Finally build your program and link it to libmuser.so.
Running QEMU
============
-To pass the device to QEMU add the following options:
+Use the following snippet to create the directory structure, this is required
+because QEMU still thinks it's talking to VFIO. "muser" can really by anything
+or even omitted. "foo" is typically the guest name/UUID. "0" is the IOMMU
+group, this must be an integer and must not exist under /dev/vfio. SELinux and
+cgroups can be tricky to set up correctly, so try and keep it simple for now
+(e.g. disable SELinux, use world-accessible paths such as /var/run etc.).
+
+ mkdir -p /var/run/muser/iommu_group /var/run/muser/foo/0
+ cd /var/run/muser/foo/0 && ln -sf ../0 iommu_group
+ ln -s /var/run/muser/foo/0 /var/run/muser/iommu_group/0
+
+Create your libmuser context setting /var/run/muser/foo/0 as the UUID.
+
+Run QEMU as follows:
- -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/<UUID>
- -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=mem,share=yes,size=1073741824 -numa node,nodeid=0,cpus=0,memdev=ram-node0
+ LD_PRELOAD=muser/build/dbg/libvfio/libvfio.so qemu-system-x86_64 \
+ ... \
+ -device vfio-pci,sysfsdev=/var/run/muser/foo/0
+ -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=mem,share=yes,size=1073741824 -numa node,nodeid=0,cpus=0,memdev=ram-node0
Guest RAM must be shared (share=yes) otherwise libmuser won't be able to do DMA
transfers from/to it. If you're not using QEMU then any memory that must be
@@ -121,62 +133,40 @@ Running gpio-pci-idio-16
1. First, follow the instructions to build and load muser.
2. Then, start the gpio-pci-idio-16 device emulation:
-```
-# echo 00000000-0000-0000-0000-000000000000 > /sys/class/muser/muser/mdev_supported_types/muser-1/create
-# build/dbg/samples/gpio-pci-idio-16 00000000-0000-0000-0000-000000000000
-```
+
+ # build/dbg/samples/gpio-pci-idio-16 -s /var/run/muser/foo/0
+
3. Finally, start the VM adding the command line explained earlier and then
execute:
-```
-# insmod gpio-pci-idio-16.ko
-# cat /sys/class/gpio/gpiochip480/base > /sys/class/gpio/export
-# for ((i=0;i<12;i++)); do cat /sys/class/gpio/OUT0/value; done
-0
-0
-0
-1
-1
-1
-0
-0
-0
-1
-1
-1
-```
+
+ # insmod gpio-pci-idio-16.ko
+ # cat /sys/class/gpio/gpiochip480/base > /sys/class/gpio/export
+ # for ((i=0;i<12;i++)); do cat /sys/class/gpio/OUT0/value; done
+ 0
+ 0
+ 0
+ 1
+ 1
+ 1
+ 0
+ 0
+ 0
+ 1
+ 1
+ 1
Future Work
===========
-Making libmuser Restartable
-----------------------------
-
-muser can be made restartable so that (a) it can recover from failures, and
-(b) upgrades are less disrupting. This is something we plan to implement in the
-future. To make it restarable muser needs to reconfigure eventfds and DMA
-region mmaps first thing when the device is re-opened by libmuser. After muser
-has finished reconfiguring it will send a "ready" command, after which normal
-operation will be resumed. This "ready" command will always be sent when the
-device is opened, even if this is the first time, as this way we don't need to
-differentiate between normal operation and restarted operation. libmuser will
-store the PCI BAR on /dev/shm (named after e.g. the device UUID) so that it can
-easily find them on restart.
-
-
-Making libmuser Multi-threaded
--------------------------------
-
-libmuser can be made multi-threaded in order to improve performance. To
-implement this we'll have to maintain a private context in struct file.
+See official fork for more details.
Troubleshooting
---------------
-If you get the following error when starting QEMU:
-
- qemu-system-x86_64: -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/00000000-0000-0000-0000-000000000000: vfio 00000000-0000-0000-0000-000000000000: failed to read device config space: Bad address
-
-it might mean that you haven't properly patched your kernel.
+It's easy to mess things up as this is a PoC. libvfio stores logs under
+`/tmp/libvfio`. When things fail it's usually because the directory hasn't been
+correctly set up or cleaned up from the previous run, use `strace` and check
+which syscalls fail and why.
To debug accesses to your PCI device from QEMU add the following to the QEMU
command line: