aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: 96223b3df6ee586b7d420d720aa9115f57f7585b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
Mediated Userspace Device
=========================

Overview
--------

MUSER is a framework that allows PCI devices to be implemented in userspace. It
leverages the Linux kernel VFIO/MDEV infrastructure, allowing such devices to
be easily accessed via standard VFIO interfaces and subsequently virtual
machines. These can be completely virtual and not backed by any real hardware.
This provides interesting benefits, including:

* Simplification of the initial development of kernel drivers for new devices
* Easy plumbing to hypervisors that support VFIO device pass-through
* Performance benefits as a single process can poll multiple drivers
 
In this fork we focus on making QEMU and MUSER work without the need of the
MUSER kernel module. This has been demonstrated as a PoC in
https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07900.html. In the PoC
we use a library to intercept QEMU's syscalls to VFIO (libpathtrap) and convert
theme into messages that we send to the process where device emulation is
implemented (libvfio). Any QEMU version can be used, unpatched.

The PoC is merely a step towards defining a device offloading protocol that
will hopefully be officially suported by QEMU so we won't need to do tricks with
intercepting syscalls etc. This protocol will be called VFIO-over-socket (or
vfio-user) and is based on the existing VFIO framework (it reuses structs,
defines, concepts, etc). Hopefully the protocol won't be too different from the
one in the PoC. You can follow/participate in the discussion here:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg723773.html 

The library abstracts most of the complexity around representing the device.
Applications using libmuser provide a description of the device (eg. region and
irq information) and as set of callbacks which are invoked by libmuser when
those regions are accessed. See src/samples on how to build such an
application.

Currently there is one, single-threaded application instance per device,
however the application can employ any form of concurrency needed. In the
future we plan to make libmuser multi-threaded. The application can be
implemented in whatever way is convenient, e.g. as a Python script using
bindings, on the cloud, etc. There's also experimental support for polling.


Memory Mapping the Device
-------------------------

The device driver can allow parts of the virtual device to be memory mapped by
the virtual machine (e.g. the PCI BARs). The business logic needs to implement
the mmap callback and reply to the request passing the memory address whose
backing pages are then used to satisfy the original mmap call. Currently
reading and writing of the memory mapped memory by the client goes undetected
by libmuser, the business logic needs to poll. In the future we plan to
implement a mechanism in order to provide notifications to libmuser whenever a
page is written to.


Interrupts
----------

Interrupts are implemented by passing the event file descriptor to libmuser
and then notifying it about it. libmuser can then trigger interrupts simply by
writing to it. This can be much more expensive compared to triggering interrupts
from the kernel, however this performance penalty is perfectly acceptable when
prototyping the functional aspect of a device driver.


System Architecture
-------------------

QEMU (with the "help" of libpathtrap and livfio) and libmuser communicate via a
UNIX domain socket (in the future it can be anything, e.g. UDP).  Whenever QEMU
executes an ioctl to the VFIO device, libpathtrap/libvfio convert the operation
into a message and send it to libmuser, unblocking it. libmuser executed the
request and sends back the response.  Currently there can be only one command
pending, we plan to allow multiple commands to be executed in parallel.


Building muser
==============

Just do:

	git submodule update --init
	make && make install

The kernel headers are necessary because VFIO structs and defines are resused.
To specify an alternative kernel directory set the KDIR environment variable
accordingly.
To enable Python bindings set the PYTHON_BINDINGS environment variable to a
non-empty string.

Finally build your program and link it to libmuser.so.

Running QEMU
============

Use the following snippet to create the directory structure, this is required
because QEMU still thinks it's talking to VFIO. "muser" can really by anything
or even omitted. "foo" is typically the guest name/UUID. "0" is the IOMMU
group, this must be an integer and must not exist under /dev/vfio. SELinux and
cgroups can be tricky to set up correctly, so try and keep it simple for now
(e.g. disable SELinux, use world-accessible paths such as /var/run etc.).

	mkdir -p /var/run/muser/iommu_group /var/run/muser/foo/0
	cd /var/run/muser/foo/0 && ln -sf ../0 iommu_group
	ln -s /var/run/muser/foo/0 /var/run/muser/iommu_group/0

Create your libmuser context setting /var/run/muser/foo/0 as the UUID.

Run QEMU as follows:

    LD_PRELOAD=muser/build/dbg/libvfio/libvfio.so qemu-system-x86_64 \
	... \
	-device vfio-pci,sysfsdev=/var/run/muser/foo/0
	-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=mem,share=yes,size=1073741824 -numa node,nodeid=0,cpus=0,memdev=ram-node0

Guest RAM must be shared (share=yes) otherwise libmuser won't be able to do DMA
transfers from/to it. If you're not using QEMU then any memory that must be
accessed by libmuser must be allocate MAP_SHARED. Registering memory for DMA
that has not been allocated with MAP_SHARED is ignored and any attempts to
access that memory will result in an error.

Example
=======

samples/gpio-pci-idio-16.c implements a tiny part of the PCI-IDIO-16 GPIO
(https://www.accesio.com/?p=/pci/pci_idio_16.html). In this sample it's a simple
device that toggles the input every 3 times it's read.

Running gpio-pci-idio-16
------------------------

1. First, follow the instructions to build and load muser.
2. Then, start the gpio-pci-idio-16 device emulation:

	# build/dbg/samples/gpio-pci-idio-16 -s /var/run/muser/foo/0

3. Finally, start the VM adding the command line explained earlier and then
execute:

	# insmod gpio-pci-idio-16.ko
	# cat /sys/class/gpio/gpiochip480/base > /sys/class/gpio/export
	# for ((i=0;i<12;i++)); do cat /sys/class/gpio/OUT0/value; done
	0
	0
	0
	1
	1
	1
	0
	0
	0
	1
	1
	1

Future Work
===========

See official fork for more details.

Troubleshooting
---------------

It's easy to mess things up as this is a PoC. libvfio stores logs under
`/tmp/libvfio`. When things fail it's usually because the directory hasn't been
correctly set up or cleaned up from the previous run, use `strace` and check
which syscalls fail and why.

To debug accesses to your PCI device from QEMU add the following to the QEMU
command line:

    -trace enable=vfio*,file=qemu-vfio.trace