aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--docs/specs/vhost-user.txt266
1 files changed, 266 insertions, 0 deletions
diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
new file mode 100644
index 0000000..0ea767e
--- /dev/null
+++ b/docs/specs/vhost-user.txt
@@ -0,0 +1,266 @@
+Vhost-user Protocol
+===================
+
+Copyright (c) 2014 Virtual Open Systems Sarl.
+
+This work is licensed under the terms of the GNU GPL, version 2 or later.
+See the COPYING file in the top-level directory.
+===================
+
+This protocol is aiming to complement the ioctl interface used to control the
+vhost implementation in the Linux kernel. It implements the control plane needed
+to establish virtqueue sharing with a user space process on the same host. It
+uses communication over a Unix domain socket to share file descriptors in the
+ancillary data of the message.
+
+The protocol defines 2 sides of the communication, master and slave. Master is
+the application that shares its virtqueues, in our case QEMU. Slave is the
+consumer of the virtqueues.
+
+In the current implementation QEMU is the Master, and the Slave is intended to
+be a software Ethernet switch running in user space, such as Snabbswitch.
+
+Master and slave can be either a client (i.e. connecting) or server (listening)
+in the socket communication.
+
+Message Specification
+---------------------
+
+Note that all numbers are in the machine native byte order. A vhost-user message
+consists of 3 header fields and a payload:
+
+------------------------------------
+| request | flags | size | payload |
+------------------------------------
+
+ * Request: 32-bit type of the request
+ * Flags: 32-bit bit field:
+ - Lower 2 bits are the version (currently 0x01)
+ - Bit 2 is the reply flag - needs to be sent on each reply from the slave
+ * Size - 32-bit size of the payload
+
+
+Depending on the request type, payload can be:
+
+ * A single 64-bit integer
+ -------
+ | u64 |
+ -------
+
+ u64: a 64-bit unsigned integer
+
+ * A vring state description
+ ---------------
+ | index | num |
+ ---------------
+
+ Index: a 32-bit index
+ Num: a 32-bit number
+
+ * A vring address description
+ --------------------------------------------------------------
+ | index | flags | size | descriptor | used | available | log |
+ --------------------------------------------------------------
+
+ Index: a 32-bit vring index
+ Flags: a 32-bit vring flags
+ Descriptor: a 64-bit user address of the vring descriptor table
+ Used: a 64-bit user address of the vring used ring
+ Available: a 64-bit user address of the vring available ring
+ Log: a 64-bit guest address for logging
+
+ * Memory regions description
+ ---------------------------------------------------
+ | num regions | padding | region0 | ... | region7 |
+ ---------------------------------------------------
+
+ Num regions: a 32-bit number of regions
+ Padding: 32-bit
+
+ A region is:
+ ---------------------------------------
+ | guest address | size | user address |
+ ---------------------------------------
+
+ Guest address: a 64-bit guest address of the region
+ Size: a 64-bit size
+ User address: a 64-bit user address
+
+
+In QEMU the vhost-user message is implemented with the following struct:
+
+typedef struct VhostUserMsg {
+ VhostUserRequest request;
+ uint32_t flags;
+ uint32_t size;
+ union {
+ uint64_t u64;
+ struct vhost_vring_state state;
+ struct vhost_vring_addr addr;
+ VhostUserMemory memory;
+ };
+} QEMU_PACKED VhostUserMsg;
+
+Communication
+-------------
+
+The protocol for vhost-user is based on the existing implementation of vhost
+for the Linux Kernel. Most messages that can be sent via the Unix domain socket
+implementing vhost-user have an equivalent ioctl to the kernel implementation.
+
+The communication consists of master sending message requests and slave sending
+message replies. Most of the requests don't require replies. Here is a list of
+the ones that do:
+
+ * VHOST_GET_FEATURES
+ * VHOST_GET_VRING_BASE
+
+There are several messages that the master sends with file descriptors passed
+in the ancillary data:
+
+ * VHOST_SET_MEM_TABLE
+ * VHOST_SET_LOG_FD
+ * VHOST_SET_VRING_KICK
+ * VHOST_SET_VRING_CALL
+ * VHOST_SET_VRING_ERR
+
+If Master is unable to send the full message or receives a wrong reply it will
+close the connection. An optional reconnection mechanism can be implemented.
+
+Message types
+-------------
+
+ * VHOST_USER_GET_FEATURES
+
+ Id: 2
+ Equivalent ioctl: VHOST_GET_FEATURES
+ Master payload: N/A
+ Slave payload: u64
+
+ Get from the underlying vhost implementation the features bitmask.
+
+ * VHOST_USER_SET_FEATURES
+
+ Id: 3
+ Ioctl: VHOST_SET_FEATURES
+ Master payload: u64
+
+ Enable features in the underlying vhost implementation using a bitmask.
+
+ * VHOST_USER_SET_OWNER
+
+ Id: 4
+ Equivalent ioctl: VHOST_SET_OWNER
+ Master payload: N/A
+
+ Issued when a new connection is established. It sets the current Master
+ as an owner of the session. This can be used on the Slave as a
+ "session start" flag.
+
+ * VHOST_USER_RESET_OWNER
+
+ Id: 5
+ Equivalent ioctl: VHOST_RESET_OWNER
+ Master payload: N/A
+
+ Issued when a new connection is about to be closed. The Master will no
+ longer own this connection (and will usually close it).
+
+ * VHOST_USER_SET_MEM_TABLE
+
+ Id: 6
+ Equivalent ioctl: VHOST_SET_MEM_TABLE
+ Master payload: memory regions description
+
+ Sets the memory map regions on the slave so it can translate the vring
+ addresses. In the ancillary data there is an array of file descriptors
+ for each memory mapped region. The size and ordering of the fds matches
+ the number and ordering of memory regions.
+
+ * VHOST_USER_SET_LOG_BASE
+
+ Id: 7
+ Equivalent ioctl: VHOST_SET_LOG_BASE
+ Master payload: u64
+
+ Sets the logging base address.
+
+ * VHOST_USER_SET_LOG_FD
+
+ Id: 8
+ Equivalent ioctl: VHOST_SET_LOG_FD
+ Master payload: N/A
+
+ Sets the logging file descriptor, which is passed as ancillary data.
+
+ * VHOST_USER_SET_VRING_NUM
+
+ Id: 9
+ Equivalent ioctl: VHOST_SET_VRING_NUM
+ Master payload: vring state description
+
+ Sets the number of vrings for this owner.
+
+ * VHOST_USER_SET_VRING_ADDR
+
+ Id: 10
+ Equivalent ioctl: VHOST_SET_VRING_ADDR
+ Master payload: vring address description
+ Slave payload: N/A
+
+ Sets the addresses of the different aspects of the vring.
+
+ * VHOST_USER_SET_VRING_BASE
+
+ Id: 11
+ Equivalent ioctl: VHOST_SET_VRING_BASE
+ Master payload: vring state description
+
+ Sets the base offset in the available vring.
+
+ * VHOST_USER_GET_VRING_BASE
+
+ Id: 12
+ Equivalent ioctl: VHOST_USER_GET_VRING_BASE
+ Master payload: vring state description
+ Slave payload: vring state description
+
+ Get the available vring base offset.
+
+ * VHOST_USER_SET_VRING_KICK
+
+ Id: 13
+ Equivalent ioctl: VHOST_SET_VRING_KICK
+ Master payload: u64
+
+ Set the event file descriptor for adding buffers to the vring. It
+ is passed in the ancillary data.
+ Bits (0-7) of the payload contain the vring index. Bit 8 is the
+ invalid FD flag. This flag is set when there is no file descriptor
+ in the ancillary data. This signals that polling should be used
+ instead of waiting for a kick.
+
+ * VHOST_USER_SET_VRING_CALL
+
+ Id: 14
+ Equivalent ioctl: VHOST_SET_VRING_CALL
+ Master payload: u64
+
+ Set the event file descriptor to signal when buffers are used. It
+ is passed in the ancillary data.
+ Bits (0-7) of the payload contain the vring index. Bit 8 is the
+ invalid FD flag. This flag is set when there is no file descriptor
+ in the ancillary data. This signals that polling will be used
+ instead of waiting for the call.
+
+ * VHOST_USER_SET_VRING_ERR
+
+ Id: 15
+ Equivalent ioctl: VHOST_SET_VRING_ERR
+ Master payload: u64
+
+ Set the event file descriptor to signal when error occurs. It
+ is passed in the ancillary data.
+ Bits (0-7) of the payload contain the vring index. Bit 8 is the
+ invalid FD flag. This flag is set when there is no file descriptor
+ in the ancillary data.