diff options
author | Peter Maydell <peter.maydell@linaro.org> | 2022-03-08 22:27:34 +0000 |
---|---|---|
committer | Peter Maydell <peter.maydell@linaro.org> | 2022-03-08 22:27:34 +0000 |
commit | 9f0369efb0f2a200f18b1aacd2ef493e22da5351 (patch) | |
tree | 8243df5bf223f9b5f57d08429bdd6873b22394e8 /docs | |
parent | 2ad76249000dc35f0a588bd55bd9264f567b4abc (diff) | |
parent | 128e050d41794e61e5849c6c507160da5556ea61 (diff) | |
download | qemu-9f0369efb0f2a200f18b1aacd2ef493e22da5351.zip qemu-9f0369efb0f2a200f18b1aacd2ef493e22da5351.tar.gz qemu-9f0369efb0f2a200f18b1aacd2ef493e22da5351.tar.bz2 |
Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into staging
virtio,pc,pci: features, cleanups, fixes
vhost-user enabled on non-linux systems
beginning of nvme sriov support
bigger tx queue for vdpa
virtio iommu bypass
FADT flag to detect legacy keyboards
Fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
# gpg: Signature made Mon 07 Mar 2022 22:43:31 GMT
# gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469
# gpg: issuer "mst@redhat.com"
# gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full]
# gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full]
# Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67
# Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469
* remotes/mst/tags/for_upstream: (47 commits)
hw/acpi/microvm: turn on 8042 bit in FADT boot architecture flags if present
tests/acpi: i386: update FACP table differences
hw/acpi: add indication for i8042 in IA-PC boot flags of the FADT table
tests/acpi: i386: allow FACP acpi table changes
docs: vhost-user: add subsection for non-Linux platforms
configure, meson: allow enabling vhost-user on all POSIX systems
vhost: use wfd on functions setting vring call fd
event_notifier: add event_notifier_get_wfd()
pci: drop COMPAT_PROP_PCP for 2.0 machine types
hw/smbios: Add table 4 parameter, "processor-id"
x86: cleanup unused compat_apic_id_mode
vhost-vsock: detach the virqueue element in case of error
pc: add option to disable PS/2 mouse/keyboard
acpi: pcihp: pcie: set power on cap on parent slot
pci: expose TYPE_XIO3130_DOWNSTREAM name
pci: show id info when pci BDF conflict
hw/misc/pvpanic: Use standard headers instead
headers: Add pvpanic.h
pci-bridge/xio3130_downstream: Fix error handling
pci-bridge/xio3130_upstream: Fix error handling
...
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
# Conflicts:
# docs/specs/index.rst
Diffstat (limited to 'docs')
-rw-r--r-- | docs/about/deprecated.rst | 8 | ||||
-rw-r--r-- | docs/interop/vhost-user.rst | 20 | ||||
-rw-r--r-- | docs/pcie_sriov.txt | 115 | ||||
-rw-r--r-- | docs/specs/acpi_erst.rst | 200 | ||||
-rw-r--r-- | docs/specs/index.rst | 1 | ||||
-rw-r--r-- | docs/specs/pci-ids.txt | 1 |
6 files changed, 345 insertions, 0 deletions
diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst index 85773db..cf02ef6 100644 --- a/docs/about/deprecated.rst +++ b/docs/about/deprecated.rst @@ -324,6 +324,14 @@ machine is hardly emulated at all (e.g. neither the LCD nor the USB part had been implemented), so there is not much value added by this board. Use the ``ref405ep`` machine instead. +``pc-i440fx-1.4`` up to ``pc-i440fx-1.7`` (since 7.0) +''''''''''''''''''''''''''''''''''''''''''''''''''''' + +These old machine types are quite neglected nowadays and thus might have +various pitfalls with regards to live migration. Use a newer machine type +instead. + + Backend options --------------- diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst index edc3ad8..4dbc84f 100644 --- a/docs/interop/vhost-user.rst +++ b/docs/interop/vhost-user.rst @@ -38,6 +38,26 @@ conventions <backend_conventions>`. *Master* and *slave* can be either a client (i.e. connecting) or server (listening) in the socket communication. +Support for platforms other than Linux +-------------------------------------- + +While vhost-user was initially developed targeting Linux, nowadays it +is supported on any platform that provides the following features: + +- A way for requesting shared memory represented by a file descriptor + so it can be passed over a UNIX domain socket and then mapped by the + other process. + +- AF_UNIX sockets with SCM_RIGHTS, so QEMU and the other process can + exchange messages through it, including ancillary data when needed. + +- Either eventfd or pipe/pipe2. On platforms where eventfd is not + available, QEMU will automatically fall back to pipe2 or, as a last + resort, pipe. Each file descriptor will be used for receiving or + sending events by reading or writing (respectively) an 8-byte value + to the corresponding it. The 8-value itself has no meaning and + should not be interpreted. + Message Specification ===================== diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt new file mode 100644 index 0000000..f5e891e --- /dev/null +++ b/docs/pcie_sriov.txt @@ -0,0 +1,115 @@ +PCI SR/IOV EMULATION SUPPORT +============================ + +Description +=========== +SR/IOV (Single Root I/O Virtualization) is an optional extended capability +of a PCI Express device. It allows a single physical function (PF) to appear as multiple +virtual functions (VFs) for the main purpose of eliminating software +overhead in I/O from virtual machines. + +Qemu now implements the basic common functionality to enable an emulated device +to support SR/IOV. Yet no fully implemented devices exists in Qemu, but a +proof-of-concept hack of the Intel igb can be found here: + +git://github.com/knuto/qemu.git sriov_patches_v5 + +Implementation +============== +Implementing emulation of an SR/IOV capable device typically consists of +implementing support for two types of device classes; the "normal" physical device +(PF) and the virtual device (VF). From Qemu's perspective, the VFs are just +like other devices, except that some of their properties are derived from +the PF. + +A virtual function is different from a physical function in that the BAR +space for all VFs are defined by the BAR registers in the PFs SR/IOV +capability. All VFs have the same BARs and BAR sizes. + +Accesses to these virtual BARs then is computed as + + <VF BAR start> + <VF number> * <BAR sz> + <offset> + +From our emulation perspective this means that there is a separate call for +setting up a BAR for a VF. + +1) To enable SR/IOV support in the PF, it must be a PCI Express device so + you would need to add a PCI Express capability in the normal PCI + capability list. You might also want to add an ARI (Alternative + Routing-ID Interpretation) capability to indicate that your device + supports functions beyond it's "own" function space (0-7), + which is necessary to support more than 7 functions, or + if functions extends beyond offset 7 because they are placed at an + offset > 1 or have stride > 1. + + ... + #include "hw/pci/pcie.h" + #include "hw/pci/pcie_sriov.h" + + pci_your_pf_dev_realize( ... ) + { + ... + int ret = pcie_endpoint_cap_init(d, 0x70); + ... + pcie_ari_init(d, 0x100, 1); + ... + + /* Add and initialize the SR/IOV capability */ + pcie_sriov_pf_init(d, 0x200, "your_virtual_dev", + vf_devid, initial_vfs, total_vfs, + fun_offset, stride); + + /* Set up individual VF BARs (parameters as for normal BARs) */ + pcie_sriov_pf_init_vf_bar( ... ) + ... + } + + For cleanup, you simply call: + + pcie_sriov_pf_exit(device); + + which will delete all the virtual functions and associated resources. + +2) Similarly in the implementation of the virtual function, you need to + make it a PCI Express device and add a similar set of capabilities + except for the SR/IOV capability. Then you need to set up the VF BARs as + subregions of the PFs SR/IOV VF BARs by calling + pcie_sriov_vf_register_bar() instead of the normal pci_register_bar() call: + + pci_your_vf_dev_realize( ... ) + { + ... + int ret = pcie_endpoint_cap_init(d, 0x60); + ... + pcie_ari_init(d, 0x100, 1); + ... + memory_region_init(mr, ... ) + pcie_sriov_vf_register_bar(d, bar_nr, mr); + ... + } + +Testing on Linux guest +====================== +The easiest is if your device driver supports sysfs based SR/IOV +enabling. Support for this was added in kernel v.3.8, so not all drivers +support it yet. + +To enable 4 VFs for a device at 01:00.0: + + modprobe yourdriver + echo 4 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs + +You should now see 4 VFs with lspci. +To turn SR/IOV off again - the standard requires you to turn it off before you can enable +another VF count, and the emulation enforces this: + + echo 0 > /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs + +Older drivers typically provide a max_vfs module parameter +to enable it at load time: + + modprobe yourdriver max_vfs=4 + +To disable the VFs again then, you simply have to unload the driver: + + rmmod yourdriver diff --git a/docs/specs/acpi_erst.rst b/docs/specs/acpi_erst.rst new file mode 100644 index 0000000..a8a9d22 --- /dev/null +++ b/docs/specs/acpi_erst.rst @@ -0,0 +1,200 @@ +ACPI ERST DEVICE +================ + +The ACPI ERST device is utilized to support the ACPI Error Record +Serialization Table, ERST, functionality. This feature is designed for +storing error records in persistent storage for future reference +and/or debugging. + +The ACPI specification[1], in Chapter "ACPI Platform Error Interfaces +(APEI)", and specifically subsection "Error Serialization", outlines a +method for storing error records into persistent storage. + +The format of error records is described in the UEFI specification[2], +in Appendix N "Common Platform Error Record". + +While the ACPI specification allows for an NVRAM "mode" (see +GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES) where non-volatile RAM is +directly exposed for direct access by the OS/guest, this device +implements the non-NVRAM "mode". This non-NVRAM "mode" is what is +implemented by most BIOS (since flash memory requires programming +operations in order to update its contents). Furthermore, as of the +time of this writing, Linux only supports the non-NVRAM "mode". + + +Background/Motivation +--------------------- + +Linux uses the persistent storage filesystem, pstore, to record +information (eg. dmesg tail) upon panics and shutdowns. Pstore is +independent of, and runs before, kdump. In certain scenarios (ie. +hosts/guests with root filesystems on NFS/iSCSI where networking +software and/or hardware fails, and thus kdump fails), pstore may +contain information available for post-mortem debugging. + +Two common storage backends for the pstore filesystem are ACPI ERST +and UEFI. Most BIOS implement ACPI ERST. UEFI is not utilized in all +guests. With QEMU supporting ACPI ERST, it becomes a viable pstore +storage backend for virtual machines (as it is now for bare metal +machines). + +Enabling support for ACPI ERST facilitates a consistent method to +capture kernel panic information in a wide range of guests: from +resource-constrained microvms to very large guests, and in particular, +in direct-boot environments (which would lack UEFI run-time services). + +Note that Microsoft Windows also utilizes the ACPI ERST for certain +crash information, if available[3]. + + +Configuration|Usage +------------------- + +To use ACPI ERST, a memory-backend-file object and acpi-erst device +can be created, for example: + + qemu ... + -object memory-backend-file,id=erstnvram,mem-path=acpi-erst.backing,size=0x10000,share=on \ + -device acpi-erst,memdev=erstnvram + +For proper operation, the ACPI ERST device needs a memory-backend-file +object with the following parameters: + + - id: The id of the memory-backend-file object is used to associate + this memory with the acpi-erst device. + - size: The size of the ACPI ERST backing storage. This parameter is + required. + - mem-path: The location of the ACPI ERST backing storage file. This + parameter is also required. + - share: The share=on parameter is required so that updates to the + ERST backing store are written to the file. + +and ERST device: + + - memdev: Is the object id of the memory-backend-file. + - record_size: Specifies the size of the records (or slots) in the + backend storage. Must be a power of two value greater than or + equal to 4096 (PAGE_SIZE). + + +PCI Interface +------------- + +The ERST device is a PCI device with two BARs, one for accessing the +programming registers, and the other for accessing the record exchange +buffer. + +BAR0 contains the programming interface consisting of ACTION and VALUE +64-bit registers. All ERST actions/operations/side effects happen on +the write to the ACTION, by design. Any data needed by the action must +be placed into VALUE prior to writing ACTION. Reading the VALUE +simply returns the register contents, which can be updated by a +previous ACTION. + +BAR1 contains the 8KiB record exchange buffer, which is the +implemented maximum record size. + + +Backend Storage Format +---------------------- + +The backend storage is divided into fixed size "slots", 8KiB in +length, with each slot storing a single record. Not all slots need to +be occupied, and they need not be occupied in a contiguous fashion. +The ability to clear/erase specific records allows for the formation +of unoccupied slots. + +Slot 0 contains a backend storage header that identifies the contents +as ERST and also facilitates efficient access to the records. +Depending upon the size of the backend storage, additional slots will +be designated to be a part of the slot 0 header. For example, at 8KiB, +the slot 0 header can accomodate 1021 records. Thus a storage size +of 8MiB (8KiB * 1024) requires an additional slot for use by the +header. In this scenario, slot 0 and slot 1 form the backend storage +header, and records can be stored starting at slot 2. + +Below is an example layout of the backend storage format (for storage +size less than 8MiB). The size of the storage is a multiple of 8KiB, +and contains N number of slots to store records. The example below +shows two records (in CPER format) in the backend storage, while the +remaining slots are empty/available. + +:: + + Slot Record + <------------------ 8KiB --------------------> + +--------------------------------------------+ + 0 | storage header | + +--------------------------------------------+ + 1 | empty/available | + +--------------------------------------------+ + 2 | CPER | + +--------------------------------------------+ + 3 | CPER | + +--------------------------------------------+ + ... | | + +--------------------------------------------+ + N | empty/available | + +--------------------------------------------+ + +The storage header consists of some basic information and an array +of CPER record_id's to efficiently access records in the backend +storage. + +All fields in the header are stored in little endian format. + +:: + + +--------------------------------------------+ + | magic | 0x0000 + +--------------------------------------------+ + | record_offset | record_size | 0x0008 + +--------------------------------------------+ + | record_count | reserved | version | 0x0010 + +--------------------------------------------+ + | record_id[0] | 0x0018 + +--------------------------------------------+ + | record_id[1] | 0x0020 + +--------------------------------------------+ + | record_id[...] | + +--------------------------------------------+ + | record_id[N] | 0x1FF8 + +--------------------------------------------+ + +The 'magic' field contains the value 0x524F545354535245. + +The 'record_size' field contains the value 0x2000, 8KiB. + +The 'record_offset' field points to the first record_id in the array, +0x0018. + +The 'version' field contains 0x0100, the first version. + +The 'record_count' field contains the number of valid records in the +backend storage. + +The 'record_id' array fields are the 64-bit record identifiers of the +CPER record in the corresponding slot. Stated differently, the +location of a CPER record_id in the record_id[] array provides the +slot index for the corresponding record in the backend storage. + +Note that, for example, with a backend storage less than 8MiB, slot 0 +contains the header, so the record_id[0] will never contain a valid +CPER record_id. Instead slot 1 is the first available slot and thus +record_id_[1] may contain a CPER. + +A 'record_id' of all 0s or all 1s indicates an invalid record (ie. the +slot is available). + + +References +---------- + +[1] "Advanced Configuration and Power Interface Specification", + version 4.0, June 2009. + +[2] "Unified Extensible Firmware Interface Specification", + version 2.1, October 2008. + +[3] "Windows Hardware Error Architecture", specfically + "Error Record Persistence Mechanism". diff --git a/docs/specs/index.rst b/docs/specs/index.rst index 2a35700..e10684b 100644 --- a/docs/specs/index.rst +++ b/docs/specs/index.rst @@ -18,4 +18,5 @@ guest hardware that is specific to QEMU. acpi_mem_hotplug acpi_pci_hotplug acpi_nvdimm + acpi_erst sev-guest-firmware diff --git a/docs/specs/pci-ids.txt b/docs/specs/pci-ids.txt index 5e407a6..dd6859d 100644 --- a/docs/specs/pci-ids.txt +++ b/docs/specs/pci-ids.txt @@ -65,6 +65,7 @@ PCI devices (other than virtio): 1b36:000f mdpy (mdev sample device), linux/samples/vfio-mdev/mdpy.c 1b36:0010 PCIe NVMe device (-device nvme) 1b36:0011 PCI PVPanic device (-device pvpanic-pci) +1b36:0012 PCI ACPI ERST device (-device acpi-erst) All these devices are documented in docs/specs. |