diff options
Diffstat (limited to 'docs/specs')
-rw-r--r-- | docs/specs/acpi_hest_ghes.rst | 6 | ||||
-rw-r--r-- | docs/specs/aspeed-intc.rst | 136 | ||||
-rw-r--r-- | docs/specs/fw_cfg.rst | 4 | ||||
-rw-r--r-- | docs/specs/index.rst | 3 | ||||
-rw-r--r-- | docs/specs/pci-ids.rst | 2 | ||||
-rw-r--r-- | docs/specs/rapl-msr.rst | 25 | ||||
-rw-r--r-- | docs/specs/riscv-aia.rst | 83 | ||||
-rw-r--r-- | docs/specs/riscv-iommu.rst | 116 | ||||
-rw-r--r-- | docs/specs/tpm.rst | 8 |
9 files changed, 362 insertions, 21 deletions
diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst index 68f1fbe..c3e9f8d 100644 --- a/docs/specs/acpi_hest_ghes.rst +++ b/docs/specs/acpi_hest_ghes.rst @@ -67,8 +67,10 @@ Design Details (3) The address registers table contains N Error Block Address entries and N Read Ack Register entries. The size for each entry is 8-byte. The Error Status Data Block table contains N Error Status Data Block - entries. The size for each entry is 4096(0x1000) bytes. The total size - for the "etc/hardware_errors" fw_cfg blob is (N * 8 * 2 + N * 4096) bytes. + entries. The size for each entry is defined at the source code as + ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size + for the "etc/hardware_errors" fw_cfg blob is + (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes. N is the number of the kinds of hardware error sources. (4) QEMU generates the ACPI linker/loader script for the firmware. The diff --git a/docs/specs/aspeed-intc.rst b/docs/specs/aspeed-intc.rst new file mode 100644 index 0000000..9cefd7f --- /dev/null +++ b/docs/specs/aspeed-intc.rst @@ -0,0 +1,136 @@ +=========================== +ASPEED Interrupt Controller +=========================== + +AST2700 +------- +There are a total of 480 interrupt sources in AST2700. Due to the limitation of +interrupt numbers of processors, the interrupts are merged every 32 sources for +interrupt numbers greater than 127. + +There are two levels of interrupt controllers, INTC (CPU Die) and INTCIO +(I/O Die). + +Interrupt Mapping +----------------- +- INTC: Handles interrupt sources 0 - 127 and integrates signals from INTCIO. +- INTCIO: Handles interrupt sources 128 - 319 independently. + +QEMU Support +------------ +Currently, only GIC 192 to 201 are supported, and their source interrupts are +from INTCIO and connected to INTC at input pin 0 and output pins 0 to 9 for +GIC 192-201. + +Design for GICINT 196 +--------------------- +The orgate has interrupt sources ranging from 0 to 31, with its output pin +connected to INTCIO "T0 GICINT_196". The output pin is then connected to INTC +"GIC_192_201" at bit 4, and its bit 4 output pin is connected to GIC 196. + +INTC GIC_192_201 Output Pin Mapping +----------------------------------- +The design of INTC GIC_192_201 have 10 output pins, mapped as following: + +==== ==== +Bit GIC +==== ==== +0 192 +1 193 +2 194 +3 195 +4 196 +5 197 +6 198 +7 199 +8 200 +9 201 +==== ==== + +AST2700 A0 +---------- +It has only one INTC controller, and currently, only GIC 128-136 is supported. +To support both AST2700 A1 and AST2700 A0, there are 10 OR gates in the INTC, +with gates 1 to 9 supporting GIC 128-136. + +Design for GICINT 132 +--------------------- +The orgate has interrupt sources ranging from 0 to 31, with its output pin +connected to INTC. The output pin is then connected to GIC 132. + +Block Diagram of GICINT 196 for AST2700 A1 and GICINT 132 for AST2700 A0 +------------------------------------------------------------------------ + +.. code-block:: + + |-------------------------------------------------------------------------------------------------------| + | AST2700 A1 Design | + | To GICINT196 | + | | + | ETH1 |-----------| |--------------------------| |--------------| | + | -------->|0 | | INTCIO | | orgates[0] | | + | ETH2 | 4| orgates[0]------>|inpin[0]-------->outpin[0]|------->| 0 | | + | -------->|1 5| orgates[1]------>|inpin[1]-------->outpin[1]|------->| 1 | | + | ETH3 | 6| orgates[2]------>|inpin[2]-------->outpin[2]|------->| 2 | | + | -------->|2 19| orgates[3]------>|inpin[3]-------->outpin[3]|------->| 3 OR[0:9] |-----| | + | UART0 | 20|-->orgates[4]------>|inpin[4]-------->outpin[4]|------->| 4 | | | + | -------->|7 21| orgates[5]------>|inpin[5]-------->outpin[5]|------->| 5 | | | + | UART1 | 22| orgates[6]------>|inpin[6]-------->outpin[6]|------->| 6 | | | + | -------->|8 23| orgates[7]------>|inpin[7]-------->outpin[7]|------->| 7 | | | + | UART2 | 24| orgates[8]------>|inpin[8]-------->outpin[8]|------->| 8 | | | + | -------->|9 25| orgates[9]------>|inpin[9]-------->outpin[9]|------->| 9 | | | + | UART3 | 26| |--------------------------| |--------------| | | + | ---------|10 27| | | + | UART5 | 28| | | + | -------->|11 29| | | + | UART6 | | | | + | -------->|12 30| |-----------------------------------------------------------------------| | + | UART7 | 31| | | + | -------->|13 | | | + | UART8 | OR[0:31] | | |------------------------------| |----------| | + | -------->|14 | | | INTC | | GIC | | + | UART9 | | | |inpin[0:0]--------->outpin[0] |---------->|192 | | + | -------->|15 | | |inpin[0:1]--------->outpin[1] |---------->|193 | | + | UART10 | | | |inpin[0:2]--------->outpin[2] |---------->|194 | | + | -------->|16 | | |inpin[0:3]--------->outpin[3] |---------->|195 | | + | UART11 | | |--------------> |inpin[0:4]--------->outpin[4] |---------->|196 | | + | -------->|17 | |inpin[0:5]--------->outpin[5] |---------->|197 | | + | UART12 | | |inpin[0:6]--------->outpin[6] |---------->|198 | | + | -------->|18 | |inpin[0:7]--------->outpin[7] |---------->|199 | | + | |-----------| |inpin[0:8]--------->outpin[8] |---------->|200 | | + | |inpin[0:9]--------->outpin[9] |---------->|201 | | + |-------------------------------------------------------------------------------------------------------| + |-------------------------------------------------------------------------------------------------------| + | ETH1 |-----------| orgates[1]------->|inpin[1]----------->outpin[10]|---------->|128 | | + | -------->|0 | orgates[2]------->|inpin[2]----------->outpin[11]|---------->|129 | | + | ETH2 | 4| orgates[3]------->|inpin[3]----------->outpin[12]|---------->|130 | | + | -------->|1 5| orgates[4]------->|inpin[4]----------->outpin[13]|---------->|131 | | + | ETH3 | 6|---->orgates[5]------->|inpin[5]----------->outpin[14]|---------->|132 | | + | -------->|2 19| orgates[6]------->|inpin[6]----------->outpin[15]|---------->|133 | | + | UART0 | 20| orgates[7]------->|inpin[7]----------->outpin[16]|---------->|134 | | + | -------->|7 21| orgates[8]------->|inpin[8]----------->outpin[17]|---------->|135 | | + | UART1 | 22| orgates[9]------->|inpin[9]----------->outpin[18]|---------->|136 | | + | -------->|8 23| |------------------------------| |----------| | + | UART2 | 24| | + | -------->|9 25| AST2700 A0 Design | + | UART3 | 26| | + | -------->|10 27| | + | UART5 | 28| | + | -------->|11 29| GICINT132 | + | UART6 | | | + | -------->|12 30| | + | UART7 | 31| | + | -------->|13 | | + | UART8 | OR[0:31] | | + | -------->|14 | | + | UART9 | | | + | -------->|15 | | + | UART10 | | | + | -------->|16 | | + | UART11 | | | + | -------->|17 | | + | UART12 | | | + | -------->|18 | | + | |-----------| | + | | + |-------------------------------------------------------------------------------------------------------| diff --git a/docs/specs/fw_cfg.rst b/docs/specs/fw_cfg.rst index 5ad47a9..31ae315 100644 --- a/docs/specs/fw_cfg.rst +++ b/docs/specs/fw_cfg.rst @@ -54,11 +54,11 @@ Data Register ------------- * Read/Write (writes ignored as of QEMU v2.4, but see the DMA interface) -* Location: platform dependent (IOport [#]_ or MMIO) +* Location: platform dependent (IOport\ [#placement]_ or MMIO) * Width: 8-bit (if IOport), 8/16/32/64-bit (if MMIO) * Endianness: string-preserving -.. [#] +.. [#placement] On platforms where the data register is exposed as an IOport, its port number will always be one greater than the port number of the selector register. In other words, the two ports overlap, and can not diff --git a/docs/specs/index.rst b/docs/specs/index.rst index 6495ed5..f19d73c 100644 --- a/docs/specs/index.rst +++ b/docs/specs/index.rst @@ -36,3 +36,6 @@ guest hardware that is specific to QEMU. vmgenid rapl-msr rocker + riscv-iommu + riscv-aia + aspeed-intc diff --git a/docs/specs/pci-ids.rst b/docs/specs/pci-ids.rst index 328ab31..261b0f3 100644 --- a/docs/specs/pci-ids.rst +++ b/docs/specs/pci-ids.rst @@ -98,6 +98,8 @@ PCI devices (other than virtio): PCI ACPI ERST device (``-device acpi-erst``) 1b36:0013 PCI UFS device (``-device ufs``) +1b36:0014 + PCI RISC-V IOMMU device All these devices are documented in :doc:`index`. diff --git a/docs/specs/rapl-msr.rst b/docs/specs/rapl-msr.rst index 1202ee8..aaf0db9 100644 --- a/docs/specs/rapl-msr.rst +++ b/docs/specs/rapl-msr.rst @@ -9,11 +9,12 @@ The consumption is reported via MSRs (model specific registers) like MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are 64 bits registers that represent the accumulated energy consumption in micro Joules. -Thanks to the MSR Filtering patch [#a]_ not all MSRs are handled by KVM. Some -of them can now be handled by the userspace (QEMU). It uses a mechanism called -"MSR filtering" where a list of MSRs is given at init time of a VM to KVM so -that a callback is put in place. The design of this patch uses only this -mechanism for handling the MSRs between guest/host. +Thanks to KVM's `MSR filtering <msr-filter-patch_>`__ functionality, +not all MSRs are handled by KVM. Some of them can now be handled by the +userspace (QEMU); a list of MSRs is given at VM creation time to KVM, and +a userspace exit occurs when they are accessed. + +.. _msr-filter-patch: https://patchwork.kernel.org/project/kvm/patch/20200916202951.23760-7-graf@amazon.com/ At the moment the following MSRs are involved: @@ -92,9 +93,12 @@ found by the sysconf system call. A typical value of clock ticks per second is package has 4 cores, 400 ticks maximum can be scheduled on all the cores of the package for a period of 1 second. -The /proc/[pid]/stat [#b]_ is a sysfs file that can give the executed time of a -process with the [pid] as the process ID. It gives the amount of ticks the -process has been scheduled in userspace (utime) and kernel space (stime). +`/proc/[pid]/stat <stat_>`__ is a procfs file that can give the executed +time of a process with the [pid] as the process ID. It gives the amount +of ticks the process has been scheduled in userspace (utime) and kernel +space (stime). + +.. _stat: https://man7.org/linux/man-pages/man5/proc.5.html By reading those metrics for a thread, one can calculate the ratio of time the package has spent executing the thread. @@ -148,8 +152,3 @@ Current Limitations - Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at the moment. -References ----------- - -.. [#a] https://patchwork.kernel.org/project/kvm/patch/20200916202951.23760-7-graf@amazon.com/ -.. [#b] https://man7.org/linux/man-pages/man5/proc.5.html diff --git a/docs/specs/riscv-aia.rst b/docs/specs/riscv-aia.rst new file mode 100644 index 0000000..8097e2f --- /dev/null +++ b/docs/specs/riscv-aia.rst @@ -0,0 +1,83 @@ +.. _riscv-aia: + +RISC-V AIA support for RISC-V machines +====================================== + +AIA (Advanced Interrupt Architecture) support is implemented in the ``virt`` +RISC-V machine for TCG and KVM accelerators. + +The support consists of two main modes: + +- "aia=aplic": adds one or more APLIC (Advanced Platform Level Interrupt Controller) + devices +- "aia=aplic-imsic": adds one or more APLIC device and an IMSIC (Incoming MSI + Controller) device for each CPU + +From an user standpoint, these modes will behave the same regardless of the accelerator +used. From a developer standpoint the accelerator settings will change what it being +emulated in userspace versus what is being emulated by an in-kernel irqchip. + +When running TCG, all controllers are emulated in userspace, including machine mode +(m-mode) APLIC and IMSIC (when applicable). + +When running KVM: + +- no m-mode is provided, so there is no m-mode APLIC or IMSIC emulation regardless of + the AIA mode chosen +- with "aia=aplic", s-mode APLIC will be emulated by userspace +- with "aia=aplic-imsic" there are two possibilities. If no additional KVM option + is provided there will be no APLIC or IMSIC emulation in userspace, and the virtual + machine will use the provided in-kernel APLIC and IMSIC controllers. If the user + chooses to use the irqchip in split mode via "-accel kvm,kernel-irqchip=split", + s-mode APLIC will be emulated while using the s-mode IMSIC from the irqchip + +The following table summarizes how the AIA and accelerator options defines what +we will emulate in userspace: + + +.. list-table:: How AIA and accel options changes controller emulation + :widths: 25 25 25 25 25 25 25 + :header-rows: 1 + + * - Accel + - Accel props + - AIA type + - APLIC m-mode + - IMSIC m-mode + - APLIC s-mode + - IMSIC s-mode + * - tcg + - --- + - aplic + - emul + - n/a + - emul + - n/a + * - tcg + - --- + - aplic-imsic + - emul + - emul + - emul + - emul + * - kvm + - --- + - aplic + - n/a + - n/a + - emul + - n/a + * - kvm + - none + - aplic-imsic + - n/a + - n/a + - in-kernel + - in-kernel + * - kvm + - irqchip=split + - aplic-imsic + - n/a + - n/a + - emul + - in-kernel diff --git a/docs/specs/riscv-iommu.rst b/docs/specs/riscv-iommu.rst new file mode 100644 index 0000000..991d376 --- /dev/null +++ b/docs/specs/riscv-iommu.rst @@ -0,0 +1,116 @@ +.. _riscv-iommu: + +RISC-V IOMMU support for RISC-V machines +======================================== + +QEMU implements a RISC-V IOMMU emulation based on the RISC-V IOMMU spec +version 1.0 `iommu1.0.0`_. + +The emulation includes a PCI reference device (riscv-iommu-pci) and a platform +bus device (riscv-iommu-sys) that QEMU RISC-V boards can use. The 'virt' +RISC-V machine is compatible with both devices. + +riscv-iommu-pci reference device +-------------------------------- + +This device implements the RISC-V IOMMU emulation as recommended by the section +"Integrating an IOMMU as a PCIe device" of `iommu1.0.0`_: a PCI device with base +class 08h, sub-class 06h and programming interface 00h. + +As a reference device it doesn't implement anything outside of the specification, +so it uses a generic default PCI ID given by QEMU: 1b36:0014. + +To include the device in the 'virt' machine: + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt -device riscv-iommu-pci,[optional_pci_opts] (...) + +This will add a RISC-V IOMMU PCI device in the board following any additional +PCI parameters (like PCI bus address). The behavior of the RISC-V IOMMU is +defined by the spec but its operation is OS dependent. + +As of this writing the existing Linux kernel support `linux-v8`_, not yet merged, +does not have support for features like VFIO passthrough. The IOMMU emulation +was tested using a public Ventana Micro Systems kernel repository in +`ventana-linux`_. This kernel is based on `linux-v8`_ with additional patches that +enable features like KVM VFIO passthrough with irqbypass. Until the kernel support +is feature complete feel free to use the kernel available in the Ventana Micro Systems +mirror. + +The current Linux kernel support will use the IOMMU device to create IOMMU groups +with any eligible cards available in the system, regardless of factors such as the +order in which the devices are added in the command line. + +This means that these command lines are equivalent as far as the current +IOMMU kernel driver behaves: + +.. code-block:: bash + + $ qemu-system-riscv64 \ + -M virt,aia=aplic-imsic,aia-guests=5 \ + -device riscv-iommu-pci,addr=1.0,vendor-id=0x1efd,device-id=0xedf1 \ + -device e1000e,netdev=net1 -netdev user,id=net1,net=192.168.0.0/24 \ + -device e1000e,netdev=net2 -netdev user,id=net2,net=192.168.200.0/24 \ + (...) + + $ qemu-system-riscv64 \ + -M virt,aia=aplic-imsic,aia-guests=5 \ + -device e1000e,netdev=net1 -netdev user,id=net1,net=192.168.0.0/24 \ + -device e1000e,netdev=net2 -netdev user,id=net2,net=192.168.200.0/24 \ + -device riscv-iommu-pci,addr=1.0,vendor-id=0x1efd,device-id=0xedf1 \ + (...) + +Both will create iommu groups for the two e1000e cards. + +Another thing to notice on `linux-v8`_ and `ventana-linux`_ is that the kernel driver +considers an IOMMU identified as a Rivos device, i.e. it uses Rivos vendor ID. To +use the riscv-iommu-pci device with the existing kernel support we need to emulate +a Rivos PCI IOMMU by setting 'vendor-id' and 'device-id': + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt \ + -device riscv-iommu-pci,vendor-id=0x1efd,device-id=0xedf1 (...) + +Several options are available to control the capabilities of the device, namely: + +- "bus": the bus that the IOMMU device uses +- "ioatc-limit": size of the Address Translation Cache (default to 2Mb) +- "intremap": enable/disable MSI support +- "ats": enable ATS support +- "off" (Out-of-reset translation mode: 'on' for DMA disabled, 'off' for 'BARE' (passthrough)) +- "s-stage": enable s-stage support +- "g-stage": enable g-stage support +- "hpm-counters": number of hardware performance counters available. Maximum value is 31. + Default value is 31. Use 0 (zero) to disable HPM support + +riscv-iommu-sys device +---------------------- + +This device implements the RISC-V IOMMU emulation as a platform bus device that +RISC-V boards can use. + +For the 'virt' board the device is disabled by default. To enable it use the +'iommu-sys' machine option: + +.. code-block:: bash + + $ qemu-system-riscv64 -M virt,iommu-sys=on (...) + +There is no options to configure the capabilities of this device in the 'virt' +board using the QEMU command line. The device is configured with the following +riscv-iommu options: + +- "ioatc-limit": default value (2Mb) +- "intremap": enabled +- "ats": enabled +- "off": on (DMA disabled) +- "s-stage": enabled +- "g-stage": enabled + +.. _iommu1.0.0: https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0.0/riscv-iommu.pdf + +.. _linux-v8: https://lore.kernel.org/linux-riscv/cover.1718388908.git.tjeznach@rivosinc.com/ + +.. _ventana-linux: https://github.com/ventanamicro/linux/tree/dev-upstream diff --git a/docs/specs/tpm.rst b/docs/specs/tpm.rst index 1ad36ad..b630a35 100644 --- a/docs/specs/tpm.rst +++ b/docs/specs/tpm.rst @@ -205,8 +205,8 @@ to be used with the passthrough backend or the swtpm backend. QEMU files related to TPM backends: - ``backends/tpm.c`` - - ``include/sysemu/tpm.h`` - - ``include/sysemu/tpm_backend.h`` + - ``include/system/tpm.h`` + - ``include/system/tpm_backend.h`` The QEMU TPM passthrough device ------------------------------- @@ -240,7 +240,7 @@ PCRs. QEMU files related to the TPM passthrough device: - ``backends/tpm/tpm_passthrough.c`` - ``backends/tpm/tpm_util.c`` - - ``include/sysemu/tpm_util.h`` + - ``include/system/tpm_util.h`` Command line to start QEMU with the TPM passthrough device using the host's @@ -301,7 +301,7 @@ command. QEMU files related to the TPM emulator device: - ``backends/tpm/tpm_emulator.c`` - ``backends/tpm/tpm_util.c`` - - ``include/sysemu/tpm_util.h`` + - ``include/system/tpm_util.h`` The following commands start the swtpm with a UnixIO control channel over a socket interface. They do not need to be run as root. |