diff options
Diffstat (limited to 'docs/system')
47 files changed, 1538 insertions, 402 deletions
diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst index 97fd6a0..6317c0e 100644 --- a/docs/system/arm/aspeed.rst +++ b/docs/system/arm/aspeed.rst @@ -1,12 +1,11 @@ -Aspeed family boards (``ast2500-evb``, ``ast2600-evb``, ``ast2700-evb``, ``bletchley-bmc``, ``fuji-bmc``, ``fby35-bmc``, ``fp5280g2-bmc``, ``g220a-bmc``, ``palmetto-bmc``, ``qcom-dc-scm-v1-bmc``, ``qcom-firework-bmc``, ``quanta-q71l-bmc``, ``rainier-bmc``, ``romulus-bmc``, ``sonorapass-bmc``, ``supermicrox11-bmc``, ``supermicrox11spi-bmc``, ``tiogapass-bmc``, ``witherspoon-bmc``, ``yosemitev2-bmc``) -================================================================================================================================================================================================================================================================================================================================================================================================================== +Aspeed family boards (``ast2500-evb``, ``ast2600-evb``, ``ast2700-evb``, ``bletchley-bmc``, ``fuji-bmc``, ``gb200nvl-bmc``, ``fby35-bmc``, ``fp5280g2-bmc``, ``g220a-bmc``, ``palmetto-bmc``, ``qcom-dc-scm-v1-bmc``, ``qcom-firework-bmc``, ``quanta-q71l-bmc``, ``rainier-bmc``, ``romulus-bmc``, ``sonorapass-bmc``, ``supermicrox11-bmc``, ``supermicrox11spi-bmc``, ``tiogapass-bmc``, ``witherspoon-bmc``, ``yosemitev2-bmc``) +==================================================================================================================================================================================================================================================================================================================================================================================================================================== The QEMU Aspeed machines model BMCs of various OpenPOWER systems and Aspeed evaluation boards. They are based on different releases of the Aspeed SoC : the AST2400 integrating an ARM926EJ-S CPU (400MHz), the AST2500 with an ARM1176JZS CPU (800MHz), the AST2600 -with dual cores ARM Cortex-A7 CPUs (1.2GHz) and more recently the AST2700 -with quad cores ARM Cortex-A35 64 bits CPUs (1.6GHz) +with dual cores ARM Cortex-A7 CPUs (1.2GHz). The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C, etc. @@ -36,13 +35,10 @@ AST2600 SoC based machines : - ``fuji-bmc`` Facebook Fuji BMC - ``bletchley-bmc`` Facebook Bletchley BMC - ``fby35-bmc`` Facebook fby35 BMC +- ``gb200nvl-bmc`` Nvidia GB200nvl BMC - ``qcom-dc-scm-v1-bmc`` Qualcomm DC-SCM V1 BMC - ``qcom-firework-bmc`` Qualcomm Firework BMC -AST2700 SoC based machines : - -- ``ast2700-evb`` Aspeed AST2700 Evaluation board (Cortex-A35) - Supported devices ----------------- @@ -247,10 +243,109 @@ under Linux), use : -M ast2500-evb,bmc-console=uart3 +OTP Option +^^^^^^^^^^ + +Both the AST2600 and AST1030 chips use the same One Time Programmable +(OTP) memory module, which is utilized for configuration, key storage, +and storing user-programmable data. This OTP memory module is managed +by the Secure Boot Controller (SBC). The following options can be +specified or omitted based on your needs. + + * When the options are specified, the pre-generated configuration + file will be used as the OTP memory storage. + + * When the options are omitted, an internal memory buffer will be + used to store the OTP memory data. + +.. code-block:: bash + + -blockdev driver=file,filename=otpmem.img,node-name=otp \ + -global aspeed-otp.drive=otp \ + +The following bash command can be used to generate a default +configuration file for OTP memory: + +.. code-block:: bash + + if [ ! -f otpmem.img ]; then + for i in $(seq 1 2048); do + printf '\x00\x00\x00\x00\xff\xff\xff\xff' + done > otpmem.img + fi + +Aspeed 2700 family boards (``ast2700-evb``) +================================================================== + +The QEMU Aspeed machines model BMCs of Aspeed evaluation boards. +They are based on different releases of the Aspeed SoC : +the AST2700 with quad cores ARM Cortex-A35 64 bits CPUs (1.6GHz). + +The SoC comes with RAM, Gigabit ethernet, USB, SD/MMC, USB, SPI, I2C, +etc. + +AST2700 SoC based machines : + +- ``ast2700-evb`` Aspeed AST2700 Evaluation board (Cortex-A35) +- ``ast2700fc`` Aspeed AST2700 Evaluation board (Cortex-A35 + Cortex-M4) + +Supported devices +----------------- + * Interrupt Controller + * Timer Controller + * RTC Controller + * I2C Controller + * System Control Unit (SCU) + * SRAM mapping + * X-DMA Controller (basic interface) + * Static Memory Controller (SMC or FMC) - Only SPI Flash support + * SPI Memory Controller + * USB 2.0 Controller + * SD/MMC storage controllers + * SDRAM controller (dummy interface for basic settings and training) + * Watchdog Controller + * GPIO Controller (Master only) + * UART + * Ethernet controllers + * Front LEDs (PCA9552 on I2C bus) + * LPC Peripheral Controller (a subset of subdevices are supported) + * Hash/Crypto Engine (HACE) - Hash support only. TODO: Crypto + * ADC + * eMMC Boot Controller (dummy) + * PECI Controller (minimal) + * I3C Controller + * Internal Bridge Controller (SLI dummy) + +Missing devices +--------------- + * PWM and Fan Controller + * Slave GPIO Controller + * Super I/O Controller + * PCI-Express 1 Controller + * Graphic Display Controller + * MCTP Controller + * Mailbox Controller + * Virtual UART + * eSPI Controller + +Boot options +------------ + +Images can be downloaded from the ASPEED Forked OpenBMC GitHub release repository : + + https://github.com/AspeedTech-BMC/openbmc/releases + Booting the ast2700-evb machine ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Boot the AST2700 machine from the flash image, use an MTD drive : +Boot the AST2700 machine from the flash image. + +There are two supported methods for booting the AST2700 machine with a flash image: + +Manual boot using ``-device loader``: + +It causes all 4 CPU cores to start execution from address ``0x430000000``, which +corresponds to the BL31 image load address. .. code-block:: bash @@ -270,6 +365,89 @@ Boot the AST2700 machine from the flash image, use an MTD drive : -drive file=${IMGDIR}/image-bmc,format=raw,if=mtd \ -nographic +Boot using a virtual boot ROM (``-bios``): + +If users do not specify the ``-bios option``, QEMU will attempt to load the +default vbootrom image ``ast27x0_bootrom.bin`` from either the current working +directory or the ``pc-bios`` directory within the QEMU source tree. + +.. code-block:: bash + + $ qemu-system-aarch64 -M ast2700-evb \ + -drive file=image-bmc,format=raw,if=mtd \ + -nographic + +The ``-bios`` option allows users to specify a custom path for the vbootrom +image to be loaded during boot. This will load the vbootrom image from the +specified path in the ${HOME} directory. + +.. code-block:: bash + + -bios ${HOME}/ast27x0_bootrom.bin + +Booting the ast2700fc machine +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +AST2700 features four Cortex-A35 primary processors and two Cortex-M4 coprocessors. +**ast2700-evb** machine focuses on emulating the four Cortex-A35 primary processors, +**ast2700fc** machine extends **ast2700-evb** by adding support for the two Cortex-M4 coprocessors. + +Steps to boot the AST2700fc machine: + +1. Ensure you have the following AST2700A1 binaries available in a directory + + * u-boot-nodtb.bin + * u-boot.dtb + * bl31.bin + * optee/tee-raw.bin + * image-bmc + * zephyr-aspeed-ssp.elf (for SSP firmware, CPU 5) + * zephyr-aspeed-tsp.elf (for TSP firmware, CPU 6) + +2. Execute the following command to start ``ast2700fc`` machine: + +.. code-block:: bash + + IMGDIR=ast2700-default + UBOOT_SIZE=$(stat --format=%s -L ${IMGDIR}/u-boot-nodtb.bin) + + $ qemu-system-aarch64 -M ast2700fc \ + -device loader,force-raw=on,addr=0x400000000,file=${IMGDIR}/u-boot-nodtb.bin \ + -device loader,force-raw=on,addr=$((0x400000000 + ${UBOOT_SIZE})),file=${IMGDIR}/u-boot.dtb \ + -device loader,force-raw=on,addr=0x430000000,file=${IMGDIR}/bl31.bin \ + -device loader,force-raw=on,addr=0x430080000,file=${IMGDIR}/optee/tee-raw.bin \ + -device loader,cpu-num=0,addr=0x430000000 \ + -device loader,cpu-num=1,addr=0x430000000 \ + -device loader,cpu-num=2,addr=0x430000000 \ + -device loader,cpu-num=3,addr=0x430000000 \ + -drive file=${IMGDIR}/image-bmc,if=mtd,format=raw \ + -device loader,file=${IMGDIR}/zephyr-aspeed-ssp.elf,cpu-num=4 \ + -device loader,file=${IMGDIR}/zephyr-aspeed-tsp.elf,cpu-num=5 \ + -serial pty -serial pty -serial pty \ + -snapshot \ + -S -nographic + +After launching QEMU, serial devices will be automatically redirected. +Example output: + +.. code-block:: bash + + char device redirected to /dev/pts/55 (label serial0) + char device redirected to /dev/pts/56 (label serial1) + char device redirected to /dev/pts/57 (label serial2) + +- serial0: Console for the four Cortex-A35 primary processors. +- serial1 and serial2: Consoles for the two Cortex-M4 coprocessors. + +Use ``tio`` or another terminal emulator to connect to the consoles: + +.. code-block:: bash + + $ tio /dev/pts/55 + $ tio /dev/pts/56 + $ tio /dev/pts/57 + + Aspeed minibmc family boards (``ast1030-evb``) ================================================================== diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst index 78c2fd2..31a5878 100644 --- a/docs/system/arm/emulation.rst +++ b/docs/system/arm/emulation.rst @@ -21,15 +21,19 @@ the following architecture extensions: - FEAT_AdvSIMD (Advanced SIMD Extension) - FEAT_AES (AESD and AESE instructions) - FEAT_AFP (Alternate floating-point behavior) +- FEAT_AIE (Memory Attribute Index Enhancement) - FEAT_Armv9_Crypto (Armv9 Cryptographic Extension) - FEAT_ASID16 (16 bit ASID) +- FEAT_ATS1A (Address Translation operations that ignore stage 1 permissions) - FEAT_BBM at level 2 (Translation table break-before-make levels) - FEAT_BF16 (AArch64 BFloat16 instructions) - FEAT_BTI (Branch Target Identification) - FEAT_CCIDX (Extended cache index) +- FEAT_CHK (Check Feature Status) - FEAT_CMOW (Control for cache maintenance permission) - FEAT_CRC32 (CRC32 instructions) - FEAT_Crypto (Cryptographic Extension) +- FEAT_CSSC (Common Short Sequence Compression instructions) - FEAT_CSV2 (Cache speculation variant 2) - FEAT_CSV2_1p1 (Cache speculation variant 2, version 1.1) - FEAT_CSV2_1p2 (Cache speculation variant 2, version 1.2) @@ -70,6 +74,7 @@ the following architecture extensions: - FEAT_FRINTTS (Floating-point to integer instructions) - FEAT_FlagM (Flag manipulation instructions v2) - FEAT_FlagM2 (Enhancements to flag manipulation instructions) +- FEAT_GCS (Guarded Control Stack Extension) - FEAT_GTG (Guest translation granule size) - FEAT_HAFDBS (Hardware management of the access flag and dirty bit state) - FEAT_HBC (Hinted conditional branches) @@ -88,7 +93,11 @@ the following architecture extensions: - FEAT_LRCPC2 (Load-acquire RCpc instructions v2) - FEAT_LSE (Large System Extensions) - FEAT_LSE2 (Large System Extensions v2) +- FEAT_LSE128 (128-bit Atomics) - FEAT_LVA (Large Virtual Address space) +- FEAT_MEC (Memory Encryption Contexts) + + * This is a register-only implementation without encryption. - FEAT_MixedEnd (Mixed-endian support) - FEAT_MixedEndEL0 (Mixed-endian support at EL0) - FEAT_MOPS (Standardization of memory operations) @@ -117,10 +126,14 @@ the following architecture extensions: - FEAT_RASv1p1 (RAS Extension v1.1) - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions) - FEAT_RME (Realm Management Extension) (NB: support status in QEMU is experimental) +- FEAT_RME_GPC2 (RME Granule Protection Check 2 Extension) - FEAT_RNG (Random number generator) - FEAT_RPRES (Increased precision of FRECPE and FRSQRTE) +- FEAT_S1PIE (Stage 1 permission indirections) +- FEAT_S2PIE (Stage 2 permission indirections) - FEAT_S2FWB (Stage 2 forced Write-Back) - FEAT_SB (Speculation Barrier) +- FEAT_SCTLR2 (Extension to SCTLR_ELx) - FEAT_SEL2 (Secure EL2) - FEAT_SHA1 (SHA1 instructions) - FEAT_SHA256 (SHA256 instructions) @@ -129,19 +142,26 @@ the following architecture extensions: - FEAT_SM3 (Advanced SIMD SM3 instructions) - FEAT_SM4 (Advanced SIMD SM4 instructions) - FEAT_SME (Scalable Matrix Extension) +- FEAT_SME2 (Scalable Matrix Extension version 2) +- FEAT_SME2p1 (Scalable Matrix Extension version 2.1) +- FEAT_SME_B16B16 (Non-widening BFloat16 arithmetic for SME2) - FEAT_SME_FA64 (Full A64 instruction set in Streaming SVE mode) +- FEAT_SME_F16F16 (Non-widening half-precision FP16 arithmetic for SME2) - FEAT_SME_F64F64 (Double-precision floating-point outer product instructions) - FEAT_SME_I16I64 (16-bit to 64-bit integer widening outer product instructions) - FEAT_SVE (Scalable Vector Extension) - FEAT_SVE_AES (Scalable Vector AES instructions) +- FEAT_SVE_B16B16 (Non-widening BFloat16 arithmetic for SVE2) - FEAT_SVE_BitPerm (Scalable Vector Bit Permutes instructions) - FEAT_SVE_PMULL128 (Scalable Vector PMULL instructions) - FEAT_SVE_SHA3 (Scalable Vector SHA3 instructions) - FEAT_SVE_SM4 (Scalable Vector SM4 instructions) - FEAT_SVE2 (Scalable Vector Extension version 2) +- FEAT_SVE2p1 (Scalable Vector Extension version 2.1) - FEAT_SPECRES (Speculation restriction instructions) - FEAT_SSBS (Speculative Store Bypass Safe) - FEAT_SSBS2 (MRS and MSR instructions for SSBS version 2) +- FEAT_TCR2 (Support for TCR2_ELx) - FEAT_TGran16K (Support for 16KB memory translation granule size at stage 1) - FEAT_TGran4K (Support for 4KB memory translation granule size at stage 1) - FEAT_TGran64K (Support for 64KB memory translation granule size at stage 1) diff --git a/docs/system/arm/imx8mp-evk.rst b/docs/system/arm/imx8mp-evk.rst index 00527b0..75c8fbd 100644 --- a/docs/system/arm/imx8mp-evk.rst +++ b/docs/system/arm/imx8mp-evk.rst @@ -35,7 +35,7 @@ Direct Linux Kernel Boot Probably the easiest way to get started with a whole Linux system on the machine is to generate an image with Buildroot. Version 2024.11.1 is tested at the time -of writing and involves three steps. First run the following commands in the +of writing and involves two steps. First run the following commands in the toplevel directory of the Buildroot source tree: .. code-block:: bash @@ -50,14 +50,6 @@ it and resize the SD card image to a power of two: $ qemu-img resize sdcard.img 256M -Finally, the device tree needs to be patched with the following commands which -will remove the ``cpu-idle-states`` properties from CPU nodes: - -.. code-block:: bash - - $ dtc imx8mp-evk.dtb | sed '/cpu-idle-states/d' > imx8mp-evk-patched.dts - $ dtc imx8mp-evk-patched.dts -o imx8mp-evk-patched.dtb - Now that everything is prepared the machine can be started as follows: .. code-block:: bash @@ -65,6 +57,25 @@ Now that everything is prepared the machine can be started as follows: $ qemu-system-aarch64 -M imx8mp-evk -smp 4 -m 3G \ -display none -serial null -serial stdio \ -kernel Image \ - -dtb imx8mp-evk-patched.dtb \ + -dtb imx8mp-evk.dtb \ -append "root=/dev/mmcblk2p2" \ -drive file=sdcard.img,if=sd,bus=2,format=raw,id=mmcblk2 + + +KVM Acceleration +---------------- + +To enable hardware-assisted acceleration via KVM, append +``-accel kvm -cpu host`` to the command line. While this speeds up performance +significantly, be aware of the following limitations: + +* The ``imx8mp-evk`` machine is not included under the "virtualization use case" + of :doc:`QEMU's security policy </system/security>`. This means that you + should not trust that it can contain malicious guests, whether it is run + using TCG or KVM. If you don't trust your guests and you're relying on QEMU to + be the security boundary, you want to choose another machine such as ``virt``. +* Rather than Cortex-A53 CPUs, the same CPU type as the host's will be used. + This is a limitation of KVM and may not work with guests with a tight + dependency on Cortex-A53. +* No EL2 and EL3 exception levels are available which is also a KVM limitation. + Direct kernel boot should work but running U-Boot, TF-A, etc. won't succeed. diff --git a/docs/system/arm/max78000.rst b/docs/system/arm/max78000.rst new file mode 100644 index 0000000..3d95011 --- /dev/null +++ b/docs/system/arm/max78000.rst @@ -0,0 +1,37 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Analog Devices max78000 board (``max78000fthr``) +================================================ + +The max78000 is a Cortex-M4 based SOC with a RISC-V coprocessor. The RISC-V coprocessor is not supported. + +Supported devices +----------------- + + * Instruction Cache Controller + * UART + * Global Control Register + * True Random Number Generator + * AES + +Notable unsupported devices +--------------------------- + + * I2C + * CNN + * CRC + * SPI + +Boot options +------------ + +The max78000 can be started using the ``-kernel`` option to load a +firmware at address 0 as the ROM. As the ROM normally jumps to software loaded +from the internal flash at address 0x10000000, loading your program there is +generally advisable. If you don't have a copy of the ROM, the interrupt +vector table from user firmware will do. +Example: + +.. code-block:: bash + + $ qemu-system-arm -machine max78000fthr -kernel max78000.bin -device loader,file=max78000.bin,addr=0x10000000 diff --git a/docs/system/arm/virt.rst b/docs/system/arm/virt.rst index adf446c..e557077 100644 --- a/docs/system/arm/virt.rst +++ b/docs/system/arm/virt.rst @@ -31,12 +31,14 @@ Supported devices The virt board supports: - PCI/PCIe devices +- CXL Fixed memory windows, root bridges and devices. - Flash memory - Either one or two PL011 UARTs for the NonSecure World - An RTC - The fw_cfg device that allows a guest to obtain data from QEMU - A PL061 GPIO controller -- An optional SMMUv3 IOMMU +- An optional machine-wide SMMUv3 IOMMU +- User-creatable SMMUv3 devices (see below for example) - hotpluggable DIMMs - hotpluggable NVDIMMs - An MSI controller (GICv2M or ITS). GICv2M is selected by default along @@ -70,11 +72,11 @@ Supported guest CPU types: - ``cortex-a76`` (64-bit) - ``cortex-a710`` (64-bit) - ``a64fx`` (64-bit) -- ``host`` (with KVM only) +- ``host`` (with KVM and HVF only) - ``neoverse-n1`` (64-bit) - ``neoverse-v1`` (64-bit) - ``neoverse-n2`` (64-bit) -- ``max`` (same as ``host`` for KVM; best possible emulation with TCG) +- ``max`` (same as ``host`` for KVM and HVF; best possible emulation with TCG) Note that the default is ``cortex-a15``, so for an AArch64 guest you must specify a CPU type. @@ -175,7 +177,7 @@ iommu ``none`` Don't create an IOMMU (the default) ``smmuv3`` - Create an SMMUv3 + Create a machine-wide SMMUv3. default-bus-bypass-iommu Set ``on``/``off`` to enable/disable `bypass_iommu @@ -189,6 +191,14 @@ ras acpi Set ``on``/``off``/``auto`` to enable/disable ACPI. +cxl + Set ``on``/``off`` to enable/disable CXL. More details in + :doc:`../devices/cxl`. The default is off. + +cxl-fmw + Array of CXL fixed memory windows describing fixed address routing to + target CXL host bridges. See :doc:`../devices/cxl`. + dtb-randomness Set ``on``/``off`` to pass random seeds via the guest DTB rng-seed and kaslr-seed nodes (in both "/chosen" and @@ -210,6 +220,36 @@ x-oem-table-id Set string (up to 8 bytes) to override the default value of field OEM Table ID in ACPI table header. +SMMU configuration +"""""""""""""""""" + +Machine-wide SMMUv3 IOMMU + Setting the machine-specific option ``iommu=smmuv3`` causes QEMU to + create a single, machine-wide SMMUv3 instance that applies to all + devices in the PCIe topology. + + For information about selectively bypassing devices, refer to + ``docs/bypass-iommu.txt``. + +User-creatable SMMUv3 devices + You can use the ``-device arm-smmuv3`` option to create multiple + user-defined SMMUv3 devices, each associated with a separate PCIe + root complex. This is only permitted if the machine-wide SMMUv3 + (``iommu=smmuv3``) option is not used. Each ``arm-smmuv3`` device + uses the ``primary-bus`` sub-option to specify which PCIe root + complex it is associated with. + + This model is useful when you want to mirror a host configuration where + each NUMA node typically has its own SMMU, allowing the VM topology to + align more closely with the host’s hardware layout. + + Example:: + + -device arm-smmuv3,primary-bus=pcie.0,id=smmuv3.0 + ... + -device pxb-pcie,id=pcie.1,numa_node=1 + -device arm-smmuv3,primary-bus=pcie.1,id=smmuv3.1 + Linux guest kernel configuration """""""""""""""""""""""""""""""" diff --git a/docs/system/arm/xlnx-versal-virt.rst b/docs/system/arm/xlnx-versal-virt.rst index c5f35f2..640cc07 100644 --- a/docs/system/arm/xlnx-versal-virt.rst +++ b/docs/system/arm/xlnx-versal-virt.rst @@ -1,29 +1,37 @@ -Xilinx Versal Virt (``xlnx-versal-virt``) -========================================= +AMD Versal Virt (``amd-versal-virt``, ``amd-versal2-virt``) +=========================================================== -Xilinx Versal is a family of heterogeneous multi-core SoCs +AMD Versal is a family of heterogeneous multi-core SoCs (System on Chip) that combine traditional hardened CPUs and I/O peripherals in a Processing System (PS) with runtime programmable FPGA logic (PL) and an Artificial Intelligence Engine (AIE). +QEMU implements the following Versal SoCs variants: + +- Versal (the ``amd-versal-virt`` machine, the alias ``xlnx-versal-virt`` is + kept for backward compatibility) +- Versal Gen 2 (the ``amd-versal2-virt`` machine) + More details here: -https://www.xilinx.com/products/silicon-devices/acap/versal.html +https://www.amd.com/en/products/adaptive-socs-and-fpgas/versal.html The family of Versal SoCs share a single architecture but come in different parts with different speed grades, amounts of PL and other differences. -The Xilinx Versal Virt board in QEMU is a model of a virtual board +The AMD Versal Virt board in QEMU is a model of a virtual board (does not exist in reality) with a virtual Versal SoC without I/O limitations. Currently, we support the following cores and devices: +Versal +"""""" Implemented CPU cores: -- 2 ACPUs (ARM Cortex-A72) +- 2 ACPUs (ARM Cortex-A72) with their GICv3 and ITS +- 2 RCPUs (ARM Cortex-R5F) with their GICv2 Implemented devices: -- Interrupt controller (ARM GICv3) - 2 UARTs (ARM PL011) - An RTC (Versal built-in) - 2 GEMs (Cadence MACB Ethernet MACs) @@ -35,6 +43,31 @@ Implemented devices: - BBRAM (36 bytes of Battery-backed RAM) - eFUSE (3072 bytes of one-time field-programmable bit array) - 2 CANFDs +- USB controller +- OSPI controller +- TRNG controller + +Versal Gen 2 +"""""""""""" +Implemented CPU cores: + +- 8 ACPUs (ARM Cortex-A78AE) with their GICv3 and ITS +- 10 RCPUs (ARM Cortex-R52) with their GICv3 (one per cluster) + +Implemented devices: + +- 2 UARTs (ARM PL011) +- An RTC (Versal built-in) +- 3 GEMs (Cadence MACB Ethernet MACs) +- 8 ADMA (Xilinx zDMA) channels +- 2 SD Controllers +- OCM (256KB of On Chip Memory) +- DDR memory +- BBRAM (36 bytes of Battery-backed RAM) +- 2 CANFDs +- 2 USB controllers +- OSPI controller +- TRNG controller QEMU does not yet model any other devices, including the PL and the AI Engine. @@ -44,8 +77,8 @@ Other differences between the hardware and the QEMU model: ``-m`` argument. If a DTB is provided on the command line then QEMU will edit it to include suitable entries describing the Versal DDR memory ranges. -- QEMU provides 8 virtio-mmio virtio transports; these start at - address ``0xa0000000`` and have IRQs from 111 and upwards. +- QEMU provides 8 virtio-mmio virtio transports. They use reserved memory + regions and IRQ pins to avoid conflicts with real SoC peripherals. Running """"""" @@ -58,7 +91,13 @@ When loading an OS, QEMU generates a DTB and selects an appropriate address where it gets loaded. This DTB will be passed to the kernel in register x0. If there's no ``-kernel`` option, we generate a DTB and place it at 0x1000 -for boot-loaders or firmware to pick it up. +for boot-loaders or firmware to pick it up. To dump and observe the generated +DTB, one can use the ``dumpdtb`` machine option: + +.. code-block:: bash + + $ qemu-system-aarch64 -M amd-versal-virt,dumpdtb=example.dtb -m 2G + If users want to provide their own DTB, they can use the ``-dtb`` option. These DTBs will have their memory nodes modified to match QEMU's @@ -74,7 +113,7 @@ Direct Linux boot of a generic ARM64 upstream Linux kernel: .. code-block:: bash - $ qemu-system-aarch64 -M xlnx-versal-virt -m 2G \ + $ qemu-system-aarch64 -M amd-versal-virt -m 2G \ -serial mon:stdio -display none \ -kernel arch/arm64/boot/Image \ -nic user -nic user \ @@ -87,7 +126,7 @@ Direct Linux boot of PetaLinux 2019.2: .. code-block:: bash - $ qemu-system-aarch64 -M xlnx-versal-virt -m 2G \ + $ qemu-system-aarch64 -M amd-versal-virt -m 2G \ -serial mon:stdio -display none \ -kernel petalinux-v2019.2/Image \ -append "rdinit=/sbin/init console=ttyAMA0,115200n8 earlycon=pl011,mmio,0xFF000000,115200n8" \ @@ -100,7 +139,7 @@ version of ATF tries to configure the CCI which we don't model) and U-boot: .. code-block:: bash - $ qemu-system-aarch64 -M xlnx-versal-virt -m 2G \ + $ qemu-system-aarch64 -M amd-versal-virt -m 2G \ -serial stdio -display none \ -device loader,file=petalinux-v2018.3/bl31.elf,cpu-num=0 \ -device loader,file=petalinux-v2019.2/u-boot.elf \ @@ -125,7 +164,7 @@ Boot Linux as DOM0 on Xen via U-Boot: .. code-block:: bash - $ qemu-system-aarch64 -M xlnx-versal-virt -m 4G \ + $ qemu-system-aarch64 -M amd-versal-virt -m 4G \ -serial stdio -display none \ -device loader,file=petalinux-v2019.2/u-boot.elf,cpu-num=0 \ -device loader,addr=0x30000000,file=linux/2018-04-24/xen \ @@ -153,7 +192,7 @@ Boot Linux as Dom0 on Xen via ARM Trusted Firmware and U-Boot: .. code-block:: bash - $ qemu-system-aarch64 -M xlnx-versal-virt -m 4G \ + $ qemu-system-aarch64 -M amd-versal-virt -m 4G \ -serial stdio -display none \ -device loader,file=petalinux-v2018.3/bl31.elf,cpu-num=0 \ -device loader,file=petalinux-v2019.2/u-boot.elf \ @@ -201,6 +240,11 @@ To use a different index value, N, from default of 0, add: eFUSE File Backend """""""""""""""""" + +.. note:: + The eFUSE device is not implemented in the Versal Gen 2 QEMU model + yet. + eFUSE can have an optional file backend, which must be a seekable binary file with a size of 3072 bytes or larger. A file with all binary 0s is a 'blank'. @@ -227,7 +271,7 @@ To use a different index value, N, from default of 1, add: is highly recommended (albeit with usage complexity). Better yet, do not use actual product data when running guest image - on this Xilinx Versal Virt board. + on this AMD Versal Virt board. Using CANFDs for Versal Virt """""""""""""""""""""""""""" @@ -258,3 +302,7 @@ To connect CANFD0 and CANFD1 to host machine's CAN interface can0: -object can-bus,id=canbus -machine canbus0=canbus -machine canbus1=canbus -object can-host-socketcan,id=canhost0,if=can0,canbus=canbus + +.. note:: + Versal Gen 2 has 4 CAN controllers. ``canbus0`` to ``canbus3`` can + be specified on the command line. diff --git a/docs/system/confidential-guest-support.rst b/docs/system/confidential-guest-support.rst index 0c490db..66129fb 100644 --- a/docs/system/confidential-guest-support.rst +++ b/docs/system/confidential-guest-support.rst @@ -38,6 +38,7 @@ Supported mechanisms Currently supported confidential guest mechanisms are: * AMD Secure Encrypted Virtualization (SEV) (see :doc:`i386/amd-memory-encryption`) +* Intel Trust Domain Extension (TDX) (see :doc:`i386/tdx`) * POWER Protected Execution Facility (PEF) (see :ref:`power-papr-protected-execution-facility-pef`) * s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`) diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst index a1b0d79..9713255 100644 --- a/docs/system/device-emulation.rst +++ b/docs/system/device-emulation.rst @@ -82,21 +82,19 @@ Emulated Devices .. toctree:: :maxdepth: 1 + devices/virtio/index.rst + devices/can.rst + devices/canokey.rst devices/ccid.rst devices/cxl.rst - devices/ivshmem.rst + devices/emmc.rst + devices/igb.rst devices/ivshmem-flat.rst + devices/ivshmem.rst devices/keyboard.rst devices/net.rst devices/nvme.rst - devices/usb.rst - devices/vhost-user.rst - devices/virtio-gpu.rst - devices/virtio-pmem.rst - devices/virtio-snd.rst - devices/vhost-user-input.rst - devices/vhost-user-rng.rst - devices/canokey.rst devices/usb-u2f.rst - devices/igb.rst + devices/usb.rst + devices/vfio-user.rst diff --git a/docs/system/devices/cxl.rst b/docs/system/devices/cxl.rst index 882b036..ca15a0d 100644 --- a/docs/system/devices/cxl.rst +++ b/docs/system/devices/cxl.rst @@ -308,7 +308,7 @@ A very simple setup with just one directly attached CXL Type 3 Persistent Memory -object memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa.raw,size=256M \ -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ - -device cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \ + -device cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,sn=0x1 \ -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G A very simple setup with just one directly attached CXL Type 3 Volatile Memory device:: @@ -349,13 +349,13 @@ the CXL Type3 device directly attached (no switches).:: -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ -device pxb-cxl,bus_nr=222,bus=pcie.0,id=cxl.2 \ -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ - -device cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0 \ + -device cxl-type3,bus=root_port13,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem0,sn=0x1 \ -device cxl-rp,port=1,bus=cxl.1,id=root_port14,chassis=0,slot=3 \ - -device cxl-type3,bus=root_port14,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1 \ + -device cxl-type3,bus=root_port14,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem1,sn=0x2 \ -device cxl-rp,port=0,bus=cxl.2,id=root_port15,chassis=0,slot=5 \ - -device cxl-type3,bus=root_port15,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2 \ + -device cxl-type3,bus=root_port15,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem2,sn=0x3 \ -device cxl-rp,port=1,bus=cxl.2,id=root_port16,chassis=0,slot=6 \ - -device cxl-type3,bus=root_port16,persistent-memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3 \ + -device cxl-type3,bus=root_port16,persistent-memdev=cxl-mem4,lsa=cxl-lsa4,id=cxl-pmem3,sn=0x4 \ -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.targets.1=cxl.2,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=8k An example of 4 devices below a switch suitable for 1, 2 or 4 way interleave:: @@ -375,15 +375,26 @@ An example of 4 devices below a switch suitable for 1, 2 or 4 way interleave:: -device cxl-rp,port=1,bus=cxl.1,id=root_port1,chassis=0,slot=1 \ -device cxl-upstream,bus=root_port0,id=us0 \ -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \ - -device cxl-type3,bus=swport0,persistent-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \ + -device cxl-type3,bus=swport0,persistent-memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0,sn=0x1 \ -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \ - -device cxl-type3,bus=swport1,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \ + -device cxl-type3,bus=swport1,persistent-memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1,sn=0x2 \ -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \ - -device cxl-type3,bus=swport2,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \ + -device cxl-type3,bus=swport2,persistent-memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2,sn=0x3 \ -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \ - -device cxl-type3,bus=swport3,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \ + -device cxl-type3,bus=swport3,persistent-memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3,sn=0x4 \ -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k +A simple arm/virt example featuring a single direct connected CXL Type 3 +Volatile Memory device:: + + qemu-system-aarch64 -M virt,gic-version=3,cxl=on -m 4g,maxmem=8g,slots=4 -cpu max -smp 4 \ + ... + -object memory-backend-ram,id=vmem0,share=on,size=256M \ + -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ + -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ + -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0 \ + -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G + Deprecations ------------ diff --git a/docs/system/devices/emmc.rst b/docs/system/devices/emmc.rst new file mode 100644 index 0000000..e62adfd --- /dev/null +++ b/docs/system/devices/emmc.rst @@ -0,0 +1,55 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +============== +eMMC Emulation +============== + +Besides SD card emulation, QEMU also offers an eMMC model as found on many +embedded boards. An eMMC, just like an SD card, is connected to the machine +via an SDHCI controller. + +Create eMMC Images +================== + +A recent eMMC consists of 4 partitions: 2 boot partitions, 1 Replay protected +Memory Block (RPMB), and the user data area. QEMU expects backing images for +the eMMC to contain those partitions concatenated in exactly that order. +However, the boot partitions as well as the RPMB might be absent if their sizes +are configured to zero. + +The eMMC specification defines alignment constraints for the partitions. The +two boot partitions must be of the same size. Furthermore, boot and RPMB +partitions must be multiples of 128 KB with a maximum of 32640 KB for each +boot partition and 16384K for the RPMB partition. + +The alignment constrain of the user data area depends on its size. Up to 2 +GByte, the size must be a power of 2. From 2 GByte onward, the size has to be +multiples of 512 byte. + +QEMU is enforcing those alignment rules before instantiating the device. +Therefore, the provided image has to strictly follow them as well. The helper +script ``scripts/mkemmc.sh`` can be used to create compliant images, with or +without pre-filled partitions. E.g., to create an eMMC image from a firmware +image and an OS image with an empty 2 MByte RPMB, use the following command: + +.. code-block:: console + + scripts/mkemmc.sh -b firmware.img -r /dev/zero:2M os.img emmc.img + +This will take care of rounding up the partition sizes to the next valid value +and will leave the RPMB and the second boot partition empty (zeroed). + +Adding eMMC Devices +=================== + +An eMMC is either automatically created by a machine model (e.g. Aspeed boards) +or can be user-created when using a PCI-attached SDHCI controller. To +instantiate the eMMC image from the example above in a machine without other +SDHCI controllers while assuming that the firmware needs a boot partitions of +1 MB, use the following options: + +.. code-block:: console + + -drive file=emmc.img,if=none,format=raw,id=emmc-img + -device sdhci-pci + -device emmc,drive=emmc-img,boot-partition-size=1048576,rpmb-partition-size=2097152 diff --git a/docs/system/devices/igb.rst b/docs/system/devices/igb.rst index 71f31cb..50f625f 100644 --- a/docs/system/devices/igb.rst +++ b/docs/system/devices/igb.rst @@ -54,7 +54,7 @@ directory: .. code-block:: shell - meson test qtest-x86_64/qos-test + pyvenv/bin/meson test qtest-x86_64/qos-test ethtool can test register accesses, interrupts, etc. It is automated as an functional test and can be run from the build directory with the following diff --git a/docs/system/devices/net.rst b/docs/system/devices/net.rst index 2ab516d..13199a4 100644 --- a/docs/system/devices/net.rst +++ b/docs/system/devices/net.rst @@ -21,11 +21,17 @@ configure it as if it was a real ethernet card. Linux host ^^^^^^^^^^ -As an example, you can download the ``linux-test-xxx.tar.gz`` archive -and copy the script ``qemu-ifup`` in ``/etc`` and configure properly -``sudo`` so that the command ``ifconfig`` contained in ``qemu-ifup`` can -be executed as root. You must verify that your host kernel supports the -TAP network interfaces: the device ``/dev/net/tun`` must be present. +A distribution will generally provide specific helper scripts when it +packages QEMU. By default these are found at ``/etc/qemu-ifup`` and +``/etc/qemu-ifdown`` and are called appropriately when QEMU wants to +change the network state. + +If QEMU is being run as a non-privileged user you may need properly +configure ``sudo`` so that network commands in the scripts can be +executed as root. + +You must verify that your host kernel supports the TAP network +interfaces: the device ``/dev/net/tun`` must be present. See :ref:`sec_005finvocation` to have examples of command lines using the TAP network interfaces. @@ -73,10 +79,156 @@ those sockets. To allow ping for GID 100 (usually users group):: When using the built-in TFTP server, the router is also the TFTP server. -When using the ``'-netdev user,hostfwd=...'`` option, TCP or UDP +When using the ``'-netdev user,hostfwd=...'`` option, TCP, UDP or UNIX connections can be redirected from the host to the guest. It allows for example to redirect X11, telnet or SSH connections. +Using passt as the user mode network stack +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +passt_ can be used as a simple replacement for SLIRP (``-net user``). +passt doesn't require any capability or privilege. passt has +better performance than ``-net user``, full IPv6 support and better security +as it's a daemon that is not executed in QEMU context. + +passt_ can be used in the same way as the user backend (using ``-net passt``, +``-netdev passt`` or ``-nic passt``) or it can be launched manually and +connected to QEMU either by using a socket (``-netdev stream``) or by using +the vhost-user interface (``-netdev vhost-user``). + +Using ``-netdev stream`` or ``-netdev vhost-user`` will allow the user to +enable functionalities not available through the passt backend interface +(like migration). + +See `passt(1)`_ for more details on passt. + +.. _passt: https://passt.top/ +.. _passt(1): https://passt.top/builds/latest/web/passt.1.html + +To use the passt backend interface +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +There is no need to start the daemon as QEMU will do it for you. + +By default, passt will be started in the socket-based mode. + +.. parsed-literal:: + |qemu_system| [...OPTIONS...] -nic passt + + (qemu) info network + e1000e.0: index=0,type=nic,model=e1000e,macaddr=52:54:00:12:34:56 + \ #net071: index=0,type=passt,stream,connected to pid 24846 + +.. parsed-literal:: + |qemu_system| [...OPTIONS...] -net nic -net passt,tcp-ports=10001,udp-ports=10001 + + (qemu) info network + hub 0 + \ hub0port1: #net136: index=0,type=passt,stream,connected to pid 25204 + \ hub0port0: e1000e.0: index=0,type=nic,model=e1000e,macaddr=52:54:00:12:34:56 + +.. parsed-literal:: + |qemu_system| [...OPTIONS...] -netdev passt,id=netdev0 -device virtio-net,mac=9a:2b:2c:2d:2e:2f,id=virtio0,netdev=netdev0 + + (qemu) info network + virtio0: index=0,type=nic,model=virtio-net-pci,macaddr=9a:2b:2c:2d:2e:2f + \ netdev0: index=0,type=passt,stream,connected to pid 25428 + +To use the vhost-based interface, add the ``vhost-user=on`` parameter and +select the virtio-net device: + +.. parsed-literal:: + |qemu_system| [...OPTIONS...] -nic passt,model=virtio,vhost-user=on + + (qemu) info network + virtio-net-pci.0: index=0,type=nic,model=virtio-net-pci,macaddr=52:54:00:12:34:56 + \ #net006: index=0,type=passt,vhost-user,connected to pid 25731 + +To use socket based passt interface: +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Start passt as a daemon:: + + passt --socket ~/passt.socket + +If ``--socket`` is not provided, passt will print the path of the UNIX domain socket QEMU can connect to (``/tmp/passt_1.socket``, ``/tmp/passt_2.socket``, +...). Then you can connect your QEMU instance to passt: + +.. parsed-literal:: + |qemu_system| [...OPTIONS...] -device virtio-net-pci,netdev=netdev0 -netdev stream,id=netdev0,server=off,addr.type=unix,addr.path=~/passt.socket + +Where ``~/passt.socket`` is the UNIX socket created by passt to +communicate with QEMU. + +To use vhost-based interface: +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Start passt with ``--vhost-user``:: + + passt --vhost-user --socket ~/passt.socket + +Then to connect QEMU: + +.. parsed-literal:: + |qemu_system| [...OPTIONS...] -m $RAMSIZE -chardev socket,id=chr0,path=~/passt.socket -netdev vhost-user,id=netdev0,chardev=chr0 -device virtio-net,netdev=netdev0 -object memory-backend-memfd,id=memfd0,share=on,size=$RAMSIZE -numa node,memdev=memfd0 + +Where ``$RAMSIZE`` is the memory size of your VM ``-m`` and ``-object memory-backend-memfd,size=`` must match. + +Migration of passt: +^^^^^^^^^^^^^^^^^^^ + +When passt is connected to QEMU using the vhost-user interface it can +be migrated with QEMU and the network connections are not interrupted. + +As passt runs with no privileges, it relies on passt-repair to save and +load the TCP connections state, using the TCP_REPAIR socket option. +The passt-repair helper needs to have the CAP_NET_ADMIN capability, or run as root. If passt-repair is not available, TCP connections will not be preserved. + +Example of migration of a guest on the same host +________________________________________________ + +Before being able to run passt-repair, the CAP_NET_ADMIN capability must be set +on the file, run as root:: + + setcap cap_net_admin+eip ./passt-repair + +Start passt for the source side:: + + passt --vhost-user --socket ~/passt_src.socket --repair-path ~/passt-repair_src.socket + +Where ``~/passt-repair_src.socket`` is the UNIX socket created by passt to +communicate with passt-repair. The default value is the ``--socket`` path +appended with ``.repair``. + +Start passt-repair:: + + passt-repair ~/passt-repair_src.socket + +Start source side QEMU with a monitor to be able to send the migrate command: + +.. parsed-literal:: + |qemu_system| [...OPTIONS...] [...VHOST USER OPTIONS...] -monitor stdio + +Start passt for the destination side:: + + passt --vhost-user --socket ~/passt_dst.socket --repair-path ~/passt-repair_dst.socket + +Start passt-repair:: + + passt-repair ~/passt-repair_dst.socket + +Start QEMU with the ``-incoming`` parameter: + +.. parsed-literal:: + |qemu_system| [...OPTIONS...] [...VHOST USER OPTIONS...] -incoming tcp:localhost:4444 + +Then in the source guest monitor the migration can be started:: + + (qemu) migrate tcp:localhost:4444 + +A separate passt-repair instance must be started for every migration. In the case of a failed migration, passt-repair also needs to be restarted before trying +again. + Hubs ~~~~ diff --git a/docs/system/devices/vfio-user.rst b/docs/system/devices/vfio-user.rst new file mode 100644 index 0000000..e10a6d0 --- /dev/null +++ b/docs/system/devices/vfio-user.rst @@ -0,0 +1,26 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +========= +vfio-user +========= + +QEMU includes a ``vfio-user`` client. The ``vfio-user`` specification allows for +implementing (PCI) devices in userspace outside of QEMU; it is similar to +``vhost-user`` in this respect (see :doc:`virtio/vhost-user`), but can emulate arbitrary +PCI devices, not just ``virtio``. Whereas ``vfio`` is handled by the host +kernel, ``vfio-user``, while similar in implementation, is handled entirely in +userspace. + +For example, SPDK includes a virtual PCI NVMe controller implementation; by +setting up a ``vfio-user`` UNIX socket between QEMU and SPDK, a VM can send NVMe +I/O to the SPDK process. + +Presuming a suitable ``vfio-user`` server has opened a socket at +``/tmp/vfio-user.sock``, a device can be configured with for example: + +.. code-block:: console + + --device '{"driver": "vfio-user-pci","socket": {"path": "/tmp/vfio-user.sock", "type": "unix"}}' + +See `libvfio-user <https://github.com/nutanix/libvfio-user/>`_ for further +information. diff --git a/docs/system/devices/vhost-user-input.rst b/docs/system/devices/vhost-user-input.rst deleted file mode 100644 index 118eb78..0000000 --- a/docs/system/devices/vhost-user-input.rst +++ /dev/null @@ -1,45 +0,0 @@ -.. _vhost_user_input: - -QEMU vhost-user-input - Input emulation -======================================= - -This document describes the setup and usage of the Virtio input device. -The Virtio input device is a paravirtualized device for input events. - -Description ------------ - -The vhost-user-input device implementation was designed to work with a daemon -polling on input devices and passes input events to the guest. - -QEMU provides a backend implementation in contrib/vhost-user-input. - -Linux kernel support --------------------- - -Virtio input requires a guest Linux kernel built with the -``CONFIG_VIRTIO_INPUT`` option. - -Examples --------- - -The backend daemon should be started first: - -:: - - host# vhost-user-input --socket-path=input.sock \ - --evdev-path=/dev/input/event17 - -The QEMU invocation needs to create a chardev socket to communicate with the -backend daemon and access the VirtIO queues with the guest over the -:ref:`shared memory <shared_memory_object>`. - -:: - - host# qemu-system \ - -chardev socket,path=/tmp/input.sock,id=mouse0 \ - -device vhost-user-input-pci,chardev=mouse0 \ - -m 4096 \ - -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \ - -numa node,memdev=mem \ - ... diff --git a/docs/system/devices/vhost-user-rng.rst b/docs/system/devices/vhost-user-rng.rst deleted file mode 100644 index ead1405..0000000 --- a/docs/system/devices/vhost-user-rng.rst +++ /dev/null @@ -1,41 +0,0 @@ -.. _vhost_user_rng: - -QEMU vhost-user-rng - RNG emulation -=================================== - -Background ----------- - -What follows builds on the material presented in vhost-user.rst - it should -be reviewed before moving forward with the content in this file. - -Description ------------ - -The vhost-user-rng device implementation was designed to work with a random -number generator daemon such as the one found in the vhost-device crate of -the rust-vmm project available on github [1]. - -[1]. https://github.com/rust-vmm/vhost-device - -Examples --------- - -The daemon should be started first: - -:: - - host# vhost-device-rng --socket-path=rng.sock -c 1 -m 512 -p 1000 - -The QEMU invocation needs to create a chardev socket the device can -use to communicate as well as share the guests memory over a memfd. - -:: - - host# qemu-system \ - -chardev socket,path=$(PATH)/rng.sock,id=rng0 \ - -device vhost-user-rng-pci,chardev=rng0 \ - -m 4096 \ - -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \ - -numa node,memdev=mem \ - ... diff --git a/docs/system/devices/virtio/index.rst b/docs/system/devices/virtio/index.rst new file mode 100644 index 0000000..c292101 --- /dev/null +++ b/docs/system/devices/virtio/index.rst @@ -0,0 +1,29 @@ +VirtIO Devices +============== + +VirtIO devices are paravirtualized devices designed to be efficient to +emulate and virtualize. Unless you are specifically trying to exercise +a driver for some particular hardware they are the recommended device +models to use for virtual machines. + +The `VirtIO specification`_ is an open standard managed by OASIS. It +describes how a *driver* in a guest operating system interacts with +the *device* model provided by QEMU. Multiple Operating Systems +support drivers for VirtIO with Linux perhaps having the widest range +of device types supported. + +The device implementation can either be provided wholly by QEMU, or in +concert with the kernel (known as *vhost*). The device implementation +can also be off-loaded to an external process via :ref:`vhost user +<vhost_user>`. + +.. toctree:: + :maxdepth: 1 + + virtio-gpu.rst + virtio-pmem.rst + virtio-snd.rst + vhost-user.rst + vhost-user-contrib.rst + +.. _VirtIO specification: https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html diff --git a/docs/system/devices/virtio/vhost-user-contrib.rst b/docs/system/devices/virtio/vhost-user-contrib.rst new file mode 100644 index 0000000..48d04d2 --- /dev/null +++ b/docs/system/devices/virtio/vhost-user-contrib.rst @@ -0,0 +1,87 @@ +vhost-user daemons in contrib +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +QEMU provides a number of :ref:`vhost_user` daemons in the contrib +directory. They were often written when vhost-user was initially added +to the code base. You should also consider if other vhost-user daemons +such as those from the rust-vmm `vhost-device repository`_ are better +suited for production use. + +.. _vhost-device repository: https://github.com/rust-vmm/vhost-device + +.. _vhost_user_block: + +vhost-user-block - block device +=============================== + +vhost-user-block is a backend for exposing block devices. It can +present a flat file or block device as a simple block device to the +guest. You almost certainly want to use the :ref:`storage-daemon` +instead which supports a wide variety of storage modes and exports a +number of interfaces including vhost-user. + +.. _vhost_user_gpu: + +vhost-user-gpu - gpu device +=========================== + +vhost-user-gpu presents a paravirtualized GPU and display controller. +You probably want to use the internal :ref:`virtio_gpu` implementation +if you want the latest features. There is also a `vhost_device_gpu`_ +daemon as part of the rust-vmm project. + +.. _vhost_device_gpu: https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-gpu + +.. _vhost_user_input: + +vhost-user-input - Input emulation +================================== + +The Virtio input device is a paravirtualized device for input events. + +Description +----------- + +The vhost-user-input device implementation was designed to work with a daemon +polling on input devices and passes input events to the guest. + +QEMU provides a backend implementation in contrib/vhost-user-input. + +Linux kernel support +-------------------- + +Virtio input requires a guest Linux kernel built with the +``CONFIG_VIRTIO_INPUT`` option. + +Examples +-------- + +The backend daemon should be started first: + +:: + + host# vhost-user-input --socket-path=input.sock \ + --evdev-path=/dev/input/event17 + +The QEMU invocation needs to create a chardev socket to communicate with the +backend daemon and access the VirtIO queues with the guest over the +:ref:`shared memory <shared_memory_object>`. + +:: + + host# qemu-system \ + -chardev socket,path=/tmp/input.sock,id=mouse0 \ + -device vhost-user-input-pci,chardev=mouse0 \ + -m 4096 \ + -object memory-backend-file,id=mem,size=4G,mem-path=/dev/shm,share=on \ + -numa node,memdev=mem \ + ... + + +.. _vhost_user_scsi: + +vhost-user-scsi - SCSI controller +================================= + +The vhost-user-scsi daemon can proxy iSCSI devices onto a virtualized +SCSI controller. diff --git a/docs/system/devices/vhost-user.rst b/docs/system/devices/virtio/vhost-user.rst index 35259d8..f556a84 100644 --- a/docs/system/devices/vhost-user.rst +++ b/docs/system/devices/virtio/vhost-user.rst @@ -27,61 +27,55 @@ platform details for what sort of virtio bus to use. - Notes * - vhost-user-blk - Block storage - - See contrib/vhost-user-blk + - :ref:`storage-daemon` * - vhost-user-fs - File based storage driver - - See https://gitlab.com/virtio-fs/virtiofsd + - `virtiofsd <https://gitlab.com/virtio-fs/virtiofsd>`_ * - vhost-user-gpio - Proxy gpio pins to host - - See https://github.com/rust-vmm/vhost-device + - `vhost-device-gpio <https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-gpio>`_ * - vhost-user-gpu - GPU driver - - See contrib/vhost-user-gpu + - `vhost-device-gpu <https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-gpu>`_ or :ref:`vhost_user_gpu` * - vhost-user-i2c - Proxy i2c devices to host - - See https://github.com/rust-vmm/vhost-device + - `vhost-device-i2c <https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-i2c>`_ * - vhost-user-input - Generic input driver - - :ref:`vhost_user_input` + - `vhost-device-input <https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-input>`_ or :ref:`vhost_user_input` * - vhost-user-rng - Entropy driver - - :ref:`vhost_user_rng` + - `vhost-device-rng <https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-rng>`_ * - vhost-user-scmi - System Control and Management Interface - - See https://github.com/rust-vmm/vhost-device + - `vhost-device-scmi <https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-scmi>`_ * - vhost-user-snd - Audio device - - See https://github.com/rust-vmm/vhost-device/staging + - `vhost-device-sound <https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-sound>`_ * - vhost-user-scsi - SCSI based storage - - See contrib/vhost-user-scsi + - :ref:`vhost_user_scsi` * - vhost-user-vsock - Socket based communication - - See https://github.com/rust-vmm/vhost-device + - `vhost-device-vsock <https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock>`_ The referenced *daemons* are not exhaustive, any conforming backend implementing the device and using the vhost-user protocol should work. -vhost-user-device -^^^^^^^^^^^^^^^^^ +vhost-user-test-device +^^^^^^^^^^^^^^^^^^^^^^ -The vhost-user-device is a generic development device intended for -expert use while developing new backends. The user needs to specify -all the required parameters including: +The vhost-user-test-device is a generic development device intended +for expert use while developing new backends. The user needs to +specify all the required parameters including: - Device ``virtio-id`` - The ``num_vqs`` it needs and their ``vq_size`` - The ``config_size`` if needed .. note:: - To prevent user confusion you cannot currently instantiate - vhost-user-device without first patching out:: - - /* Reason: stop inexperienced users confusing themselves */ - dc->user_creatable = false; - - in ``vhost-user-device.c`` and ``vhost-user-device-pci.c`` file and - rebuilding. + While this is a useful device for development it is not recommended + for production use. vhost-user daemon ================= diff --git a/docs/system/devices/virtio-gpu.rst b/docs/system/devices/virtio/virtio-gpu.rst index b7eb0fc..0f4bb30 100644 --- a/docs/system/devices/virtio-gpu.rst +++ b/docs/system/devices/virtio/virtio-gpu.rst @@ -1,7 +1,9 @@ .. SPDX-License-Identifier: GPL-2.0-or-later -virtio-gpu +.. _virtio_gpu: + +VirtIO GPU ========== This document explains the setup and usage of the virtio-gpu device. diff --git a/docs/system/devices/virtio-pmem.rst b/docs/system/devices/virtio/virtio-pmem.rst index c82ac06..0c24de8 100644 --- a/docs/system/devices/virtio-pmem.rst +++ b/docs/system/devices/virtio/virtio-pmem.rst @@ -1,7 +1,5 @@ - -=========== -virtio pmem -=========== +VirtIO Persistent Memory +======================== This document explains the setup and usage of the virtio pmem device. The virtio pmem device is a paravirtualized persistent memory device diff --git a/docs/system/devices/virtio-snd.rst b/docs/system/devices/virtio/virtio-snd.rst index 2a9187f..3c797f6 100644 --- a/docs/system/devices/virtio-snd.rst +++ b/docs/system/devices/virtio/virtio-snd.rst @@ -1,4 +1,4 @@ -virtio sound +VirtIO Sound ============ This document explains the setup and usage of the Virtio sound device. diff --git a/docs/system/gdb.rst b/docs/system/gdb.rst index 4228cb5..d50470b 100644 --- a/docs/system/gdb.rst +++ b/docs/system/gdb.rst @@ -20,7 +20,7 @@ connection, use the ``-gdb dev`` option instead of ``-s``. See .. parsed-literal:: - |qemu_system| -s -S -kernel bzImage -hda rootdisk.img -append "root=/dev/hda" + |qemu_system| -s -S -kernel bzImage -drive file=rootdisk.img,format=raw -append "root=/dev/sda" QEMU will launch but will silently wait for gdb to connect. diff --git a/docs/system/i386/amd-memory-encryption.rst b/docs/system/i386/amd-memory-encryption.rst index 748f509..6c23f35 100644 --- a/docs/system/i386/amd-memory-encryption.rst +++ b/docs/system/i386/amd-memory-encryption.rst @@ -1,3 +1,5 @@ +.. _amd-sev: + AMD Secure Encrypted Virtualization (SEV) ========================================= diff --git a/docs/system/i386/tdx.rst b/docs/system/i386/tdx.rst new file mode 100644 index 0000000..8131750 --- /dev/null +++ b/docs/system/i386/tdx.rst @@ -0,0 +1,161 @@ +Intel Trusted Domain eXtension (TDX) +==================================== + +Intel Trusted Domain eXtensions (TDX) refers to an Intel technology that extends +Virtual Machine Extensions (VMX) and Multi-Key Total Memory Encryption (MKTME) +with a new kind of virtual machine guest called a Trust Domain (TD). A TD runs +in a CPU mode that is designed to protect the confidentiality of its memory +contents and its CPU state from any other software, including the hosting +Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself. + +Prerequisites +------------- + +To run TD, the physical machine needs to have TDX module loaded and initialized +while KVM hypervisor has TDX support and has TDX enabled. If those requirements +are met, the ``KVM_CAP_VM_TYPES`` will report the support of ``KVM_X86_TDX_VM``. + +Trust Domain Virtual Firmware (TDVF) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Trust Domain Virtual Firmware (TDVF) is required to provide TD services to boot +TD Guest OS. TDVF needs to be copied to guest private memory and measured before +the TD boots. + +KVM vcpu ioctl ``KVM_TDX_INIT_MEM_REGION`` can be used to populate the TDVF +content into its private memory. + +Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash +device and it actually works as RAM. "-bios" option is chosen to load TDVF. + +OVMF is the opensource firmware that implements the TDVF support. Thus the +command line to specify and load TDVF is ``-bios OVMF.fd`` + +Feature Configuration +--------------------- + +Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a TD are not +under full control of VMM. VMM can only configure part of features of a TD on +``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl. + +The configurable features have three types: + +- Attributes: + - PKS (bit 30) controls whether Supervisor Protection Keys is exposed to TD, + which determines related CPUID bit and CR4 bit; + - PERFMON (bit 63) controls whether PMU is exposed to TD. + +- XSAVE related features (XFAM): + XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS MSR. It + determines the set of extended features available for use by the guest TD. + +- CPUID features: + Only some bits of some CPUID leaves are directly configurable by VMM. + +What features can be configured is reported via TDX capabilities. + +TDX capabilities +~~~~~~~~~~~~~~~~ + +The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_CAPABILITIES`` +to get the TDX capabilities from KVM. It returns a data structure of +``struct kvm_tdx_capabilities``, which tells the supported configuration of +attributes, XFAM and CPUIDs. + +TD attributes +~~~~~~~~~~~~~ + +QEMU supports configuring raw 64-bit TD attributes directly via "attributes" +property of "tdx-guest" object. Note, it's users' responsibility to provide a +valid value because some bits may not supported by current QEMU or KVM yet. + +QEMU also supports the configuration of individual attribute bits that are +supported by it, via properties of "tdx-guest" object. +E.g., "sept-ve-disable" (bit 28). + +MSR based features +~~~~~~~~~~~~~~~~~~ + +Current KVM doesn't support MSR based feature (e.g., MSR_IA32_ARCH_CAPABILITIES) +configuration for TDX, and it's a future work to enable it in QEMU when KVM adds +support of it. + +Feature check +~~~~~~~~~~~~~ + +QEMU checks if the final (CPU) features, determined by given cpu model and +explicit feature adjustment of "+featureA/-featureB", can be supported or not. +It can produce feature not supported warning like + + "warning: host doesn't support requested feature: CPUID.07H:EBX.intel-pt [bit 25]" + +It can also produce warning like + + "warning: TDX forcibly sets the feature: CPUID.80000007H:EDX.invtsc [bit 8]" + +if the fixed-1 feature is requested to be disabled explicitly. This is newly +added to QEMU for TDX because TDX has fixed-1 features that are forcibly enabled +by TDX module and VMM cannot disable them. + +Launching a TD (TDX VM) +----------------------- + +To launch a TD, the necessary command line options are tdx-guest object and +split kernel-irqchip, as below: + +.. parsed-literal:: + + |qemu_system_x86| \\ + -accel kvm \\ + -cpu host \\ + -object tdx-guest,id=tdx0 \\ + -machine ...,confidential-guest-support=tdx0 \\ + -bios OVMF.fd \\ + +Restrictions +------------ + + - kernel-irqchip must be split; + + This is set by default for TDX guest if kernel-irqchip is left on its default + 'auto' setting. + + - No readonly support for private memory; + + - No SMM support: SMM support requires manipulating the guest register states + which is not allowed; + +Debugging +--------- + +Bit 0 of TD attributes, is DEBUG bit, which decides if the TD runs in off-TD +debug mode. When in off-TD debug mode, TD's VCPU state and private memory are +accessible via given SEAMCALLs. This requires KVM to expose APIs to invoke those +SEAMCALLs and corresonponding QEMU change. + +It's targeted as future work. + +TD attestation +-------------- + +In TD guest, the attestation process is used to verify the TDX guest +trustworthiness to other entities before provisioning secrets to the guest. + +TD attestation is initiated first by calling TDG.MR.REPORT inside TD to get the +REPORT. Then the REPORT data needs to be converted into a remotely verifiable +Quote by SGX Quoting Enclave (QE). + +It's a future work in QEMU to add support of TD attestation since it lacks +support in current KVM. + +Live Migration +-------------- + +Future work. + +References +---------- + +- `TDX Homepage <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html>`__ + +- `SGX QE <https://github.com/intel/SGXDataCenterAttestationPrimitives/tree/master/QuoteGeneration>`__ diff --git a/docs/system/igvm.rst b/docs/system/igvm.rst new file mode 100644 index 0000000..79508d9 --- /dev/null +++ b/docs/system/igvm.rst @@ -0,0 +1,173 @@ +Independent Guest Virtual Machine (IGVM) support +================================================ + +IGVM files are designed to encapsulate all the information required to launch a +virtual machine on any given virtualization stack in a deterministic way. This +allows the cryptographic measurement of initial guest state for Confidential +Guests to be calculated when the IGVM file is built, allowing a relying party to +verify the initial state of a guest via a remote attestation. + +Although IGVM files are designed with Confidential Computing in mind, they can +also be used to configure non-confidential guests. Multiple platforms can be +defined by a single IGVM file, allowing a single IGVM file to configure a +virtual machine that can run on, for example, TDX, SEV and non-confidential +hosts. + +QEMU supports IGVM files through the user-creatable ``igvm-cfg`` object. This +object is used to define the filename of the IGVM file to process. A reference +to the object is added to the ``-machine`` to configure the virtual machine +to use the IGVM file for configuration. + +Confidential platform support is provided through the use of +the ``ConfidentialGuestSupport`` object. If the virtual machine provides an +instance of this object then this is used by the IGVM loader to configure the +isolation properties of the directives within the file. + +Further Information on IGVM +--------------------------- + +Information about the IGVM format, including links to the format specification +and documentation for the Rust and C libraries can be found at the project +repository: + +https://github.com/microsoft/igvm + + +Supported Platforms +------------------- + +Currently, IGVM files can be provided for Confidential Guests on host systems +that support AMD SEV, SEV-ES and SEV-SNP with KVM. IGVM files can also be +provided for non-confidential guests. + + +Limitations when using IGVM with AMD SEV, SEV-ES and SEV-SNP +------------------------------------------------------------ + +IGVM files configure the initial state of the guest using a set of directives. +Not every directive is supported by every Confidential Guest type. For example, +AMD SEV does not support encrypted save state regions, therefore setting the +initial CPU state using IGVM for SEV is not possible. When an IGVM file contains +directives that are not supported for the active platform, an error is generated +and the guest launch is aborted. + +The table below describes the list of directives that are supported for SEV, +SEV-ES, SEV-SNP and non-confidential platforms. + +.. list-table:: SEV, SEV-ES, SEV-SNP & non-confidential Supported Directives + :widths: 35 65 + :header-rows: 1 + + * - IGVM directive + - Notes + * - IGVM_VHT_PAGE_DATA + - ``NORMAL`` zero, measured and unmeasured page types are supported. Other + page types result in an error. + * - IGVM_VHT_PARAMETER_AREA + - + * - IGVM_VHT_PARAMETER_INSERT + - + * - IGVM_VHT_VP_COUNT_PARAMETER + - The guest parameter page is populated with the CPU count. + * - IGVM_VHT_ENVIRONMENT_INFO_PARAMETER + - The ``memory_is_shared`` parameter is set to 1 in the guest parameter + page. + +.. list-table:: Additional SEV, SEV-ES & SEV_SNP Supported Directives + :widths: 25 75 + :header-rows: 1 + + * - IGVM directive + - Notes + * - IGVM_VHT_MEMORY_MAP + - The memory map page is populated using entries from the E820 table. + * - IGVM_VHT_REQUIRED_MEMORY + - Ensures memory is available in the guest at the specified range. + +.. list-table:: Additional SEV-ES & SEV-SNP Supported Directives + :widths: 25 75 + :header-rows: 1 + + * - IGVM directive + - Notes + * - IGVM_VHT_VP_CONTEXT + - Setting of the initial CPU state for the boot CPU and additional CPUs is + supported with limitations on the fields that can be provided in the + VMSA. See below for details on which fields are supported. + +Initial CPU state with VMSA +--------------------------- + +The initial state of guest CPUs can be defined in the IGVM file for AMD SEV-ES +and SEV-SNP. The state data is provided as a VMSA structure as defined in Table +B-4 in the AMD64 Architecture Programmer's Manual, Volume 2 [1]. + +The IGVM VMSA is translated to CPU state in QEMU which is then synchronized +by KVM to the guest VMSA during the launch process where it contributes to the +launch measurement. See :ref:`amd-sev` for details on the launch process and +guest launch measurement. + +It is important that no information is lost or changed when translating the +VMSA provided by the IGVM file into the VSMA that is used to launch the guest. +Therefore, QEMU restricts the VMSA fields that can be provided in the IGVM +VMSA structure to the following registers: + +RAX, RCX, RDX, RBX, RBP, RSI, RDI, R8-R15, RSP, RIP, CS, DS, ES, FS, GS, SS, +CR0, CR3, CR4, XCR0, EFER, PAT, GDT, IDT, LDTR, TR, DR6, DR7, RFLAGS, X87_FCW, +MXCSR. + +When processing the IGVM file, QEMU will check if any fields other than the +above are non-zero and generate an error if this is the case. + +KVM uses a hardcoded GPA of 0xFFFFFFFFF000 for the VMSA. When an IGVM file +defines initial CPU state, the GPA for each VMSA must match this hardcoded +value. + +Firmware Images with IGVM +------------------------- + +When an IGVM filename is specified for a Confidential Guest Support object it +overrides the default handling of system firmware: the firmware image, such as +an OVMF binary should be contained as a payload of the IGVM file and not +provided as a flash drive or via the ``-bios`` parameter. The default QEMU +firmware is not automatically populated into the guest memory space. + +If an IGVM file is provided along with either the ``-bios`` parameter or pflash +devices then an error is displayed and the guest startup is aborted. + +Running a guest configured using IGVM +------------------------------------- + +To run a guest configured with IGVM you firstly need to generate an IGVM file +that contains a guest configuration compatible with the platform you are +targeting. + +The ``buildigvm`` tool [2] is an example of a tool that can be used to generate +IGVM files for non-confidential X86 platforms as well as for SEV, SEV-ES and +SEV-SNP confidential platforms. + +Example using this tool to generate an IGVM file for AMD SEV-SNP:: + + buildigvm --firmware /path/to/OVMF.fd --output sev-snp.igvm \ + --cpucount 4 sev-snp + +To run a guest configured with the generated IGVM you need to add an +``igvm-cfg`` object and refer to it from the ``-machine`` parameter: + +Example (for AMD SEV):: + + qemu-system-x86_64 \ + <other parameters> \ + -machine ...,confidential-guest-support=sev0,igvm-cfg=igvm0 \ + -object sev-guest,id=sev0,cbitpos=47,reduced-phys-bits=1 \ + -object igvm-cfg,id=igvm0,file=/path/to/sev-snp.igvm + +References +---------- + +[1] AMD64 Architecture Programmer's Manual, Volume 2: System Programming + Rev 3.41 + https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24593.pdf + +[2] ``buildigvm`` - A tool to build example IGVM files containing OVMF firmware + https://github.com/roy-hopkins/buildigvm
\ No newline at end of file diff --git a/docs/system/images.rst b/docs/system/images.rst index a555117..4370696 100644 --- a/docs/system/images.rst +++ b/docs/system/images.rst @@ -30,7 +30,7 @@ Snapshot mode If you use the option ``-snapshot``, all disk images are considered as read only. When sectors in written, they are written in a temporary file created in ``/tmp``. You can however force the write back to the raw -disk images by using the ``commit`` monitor command (or C-a s in the +disk images by using the ``commit`` monitor command (or :kbd:`Ctrl+a s` in the serial console). .. _vm_005fsnapshots: diff --git a/docs/system/index.rst b/docs/system/index.rst index c21065e..427b020 100644 --- a/docs/system/index.rst +++ b/docs/system/index.rst @@ -38,4 +38,6 @@ or Hypervisor.Framework. security multi-process confidential-guest-support + igvm vm-templating + sriov diff --git a/docs/system/introduction.rst b/docs/system/introduction.rst index 338d374..9c57523 100644 --- a/docs/system/introduction.rst +++ b/docs/system/introduction.rst @@ -23,6 +23,9 @@ Tiny Code Generator (TCG) capable of emulating many CPUs. * - Xen - Linux (as dom0) - Arm, x86 + * - MSHV + - Linux (as dom0) + - x86 * - Hypervisor Framework (hvf) - MacOS - x86 (64 bit only), Arm (64 bit only) @@ -81,7 +84,7 @@ may not be optimal for modern systems. For a non-x86 system where we emulate a broad range of machine types, the command lines are generally more explicit in defining the machine -and boot behaviour. You will find often find example command lines in +and boot behaviour. You will often find example command lines in the :ref:`system-targets-ref` section of the manual. While the project doesn't want to discourage users from using the diff --git a/docs/system/keys.rst.inc b/docs/system/keys.rst.inc index 59966a3..3b5307b 100644 --- a/docs/system/keys.rst.inc +++ b/docs/system/keys.rst.inc @@ -1,36 +1,37 @@ During the graphical emulation, you can use special key combinations from -the following table to change modes. By default the modifier is Ctrl-Alt +the following table to change modes. By default the modifier is :kbd:`Ctrl+Alt` (used in the table below) which can be changed with ``-display`` suboption ``mod=`` where appropriate. For example, ``-display sdl, -grab-mod=lshift-lctrl-lalt`` changes the modifier key to Ctrl-Alt-Shift, -while ``-display sdl,grab-mod=rctrl`` changes it to the right Ctrl key. +grab-mod=lshift-lctrl-lalt`` changes the modifier key to :kbd:`Ctrl+Alt+Shift`, +while ``-display sdl,grab-mod=rctrl`` changes it to the right :kbd:`Ctrl` key. -Ctrl-Alt-f - Toggle full screen +.. list-table:: Multiplexer Keys + :widths: 10 90 + :header-rows: 1 -Ctrl-Alt-+ - Enlarge the screen + * - Key Sequence + - Action -Ctrl-Alt\-- - Shrink the screen + * - :kbd:`Ctrl+Alt+f` + - Toggle full screen -Ctrl-Alt-u - Restore the screen's un-scaled dimensions + * - :kbd:`Ctrl+Alt++` + - Enlarge the screen -Ctrl-Alt-n - Switch to virtual console 'n'. Standard console mappings are: + * - :kbd:`Ctrl+Alt+-` + - Shrink the screen - *1* - Target system display + * - :kbd:`Ctrl+Alt+0` + - Restore the screen's un-scaled dimensions - *2* - Monitor + * - :kbd:`Ctrl+Alt+n` + - Switch to virtual console 'n'. Standard console mappings are: - *3* - Serial port + - *1*: Target system display + - *2*: Monitor + - *3*: Serial port + * - :kbd:`Ctrl+Alt+g` + - Toggle mouse and keyboard grab. -Ctrl-Alt-g - Toggle mouse and keyboard grab. - -In the virtual consoles, you can use Ctrl-Up, Ctrl-Down, Ctrl-PageUp and -Ctrl-PageDown to move in the back log. +In the virtual consoles, you can use :kbd:`Ctrl+Up`, :kbd:`Ctrl+Down`, :kbd:`Ctrl+PageUp` and +:kbd:`Ctrl+PageDown` to move in the back log. diff --git a/docs/system/linuxboot.rst b/docs/system/linuxboot.rst index 5db2e56..f7573ab 100644 --- a/docs/system/linuxboot.rst +++ b/docs/system/linuxboot.rst @@ -11,7 +11,7 @@ The syntax is: .. parsed-literal:: - |qemu_system| -kernel bzImage -hda rootdisk.img -append "root=/dev/hda" + |qemu_system| -kernel bzImage -drive file=rootdisk.img,format=raw -append "root=/dev/sda" Use ``-kernel`` to provide the Linux kernel image and ``-append`` to give the kernel command line arguments. The ``-initrd`` option can be @@ -23,8 +23,8 @@ virtual serial port and the QEMU monitor to the console with the .. parsed-literal:: - |qemu_system| -kernel bzImage -hda rootdisk.img \ - -append "root=/dev/hda console=ttyS0" -nographic + |qemu_system| -kernel bzImage -drive file=rootdisk.img,format=raw \ + -append "root=/dev/sda console=ttyS0" -nographic -Use Ctrl-a c to switch between the serial console and the monitor (see +Use :kbd:`Ctrl+a c` to switch between the serial console and the monitor (see :ref:`GUI_keys`). diff --git a/docs/system/loongarch/virt.rst b/docs/system/loongarch/virt.rst index 172fba0..7845878 100644 --- a/docs/system/loongarch/virt.rst +++ b/docs/system/loongarch/virt.rst @@ -12,14 +12,15 @@ Supported devices ----------------- The ``virt`` machine supports: -- Gpex host bridge -- Ls7a RTC device -- Ls7a IOAPIC device -- ACPI GED device -- Fw_cfg device -- PCI/PCIe devices -- Memory device -- CPU device. Type: la464. + +* Gpex host bridge +* Ls7a RTC device +* Ls7a IOAPIC device +* ACPI GED device +* Fw_cfg device +* PCI/PCIe devices +* Memory device +* CPU device. Type: la464. CPU and machine Type -------------------- @@ -39,13 +40,7 @@ can be accessed by following steps. .. code-block:: bash - ./configure --disable-rdma --prefix=/usr \ - --target-list="loongarch64-softmmu" \ - --disable-libiscsi --disable-libnfs --disable-libpmem \ - --disable-glusterfs --enable-libusb --enable-usb-redir \ - --disable-opengl --disable-xen --enable-spice \ - --enable-debug --disable-capstone --disable-kvm \ - --enable-profiler + ./configure --target-list="loongarch64-softmmu" make -j8 (2) Set cross tools: @@ -53,9 +48,7 @@ can be accessed by following steps. .. code-block:: bash wget https://github.com/loongson/build-tools/releases/download/2022.09.06/loongarch64-clfs-6.3-cross-tools-gcc-glibc.tar.xz - tar -vxf loongarch64-clfs-6.3-cross-tools-gcc-glibc.tar.xz -C /opt - export PATH=/opt/cross-tools/bin:$PATH export LD_LIBRARY_PATH=/opt/cross-tools/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH=/opt/cross-tools/loongarch64-unknown-linux-gnu/lib/:$LD_LIBRARY_PATH @@ -74,13 +67,9 @@ Note: To build the release version of the bios, set --buildtarget=RELEASE, .. code-block:: bash git clone https://github.com/loongson/linux.git - cd linux - git checkout loongarch-next - make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- loongson3_defconfig - make ARCH=loongarch CROSS_COMPILE=loongarch64-unknown-linux-gnu- -j32 Note: The branch of linux source code is loongarch-next. diff --git a/docs/system/mux-chardev.rst.inc b/docs/system/mux-chardev.rst.inc index 84ea12c..c87ba31 100644 --- a/docs/system/mux-chardev.rst.inc +++ b/docs/system/mux-chardev.rst.inc @@ -1,27 +1,33 @@ During emulation, if you are using a character backend multiplexer (which is the default if you are using ``-nographic``) then several commands are available via an escape sequence. These key sequences all -start with an escape character, which is Ctrl-a by default, but can be +start with an escape character, which is :kbd:`Ctrl+a` by default, but can be changed with ``-echr``. The list below assumes you're using the default. -Ctrl-a h - Print this help +.. list-table:: Multiplexer Keys + :widths: 20 80 + :header-rows: 1 -Ctrl-a x - Exit emulator + * - Key Sequence + - Action -Ctrl-a s - Save disk data back to file (if -snapshot) + * - :kbd:`Ctrl+a h` + - Print this help -Ctrl-a t - Toggle console timestamps + * - :kbd:`Ctrl+a x` + - Exit emulator -Ctrl-a b - Send break (magic sysrq in Linux) + * - :kbd:`Ctrl+a s` + - Save disk data back to file (if -snapshot) -Ctrl-a c - Rotate between the frontends connected to the multiplexer (usually - this switches between the monitor and the console) + * - :kbd:`Ctrl+a t` + - Toggle console timestamps -Ctrl-a Ctrl-a - Send the escape character to the frontend + * - :kbd:`Ctrl+a b` + - Send break (magic sysrq in Linux) + + * - :kbd:`Ctrl+a c` + - Rotate between the frontends connected to the multiplexer (usually this switches between the monitor and the console) + + * - :kbd:`Ctrl+a Ctrl+a` + - Send the escape character to the frontend diff --git a/docs/system/ppc/amigang.rst b/docs/system/ppc/amigang.rst index 21bb14e..e290412 100644 --- a/docs/system/ppc/amigang.rst +++ b/docs/system/ppc/amigang.rst @@ -1,6 +1,6 @@ -========================================================= -AmigaNG boards (``amigaone``, ``pegasos2``, ``sam460ex``) -========================================================= +======================================================================= +AmigaNG boards (``amigaone``, ``pegasos1``, ``pegasos2``, ``sam460ex``) +======================================================================= These PowerPC machines emulate boards that are primarily used for running Amiga like OSes (AmigaOS 4, MorphOS and AROS) but these can @@ -64,18 +64,23 @@ eventually it boots and the installer becomes visible. The ``ati-vga`` RV100 emulation is not complete yet so only frame buffer works, DRM and 3D is not available. -Genesi/bPlan Pegasos II (``pegasos2``) -====================================== +Genesi/bPlan Pegasos (``pegasos1``, ``pegasos2``) +================================================= -The ``pegasos2`` machine emulates the Pegasos II sold by Genesi and -designed by bPlan. Its schematics are available at -https://www.powerdeveloper.org/platforms/pegasos/schematics. +The ``pegasos1`` machine emulates the original Pegasos (later marked I) sold by +Genesi and designed by bPlan. It uses the same Articia S north bridge as the +``amigaone`` machine, otherwise it is mostly the same as the later Pegasos II. + +The ``pegasos2`` machine emulates the Pegasos II which is a redesigned version +of Pegasos I to fix problems with its north bridge. Its schematics are available +at https://www.powerdeveloper.org/platforms/pegasos/schematics. Emulated devices ---------------- * PowerPC 7457 CPU (can also use ``-cpu g3`` or ``750cxe``) - * Marvell MV64361 Discovery II north bridge + * Articia S north bridge (for ``pegasos1``) + * Marvell MV64361 Discovery II north bridge (for ``pegasos2``) * VIA VT8231 south bridge * PCI VGA compatible card (guests may need other card instead) * PS/2 keyboard and mouse @@ -83,9 +88,9 @@ Emulated devices Firmware -------- -The Pegasos II board has an Open Firmware compliant ROM based on +The Pegasos boards have an Open Firmware compliant ROM based on SmartFirmware with some changes that are not open-sourced therefore -the ROM binary cannot be included in QEMU. An updater was available +the ROM binary cannot be included in QEMU. A Pegasos II updater was available from bPlan, it can be found in the `Internet Archive <http://web.archive.org/web/20071021223056/http://www.bplan-gmbh.de/up050404/up050404>`_. The ROM image can be extracted from it with the following command: @@ -111,7 +116,7 @@ At the firmware ``ok`` prompt enter ``boot cd install/pegasos``. Alternatively, it is possible to boot the kernel directly without firmware ROM using the QEMU built-in minimal Virtual Open Firmware -(VOF) emulation which is also supported on ``pegasos2``. For this, +(VOF) emulation which is also supported on ``pegasos1`` and ``pegasos2``. For this, extract the kernel ``install/powerpc/vmlinuz-chrp.initrd`` from the CD image, then run: diff --git a/docs/system/ppc/powernv.rst b/docs/system/ppc/powernv.rst index f3ec2cc..5154794 100644 --- a/docs/system/ppc/powernv.rst +++ b/docs/system/ppc/powernv.rst @@ -1,5 +1,5 @@ -PowerNV family boards (``powernv8``, ``powernv9``, ``powernv10``) -================================================================== +PowerNV family boards (``powernv8``, ``powernv9``, ``powernv10``, ``powernv11``) +================================================================================ PowerNV (as Non-Virtualized) is the "bare metal" platform using the OPAL firmware. It runs Linux on IBM and OpenPOWER systems and it can @@ -15,11 +15,12 @@ beyond the scope of what QEMU addresses today. Supported devices ----------------- - * Multi processor support for POWER8, POWER8NVL and POWER9. + * Multi processor support for POWER8, POWER8NVL, POWER9, Power10 and Power11. * XSCOM, serial communication sideband bus to configure chiplets. * Simple LPC Controller. * Processor Service Interface (PSI) Controller. - * Interrupt Controller, XICS (POWER8) and XIVE (POWER9) and XIVE2 (Power10). + * Interrupt Controller, XICS (POWER8) and XIVE (POWER9) and XIVE2 (Power10 & + Power11). * POWER8 PHB3 PCIe Host bridge and POWER9 PHB4 PCIe Host bridge. * Simple OCC is an on-chip micro-controller used for power management tasks. * iBT device to handle BMC communication, with the internal BMC simulator diff --git a/docs/system/qemu-block-drivers.rst.inc b/docs/system/qemu-block-drivers.rst.inc index cfe1acb..384e95b 100644 --- a/docs/system/qemu-block-drivers.rst.inc +++ b/docs/system/qemu-block-drivers.rst.inc @@ -500,8 +500,6 @@ What you should *never* do: - expect it to work when loadvm'ing - write to the FAT directory on the host system while accessing it with the guest system -.. _nbd: - NBD access ~~~~~~~~~~ diff --git a/docs/system/riscv/microchip-icicle-kit.rst b/docs/system/riscv/microchip-icicle-kit.rst index 40798b1..9809e94 100644 --- a/docs/system/riscv/microchip-icicle-kit.rst +++ b/docs/system/riscv/microchip-icicle-kit.rst @@ -5,10 +5,10 @@ Microchip PolarFire SoC Icicle Kit integrates a PolarFire SoC, with one SiFive's E51 plus four U54 cores and many on-chip peripherals and an FPGA. For more details about Microchip PolarFire SoC, please see: -https://www.microsemi.com/product-directory/soc-fpgas/5498-polarfire-soc-fpga +https://www.microchip.com/en-us/products/fpgas-and-plds/system-on-chip-fpgas/polarfire-soc-fpgas The Icicle Kit board information can be found here: -https://www.microsemi.com/existing-parts/parts/152514 +https://www.microchip.com/en-us/development-tool/mpfs-icicle-kit-es Supported devices ----------------- @@ -26,95 +26,48 @@ The ``microchip-icicle-kit`` machine supports the following devices: * 2 GEM Ethernet controllers * 1 SDHC storage controller +The memory is set to 1537 MiB by default. A sanity check on RAM size is +performed in the machine init routine to prompt user to increase the RAM size +to > 1537 MiB when less than 1537 MiB RAM is detected. + Boot options ------------ -The ``microchip-icicle-kit`` machine can start using the standard -bios -functionality for loading its BIOS image, aka Hart Software Services (HSS_). -HSS loads the second stage bootloader U-Boot from an SD card. Then a kernel -can be loaded from U-Boot. It also supports direct kernel booting via the --kernel option along with the device tree blob via -dtb. When direct kernel -boot is used, the OpenSBI fw_dynamic BIOS image is used to boot a payload -like U-Boot or OS kernel directly. - -The user provided DTB should have the following requirements: - -* The /cpus node should contain at least one subnode for E51 and the number - of subnodes should match QEMU's ``-smp`` option -* The /memory reg size should match QEMU’s selected ram_size via ``-m`` -* Should contain a node for the CLINT device with a compatible string - "riscv,clint0" - -QEMU follows below truth table to select which payload to execute: - -===== ========== ========== ======= --bios -kernel -dtb payload -===== ========== ========== ======= - N N don't care HSS - Y don't care don't care HSS - N Y Y kernel -===== ========== ========== ======= - -The memory is set to 1537 MiB by default which is the minimum required high -memory size by HSS. A sanity check on ram size is performed in the machine -init routine to prompt user to increase the RAM size to > 1537 MiB when less -than 1537 MiB ram is detected. - -Running HSS ------------ - -HSS 2020.12 release is tested at the time of writing. To build an HSS image -that can be booted by the ``microchip-icicle-kit`` machine, type the following -in the HSS source tree: - -.. code-block:: bash - - $ export CROSS_COMPILE=riscv64-linux- - $ cp boards/mpfs-icicle-kit-es/def_config .config - $ make BOARD=mpfs-icicle-kit-es - -Download the official SD card image released by Microchip and prepare it for -QEMU usage: - -.. code-block:: bash - - $ wget ftp://ftpsoc.microsemi.com/outgoing/core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic.gz - $ gunzip core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic.gz - $ qemu-img resize core-image-minimal-dev-icicle-kit-es-sd-20201009141623.rootfs.wic 4G - -Then we can boot the machine by: - -.. code-block:: bash - - $ qemu-system-riscv64 -M microchip-icicle-kit -smp 5 \ - -bios path/to/hss.bin -sd path/to/sdcard.img \ - -nic user,model=cadence_gem \ - -nic tap,ifname=tap,model=cadence_gem,script=no \ - -display none -serial stdio \ - -chardev socket,id=serial1,path=serial1.sock,server=on,wait=on \ - -serial chardev:serial1 +The ``microchip-icicle-kit`` machine provides some options to run a firmware +(BIOS) or a kernel image. QEMU follows below truth table to select the +firmware: -With above command line, current terminal session will be used for the first -serial port. Open another terminal window, and use ``minicom`` to connect the -second serial port. +============= =========== ====================================== +-bios -kernel firmware +============= =========== ====================================== +none N this is an error +none Y the kernel image +NULL, default N hss.bin +NULL, default Y opensbi-riscv64-generic-fw_dynamic.bin +other don't care the BIOS image +============= =========== ====================================== -.. code-block:: bash +Direct Kernel Boot +------------------ - $ minicom -D unix\#serial1.sock +Use the ``-kernel`` option to directly run a kernel image. When a direct +kernel boot is requested, a device tree blob may be specified via the ``-dtb`` +option. Unlike other QEMU machines, this machine does not generate a device +tree for the kernel. It shall be provided by the user. The user provided DTB +should meet the following requirements: -HSS output is on the first serial port (stdio) and U-Boot outputs on the -second serial port. U-Boot will automatically load the Linux kernel from -the SD card image. +* The ``/cpus`` node should contain at least one subnode for E51 and the number + of subnodes should match QEMU's ``-smp`` option. -Direct Kernel Boot ------------------- +* The ``/memory`` reg size should match QEMU’s selected RAM size via the ``-m`` + option. -Sometimes we just want to test booting a new kernel, and transforming the -kernel image to the format required by the HSS bootflow is tedious. We can -use '-kernel' for direct kernel booting just like other RISC-V machines do. +* It should contain a node for the CLINT device with a compatible string + "riscv,clint0". -In this mode, the OpenSBI fw_dynamic BIOS image for 'generic' platform is -used to boot an S-mode payload like U-Boot or OS kernel directly. +When ``-bios`` is not specified or set to ``default``, the OpenSBI +``fw_dynamic`` BIOS image for the ``generic`` platform is used to boot an +S-mode payload like U-Boot or OS kernel directly. For example, the following commands show building a U-Boot image from U-Boot mainline v2021.07 for the Microchip Icicle Kit board: @@ -146,4 +99,13 @@ CAVEATS: ``u-boot.bin`` has to be used which does contain one. To use the ELF image, we need to change to CONFIG_OF_EMBED or CONFIG_OF_PRIOR_STAGE. +Running HSS +----------- + +The machine ``microchip-icicle-kit`` used to run the Hart Software Services +(HSS_), however, the HSS development progressed and the QEMU machine +implementation lacks behind. Currently, running the HSS no longer works. +There is missing support in the clock and memory controller devices. In +particular, reading from the SD card does not work. + .. _HSS: https://github.com/polarfire-soc/hart-software-services diff --git a/docs/system/riscv/xiangshan-kunminghu.rst b/docs/system/riscv/xiangshan-kunminghu.rst new file mode 100644 index 0000000..46e7cee --- /dev/null +++ b/docs/system/riscv/xiangshan-kunminghu.rst @@ -0,0 +1,39 @@ +BOSC Xiangshan Kunminghu FPGA prototype platform (``xiangshan-kunminghu``) +========================================================================== +The ``xiangshan-kunminghu`` machine is compatible with our FPGA prototype +platform. + +XiangShan is an open-source high-performance RISC-V processor project. +The third generation processor is called Kunminghu. Kunminghu is a 64-bit +RV64GCBSUHV processor core. More information can be found in our Github +repository: +https://github.com/OpenXiangShan/XiangShan + +Supported devices +----------------- +The ``xiangshan-kunminghu`` machine supports the following devices: + +* Up to 16 xiangshan-kunminghu cores +* Core Local Interruptor (CLINT) +* Incoming MSI Controller (IMSIC) +* Advanced Platform-Level Interrupt Controller (APLIC) +* 1 UART + +Boot options +------------ +The ``xiangshan-kunminghu`` machine can start using the standard ``-bios`` +functionality for loading the boot image. You need to compile and link +the firmware, kernel, and Device Tree (FDT) into a single binary file, +such as ``fw_payload.bin``. + +Running +------- +Below is an example command line for running the ``xiangshan-kunminghu`` +machine: + +.. code-block:: bash + + $ qemu-system-riscv64 -machine xiangshan-kunminghu \ + -smp 16 -m 16G \ + -bios path/to/opensbi/platform/generic/firmware/fw_payload.bin \ + -nographic diff --git a/docs/system/security.rst b/docs/system/security.rst index f2092c8..5399204 100644 --- a/docs/system/security.rst +++ b/docs/system/security.rst @@ -35,6 +35,32 @@ malicious: Bugs affecting these entities are evaluated on whether they can cause damage in real-world use cases and treated as security bugs if this is the case. +To be covered by this security support policy you must: + +- use a virtualization accelerator like KVM or HVF +- use one of the machine types listed below + +It may be possible to use other machine types with a virtualization +accelerator to provide improved performance with a trusted guest +workload, but any machine type not listed here should not be +considered to be providing guest isolation or security guarantees, +and falls under the "non-virtualization use case". + +Supported machine types for the virtualization use case, by target architecture: + +aarch64 + ``virt`` +i386, x86_64 + ``microvm``, ``xenfv``, ``xenpv``, ``xenpvh``, ``pc``, ``q35`` +s390x + ``s390-ccw-virtio`` +loongarch64: + ``virt`` +ppc64: + ``pseries`` +riscv32, riscv64: + ``virt`` + Non-virtualization Use Case ''''''''''''''''''''''''''' diff --git a/docs/system/sriov.rst b/docs/system/sriov.rst new file mode 100644 index 0000000..b19e787 --- /dev/null +++ b/docs/system/sriov.rst @@ -0,0 +1,37 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +Composable SR-IOV device +======================== + +SR-IOV (Single Root I/O Virtualization) is an optional extended capability of a +PCI Express device. It allows a single physical function (PF) to appear as +multiple virtual functions (VFs) for the main purpose of eliminating software +overhead in I/O from virtual machines. + +There are devices with predefined SR-IOV configurations, but it is also possible +to compose an SR-IOV device yourself. Composing an SR-IOV device is currently +only supported by virtio-net-pci. + +Users can configure an SR-IOV-capable virtio-net device by adding +virtio-net-pci functions to a bus. Below is a command line example: + +.. code-block:: shell + + -netdev user,id=n -netdev user,id=o + -netdev user,id=p -netdev user,id=q + -device pcie-root-port,id=b + -device virtio-net-pci,bus=b,addr=0x0.0x3,netdev=q,sriov-pf=f + -device virtio-net-pci,bus=b,addr=0x0.0x2,netdev=p,sriov-pf=f + -device virtio-net-pci,bus=b,addr=0x0.0x1,netdev=o,sriov-pf=f + -device virtio-net-pci,bus=b,addr=0x0.0x0,netdev=n,id=f + +The VFs specify the paired PF with ``sriov-pf`` property. The PF must be +added after all VFs. It is the user's responsibility to ensure that VFs have +function numbers larger than one of the PF, and that the function numbers +have a consistent stride. Both the PF and VFs are ARI-capable so you can have +255 VFs at maximum. + +You may also need to perform additional steps to activate the SR-IOV feature on +your guest. For Linux, refer to [1]_. + +.. [1] https://docs.kernel.org/PCI/pci-iov-howto.html diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst index b96a05a..a96d186 100644 --- a/docs/system/target-arm.rst +++ b/docs/system/target-arm.rst @@ -71,6 +71,7 @@ Board-specific documentation .. toctree:: :maxdepth: 1 + arm/max78000 arm/integratorcp arm/mps2 arm/musca diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst index ab7af1a..2374391 100644 --- a/docs/system/target-i386.rst +++ b/docs/system/target-i386.rst @@ -31,11 +31,10 @@ Architectural features i386/kvm-pv i386/sgx i386/amd-memory-encryption + i386/tdx OS requirements ~~~~~~~~~~~~~~~ On x86_64 hosts, the default set of CPU features enabled by the KVM -accelerator require the host to be running Linux v4.5 or newer. Red Hat -Enterprise Linux 7 is also supported, since the required -functionality was backported. +accelerator require the host to be running Linux v4.5 or newer. diff --git a/docs/system/target-loongarch.rst b/docs/system/target-loongarch.rst new file mode 100644 index 0000000..316c604 --- /dev/null +++ b/docs/system/target-loongarch.rst @@ -0,0 +1,19 @@ +.. _LoongArch-System-emulator: + +LoongArch System emulator +------------------------- + +QEMU can emulate loongArch 64 bit systems via the +``qemu-system-loongarch64`` binary. Only one machine type ``virt`` is +supported. + +When using KVM as accelerator, QEMU can emulate la464 cpu model. And when +using the default cpu model with TCG as accelerator, QEMU will emulate a +subset of la464 cpu features that should be enough to run distributions +built for the la464. + +Board-specific documentation +============================ + +.. toctree:: + loongarch/virt diff --git a/docs/system/target-mips.rst b/docs/system/target-mips.rst index 83239fb..2a152e1 100644 --- a/docs/system/target-mips.rst +++ b/docs/system/target-mips.rst @@ -12,8 +12,6 @@ machine types are emulated: - An ACER Pica \"pica61\". This machine needs the 64-bit emulator. -- MIPS emulator pseudo board \"mipssim\" - - A MIPS Magnum R4000 machine \"magnum\". This machine needs the 64-bit emulator. @@ -80,15 +78,6 @@ The Loongson-3 virtual platform emulation supports: - Both KVM and TCG supported -The mipssim pseudo board emulation provides an environment similar to -what the proprietary MIPS emulator uses for running Linux. It supports: - -- A range of MIPS CPUs, default is the 24Kf - -- PC style serial port - -- MIPSnet network emulation - .. include:: cpu-models-mips.rst.inc .. _nanoMIPS-System-emulator: @@ -112,5 +101,5 @@ https://mipsdistros.mips.com/LinuxDistro/nanomips/kernels/v4.15.18-432-gb2eb9a8b Start system emulation of Malta board with nanoMIPS I7200 CPU:: qemu-system-mipsel -cpu I7200 -kernel <kernel_image_file> \ - -M malta -serial stdio -m <memory_size> -hda <disk_image_file> \ + -M malta -serial stdio -m <memory_size> -drive file=<disk_image_file>,format=raw \ -append "mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" diff --git a/docs/system/target-riscv.rst b/docs/system/target-riscv.rst index 95457af..89b2cb7 100644 --- a/docs/system/target-riscv.rst +++ b/docs/system/target-riscv.rst @@ -71,6 +71,7 @@ undocumented; you can get a complete list by running riscv/shakti-c riscv/sifive_u riscv/virt + riscv/xiangshan-kunminghu RISC-V CPU firmware ------------------- diff --git a/docs/system/targets.rst b/docs/system/targets.rst index 224fada..38e2418 100644 --- a/docs/system/targets.rst +++ b/docs/system/targets.rst @@ -18,6 +18,7 @@ Contents: target-arm target-avr + target-loongarch target-m68k target-mips target-ppc diff --git a/docs/system/tls.rst b/docs/system/tls.rst index e284c82..03fa1d8 100644 --- a/docs/system/tls.rst +++ b/docs/system/tls.rst @@ -36,8 +36,58 @@ server and exposing it directly to remote browser clients. In such a case it might be useful to use a commercial CA to avoid needing to install custom CA certs in the web browsers. -The recommendation is for the server to keep its certificates in either -``/etc/pki/qemu`` or for unprivileged users in ``$HOME/.pki/qemu``. +.. _tls_cert_file_naming: + +Certificate file naming +~~~~~~~~~~~~~~~~~~~~~~~ + +In a simple setup, where all QEMU instances on a machine share the +same TLS configuration, it is suggested that QEMU certificates be +kept in either ``/etc/pki/qemu`` or, for unprivileged users, in +``$HOME/.pki/qemu``. Where different QEMU subsystems require +different certificate configurations, sub-dirs of these locations +may be chosen. + +The default file names that QEMU will traditionally load are: + +* ``ca-cert.pem`` - mandatory; for both client and server configurations +* ``ca-crl.pem`` - optional; for server configurations only +* ``server-cert.pem`` - mandatory; for server configurations only +* ``server-key.pem`` - mandatory; for server configurations only +* ``client-cert.pem`` - optional; for client configurations only +* ``client-key.pem`` - optional; for client configurations only +* ``dh-params.pem`` - optional; for server configurations only + +Since QEMU 10.2.0, there is support for loading upto four additional +identities: + +* ``server-cert-[IDX].pem`` - optional; for server configurations only +* ``server-key-[IDX].pem`` - optional; for server configurations only +* ``client-cert-[IDX].pem`` - optional; for client configurations only +* ``client-key-[IDX].pem`` - optional; for client configurations only + +where ``-[IDX]`` is one of the digits 0-3. Loading will terminate at +the first absent index. The index based certificate files may be used +as a replacement for, or in addition to, the traditional non-index +based certificate files. The traditional certificate files will be +loaded first, if present, then the index based certificates. Where +multiple certificates are compatible with a TLS session, the first +loaded certificate will preferred. IOW file naming can influence +which certificates are used for a session. + +The use of multiple sets of certificates is intended to allow an +incremental transition to certificates using different crytographic +algorithms. This allows a newly deployed QEMU to introduce use of +stronger cryptographic algorithms that will be preferred when talking +to other newly deployed QEMU instances, while retaining compatbility +with certificates issued to a historically deployed QEMU. This is +notably useful to support live migration from an old QEMU deployed +on older operating system releases, which may support fewer crypto +algorithm choices than the current OS. + +The certificate creation commands below will be illustrated using +the traditional naming scheme, but their args can be substituted +to use the indexed naming in the obvious manner. .. _tls_005fgenerate_005fca: @@ -118,7 +168,6 @@ information for each server, and use it to issue server certificates. ip_address = 2620:0:cafe::87 ip_address = 2001:24::92 tls_www_server - encryption_key signing_key EOF # certtool --generate-privkey > server-hostNNN-key.pem @@ -134,9 +183,8 @@ the subject alt name extension data. The ``tls_www_server`` keyword is the key purpose extension to indicate this certificate is intended for usage in a web server. Although QEMU network services are not in fact HTTP servers (except for VNC websockets), setting this key purpose is -still recommended. The ``encryption_key`` and ``signing_key`` keyword is -the key usage extension to indicate this certificate is intended for -usage in the data session. +still recommended. The ``signing_key`` keyword is the key usage extension +to indicate this certificate is intended for usage in the data session. The ``server-hostNNN-key.pem`` and ``server-hostNNN-cert.pem`` files should now be securely copied to the server for which they were @@ -171,7 +219,6 @@ certificates. organization = Name of your organization cn = hostNNN.foo.example.com tls_www_client - encryption_key signing_key EOF # certtool --generate-privkey > client-hostNNN-key.pem @@ -187,9 +234,8 @@ the ``dns_name`` and ``ip_address`` fields are not included. The ``tls_www_client`` keyword is the key purpose extension to indicate this certificate is intended for usage in a web client. Although QEMU network clients are not in fact HTTP clients, setting this key purpose is still -recommended. The ``encryption_key`` and ``signing_key`` keyword is the -key usage extension to indicate this certificate is intended for usage -in the data session. +recommended. The ``signing_key`` keyword is the key usage extension to +indicate this certificate is intended for usage in the data session. The ``client-hostNNN-key.pem`` and ``client-hostNNN-cert.pem`` files should now be securely copied to the client for which they were @@ -222,7 +268,6 @@ client and server instructions in one. ip_address = 2001:24::92 tls_www_server tls_www_client - encryption_key signing_key EOF # certtool --generate-privkey > both-hostNNN-key.pem @@ -256,11 +301,13 @@ When specifying the object, the ``dir`` parameters specifies which directory contains the credential files. This directory is expected to contain files with the names mentioned previously, ``ca-cert.pem``, ``server-key.pem``, ``server-cert.pem``, ``client-key.pem`` and -``client-cert.pem`` as appropriate. It is also possible to include a set -of pre-generated Diffie-Hellman (DH) parameters in a file -``dh-params.pem``, which can be created using the -``certtool --generate-dh-params`` command. If omitted, QEMU will -dynamically generate DH parameters when loading the credentials. +``client-cert.pem`` as appropriate. + +While it is possible to include a set of pre-generated Diffie-Hellman +(DH) parameters in a file ``dh-params.pem``, this facility is now +deprecated and will be removed in a future release. When omitted the +DH parameters will be automatically negotiated in accordance with +RFC7919. The ``endpoint`` parameter indicates whether the credentials will be used for a network client or server, and determines which PEM files are @@ -298,6 +345,74 @@ example with VNC: .. _tls_005fpsk: +TLS certificates for Post-Quantum Cryptography +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Given a new enough gnutls release, suitably integrated & configured with the +operating system crypto policies, QEMU is able to support post-quantum +crytography on TLS enabled services, either exclusively or in a hybrid mode. + +In exclusive mode, only a single set of certificates need to be configured +for QEMU, with PQC compliant algorithms. Such a QEMU configuration will only +be able to interoperate with other services (including other QEMU's) that +also have PQC enabled. This can result in compatibility concerns during the +period of transition over to PQC compliant algorithms. + +In hybrid mode, multiple sets of certificates need to be configured for QEMU, +at least one set with traditional (non-PQC compliant) algorithms, and at least +one other set with modern (PQC compliant) algorithms. At time of the TLS +handshake, the GNUTLS algorithm priorities should ensure that PQC compliant +algorithms are negotiated if both sides of the connection support PQC. If one +side lacks PQC, the TLS handshake should fallback to the non-PQC algorithms. +This can assist with interoperability during the transition to PQC, but has a +potential weakness wrt downgrade attacks forcing use of non-PQC algorithms. +Exclusive PQC mode should be preferred where both peers in the TLS connections +are known to support PQC. + +Key generation parameters +^^^^^^^^^^^^^^^^^^^^^^^^^ + +To create certificates with PQC compliant algorithms, the ``--key-type`` +argument must be passed to ``certtool`` when creating private keys. No +extra arguments are required for the other ``certtool`` commands, as +their behaviour will be determined by the private key type. + +The typical PQC compliant algorithms to use are ``ML-DSA-44``, ``ML-DSA-65`` +and ``ML-DSA-87``, with ``ML-DSA-65`` being a suitable default choice in +the absence of explicit requirements. + +Taking the example earlier, for creating a key for a client certificate, +to use ``ML-DSA-65`` the command line would be modified to look like:: + + # certtool --generate-privkey --key-type=mldsa65 > client-hostNNN-key.pem + +The equivalent modification applies to the creation of the private keys +used for server certs, or root/intermediate CA certs. + +For hybrid mode, the additional indexed certificate naming must be used. +If multiple configured certificates are compatible with the mutually +supported crypto algorithms between the client and server, then the +first matching certificate will be used. + +IOW, to ensure that PQC certificates are preferred, they must use a +non-index based filename, or use an index that is smaller than any +non-PQC certificates. ie, ``server-cert.pem`` for PQC and ``server-cert-0.pem`` +for non-PQC, or ``server-cert-0.pem`` for PQC and ``server-cert-1.pem`` for +non-PQC. + +Force disabling PQC via crypto priority +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In the OS configuration for system crypto algorithm priorities has +enabled PQC, this can (optionally) be overriden in QEMU configuration +disable use of PQC using the ``priority`` parameter to the ``tls-creds-x509`` +object:: + + NO_MLDSA="-SIGN-ML-DSA-65:-SIGN-ML-DSA-44:-SIGN-ML-DSA-87" + NO_MLKEM="-GROUP-X25519-MLKEM768:-GROUP-SECP256R1-MLKEM768:-GROUP-SECP384R1-MLKEM1024" + # qemu-nbd --object tls-creds-x509,id=tls0,endpoint=server,dir=....,priority=@SYSTEM:$NO_MLDSA:$NO_MLKEM + + TLS Pre-Shared Keys (PSK) ~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/docs/system/virtio-net-failover.rst b/docs/system/virtio-net-failover.rst index 6002dc5..0cc4654 100644 --- a/docs/system/virtio-net-failover.rst +++ b/docs/system/virtio-net-failover.rst @@ -26,43 +26,48 @@ and standby devices are not plugged into the same PCIe slot. Usecase ------- - Virtio-net standby allows easy migration while using a passed-through fast - networking device by falling back to a virtio-net device for the duration of - the migration. It is like a simple version of a bond, the difference is that it - requires no configuration in the guest. When a guest is live-migrated to - another host QEMU will unplug the primary device via the PCIe based hotplug - handler and traffic will go through the virtio-net device. On the target - system the primary device will be automatically plugged back and the - net_failover module registers it again as the primary device. +Virtio-net standby allows easy migration while using a passed-through +fast networking device by falling back to a virtio-net device for the +duration of the migration. It is like a simple version of a bond, the +difference is that it requires no configuration in the guest. When a +guest is live-migrated to another host QEMU will unplug the primary +device via the PCIe based hotplug handler and traffic will go through +the virtio-net device. On the target system the primary device will be +automatically plugged back and the net_failover module registers it +again as the primary device. Usage ----- - The primary device can be hotplugged or be part of the startup configuration +The primary device can be hotplugged or be part of the startup configuration - -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \ - bus=root2,failover=on +.. code-block:: shell - With the parameter failover=on the VIRTIO_NET_F_STANDBY feature will be enabled. + -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on + +With the parameter ``failover=on`` the VIRTIO_NET_F_STANDBY feature will be enabled. + +.. code-block:: shell -device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,failover_pair_id=net1 - failover_pair_id references the id of the virtio-net standby device. This - is only for pairing the devices within QEMU. The guest kernel module - net_failover will match devices with identical MAC addresses. +``failover_pair_id`` references the id of the virtio-net standby device. +This is only for pairing the devices within QEMU. The guest kernel +module net_failover will match devices with identical MAC addresses. Hotplug ------- - Both primary and standby device can be hotplugged via the QEMU monitor. Note - that if the virtio-net device is plugged first a warning will be issued that it - couldn't find the primary device. +Both primary and standby device can be hotplugged via the QEMU +monitor. Note that if the virtio-net device is plugged first a warning +will be issued that it couldn't find the primary device. Migration --------- - A new migration state wait-unplug was added for this feature. If failover primary - devices are present in the configuration, migration will go into this state. - It will wait until the device unplug is completed in the guest and then move into - active state. On the target system the primary devices will be automatically hotplugged - when the feature bit was negotiated for the virtio-net standby device. +A new migration state wait-unplug was added for this feature. If +failover primary devices are present in the configuration, migration +will go into this state. It will wait until the device unplug is +completed in the guest and then move into active state. On the target +system the primary devices will be automatically hotplugged when the +feature bit was negotiated for the virtio-net standby device. |
