diff options
Diffstat (limited to 'docs')
-rw-r--r-- | docs/conf.py | 2 | ||||
-rw-r--r-- | docs/devel/migration/CPR.rst | 112 | ||||
-rw-r--r-- | docs/devel/migration/main.rst | 19 | ||||
-rw-r--r-- | docs/interop/firmware.json | 4 | ||||
-rw-r--r-- | docs/specs/riscv-iommu.rst | 35 | ||||
-rw-r--r-- | docs/sphinx-static/theme_overrides.css | 3 |
6 files changed, 147 insertions, 28 deletions
diff --git a/docs/conf.py b/docs/conf.py index e09769e..0c9ec74 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -1,5 +1,3 @@ -# -*- coding: utf-8 -*- -# # QEMU documentation build configuration file, created by # sphinx-quickstart on Thu Jan 31 16:40:14 2019. # diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst index 0a0fd4f..b617856 100644 --- a/docs/devel/migration/CPR.rst +++ b/docs/devel/migration/CPR.rst @@ -5,7 +5,7 @@ CPR is the umbrella name for a set of migration modes in which the VM is migrated to a new QEMU instance on the same host. It is intended for use when the goal is to update host software components that run the VM, such as QEMU or even the host kernel. At this time, -the cpr-reboot and cpr-transfer modes are available. +the cpr-reboot, cpr-transfer, and cpr-exec modes are available. Because QEMU is restarted on the same host, with access to the same local devices, CPR is allowed in certain cases where normal migration @@ -324,3 +324,113 @@ descriptors from old to new QEMU. In the future, descriptors for vhost, and char devices could be transferred, preserving those devices and their kernel state without interruption, even if they do not explicitly support live migration. + +cpr-exec mode +------------- + +In this mode, QEMU stops the VM, writes VM state to the migration +URI, and directly exec's a new version of QEMU on the same host, +replacing the original process while retaining its PID. Guest RAM is +preserved in place, albeit with new virtual addresses. The user +completes the migration by specifying the ``-incoming`` option, and +by issuing the ``migrate-incoming`` command if necessary; see details +below. + +This mode supports VFIO/IOMMUFD devices by preserving device +descriptors and hence kernel state across the exec, even for devices +that do not support live migration. + +Because the old and new QEMU instances are not active concurrently, +the URI cannot be a type that streams data from one instance to the +other. + +This mode does not require a channel of type ``cpr``. The information +that is passed over that channel for cpr-transfer mode is instead +serialized to a memfd, the number of the fd is saved in the +QEMU_CPR_EXEC_STATE environment variable during the exec of new QEMU. +and new QEMU mmaps the memfd. + +Usage +^^^^^ + +Arguments for the new QEMU process are taken from the +@cpr-exec-command parameter. The first argument should be the +path of a new QEMU binary, or a prefix command that exec's the +new QEMU binary, and the arguments should include the ''-incoming'' +option. + +Memory backend objects must have the ``share=on`` attribute. +The VM must be started with the ``-machine aux-ram-share=on`` option. + +Outgoing: + * Set the migration mode parameter to ``cpr-exec``. + * Set the ``cpr-exec-command`` parameter. + * Issue the ``migrate`` command. It is recommended that the URI be + a ``file`` type, but one can use other types such as ``exec``, + provided the command captures all the data from the outgoing side, + and provides all the data to the incoming side. + +Incoming: + * You do not need to explicitly start new QEMU. It is started as + a side effect of the migrate command above. + * If the VM was running when the outgoing ``migrate`` command was + issued, then QEMU automatically resumes VM execution. + +Example 1: incoming URI +^^^^^^^^^^^^^^^^^^^^^^^ + +In these examples, we simply restart the same version of QEMU, but in +a real scenario one would set a new QEMU binary path in +cpr-exec-command. + +:: + + # qemu-kvm -monitor stdio + -object memory-backend-memfd,id=ram0,size=4G + -machine memory-backend=ram0 + -machine aux-ram-share=on + ... + + QEMU 10.2.50 monitor - type 'help' for more information + (qemu) info status + VM status: running + (qemu) migrate_set_parameter mode cpr-exec + (qemu) migrate_set_parameter cpr-exec-command qemu-kvm ... -incoming file:vm.state + (qemu) migrate -d file:vm.state + (qemu) QEMU 10.2.50 monitor - type 'help' for more information + (qemu) info status + VM status: running + +Example 2: incoming defer +^^^^^^^^^^^^^^^^^^^^^^^^^ +:: + + # qemu-kvm -monitor stdio + -object memory-backend-memfd,id=ram0,size=4G + -machine memory-backend=ram0 + -machine aux-ram-share=on + ... + + QEMU 10.2.50 monitor - type 'help' for more information + (qemu) info status + VM status: running + (qemu) migrate_set_parameter mode cpr-exec + (qemu) migrate_set_parameter cpr-exec-command qemu-kvm ... -incoming defer + (qemu) migrate -d file:vm.state + (qemu) QEMU 10.2.50 monitor - type 'help' for more information + (qemu) info status + status: paused (inmigrate) + (qemu) migrate_incoming file:vm.state + (qemu) info status + VM status: running + +Caveats +^^^^^^^ + +cpr-exec mode may not be used with postcopy, background-snapshot, +or COLO. + +cpr-exec mode requires permission to use the exec system call, which +is denied by certain sandbox options, such as spawn. + +The guest pause time increases for large guest RAM backed by small pages. diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst index 6493c1d..1afe7b9 100644 --- a/docs/devel/migration/main.rst +++ b/docs/devel/migration/main.rst @@ -444,6 +444,25 @@ The functions to do that are inside a vmstate definition, and are called: This function is called after we save the state of one device (even upon failure, unless the call to pre_save returned an error). +Following are the errp variants of these functions. + +- ``int (*pre_load_errp)(void *opaque, Error **errp);`` + + This function is called before we load the state of one device. + +- ``int (*post_load_errp)(void *opaque, int version_id, Error **errp);`` + + This function is called after we load the state of one device. + +- ``int (*pre_save_errp)(void *opaque, Error **errp);`` + + This function is called before we save the state of one device. + +New impls should preferentally use 'errp' variants of these +methods and existing impls incrementally converted. +The variants without 'errp' are intended to be removed +once all usage is converted. + Example: You can look at hpet.c, that uses the first three functions to massage the state that is transferred. diff --git a/docs/interop/firmware.json b/docs/interop/firmware.json index 6bbe2cc..ccbfaf8 100644 --- a/docs/interop/firmware.json +++ b/docs/interop/firmware.json @@ -85,12 +85,14 @@ # # @loongarch64: 64-bit LoongArch. (since: 7.1) # +# @riscv64: 64-bit RISC-V. +# # @x86_64: 64-bit x86. # # Since: 3.0 ## { 'enum' : 'FirmwareArchitecture', - 'data' : [ 'aarch64', 'arm', 'i386', 'loongarch64', 'x86_64' ] } + 'data' : [ 'aarch64', 'arm', 'i386', 'loongarch64', 'riscv64', 'x86_64' ] } ## # @FirmwareTarget: diff --git a/docs/specs/riscv-iommu.rst b/docs/specs/riscv-iommu.rst index 991d376..571a6a6 100644 --- a/docs/specs/riscv-iommu.rst +++ b/docs/specs/riscv-iommu.rst @@ -30,15 +30,15 @@ This will add a RISC-V IOMMU PCI device in the board following any additional PCI parameters (like PCI bus address). The behavior of the RISC-V IOMMU is defined by the spec but its operation is OS dependent. -As of this writing the existing Linux kernel support `linux-v8`_, not yet merged, -does not have support for features like VFIO passthrough. The IOMMU emulation -was tested using a public Ventana Micro Systems kernel repository in -`ventana-linux`_. This kernel is based on `linux-v8`_ with additional patches that -enable features like KVM VFIO passthrough with irqbypass. Until the kernel support -is feature complete feel free to use the kernel available in the Ventana Micro Systems -mirror. - -The current Linux kernel support will use the IOMMU device to create IOMMU groups +Linux kernel iommu support was merged in v6.13. QEMU IOMMU emulation can be +used with mainline kernels for simple IOMMU PCIe support. + +As of v6.17, it does not have support for features like VFIO passthrough. +There is a `VFIO`_ RFC series that is not yet merged. The public Ventana Micro +Systems kernel repository in `ventana-linux`_ can be used for testing the VFIO +functions. + +The v6.13+ Linux kernel support uses the IOMMU device to create IOMMU groups with any eligible cards available in the system, regardless of factors such as the order in which the devices are added in the command line. @@ -49,7 +49,7 @@ IOMMU kernel driver behaves: $ qemu-system-riscv64 \ -M virt,aia=aplic-imsic,aia-guests=5 \ - -device riscv-iommu-pci,addr=1.0,vendor-id=0x1efd,device-id=0xedf1 \ + -device riscv-iommu-pci,addr=1.0 \ -device e1000e,netdev=net1 -netdev user,id=net1,net=192.168.0.0/24 \ -device e1000e,netdev=net2 -netdev user,id=net2,net=192.168.200.0/24 \ (...) @@ -58,21 +58,11 @@ IOMMU kernel driver behaves: -M virt,aia=aplic-imsic,aia-guests=5 \ -device e1000e,netdev=net1 -netdev user,id=net1,net=192.168.0.0/24 \ -device e1000e,netdev=net2 -netdev user,id=net2,net=192.168.200.0/24 \ - -device riscv-iommu-pci,addr=1.0,vendor-id=0x1efd,device-id=0xedf1 \ + -device riscv-iommu-pci,addr=3.0 \ (...) Both will create iommu groups for the two e1000e cards. -Another thing to notice on `linux-v8`_ and `ventana-linux`_ is that the kernel driver -considers an IOMMU identified as a Rivos device, i.e. it uses Rivos vendor ID. To -use the riscv-iommu-pci device with the existing kernel support we need to emulate -a Rivos PCI IOMMU by setting 'vendor-id' and 'device-id': - -.. code-block:: bash - - $ qemu-system-riscv64 -M virt \ - -device riscv-iommu-pci,vendor-id=0x1efd,device-id=0xedf1 (...) - Several options are available to control the capabilities of the device, namely: - "bus": the bus that the IOMMU device uses @@ -84,6 +74,7 @@ Several options are available to control the capabilities of the device, namely: - "g-stage": enable g-stage support - "hpm-counters": number of hardware performance counters available. Maximum value is 31. Default value is 31. Use 0 (zero) to disable HPM support +- "vendor-id"/"device-id": pci device ID. Defaults to 1b36:0014 (Redhat) riscv-iommu-sys device ---------------------- @@ -111,6 +102,6 @@ riscv-iommu options: .. _iommu1.0.0: https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0.0/riscv-iommu.pdf -.. _linux-v8: https://lore.kernel.org/linux-riscv/cover.1718388908.git.tjeznach@rivosinc.com/ +.. _VFIO: https://lore.kernel.org/linux-riscv/20241114161845.502027-17-ajones@ventanamicro.com/ .. _ventana-linux: https://github.com/ventanamicro/linux/tree/dev-upstream diff --git a/docs/sphinx-static/theme_overrides.css b/docs/sphinx-static/theme_overrides.css index b225bf7..f312e9b 100644 --- a/docs/sphinx-static/theme_overrides.css +++ b/docs/sphinx-static/theme_overrides.css @@ -1,5 +1,4 @@ -/* -*- coding: utf-8; mode: css -*- - * +/* * Sphinx HTML theme customization: read the doc * Based on Linux Documentation/sphinx-static/theme_overrides.css */ |