aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/about/deprecated.rst115
-rw-r--r--docs/about/emulation.rst650
-rw-r--r--docs/about/removed-features.rst71
-rw-r--r--docs/conf.py10
-rw-r--r--docs/devel/atomics.rst6
-rw-r--r--docs/devel/blkdebug.txt162
-rw-r--r--docs/devel/build-system.rst6
-rw-r--r--docs/devel/clocks.rst6
-rw-r--r--docs/devel/crypto.rst10
-rw-r--r--docs/devel/index-api.rst1
-rw-r--r--docs/devel/index-build.rst14
-rw-r--r--docs/devel/index-internals.rst3
-rw-r--r--docs/devel/index.rst1
-rw-r--r--docs/devel/loads-stores.rst2
-rw-r--r--docs/devel/lockcnt.rst (renamed from docs/devel/lockcnt.txt)89
-rw-r--r--docs/devel/luks-detached-header.rst182
-rw-r--r--docs/devel/maintainers.rst4
-rw-r--r--docs/devel/migration/features.rst1
-rw-r--r--docs/devel/migration/main.rst6
-rw-r--r--docs/devel/migration/mapped-ram.rst4
-rw-r--r--docs/devel/migration/qatzip-compression.rst165
-rw-r--r--docs/devel/migration/uadk-compression.rst4
-rw-r--r--docs/devel/multiple-iothreads.rst139
-rw-r--r--docs/devel/multiple-iothreads.txt130
-rw-r--r--docs/devel/nested-papr.txt119
-rw-r--r--docs/devel/qapi-code-gen.rst58
-rw-r--r--docs/devel/rcu.rst (renamed from docs/devel/rcu.txt)172
-rw-r--r--docs/devel/replay.rst3
-rw-r--r--docs/devel/reset.rst20
-rw-r--r--docs/devel/tcg-plugins.rst496
-rw-r--r--docs/devel/testing/acpi-bits.rst (renamed from docs/devel/acpi-bits.rst)86
-rw-r--r--docs/devel/testing/avocado.rst581
-rw-r--r--docs/devel/testing/blkdebug.rst177
-rw-r--r--docs/devel/testing/blkverify.rst (renamed from docs/devel/blkverify.txt)30
-rw-r--r--docs/devel/testing/ci-definitions.rst.inc (renamed from docs/devel/ci-definitions.rst.inc)0
-rw-r--r--docs/devel/testing/ci-jobs.rst.inc (renamed from docs/devel/ci-jobs.rst.inc)0
-rw-r--r--docs/devel/testing/ci-runners.rst.inc (renamed from docs/devel/ci-runners.rst.inc)0
-rw-r--r--docs/devel/testing/ci.rst (renamed from docs/devel/ci.rst)0
-rw-r--r--docs/devel/testing/functional.rst338
-rw-r--r--docs/devel/testing/fuzzing.rst (renamed from docs/devel/fuzzing.rst)9
-rw-r--r--docs/devel/testing/index.rst18
-rw-r--r--docs/devel/testing/main.rst (renamed from docs/devel/testing.rst)617
-rw-r--r--docs/devel/testing/qgraph.rst (renamed from docs/devel/qgraph.rst)0
-rw-r--r--docs/devel/testing/qtest.rst (renamed from docs/devel/qtest.rst)0
-rw-r--r--docs/interop/firmware.json47
-rw-r--r--docs/interop/index.rst3
-rw-r--r--docs/interop/live-block-operations.rst4
-rw-r--r--docs/interop/nbd.rst89
-rw-r--r--docs/interop/nbd.txt72
-rw-r--r--docs/interop/parallels.rst (renamed from docs/interop/parallels.txt)108
-rw-r--r--docs/interop/prl-xml.rst192
-rw-r--r--docs/interop/prl-xml.txt158
-rw-r--r--docs/interop/qemu-ga.rst20
-rw-r--r--docs/meson.build6
-rw-r--r--docs/pcie_sriov.txt8
-rw-r--r--docs/specs/acpi_hw_reduced_hotplug.rst3
-rw-r--r--docs/specs/fw_cfg.rst4
-rw-r--r--docs/specs/index.rst3
-rw-r--r--docs/specs/pci-ids.rst8
-rw-r--r--docs/specs/rapl-msr.rst154
-rw-r--r--docs/specs/rocker.rst (renamed from docs/specs/rocker.txt)181
-rw-r--r--docs/specs/spdm.rst134
-rw-r--r--docs/sphinx-static/theme_overrides.css49
-rw-r--r--docs/sphinx/depfile.py2
-rw-r--r--docs/sphinx/hxtool.py21
-rw-r--r--docs/sphinx/kerneldoc.py38
-rw-r--r--docs/sphinx/kernellog.py28
-rw-r--r--docs/sphinx/qapidoc.py158
-rw-r--r--docs/system/arm/aspeed.rst2
-rw-r--r--docs/system/arm/cubieboard.rst1
-rw-r--r--docs/system/arm/emulation.rst1
-rw-r--r--docs/system/arm/gumstix.rst21
-rw-r--r--docs/system/arm/mainstone.rst25
-rw-r--r--docs/system/arm/nseries.rst33
-rw-r--r--docs/system/arm/palm.rst23
-rw-r--r--docs/system/arm/stm32.rst3
-rw-r--r--docs/system/arm/xscale.rst35
-rw-r--r--docs/system/i386/xenpvh.rst49
-rw-r--r--docs/system/loongarch/virt.rst2
-rw-r--r--docs/system/ppc/powermac.rst4
-rw-r--r--docs/system/target-arm.rst5
-rw-r--r--docs/system/target-i386.rst1
-rw-r--r--docs/tools/index.rst2
-rw-r--r--docs/tools/qemu-vmsr-helper.rst89
-rw-r--r--docs/tools/virtfs-proxy-helper.rst75
-rw-r--r--docs/user/main.rst4
86 files changed, 3859 insertions, 2511 deletions
diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 20b7a17..ce38a3d 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -184,6 +184,25 @@ be an effective use of its limited resources, and thus intends to discontinue
it. Since all recent x86 hardware from the past >10 years is capable of the
64-bit x86 extensions, a corresponding 64-bit OS should be used instead.
+TCG Plugin support not enabled by default on 32-bit hosts (since 9.2)
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+While it is still possible to enable TCG plugin support for 32-bit
+hosts there are a number of potential pitfalls when instrumenting
+64-bit guests. The plugin APIs typically pass most addresses as
+uint64_t but practices like encoding that address in a host pointer
+for passing as user-data will lose data. As most software analysis
+benefits from having plenty of host memory it seems reasonable to
+encourage users to use 64 bit builds of QEMU for analysis work
+whatever targets they are instrumenting.
+
+TCG Plugin support not enabled by default with TCI (since 9.2)
+''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+While the TCG interpreter can interpret the TCG ops used by plugins it
+is going to be so much slower it wouldn't make sense for any serious
+instrumentation. Due to implementation differences there will also be
+anomalies in things like memory instrumentation.
System emulator CPUs
--------------------
@@ -206,14 +225,6 @@ in the QEMU object model anymore. ``Sun-UltraSparc-IIIi+`` and
but for consistency these will get removed in a future release, too.
Use ``Sun-UltraSparc-IIIi-plus`` and ``Sun-UltraSparc-IV-plus`` instead.
-CRIS CPU architecture (since 9.0)
-'''''''''''''''''''''''''''''''''
-
-The CRIS architecture was pulled from Linux in 4.17 and the compiler
-is no longer packaged in any distro making it harder to run the
-``check-tcg`` tests. Unless we can improve the testing situation there
-is a chance the code will bitrot without anyone noticing.
-
System emulator machines
------------------------
@@ -232,12 +243,6 @@ These old machine types are quite neglected nowadays and thus might have
various pitfalls with regards to live migration. Use a newer machine type
instead.
-``shix`` (since 9.0)
-''''''''''''''''''''
-
-The machine is no longer in existence and has been long unmaintained
-in QEMU. This also holds for the TC51828 16MiB flash that it uses.
-
``pseries-2.1`` up to ``pseries-2.12`` (since 9.0)
''''''''''''''''''''''''''''''''''''''''''''''''''
@@ -246,21 +251,6 @@ to correct issues, mostly regarding migration compatibility. These are
no longer maintained and removing them will make the code easier to
read and maintain. Use versions 3.0 and above as a replacement.
-Arm machines ``akita``, ``borzoi``, ``cheetah``, ``connex``, ``mainstone``, ``n800``, ``n810``, ``spitz``, ``terrier``, ``tosa``, ``verdex``, ``z2`` (since 9.0)
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-QEMU includes models of some machine types where the QEMU code that
-emulates their SoCs is very old and unmaintained. This code is now
-blocking our ability to move forward with various changes across
-the codebase, and over many years nobody has been interested in
-trying to modernise it. We don't expect any of these machines to have
-a large number of users, because they're all modelling hardware that
-has now passed away into history. We are therefore dropping support
-for all machine types using the PXA2xx and OMAP2 SoCs. We are also
-dropping the ``cheetah`` OMAP1 board, because we don't have any
-test images for it and don't know of anybody who does; the ``sx1``
-and ``sx1-v1`` OMAP1 machines remain supported for now.
-
PPC 405 ``ref405ep`` machine (since 9.1)
''''''''''''''''''''''''''''''''''''''''
@@ -324,41 +314,6 @@ the addition of volatile memory support, it is now necessary to distinguish
between persistent and volatile memory backends. As such, memdev is deprecated
in favor of persistent-memdev.
-``-fsdev proxy`` and ``-virtfs proxy`` (since 8.1)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The 9p ``proxy`` filesystem backend driver has been deprecated and will be
-removed (along with its proxy helper daemon) in a future version of QEMU. Please
-use ``-fsdev local`` or ``-virtfs local`` for using the 9p ``local`` filesystem
-backend, or alternatively consider deploying virtiofsd instead.
-
-The 9p ``proxy`` backend was originally developed as an alternative to the 9p
-``local`` backend. The idea was to enhance security by dispatching actual low
-level filesystem operations from 9p server (QEMU process) over to a separate
-process (the virtfs-proxy-helper binary). However this alternative never gained
-momentum. The proxy backend is much slower than the local backend, hasn't seen
-any development in years, and showed to be less secure, especially due to the
-fact that its helper daemon must be run as root, whereas with the local backend
-QEMU is typically run as unprivileged user and allows to tighten behaviour by
-mapping permissions et al by using its 'mapped' security model option.
-
-Nowadays it would make sense to reimplement the ``proxy`` backend by using
-QEMU's ``vhost`` feature, which would eliminate the high latency costs under
-which the 9p ``proxy`` backend currently suffers. However as of to date nobody
-has indicated plans for such kind of reimplementation unfortunately.
-
-RISC-V 'any' CPU type ``-cpu any`` (since 8.2)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The 'any' CPU type was introduced back in 2018 and has been around since the
-initial RISC-V QEMU port. Its usage has always been unclear: users don't know
-what to expect from a CPU called 'any', and in fact the CPU does not do anything
-special that isn't already done by the default CPUs rv32/rv64.
-
-After the introduction of the 'max' CPU type, RISC-V now has a good coverage
-of generic CPUs: rv32 and rv64 as default CPUs and 'max' as a feature complete
-CPU for both 32 and 64 bit builds. Users are then discouraged to use the 'any'
-CPU type starting in 8.2.
RISC-V CPU properties which start with capital 'Z' (since 8.2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -422,6 +377,15 @@ Specifying the iSCSI password in plain text on the command line using the
used instead, to refer to a ``--object secret...`` instance that provides
a password via a file, or encrypted.
+``gluster`` backend (since 9.2)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+According to https://marc.info/?l=fedora-devel-list&m=171934833215726
+the GlusterFS development effectively ended. Unless the development
+gains momentum again, the QEMU project will remove the gluster backend
+in a future release.
+
+
Character device options
''''''''''''''''''''''''
@@ -430,6 +394,12 @@ Backend ``memory`` (since 9.0)
``memory`` is a deprecated synonym for ``ringbuf``.
+``reconnect`` (since 9.2)
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``reconnect`` option only allows specifiying second granularity timeouts,
+which is not enough for all types of use cases, use ``reconnect-ms`` instead.
+
CPU device properties
'''''''''''''''''''''
@@ -479,6 +449,17 @@ versions, aliases will point to newer CPU model versions
depending on the machine type, so management software must
resolve CPU model aliases before starting a virtual machine.
+RISC-V "virt" board "riscv,delegate" DT property (since 9.1)
+''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+The "riscv,delegate" DT property was added in QEMU 7.0 as part of
+the AIA APLIC support. The property changed name during the review
+process in Linux and the correct name ended up being
+"riscv,delegation". Changing the DT property name will break all
+available firmwares that are using the current (wrong) name. The
+property is kept as is in 9.1, together with "riscv,delegation", to
+give more time for firmware developers to change their code.
+
Migration
---------
@@ -492,3 +473,9 @@ usage of providing a file descriptor to a plain file has been
deprecated in favor of explicitly using the ``file:`` URI with the
file descriptor being passed as an ``fdset``. Refer to the ``add-fd``
command documentation for details on the ``fdset`` usage.
+
+``zero-blocks`` capability (since 9.2)
+''''''''''''''''''''''''''''''''''''''
+
+The ``zero-blocks`` capability was part of the block migration which
+doesn't exist anymore since it was removed in QEMU v9.1.
diff --git a/docs/about/emulation.rst b/docs/about/emulation.rst
index b5ff9c5..3028d5f 100644
--- a/docs/about/emulation.rst
+++ b/docs/about/emulation.rst
@@ -26,10 +26,6 @@ depending on the guest architecture.
- :ref:`Yes<AVR-System-emulator>`
- No
- 8 bit micro controller, often used in maker projects
- * - Cris
- - Yes
- - Yes
- - Embedded RISC chip developed by AXIS
* - Hexagon
- No
- Yes
@@ -42,7 +38,7 @@ depending on the guest architecture.
- :ref:`Yes<QEMU-PC-System-emulator>`
- Yes
- The ubiquitous desktop PC CPU architecture, 32 and 64 bit.
- * - Loongarch
+ * - LoongArch
- Yes
- Yes
- A MIPS-like 64bit RISC architecture developed in China
@@ -95,9 +91,6 @@ depending on the guest architecture.
- Yes
- A configurable 32 bit soft core now owned by Cadence
-A number of features are only available when running under
-emulation including :ref:`Record/Replay<replay>` and :ref:`TCG Plugins`.
-
.. _Semihosting:
Semihosting
@@ -182,3 +175,644 @@ for that architecture.
* - Xtensa
- System
- Tensilica ISS SIMCALL
+
+TCG Plugins
+-----------
+
+QEMU TCG plugins provide a way for users to run experiments taking
+advantage of the total system control emulation can have over a guest.
+It provides a mechanism for plugins to subscribe to events during
+translation and execution and optionally callback into the plugin
+during these events. TCG plugins are unable to change the system state
+only monitor it passively. However they can do this down to an
+individual instruction granularity including potentially subscribing
+to all load and store operations.
+
+See the developer section of the manual for details about
+:ref:`writing plugins<TCG Plugins>`.
+
+Usage
+~~~~~
+
+Any QEMU binary with TCG support has plugins enabled by default.
+Earlier releases needed to be explicitly enabled with::
+
+ configure --enable-plugins
+
+Once built a program can be run with multiple plugins loaded each with
+their own arguments::
+
+ $QEMU $OTHER_QEMU_ARGS \
+ -plugin contrib/plugins/libhowvec.so,inline=on,count=hint \
+ -plugin contrib/plugins/libhotblocks.so
+
+Arguments are plugin specific and can be used to modify their
+behaviour. In this case the howvec plugin is being asked to use inline
+ops to count and break down the hint instructions by type.
+
+Linux user-mode emulation also evaluates the environment variable
+``QEMU_PLUGIN``::
+
+ QEMU_PLUGIN="file=contrib/plugins/libhowvec.so,inline=on,count=hint" $QEMU
+
+QEMU plugins avoid to write directly to stdin/stderr, and use the log provided
+by the API (see function ``qemu_plugin_outs``).
+To show output, you may use this additional parameter::
+
+ $QEMU $OTHER_QEMU_ARGS \
+ -d plugin \
+ -plugin contrib/plugins/libhowvec.so,inline=on,count=hint
+
+Example Plugins
+~~~~~~~~~~~~~~~
+
+There are a number of plugins included with QEMU and you are
+encouraged to contribute your own plugins plugins upstream. There is a
+``contrib/plugins`` directory where they can go. There are also some
+basic plugins that are used to test and exercise the API during the
+``make check-tcg`` target in ``tests/tcg/plugins`` that are never the
+less useful for basic analysis.
+
+Empty
+.....
+
+``tests/tcg/plugins/empty.c``
+
+Purely a test plugin for measuring the overhead of the plugins system
+itself. Does no instrumentation.
+
+Basic Blocks
+............
+
+``tests/tcg/plugins/bb.c``
+
+A very basic plugin which will measure execution in coarse terms as
+each basic block is executed. By default the results are shown once
+execution finishes::
+
+ $ qemu-aarch64 -plugin tests/plugin/libbb.so \
+ -d plugin ./tests/tcg/aarch64-linux-user/sha1
+ SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
+ bb's: 2277338, insns: 158483046
+
+Behaviour can be tweaked with the following arguments:
+
+.. list-table:: Basic Block plugin arguments
+ :widths: 20 80
+ :header-rows: 1
+
+ * - Option
+ - Description
+ * - inline=true|false
+ - Use faster inline addition of a single counter.
+ * - idle=true|false
+ - Dump the current execution stats whenever the guest vCPU idles
+
+Basic Block Vectors
+...................
+
+``contrib/plugins/bbv.c``
+
+The bbv plugin allows you to generate basic block vectors for use with the
+`SimPoint <https://cseweb.ucsd.edu/~calder/simpoint/>`__ analysis tool.
+
+.. list-table:: Basic block vectors arguments
+ :widths: 20 80
+ :header-rows: 1
+
+ * - Option
+ - Description
+ * - interval=N
+ - The interval to generate a basic block vector specified by the number of
+ instructions (Default: N = 100000000)
+ * - outfile=PATH
+ - The path to output files.
+ It will be suffixed with ``.N.bb`` where ``N`` is a vCPU index.
+
+Example::
+
+ $ qemu-aarch64 \
+ -plugin contrib/plugins/libbbv.so,interval=100,outfile=sha1 \
+ tests/tcg/aarch64-linux-user/sha1
+ SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
+ $ du sha1.0.bb
+ 23128 sha1.0.bb
+
+Instruction
+...........
+
+``tests/tcg/plugins/insn.c``
+
+This is a basic instruction level instrumentation which can count the
+number of instructions executed on each core/thread::
+
+ $ qemu-aarch64 -plugin tests/plugin/libinsn.so \
+ -d plugin ./tests/tcg/aarch64-linux-user/threadcount
+ Created 10 threads
+ Done
+ cpu 0 insns: 46765
+ cpu 1 insns: 3694
+ cpu 2 insns: 3694
+ cpu 3 insns: 2994
+ cpu 4 insns: 1497
+ cpu 5 insns: 1497
+ cpu 6 insns: 1497
+ cpu 7 insns: 1497
+ total insns: 63135
+
+Behaviour can be tweaked with the following arguments:
+
+.. list-table:: Instruction plugin arguments
+ :widths: 20 80
+ :header-rows: 1
+
+ * - Option
+ - Description
+ * - inline=true|false
+ - Use faster inline addition of a single counter.
+ * - sizes=true|false
+ - Give a summary of the instruction sizes for the execution
+ * - match=<string>
+ - Only instrument instructions matching the string prefix
+
+The ``match`` option will show some basic stats including how many
+instructions have executed since the last execution. For
+example::
+
+ $ qemu-aarch64 -plugin tests/plugin/libinsn.so,match=bl \
+ -d plugin ./tests/tcg/aarch64-linux-user/sha512-vector
+ ...
+ 0x40069c, 'bl #0x4002b0', 10 hits, 1093 match hits, Δ+1257 since last match, 98 avg insns/match
+ 0x4006ac, 'bl #0x403690', 10 hits, 1094 match hits, Δ+47 since last match, 98 avg insns/match
+ 0x4037fc, 'bl #0x4002b0', 18 hits, 1095 match hits, Δ+22 since last match, 98 avg insns/match
+ 0x400720, 'bl #0x403690', 10 hits, 1096 match hits, Δ+58 since last match, 98 avg insns/match
+ 0x4037fc, 'bl #0x4002b0', 19 hits, 1097 match hits, Δ+22 since last match, 98 avg insns/match
+ 0x400730, 'bl #0x403690', 10 hits, 1098 match hits, Δ+33 since last match, 98 avg insns/match
+ 0x4037ac, 'bl #0x4002b0', 12 hits, 1099 match hits, Δ+20 since last match, 98 avg insns/match
+ ...
+
+For more detailed execution tracing see the ``execlog`` plugin for
+other options.
+
+Memory
+......
+
+``tests/tcg/plugins/mem.c``
+
+Basic instruction level memory instrumentation::
+
+ $ qemu-aarch64 -plugin tests/plugin/libmem.so,inline=true \
+ -d plugin ./tests/tcg/aarch64-linux-user/sha1
+ SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
+ inline mem accesses: 79525013
+
+Behaviour can be tweaked with the following arguments:
+
+.. list-table:: Memory plugin arguments
+ :widths: 20 80
+ :header-rows: 1
+
+ * - Option
+ - Description
+ * - inline=true|false
+ - Use faster inline addition of a single counter
+ * - callback=true|false
+ - Use callbacks on each memory instrumentation.
+ * - hwaddr=true|false
+ - Count IO accesses (only for system emulation)
+
+System Calls
+............
+
+``tests/tcg/plugins/syscall.c``
+
+A basic syscall tracing plugin. This only works for user-mode. By
+default it will give a summary of syscall stats at the end of the
+run::
+
+ $ qemu-aarch64 -plugin tests/plugin/libsyscall \
+ -d plugin ./tests/tcg/aarch64-linux-user/threadcount
+ Created 10 threads
+ Done
+ syscall no. calls errors
+ 226 12 0
+ 99 11 11
+ 115 11 0
+ 222 11 0
+ 93 10 0
+ 220 10 0
+ 233 10 0
+ 215 8 0
+ 214 4 0
+ 134 2 0
+ 64 2 0
+ 96 1 0
+ 94 1 0
+ 80 1 0
+ 261 1 0
+ 78 1 0
+ 160 1 0
+ 135 1 0
+
+Behaviour can be tweaked with the following arguments:
+
+.. list-table:: Syscall plugin arguments
+ :widths: 20 80
+ :header-rows: 1
+
+ * - Option
+ - Description
+ * - print=true|false
+ - Print the number of times each syscall is called
+ * - log_writes=true|false
+ - Log the buffer of each write syscall in hexdump format
+
+Test inline operations
+......................
+
+``tests/plugins/inline.c``
+
+This plugin is used for testing all inline operations, conditional callbacks and
+scoreboard. It prints a per-cpu summary of all events.
+
+
+Hot Blocks
+..........
+
+``contrib/plugins/hotblocks.c``
+
+The hotblocks plugin allows you to examine the where hot paths of
+execution are in your program. Once the program has finished you will
+get a sorted list of blocks reporting the starting PC, translation
+count, number of instructions and execution count. This will work best
+with linux-user execution as system emulation tends to generate
+re-translations as blocks from different programs get swapped in and
+out of system memory.
+
+Example::
+
+ $ qemu-aarch64 \
+ -plugin contrib/plugins/libhotblocks.so -d plugin \
+ ./tests/tcg/aarch64-linux-user/sha1
+ SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
+ collected 903 entries in the hash table
+ pc, tcount, icount, ecount
+ 0x0000000041ed10, 1, 5, 66087
+ 0x000000004002b0, 1, 4, 66087
+ ...
+
+
+Hot Pages
+.........
+
+``contrib/plugins/hotpages.c``
+
+Similar to hotblocks but this time tracks memory accesses::
+
+ $ qemu-aarch64 \
+ -plugin contrib/plugins/libhotpages.so -d plugin \
+ ./tests/tcg/aarch64-linux-user/sha1
+ SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
+ Addr, RCPUs, Reads, WCPUs, Writes
+ 0x000055007fe000, 0x0001, 31747952, 0x0001, 8835161
+ 0x000055007ff000, 0x0001, 29001054, 0x0001, 8780625
+ 0x00005500800000, 0x0001, 687465, 0x0001, 335857
+ 0x0000000048b000, 0x0001, 130594, 0x0001, 355
+ 0x0000000048a000, 0x0001, 1826, 0x0001, 11
+
+The hotpages plugin can be configured using the following arguments:
+
+.. list-table:: Hot pages arguments
+ :widths: 20 80
+ :header-rows: 1
+
+ * - Option
+ - Description
+ * - sortby=reads|writes|address
+ - Log the data sorted by either the number of reads, the number of writes, or
+ memory address. (Default: entries are sorted by the sum of reads and writes)
+ * - io=on
+ - Track IO addresses. Only relevant to full system emulation. (Default: off)
+ * - pagesize=N
+ - The page size used. (Default: N = 4096)
+
+Instruction Distribution
+........................
+
+``contrib/plugins/howvec.c``
+
+This is an instruction classifier so can be used to count different
+types of instructions. It has a number of options to refine which get
+counted. You can give a value to the ``count`` argument for a class of
+instructions to break it down fully, so for example to see all the system
+registers accesses::
+
+ $ qemu-system-aarch64 $(QEMU_ARGS) \
+ -append "root=/dev/sda2 systemd.unit=benchmark.service" \
+ -smp 4 -plugin ./contrib/plugins/libhowvec.so,count=sreg -d plugin
+
+which will lead to a sorted list after the class breakdown::
+
+ Instruction Classes:
+ Class: UDEF not counted
+ Class: SVE (68 hits)
+ Class: PCrel addr (47789483 hits)
+ Class: Add/Sub (imm) (192817388 hits)
+ Class: Logical (imm) (93852565 hits)
+ Class: Move Wide (imm) (76398116 hits)
+ Class: Bitfield (44706084 hits)
+ Class: Extract (5499257 hits)
+ Class: Cond Branch (imm) (147202932 hits)
+ Class: Exception Gen (193581 hits)
+ Class: NOP not counted
+ Class: Hints (6652291 hits)
+ Class: Barriers (8001661 hits)
+ Class: PSTATE (1801695 hits)
+ Class: System Insn (6385349 hits)
+ Class: System Reg counted individually
+ Class: Branch (reg) (69497127 hits)
+ Class: Branch (imm) (84393665 hits)
+ Class: Cmp & Branch (110929659 hits)
+ Class: Tst & Branch (44681442 hits)
+ Class: AdvSimd ldstmult (736 hits)
+ Class: ldst excl (9098783 hits)
+ Class: Load Reg (lit) (87189424 hits)
+ Class: ldst noalloc pair (3264433 hits)
+ Class: ldst pair (412526434 hits)
+ Class: ldst reg (imm) (314734576 hits)
+ Class: Loads & Stores (2117774 hits)
+ Class: Data Proc Reg (223519077 hits)
+ Class: Scalar FP (31657954 hits)
+ Individual Instructions:
+ Instr: mrs x0, sp_el0 (2682661 hits) (op=0xd5384100/ System Reg)
+ Instr: mrs x1, tpidr_el2 (1789339 hits) (op=0xd53cd041/ System Reg)
+ Instr: mrs x2, tpidr_el2 (1513494 hits) (op=0xd53cd042/ System Reg)
+ Instr: mrs x0, tpidr_el2 (1490823 hits) (op=0xd53cd040/ System Reg)
+ Instr: mrs x1, sp_el0 (933793 hits) (op=0xd5384101/ System Reg)
+ Instr: mrs x2, sp_el0 (699516 hits) (op=0xd5384102/ System Reg)
+ Instr: mrs x4, tpidr_el2 (528437 hits) (op=0xd53cd044/ System Reg)
+ Instr: mrs x30, ttbr1_el1 (480776 hits) (op=0xd538203e/ System Reg)
+ Instr: msr ttbr1_el1, x30 (480713 hits) (op=0xd518203e/ System Reg)
+ Instr: msr vbar_el1, x30 (480671 hits) (op=0xd518c01e/ System Reg)
+ ...
+
+To find the argument shorthand for the class you need to examine the
+source code of the plugin at the moment, specifically the ``*opt``
+argument in the InsnClassExecCount tables.
+
+Lockstep Execution
+..................
+
+``contrib/plugins/lockstep.c``
+
+This is a debugging tool for developers who want to find out when and
+where execution diverges after a subtle change to TCG code generation.
+It is not an exact science and results are likely to be mixed once
+asynchronous events are introduced. While the use of -icount can
+introduce determinism to the execution flow it doesn't always follow
+the translation sequence will be exactly the same. Typically this is
+caused by a timer firing to service the GUI causing a block to end
+early. However in some cases it has proved to be useful in pointing
+people at roughly where execution diverges. The only argument you need
+for the plugin is a path for the socket the two instances will
+communicate over::
+
+
+ $ qemu-system-sparc -monitor none -parallel none \
+ -net none -M SS-20 -m 256 -kernel day11/zImage.elf \
+ -plugin ./contrib/plugins/liblockstep.so,sockpath=lockstep-sparc.sock \
+ -d plugin,nochain
+
+which will eventually report::
+
+ qemu-system-sparc: warning: nic lance.0 has no peer
+ @ 0x000000ffd06678 vs 0x000000ffd001e0 (2/1 since last)
+ @ 0x000000ffd07d9c vs 0x000000ffd06678 (3/1 since last)
+ Δ insn_count @ 0x000000ffd07d9c (809900609) vs 0x000000ffd06678 (809900612)
+ previously @ 0x000000ffd06678/10 (809900609 insns)
+ previously @ 0x000000ffd001e0/4 (809900599 insns)
+ previously @ 0x000000ffd080ac/2 (809900595 insns)
+ previously @ 0x000000ffd08098/5 (809900593 insns)
+ previously @ 0x000000ffd080c0/1 (809900588 insns)
+
+
+Hardware Profile
+................
+
+``contrib/plugins/hwprofile.c``
+
+The hwprofile tool can only be used with system emulation and allows
+the user to see what hardware is accessed how often. It has a number of options:
+
+.. list-table:: Hardware Profile arguments
+ :widths: 20 80
+ :header-rows: 1
+
+ * - Option
+ - Description
+ * - track=[read|write]
+ - By default the plugin tracks both reads and writes. You can use
+ this option to limit the tracking to just one class of accesses.
+ * - source
+ - Will include a detailed break down of what the guest PC that made the
+ access was. Not compatible with the pattern option. Example output::
+
+ cirrus-low-memory @ 0xfffffd00000a0000
+ pc:fffffc0000005cdc, 1, 256
+ pc:fffffc0000005ce8, 1, 256
+ pc:fffffc0000005cec, 1, 256
+
+ * - pattern
+ - Instead break down the accesses based on the offset into the HW
+ region. This can be useful for seeing the most used registers of
+ a device. Example output::
+
+ pci0-conf @ 0xfffffd01fe000000
+ off:00000004, 1, 1
+ off:00000010, 1, 3
+ off:00000014, 1, 3
+ off:00000018, 1, 2
+ off:0000001c, 1, 2
+ off:00000020, 1, 2
+ ...
+
+
+Execution Log
+.............
+
+``contrib/plugins/execlog.c``
+
+The execlog tool traces executed instructions with memory access. It can be used
+for debugging and security analysis purposes.
+Please be aware that this will generate a lot of output.
+
+The plugin needs default argument::
+
+ $ qemu-system-arm $(QEMU_ARGS) \
+ -plugin ./contrib/plugins/libexeclog.so -d plugin
+
+which will output an execution trace following this structure::
+
+ # vCPU, vAddr, opcode, disassembly[, load/store, memory addr, device]...
+ 0, 0xa12, 0xf8012400, "movs r4, #0"
+ 0, 0xa14, 0xf87f42b4, "cmp r4, r6"
+ 0, 0xa16, 0xd206, "bhs #0xa26"
+ 0, 0xa18, 0xfff94803, "ldr r0, [pc, #0xc]", load, 0x00010a28, RAM
+ 0, 0xa1a, 0xf989f000, "bl #0xd30"
+ 0, 0xd30, 0xfff9b510, "push {r4, lr}", store, 0x20003ee0, RAM, store, 0x20003ee4, RAM
+ 0, 0xd32, 0xf9893014, "adds r0, #0x14"
+ 0, 0xd34, 0xf9c8f000, "bl #0x10c8"
+ 0, 0x10c8, 0xfff96c43, "ldr r3, [r0, #0x44]", load, 0x200000e4, RAM
+
+Please note that you need to configure QEMU with Capstone support to get disassembly.
+
+The output can be filtered to only track certain instructions or
+addresses using the ``ifilter`` or ``afilter`` options. You can stack the
+arguments if required::
+
+ $ qemu-system-arm $(QEMU_ARGS) \
+ -plugin ./contrib/plugins/libexeclog.so,ifilter=st1w,afilter=0x40001808 -d plugin
+
+This plugin can also dump registers when they change value. Specify the name of the
+registers with multiple ``reg`` options. You can also use glob style matching if you wish::
+
+ $ qemu-system-arm $(QEMU_ARGS) \
+ -plugin ./contrib/plugins/libexeclog.so,reg=\*_el2,reg=sp -d plugin
+
+Be aware that each additional register to check will slow down
+execution quite considerably. You can optimise the number of register
+checks done by using the rdisas option. This will only instrument
+instructions that mention the registers in question in disassembly.
+This is not foolproof as some instructions implicitly change
+instructions. You can use the ifilter to catch these cases::
+
+ $ qemu-system-arm $(QEMU_ARGS) \
+ -plugin ./contrib/plugins/libexeclog.so,ifilter=msr,ifilter=blr,reg=x30,reg=\*_el1,rdisas=on
+
+Cache Modelling
+...............
+
+``contrib/plugins/cache.c``
+
+Cache modelling plugin that measures the performance of a given L1 cache
+configuration, and optionally a unified L2 per-core cache when a given working
+set is run::
+
+ $ qemu-x86_64 -plugin ./contrib/plugins/libcache.so \
+ -d plugin -D cache.log ./tests/tcg/x86_64-linux-user/float_convs
+
+will report the following::
+
+ core #, data accesses, data misses, dmiss rate, insn accesses, insn misses, imiss rate
+ 0 996695 508 0.0510% 2642799 18617 0.7044%
+
+ address, data misses, instruction
+ 0x424f1e (_int_malloc), 109, movq %rax, 8(%rcx)
+ 0x41f395 (_IO_default_xsputn), 49, movb %dl, (%rdi, %rax)
+ 0x42584d (ptmalloc_init.part.0), 33, movaps %xmm0, (%rax)
+ 0x454d48 (__tunables_init), 20, cmpb $0, (%r8)
+ ...
+
+ address, fetch misses, instruction
+ 0x4160a0 (__vfprintf_internal), 744, movl $1, %ebx
+ 0x41f0a0 (_IO_setb), 744, endbr64
+ 0x415882 (__vfprintf_internal), 744, movq %r12, %rdi
+ 0x4268a0 (__malloc), 696, andq $0xfffffffffffffff0, %rax
+ ...
+
+The plugin has a number of arguments, all of them are optional:
+
+.. list-table:: Cache modelling arguments
+ :widths: 20 80
+ :header-rows: 1
+
+ * - Option
+ - Description
+ * - limit=N
+ - Print top N icache and dcache thrashing instructions along with
+ their address, number of misses, and its disassembly. (default: 32)
+ * - icachesize=N
+ iblksize=B
+ iassoc=A
+ - Instruction cache configuration arguments. They specify the
+ cache size, block size, and associativity of the instruction
+ cache, respectively. (default: N = 16384, B = 64, A = 8)
+ * - dcachesize=N
+ - Data cache size (default: 16834)
+ * - dblksize=B
+ - Data cache block size (default: 64)
+ * - dassoc=A
+ - Data cache associativity (default: 8)
+ * - evict=POLICY
+ - Sets the eviction policy to POLICY. Available policies are:
+ ``lru``, ``fifo``, and ``rand``. The plugin will use
+ the specified policy for both instruction and data caches.
+ (default: POLICY = ``lru``)
+ * - cores=N
+ - Sets the number of cores for which we maintain separate icache
+ and dcache. (default: for linux-user, N = 1, for full system
+ emulation: N = cores available to guest)
+ * - l2=on
+ - Simulates a unified L2 cache (stores blocks for both
+ instructions and data) using the default L2 configuration (cache
+ size = 2MB, associativity = 16-way, block size = 64B).
+ * - l2cachesize=N
+ - L2 cache size (default: 2097152 (2MB)), implies ``l2=on``
+ * - l2blksize=B
+ - L2 cache block size (default: 64), implies ``l2=on``
+ * - l2assoc=A
+ - L2 cache associativity (default: 16), implies ``l2=on``
+
+Stop on Trigger
+...............
+
+``contrib/plugins/stoptrigger.c``
+
+The stoptrigger plugin allows to setup triggers to stop emulation.
+It can be used for research purposes to launch some code and precisely stop it
+and understand where its execution flow went.
+
+Two types of triggers can be configured: a count of instructions to stop at,
+or an address to stop at. Multiple triggers can be set at once.
+
+By default, QEMU will exit with return code 0. A custom return code can be
+configured for each trigger using ``:CODE`` syntax.
+
+For example, to stop at the 20-th instruction with return code 41, at address
+0xd4 with return code 0 or at address 0xd8 with return code 42::
+
+ $ qemu-system-aarch64 $(QEMU_ARGS) \
+ -plugin ./contrib/plugins/libstoptrigger.so,icount=20:41,addr=0xd4,addr=0xd8:42 -d plugin
+
+The plugin will log the reason of exit, for example::
+
+ 0xd4 reached, exiting
+
+Limit instructions per second
+.............................
+
+This plugin can limit the number of Instructions Per Second that are executed::
+
+ # get number of instructions
+ $ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d plugin /bin/true |& grep total | sed -e 's/.*: //')
+ # limit speed to execute in 10 seconds
+ $ time ./build/qemu-x86_64 -plugin ./build/contrib/plugins/libips.so,ips=$(($num_insn/10)) /bin/true
+ real 10.000s
+
+
+.. list-table:: IPS arguments
+ :widths: 20 80
+ :header-rows: 1
+
+ * - Option
+ - Description
+ * - ips=N
+ - Maximum number of instructions per cpu that can be executed in one second.
+ The plugin will sleep when the given number of instructions is reached.
+
+Other emulation features
+------------------------
+
+When running system emulation you can also enable deterministic
+execution which allows for repeatable record/replay debugging. See
+:ref:`Record/Replay<replay>` for more details.
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index fc7b28e..912e0a1 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -517,6 +517,43 @@ The virtio-blk SCSI passthrough feature is a legacy VIRTIO feature. VIRTIO 1.0
and later do not support it because the virtio-scsi device was introduced for
full SCSI support. Use virtio-scsi instead when SCSI passthrough is required.
+``-fsdev proxy`` and ``-virtfs proxy`` (since 9.2)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The 9p ``proxy`` filesystem backend driver was originally developed to
+enhance security by dispatching low level filesystem operations from 9p
+server (QEMU process) over to a separate process (the virtfs-proxy-helper
+binary). However the proxy backend was much slower than the local backend,
+didn't see any development in years, and showed to be less secure,
+especially due to the fact that its helper daemon must be run as root.
+
+Use ``local``, possibly mapping permissions et al by using its 'mapped'
+security model option, or switch to ``virtiofs``. The virtiofs daemon
+``virtiofsd`` uses vhost to eliminate the high latency costs of the 9p
+``proxy`` backend.
+
+``-portrait`` and ``-rotate`` (since 9.2)
+'''''''''''''''''''''''''''''''''''''''''
+
+The ``-portrait`` and ``-rotate`` options were documented as only
+working with the PXA LCD device, and all the machine types using
+that display device were removed in 9.2, so these options also
+have been dropped.
+
+These options were intended to simulate a mobile device being
+rotated by the user, and had three effects:
+
+* the display output was rotated by 90, 180 or 270 degrees
+* the mouse/trackpad input was rotated the opposite way
+* the machine model would signal to the guest about its
+ orientation
+
+Of these three things, the input-rotation was coded without being
+restricted to boards which supported the full set of device-rotation
+handling, so in theory the options were usable on other machine models
+to produce an odd effect (rotating input but not display output). But
+this was never intended or documented behaviour, so we have dropped
+the options along with the machine models they were intended for.
User-mode emulator command line arguments
-----------------------------------------
@@ -850,6 +887,14 @@ The RISC-V no MMU cpus have been removed. The two CPUs: ``rv32imacu-nommu`` and
``rv64imacu-nommu`` can no longer be used. Instead the MMU status can be specified
via the CPU ``mmu`` option when using the ``rv32`` or ``rv64`` CPUs.
+RISC-V 'any' CPU type ``-cpu any`` (removed in 9.2)
+'''''''''''''''''''''''''''''''''''''''''''''''''''
+
+The 'any' CPU type was introduced back in 2018 and was around since the
+initial RISC-V QEMU port. Its usage was always been unclear: users don't know
+what to expect from a CPU called 'any', and in fact the CPU does not do anything
+special that isn't already done by the default CPUs rv32/rv64.
+
``compat`` property of server class POWER CPUs (removed in 6.0)
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
@@ -889,6 +934,13 @@ Nios II CPU (removed in 9.1)
QEMU Nios II architecture was orphan; Intel has EOL'ed the Nios II
processor IP (see `Intel discontinuance notification`_).
+CRIS CPU architecture (removed in 9.2)
+''''''''''''''''''''''''''''''''''''''
+
+The CRIS architecture was pulled from Linux in 4.17 and the compiler
+was no longer packaged in any distro making it harder to run the
+``check-tcg`` tests.
+
System accelerators
-------------------
@@ -978,6 +1030,25 @@ Nios II ``10m50-ghrd`` and ``nios2-generic-nommu`` machines (removed in 9.1)
The Nios II architecture was orphan.
+``shix`` (removed in 9.2)
+'''''''''''''''''''''''''
+
+The machine was unmaintained.
+
+Arm machines ``akita``, ``borzoi``, ``cheetah``, ``connex``, ``mainstone``, ``n800``, ``n810``, ``spitz``, ``terrier``, ``tosa``, ``verdex``, ``z2`` (removed in 9.2)
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+QEMU included models of some machine types where the QEMU code that
+emulates their SoCs was very old and unmaintained. This code was
+blocking our ability to move forward with various changes across
+the codebase, and over many years nobody has been interested in
+trying to modernise it. We don't expect any of these machines to have
+a large number of users, because they're all modelling hardware that
+has now passed away into history. We are therefore dropping support
+for all machine types using the PXA2xx and OMAP2 SoCs. We are also
+dropping the ``cheetah`` OMAP1 board, because we don't have any
+test images for it and don't know of anybody who does.
+
linux-user mode CPUs
--------------------
diff --git a/docs/conf.py b/docs/conf.py
index aae0304..c11a6ea 100644
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -53,10 +53,9 @@ sys.path.insert(0, os.path.join(qemu_docdir, "../scripts"))
# If your documentation needs a minimal Sphinx version, state it here.
#
-# Sphinx 1.5 and earlier can't build our docs because they are too
-# picky about the syntax of the argument to the option:: directive
-# (see Sphinx bugs #646, #3366).
-needs_sphinx = '1.6'
+# 3.4.3 is the oldest version of Sphinx that ships on a platform we
+# pledge build support for.
+needs_sphinx = '3.4.3'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
@@ -276,9 +275,6 @@ man_pages = [
('tools/qemu-trace-stap', 'qemu-trace-stap',
'QEMU SystemTap trace tool',
[], 1),
- ('tools/virtfs-proxy-helper', 'virtfs-proxy-helper',
- 'QEMU 9p virtfs proxy filesystem helper',
- ['M. Mohan Kumar'], 1),
]
man_make_section_directory = False
diff --git a/docs/devel/atomics.rst b/docs/devel/atomics.rst
index b77c6e1..95c7b77 100644
--- a/docs/devel/atomics.rst
+++ b/docs/devel/atomics.rst
@@ -204,7 +204,7 @@ They come in six kinds:
before the second with respect to the other components of the system.
Therefore, unlike ``smp_rmb()`` or ``qatomic_load_acquire()``,
``smp_read_barrier_depends()`` can be just a compiler barrier on
- weakly-ordered architectures such as Arm or PPC[#]_.
+ weakly-ordered architectures such as Arm or PPC\ [#alpha]_.
Note that the first load really has to have a _data_ dependency and not
a control dependency. If the address for the second load is dependent
@@ -212,7 +212,7 @@ They come in six kinds:
than actually loading the address itself, then it's a _control_
dependency and a full read barrier or better is required.
-.. [#] The DEC Alpha is an exception, because ``smp_read_barrier_depends()``
+.. [#alpha] The DEC Alpha is an exception, because ``smp_read_barrier_depends()``
needs a processor barrier. On strongly-ordered architectures such
as x86 or s390, ``smp_rmb()`` and ``qatomic_load_acquire()`` can
also be compiler barriers only.
@@ -295,7 +295,7 @@ Acquire/release pairing and the *synchronizes-with* relation
------------------------------------------------------------
Atomic operations other than ``qatomic_set()`` and ``qatomic_read()`` have
-either *acquire* or *release* semantics [#rmw]_. This has two effects:
+either *acquire* or *release* semantics\ [#rmw]_. This has two effects:
.. [#rmw] Read-modify-write operations can have both---acquire applies to the
read part, and release to the write.
diff --git a/docs/devel/blkdebug.txt b/docs/devel/blkdebug.txt
deleted file mode 100644
index 0b0c128..0000000
--- a/docs/devel/blkdebug.txt
+++ /dev/null
@@ -1,162 +0,0 @@
-Block I/O error injection using blkdebug
-----------------------------------------
-Copyright (C) 2014-2015 Red Hat Inc
-
-This work is licensed under the terms of the GNU GPL, version 2 or later. See
-the COPYING file in the top-level directory.
-
-The blkdebug block driver is a rule-based error injection engine. It can be
-used to exercise error code paths in block drivers including ENOSPC (out of
-space) and EIO.
-
-This document gives an overview of the features available in blkdebug.
-
-Background
-----------
-Block drivers have many error code paths that handle I/O errors. Image formats
-are especially complex since metadata I/O errors during cluster allocation or
-while updating tables happen halfway through request processing and require
-discipline to keep image files consistent.
-
-Error injection allows test cases to trigger I/O errors at specific points.
-This way, all error paths can be tested to make sure they are correct.
-
-Rules
------
-The blkdebug block driver takes a list of "rules" that tell the error injection
-engine when to fail an I/O request.
-
-Each I/O request is evaluated against the rules. If a rule matches the request
-then its "action" is executed.
-
-Rules can be placed in a configuration file; the configuration file
-follows the same .ini-like format used by QEMU's -readconfig option, and
-each section of the file represents a rule.
-
-The following configuration file defines a single rule:
-
- $ cat blkdebug.conf
- [inject-error]
- event = "read_aio"
- errno = "28"
-
-This rule fails all aio read requests with ENOSPC (28). Note that the errno
-value depends on the host. On Linux, see
-/usr/include/asm-generic/errno-base.h for errno values.
-
-Invoke QEMU as follows:
-
- $ qemu-system-x86_64
- -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
- -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
-
-Rules support the following attributes:
-
- event - which type of operation to match (e.g. read_aio, write_aio,
- flush_to_os, flush_to_disk). See the "Events" section for
- information on events.
-
- state - (optional) the engine must be in this state number in order for this
- rule to match. See the "State transitions" section for information
- on states.
-
- errno - the numeric errno value to return when a request matches this rule.
- The errno values depend on the host since the numeric values are not
- standardized in the POSIX specification.
-
- sector - (optional) a sector number that the request must overlap in order to
- match this rule
-
- once - (optional, default "off") only execute this action on the first
- matching request
-
- immediately - (optional, default "off") return a NULL BlockAIOCB
- pointer and fail without an errno instead. This
- exercises the code path where BlockAIOCB fails and the
- caller's BlockCompletionFunc is not invoked.
-
-Events
-------
-Block drivers provide information about the type of I/O request they are about
-to make so rules can match specific types of requests. For example, the qcow2
-block driver tells blkdebug when it accesses the L1 table so rules can match
-only L1 table accesses and not other metadata or guest data requests.
-
-The core events are:
-
- read_aio - guest data read
-
- write_aio - guest data write
-
- flush_to_os - write out unwritten block driver state (e.g. cached metadata)
-
- flush_to_disk - flush the host block device's disk cache
-
-See qapi/block-core.json:BlkdebugEvent for the full list of events.
-You may need to grep block driver source code to understand the
-meaning of specific events.
-
-State transitions
------------------
-There are cases where more power is needed to match a particular I/O request in
-a longer sequence of requests. For example:
-
- write_aio
- flush_to_disk
- write_aio
-
-How do we match the 2nd write_aio but not the first? This is where state
-transitions come in.
-
-The error injection engine has an integer called the "state" that always starts
-initialized to 1. The state integer is internal to blkdebug and cannot be
-observed from outside but rules can interact with it for powerful matching
-behavior.
-
-Rules can be conditional on the current state and they can transition to a new
-state.
-
-When a rule's "state" attribute is non-zero then the current state must equal
-the attribute in order for the rule to match.
-
-For example, to match the 2nd write_aio:
-
- [set-state]
- event = "write_aio"
- state = "1"
- new_state = "2"
-
- [inject-error]
- event = "write_aio"
- state = "2"
- errno = "5"
-
-The first write_aio request matches the set-state rule and transitions from
-state 1 to state 2. Once state 2 has been entered, the set-state rule no
-longer matches since it requires state 1. But the inject-error rule now
-matches the next write_aio request and injects EIO (5).
-
-State transition rules support the following attributes:
-
- event - which type of operation to match (e.g. read_aio, write_aio,
- flush_to_os, flush_to_disk). See the "Events" section for
- information on events.
-
- state - (optional) the engine must be in this state number in order for this
- rule to match
-
- new_state - transition to this state number
-
-Suspend and resume
-------------------
-Exercising code paths in block drivers may require specific ordering amongst
-concurrent requests. The "breakpoint" feature allows requests to be halted on
-a blkdebug event and resumed later. This makes it possible to achieve
-deterministic ordering when multiple requests are in flight.
-
-Breakpoints on blkdebug events are associated with a user-defined "tag" string.
-This tag serves as an identifier by which the request can be resumed at a later
-point.
-
-See the qemu-io(1) break, resume, remove_break, and wait_break commands for
-details.
diff --git a/docs/devel/build-system.rst b/docs/devel/build-system.rst
index 79eceb1..d42045a 100644
--- a/docs/devel/build-system.rst
+++ b/docs/devel/build-system.rst
@@ -145,13 +145,13 @@ was installed in the ``site-packages`` directory of another interpreter,
or with the wrong ``pip`` program.
If a package is available for the chosen interpreter, ``configure``
-prepares a small script that invokes it from the venv itself[#distlib]_.
+prepares a small script that invokes it from the venv itself\ [#distlib]_.
If not, ``configure`` can also optionally install dependencies in the
virtual environment with ``pip``, either from wheels in ``python/wheels``
or by downloading the package with PyPI. Downloading can be disabled with
``--disable-download``; and anyway, it only happens when a ``configure``
option (currently, only ``--enable-docs``) is explicitly enabled but
-the dependencies are not present[#pip]_.
+the dependencies are not present\ [#pip]_.
.. [#distlib] The scripts are created based on the package's metadata,
specifically the ``console_script`` entry points. This is the
@@ -333,7 +333,7 @@ into each emulator:
``default-configs/targets/*.mak``
These files mostly define symbols that appear in the ``*-config-target.h``
- file for each emulator [#cfgtarget]_. However, the ``TARGET_ARCH``
+ file for each emulator\ [#cfgtarget]_. However, the ``TARGET_ARCH``
and ``TARGET_BASE_ARCH`` will also be used to select the ``hw/`` and
``target/`` subdirectories that are compiled into each target.
diff --git a/docs/devel/clocks.rst b/docs/devel/clocks.rst
index 177ee1c..3f744f2 100644
--- a/docs/devel/clocks.rst
+++ b/docs/devel/clocks.rst
@@ -358,6 +358,12 @@ humans (for instance in debugging), use ``clock_display_freq()``,
which returns a prettified string-representation, e.g. "33.3 MHz".
The caller must free the string with g_free() after use.
+It's also possible to retrieve the clock period from a QTest by
+accessing QOM property ``qtest-clock-period`` using a QMP command.
+This property is only present when the device is being run under
+the ``qtest`` accelerator; it is not available when QEMU is
+being run normally.
+
Calculating expiry deadlines
----------------------------
diff --git a/docs/devel/crypto.rst b/docs/devel/crypto.rst
new file mode 100644
index 0000000..39b1c91
--- /dev/null
+++ b/docs/devel/crypto.rst
@@ -0,0 +1,10 @@
+.. _crypto-ref:
+
+====================
+Cryptography in QEMU
+====================
+
+.. toctree::
+ :maxdepth: 2
+
+ luks-detached-header
diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
index fe01b2b..1c487c1 100644
--- a/docs/devel/index-api.rst
+++ b/docs/devel/index-api.rst
@@ -9,6 +9,7 @@ generated from in-code annotations to function prototypes.
bitops
loads-stores
+ lockcnt
memory
modules
pci
diff --git a/docs/devel/index-build.rst b/docs/devel/index-build.rst
index 90b406c..0023953 100644
--- a/docs/devel/index-build.rst
+++ b/docs/devel/index-build.rst
@@ -1,9 +1,8 @@
-QEMU Build and Test System
---------------------------
+QEMU Build System
+-----------------
-Details about how QEMU's build system works and how it is integrated
-into our testing infrastructure. You will need to understand some of
-the basics if you are adding new files and targets to the build.
+Details about how QEMU's build system works. You will need to understand
+some of the basics if you are adding new files and targets to the build.
.. toctree::
:maxdepth: 3
@@ -11,10 +10,5 @@ the basics if you are adding new files and targets to the build.
build-system
kconfig
docs
- testing
- acpi-bits
- qtest
- ci
qapi-code-gen
- fuzzing
control-flow-integrity
diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst
index 5636e9c..ab9fbc4 100644
--- a/docs/devel/index-internals.rst
+++ b/docs/devel/index-internals.rst
@@ -8,6 +8,7 @@ Details about QEMU's various subsystems including how to add features to them.
qom
atomics
+ rcu
block-coroutine-wrapper
clocks
ebpf_rss
@@ -20,3 +21,5 @@ Details about QEMU's various subsystems including how to add features to them.
vfio-iommufd
writing-monitor-commands
virtio-backends
+ crypto
+ multiple-iothreads
diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index abf6045..a53f1bf 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -31,6 +31,7 @@ the :ref:`tcg_internals`.
index-process
index-build
+ testing/index
index-api
index-internals
index-tcg
diff --git a/docs/devel/loads-stores.rst b/docs/devel/loads-stores.rst
index ec627aa..9471bac 100644
--- a/docs/devel/loads-stores.rst
+++ b/docs/devel/loads-stores.rst
@@ -95,7 +95,7 @@ guest CPU state in case of a guest CPU exception. This is passed
to ``cpu_restore_state()``. Therefore the value should either be 0,
to indicate that the guest CPU state is already synchronized, or
the result of ``GETPC()`` from the top level ``HELPER(foo)``
-function, which is a return address into the generated code [#gpc]_.
+function, which is a return address into the generated code\ [#gpc]_.
.. [#gpc] Note that ``GETPC()`` should be used with great care: calling
it in other functions that are *not* the top level
diff --git a/docs/devel/lockcnt.txt b/docs/devel/lockcnt.rst
index a3fb3bc..8b43578 100644
--- a/docs/devel/lockcnt.txt
+++ b/docs/devel/lockcnt.rst
@@ -1,9 +1,9 @@
-DOCUMENTATION FOR LOCKED COUNTERS (aka QemuLockCnt)
-===================================================
+Locked Counters (aka ``QemuLockCnt``)
+=====================================
QEMU often uses reference counts to track data structures that are being
accessed and should not be freed. For example, a loop that invoke
-callbacks like this is not safe:
+callbacks like this is not safe::
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
if (ioh->revents & G_IO_OUT) {
@@ -11,11 +11,11 @@ callbacks like this is not safe:
}
}
-QLIST_FOREACH_SAFE protects against deletion of the current node (ioh)
-by stashing away its "next" pointer. However, ioh->fd_write could
+``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``)
+by stashing away its ``next`` pointer. However, ``ioh->fd_write`` could
actually delete the next node from the list. The simplest way to
avoid this is to mark the node as deleted, and remove it from the
-list in the above loop:
+list in the above loop::
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
if (ioh->deleted) {
@@ -29,7 +29,7 @@ list in the above loop:
}
If however this loop must also be reentrant, i.e. it is possible that
-ioh->fd_write invokes the loop again, some kind of counting is needed:
+``ioh->fd_write`` invokes the loop again, some kind of counting is needed::
walking_handlers++;
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
@@ -46,8 +46,8 @@ ioh->fd_write invokes the loop again, some kind of counting is needed:
}
walking_handlers--;
-One may think of using the RCU primitives, rcu_read_lock() and
-rcu_read_unlock(); effectively, the RCU nesting count would take
+One may think of using the RCU primitives, ``rcu_read_lock()`` and
+``rcu_read_unlock()``; effectively, the RCU nesting count would take
the place of the walking_handlers global variable. Indeed,
reference counting and RCU have similar purposes, but their usage in
general is complementary:
@@ -70,14 +70,14 @@ general is complementary:
this can improve performance, but also delay reclamation undesirably.
With reference counting, reclamation is deterministic.
-This file documents QemuLockCnt, an abstraction for using reference
+This file documents ``QemuLockCnt``, an abstraction for using reference
counting in code that has to be both thread-safe and reentrant.
-QemuLockCnt concepts
---------------------
+``QemuLockCnt`` concepts
+------------------------
-A QemuLockCnt comprises both a counter and a mutex; it has primitives
+A ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives
to increment and decrement the counter, and to take and release the
mutex. The counter notes how many visits to the data structures are
taking place (the visits could be from different threads, or there could
@@ -95,13 +95,14 @@ not just frees, though there could be cases where this is not necessary.
Reads, instead, can be done without taking the mutex, as long as the
readers and writers use the same macros that are used for RCU, for
-example qatomic_rcu_read, qatomic_rcu_set, QLIST_FOREACH_RCU, etc. This is
-because the reads are done outside a lock and a set or QLIST_INSERT_HEAD
+example ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``,
+etc. This is because the reads are done outside a lock and a set
+or ``QLIST_INSERT_HEAD``
can happen concurrently with the read. The RCU API ensures that the
processor and the compiler see all required memory barriers.
This could be implemented simply by protecting the counter with the
-mutex, for example:
+mutex, for example::
// (1)
qemu_mutex_lock(&walking_handlers_mutex);
@@ -125,33 +126,33 @@ mutex, for example:
Here, no frees can happen in the code represented by the ellipsis.
If another thread is executing critical section (2), that part of
the code cannot be entered, because the thread will not be able
-to increment the walking_handlers variable. And of course
+to increment the ``walking_handlers`` variable. And of course
during the visit any other thread will see a nonzero value for
-walking_handlers, as in the single-threaded code.
+``walking_handlers``, as in the single-threaded code.
Note that it is possible for multiple concurrent accesses to delay
-the cleanup arbitrarily; in other words, for the walking_handlers
+the cleanup arbitrarily; in other words, for the ``walking_handlers``
counter to never become zero. For this reason, this technique is
more easily applicable if concurrent access to the structure is rare.
However, critical sections are easy to forget since you have to do
-them for each modification of the counter. QemuLockCnt ensures that
+them for each modification of the counter. ``QemuLockCnt`` ensures that
all modifications of the counter take the lock appropriately, and it
can also be more efficient in two ways:
- it avoids taking the lock for many operations (for example
incrementing the counter while it is non-zero);
-- on some platforms, one can implement QemuLockCnt to hold the lock
+- on some platforms, one can implement ``QemuLockCnt`` to hold the lock
and the mutex in a single word, making the fast path no more expensive
than simply managing a counter using atomic operations (see
- docs/devel/atomics.rst). This can be very helpful if concurrent access to
+ :doc:`atomics`). This can be very helpful if concurrent access to
the data structure is expected to be rare.
Using the same mutex for frees and writes can still incur some small
inefficiencies; for example, a visit can never start if the counter is
-zero and the mutex is taken---even if the mutex is taken by a write,
+zero and the mutex is taken -- even if the mutex is taken by a write,
which in principle need not block a visit of the data structure.
However, these are usually not a problem if any of the following
assumptions are valid:
@@ -163,27 +164,27 @@ assumptions are valid:
- writes are frequent, but this kind of write (e.g. appending to a
list) has a very small critical section.
-For example, QEMU uses QemuLockCnt to manage an AioContext's list of
+For example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of
bottom halves and file descriptor handlers. Modifications to the list
of file descriptor handlers are rare. Creation of a new bottom half is
frequent and can happen on a fast path; however: 1) it is almost never
concurrent with a visit to the list of bottom halves; 2) it only has
-three instructions in the critical path, two assignments and a smp_wmb().
+three instructions in the critical path, two assignments and a ``smp_wmb()``.
-QemuLockCnt API
----------------
+``QemuLockCnt`` API
+-------------------
-The QemuLockCnt API is described in include/qemu/thread.h.
+.. kernel-doc:: include/qemu/lockcnt.h
-QemuLockCnt usage
------------------
+``QemuLockCnt`` usage
+---------------------
-This section explains the typical usage patterns for QemuLockCnt functions.
+This section explains the typical usage patterns for ``QemuLockCnt`` functions.
Setting a variable to a non-NULL value can be done between
-qemu_lockcnt_lock and qemu_lockcnt_unlock:
+``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``::
qemu_lockcnt_lock(&xyz_lockcnt);
if (!xyz) {
@@ -193,8 +194,8 @@ qemu_lockcnt_lock and qemu_lockcnt_unlock:
}
qemu_lockcnt_unlock(&xyz_lockcnt);
-Accessing the value can be done between qemu_lockcnt_inc and
-qemu_lockcnt_dec:
+Accessing the value can be done between ``qemu_lockcnt_inc`` and
+``qemu_lockcnt_dec``::
qemu_lockcnt_inc(&xyz_lockcnt);
if (xyz) {
@@ -204,11 +205,11 @@ qemu_lockcnt_dec:
}
qemu_lockcnt_dec(&xyz_lockcnt);
-Freeing the object can similarly use qemu_lockcnt_lock and
-qemu_lockcnt_unlock, but you also need to ensure that the count
-is zero (i.e. there is no concurrent visit). Because qemu_lockcnt_inc
-takes the QemuLockCnt's lock, the count cannot become non-zero while
-the object is being freed. Freeing an object looks like this:
+Freeing the object can similarly use ``qemu_lockcnt_lock`` and
+``qemu_lockcnt_unlock``, but you also need to ensure that the count
+is zero (i.e. there is no concurrent visit). Because ``qemu_lockcnt_inc``
+takes the ``QemuLockCnt``'s lock, the count cannot become non-zero while
+the object is being freed. Freeing an object looks like this::
qemu_lockcnt_lock(&xyz_lockcnt);
if (!qemu_lockcnt_count(&xyz_lockcnt)) {
@@ -218,7 +219,7 @@ the object is being freed. Freeing an object looks like this:
qemu_lockcnt_unlock(&xyz_lockcnt);
If an object has to be freed right after a visit, you can combine
-the decrement, the locking and the check on count as follows:
+the decrement, the locking and the check on count as follows::
qemu_lockcnt_inc(&xyz_lockcnt);
if (xyz) {
@@ -232,7 +233,7 @@ the decrement, the locking and the check on count as follows:
qemu_lockcnt_unlock(&xyz_lockcnt);
}
-QemuLockCnt can also be used to access a list as follows:
+``QemuLockCnt`` can also be used to access a list as follows::
qemu_lockcnt_inc(&io_handlers_lockcnt);
QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) {
@@ -252,10 +253,10 @@ QemuLockCnt can also be used to access a list as follows:
}
Again, the RCU primitives are used because new items can be added to the
-list during the walk. QLIST_FOREACH_RCU ensures that the processor and
+list during the walk. ``QLIST_FOREACH_RCU`` ensures that the processor and
the compiler see the appropriate memory barriers.
-An alternative pattern uses qemu_lockcnt_dec_if_lock:
+An alternative pattern uses ``qemu_lockcnt_dec_if_lock``::
qemu_lockcnt_inc(&io_handlers_lockcnt);
QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) {
@@ -273,5 +274,5 @@ An alternative pattern uses qemu_lockcnt_dec_if_lock:
}
qemu_lockcnt_dec(&io_handlers_lockcnt);
-Here you can use qemu_lockcnt_dec instead of qemu_lockcnt_dec_and_lock,
+Here you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``,
because there is no special task to do if the count goes from 1 to 0.
diff --git a/docs/devel/luks-detached-header.rst b/docs/devel/luks-detached-header.rst
new file mode 100644
index 0000000..94ec285
--- /dev/null
+++ b/docs/devel/luks-detached-header.rst
@@ -0,0 +1,182 @@
+================================
+LUKS volume with detached header
+================================
+
+Introduction
+============
+
+This document gives an overview of the design of LUKS volume with detached
+header and how to use it.
+
+Background
+==========
+
+The LUKS format has ability to store the header in a separate volume from
+the payload. We could extend the LUKS driver in QEMU to support this use
+case.
+
+Normally a LUKS volume has a layout:
+
+::
+
+ +-----------------------------------------------+
+ | | | |
+ disk | header | key material | disk payload data |
+ | | | |
+ +-----------------------------------------------+
+
+With a detached LUKS header, you need 2 disks so getting:
+
+::
+
+ +--------------------------+
+ disk1 | header | key material |
+ +--------------------------+
+ +---------------------+
+ disk2 | disk payload data |
+ +---------------------+
+
+There are a variety of benefits to doing this:
+
+ * Secrecy - the disk2 cannot be identified as containing LUKS
+ volume since there's no header
+ * Control - if access to the disk1 is restricted, then even
+ if someone has access to disk2 they can't unlock
+ it. Might be useful if you have disks on NFS but
+ want to restrict which host can launch a VM
+ instance from it, by dynamically providing access
+ to the header to a designated host
+ * Flexibility - your application data volume may be a given
+ size and it is inconvenient to resize it to
+ add encryption.You can store the LUKS header
+ separately and use the existing storage
+ volume for payload
+ * Recovery - corruption of a bit in the header may make the
+ entire payload inaccessible. It might be
+ convenient to take backups of the header. If
+ your primary disk header becomes corrupt, you
+ can unlock the data still by pointing to the
+ backup detached header
+
+Architecture
+============
+
+Take the qcow2 encryption, for example. The architecture of the
+LUKS volume with detached header is shown in the diagram below.
+
+There are two children of the root node: a file and a header.
+Data from the disk payload is stored in the file node. The
+LUKS header and key material are located in the header node,
+as previously mentioned.
+
+::
+
+ +-----------------------------+
+ Root node | foo[luks] |
+ +-----------------------------+
+ | |
+ file | header |
+ | |
+ +---------------------+ +------------------+
+ Child node |payload-format[qcow2]| |header-format[raw]|
+ +---------------------+ +------------------+
+ | |
+ file | file |
+ | |
+ +----------------------+ +---------------------+
+ Child node |payload-protocol[file]| |header-protocol[file]|
+ +----------------------+ +---------------------+
+ | |
+ | |
+ | |
+ Host storage Host storage
+
+Usage
+=====
+
+Create a LUKS disk with a detached header using qemu-img
+--------------------------------------------------------
+
+Shell commandline::
+
+ # qemu-img create --object secret,id=sec0,data=abc123 -f luks \
+ -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0 \
+ -o detached-header=true test-header.img
+ # qemu-img create -f qcow2 test-payload.qcow2 200G
+ # qemu-img info 'json:{"driver":"luks","file":{"filename": \
+ "test-payload.img"},"header":{"filename":"test-header.img"}}'
+
+Set up a VM's LUKS volume with a detached header
+------------------------------------------------
+
+Qemu commandline::
+
+ # qemu-system-x86_64 ... \
+ -object '{"qom-type":"secret","id":"libvirt-3-format-secret", \
+ "data":"abc123"}' \
+ -blockdev '{"driver":"file","filename":"/path/to/test-header.img", \
+ "node-name":"libvirt-1-storage"}' \
+ -blockdev '{"node-name":"libvirt-1-format","read-only":false, \
+ "driver":"raw","file":"libvirt-1-storage"}' \
+ -blockdev '{"driver":"file","filename":"/path/to/test-payload.qcow2", \
+ "node-name":"libvirt-2-storage"}' \
+ -blockdev '{"node-name":"libvirt-2-format","read-only":false, \
+ "driver":"qcow2","file":"libvirt-2-storage"}' \
+ -blockdev '{"node-name":"libvirt-3-format","driver":"luks", \
+ "file":"libvirt-2-format","header":"libvirt-1-format","key-secret": \
+ "libvirt-3-format-secret"}' \
+ -device '{"driver":"virtio-blk-pci","bus":XXX,"addr":YYY,"drive": \
+ "libvirt-3-format","id":"virtio-disk1"}'
+
+Add LUKS volume to a VM with a detached header
+----------------------------------------------
+
+1. object-add the secret for decrypting the cipher stored in
+ LUKS header above::
+
+ # virsh qemu-monitor-command vm '{"execute":"object-add", \
+ "arguments":{"qom-type":"secret", "id": \
+ "libvirt-4-format-secret", "data":"abc123"}}'
+
+2. block-add the protocol node for LUKS header::
+
+ # virsh qemu-monitor-command vm '{"execute":"blockdev-add", \
+ "arguments":{"node-name":"libvirt-1-storage", "driver":"file", \
+ "filename": "/path/to/test-header.img" }}'
+
+3. block-add the raw-drived node for LUKS header::
+
+ # virsh qemu-monitor-command vm '{"execute":"blockdev-add", \
+ "arguments":{"node-name":"libvirt-1-format", "driver":"raw", \
+ "file":"libvirt-1-storage"}}'
+
+4. block-add the protocol node for disk payload image::
+
+ # virsh qemu-monitor-command vm '{"execute":"blockdev-add", \
+ "arguments":{"node-name":"libvirt-2-storage", "driver":"file", \
+ "filename":"/path/to/test-payload.qcow2"}}'
+
+5. block-add the qcow2-drived format node for disk payload data::
+
+ # virsh qemu-monitor-command vm '{"execute":"blockdev-add", \
+ "arguments":{"node-name":"libvirt-2-format", "driver":"qcow2", \
+ "file":"libvirt-2-storage"}}'
+
+6. block-add the luks-drived format node to link the qcow2 disk
+ with the LUKS header by specifying the field "header"::
+
+ # virsh qemu-monitor-command vm '{"execute":"blockdev-add", \
+ "arguments":{"node-name":"libvirt-3-format", "driver":"luks", \
+ "file":"libvirt-2-format", "header":"libvirt-1-format", \
+ "key-secret":"libvirt-2-format-secret"}}'
+
+7. hot-plug the virtio-blk device finally::
+
+ # virsh qemu-monitor-command vm '{"execute":"device_add", \
+ "arguments": {"driver":"virtio-blk-pci", \
+ "drive": "libvirt-3-format", "id":"virtio-disk2"}}
+
+TODO
+====
+
+1. Support the shared detached LUKS header within the VM.
diff --git a/docs/devel/maintainers.rst b/docs/devel/maintainers.rst
index 5c907d9..88a613e 100644
--- a/docs/devel/maintainers.rst
+++ b/docs/devel/maintainers.rst
@@ -99,9 +99,9 @@ members of the QEMU community, you should make arrangements to attend
a `KeySigningParty <https://wiki.qemu.org/KeySigningParty>`__ (for
example at KVM Forum) or make alternative arrangements to have your
key signed by an attendee. Key signing requires meeting another
-community member **in person** [#]_ so please make appropriate
+community member **in person**\ [#2020]_ so please make appropriate
arrangements.
-.. [#] In recent pandemic times we have had to exercise some
+.. [#2020] In recent pandemic times we have had to exercise some
flexibility here. Maintainers still need to sign their pull
requests though.
diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst
index 58f8fd9..8f431d5 100644
--- a/docs/devel/migration/features.rst
+++ b/docs/devel/migration/features.rst
@@ -14,3 +14,4 @@ Migration has plenty of features to support different use cases.
CPR
qpl-compression
uadk-compression
+ qatzip-compression
diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index 784c899..c2857fc 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -465,6 +465,12 @@ Examples of such API functions are:
- portio_list_set_address()
- portio_list_set_enabled()
+Since the order of device save/restore is not defined, you must
+avoid accessing or changing any other device's state in one of these
+callbacks. (For instance, don't do anything that calls ``update_irq()``
+in a ``post_load`` hook.) Otherwise, restore will not be deterministic,
+and this will break execution record/replay.
+
Iterative device migration
--------------------------
diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
index d352b54..b08c2b4 100644
--- a/docs/devel/migration/mapped-ram.rst
+++ b/docs/devel/migration/mapped-ram.rst
@@ -44,7 +44,7 @@ Use-cases
The mapped-ram feature was designed for use cases where the migration
stream will be directed to a file in the filesystem and not
-immediately restored on the destination VM [#]_. These could be
+immediately restored on the destination VM\ [#alternatives]_. These could be
thought of as snapshots. We can further categorize them into live and
non-live.
@@ -70,7 +70,7 @@ mapped-ram in this scenario is portability since background-snapshot
depends on async dirty tracking (KVM_GET_DIRTY_LOG) which is not
supported outside of Linux.
-.. [#] While this same effect could be obtained with the usage of
+.. [#alternatives] While this same effect could be obtained with the usage of
snapshots or the ``file:`` migration alone, mapped-ram provides
a performance increase for VMs with larger RAM sizes (10s to
100s of GiBs), specially if the VM has been stopped beforehand.
diff --git a/docs/devel/migration/qatzip-compression.rst b/docs/devel/migration/qatzip-compression.rst
new file mode 100644
index 0000000..862b383
--- /dev/null
+++ b/docs/devel/migration/qatzip-compression.rst
@@ -0,0 +1,165 @@
+==================
+QATzip Compression
+==================
+In scenarios with limited network bandwidth, the ``QATzip`` solution can help
+users save a lot of host CPU resources by accelerating compression and
+decompression through the Intel QuickAssist Technology(``QAT``) hardware.
+
+
+The following test was conducted using 8 multifd channels and 10Gbps network
+bandwidth. The results show that, compared to zstd, ``QATzip`` significantly
+saves CPU resources on the sender and reduces migration time. Compared to the
+uncompressed solution, ``QATzip`` greatly improves the dirty page processing
+capability, indicated by the Pages per Second metric, and also reduces the
+total migration time.
+
+::
+
+ VM Configuration: 16 vCPU and 64G memory
+ VM Workload: all vCPUs are idle and 54G memory is filled with Silesia data.
+ QAT Devices: 4
+ |-----------|--------|---------|----------|----------|------|------|
+ |8 Channels |Total |down |throughput|pages per | send | recv |
+ | |time(ms)|time(ms) |(mbps) |second | cpu %| cpu% |
+ |-----------|--------|---------|----------|----------|------|------|
+ |qatzip | 16630| 28| 10467| 2940235| 160| 360|
+ |-----------|--------|---------|----------|----------|------|------|
+ |zstd | 20165| 24| 8579| 2391465| 810| 340|
+ |-----------|--------|---------|----------|----------|------|------|
+ |none | 46063| 40| 10848| 330240| 45| 85|
+ |-----------|--------|---------|----------|----------|------|------|
+
+
+QATzip Compression Framework
+============================
+
+``QATzip`` is a user space library which builds on top of the Intel QuickAssist
+Technology to provide extended accelerated compression and decompression
+services.
+
+For more ``QATzip`` introduction, please refer to `QATzip Introduction
+<https://github.com/intel/QATzip?tab=readme-ov-file#introductionl>`_
+
+::
+
+ +----------------+
+ | MultiFd Thread |
+ +-------+--------+
+ |
+ | compress/decompress
+ +-------+--------+
+ | QATzip library |
+ +-------+--------+
+ |
+ +-------+--------+
+ | QAT library |
+ +-------+--------+
+ | user space
+ --------+---------------------
+ | kernel space
+ +------+-------+
+ | QAT Driver |
+ +------+-------+
+ |
+ +------+-------+
+ | QAT Devices |
+ +--------------+
+
+
+QATzip Installation
+-------------------
+
+The ``QATzip`` installation package has been integrated into some Linux
+distributions and can be installed directly. For example, the Ubuntu Server
+24.04 LTS system can be installed using below command
+
+.. code-block:: shell
+
+ #apt search qatzip
+ libqatzip-dev/noble 1.2.0-0ubuntu3 amd64
+ Intel QuickAssist user space library development files
+
+ libqatzip3/noble 1.2.0-0ubuntu3 amd64
+ Intel QuickAssist user space library
+
+ qatzip/noble,now 1.2.0-0ubuntu3 amd64 [installed]
+ Compression user-space tool for Intel QuickAssist Technology
+
+ #sudo apt install libqatzip-dev libqatzip3 qatzip
+
+If your system does not support the ``QATzip`` installation package, you can
+use the source code to build and install, please refer to `QATzip source code installation
+<https://github.com/intel/QATzip?tab=readme-ov-file#build-intel-quickassist-technology-driver>`_
+
+QAT Hardware Deployment
+-----------------------
+
+``QAT`` supports physical functions(PFs) and virtual functions(VFs) for
+deployment, and users can configure ``QAT`` resources for migration according
+to actual needs. For more details about ``QAT`` deployment, please refer to
+`Intel QuickAssist Technology Documentation
+<https://intel.github.io/quickassist/index.html>`_
+
+For more ``QAT`` hardware introduction, please refer to `intel-quick-assist-technology-overview
+<https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html>`_
+
+How To Use QATzip Compression
+=============================
+
+1 - Install ``QATzip`` library
+
+2 - Build ``QEMU`` with ``--enable-qatzip`` parameter
+
+ E.g. configure --target-list=x86_64-softmmu --enable-kvm ``--enable-qatzip``
+
+3 - Set ``migrate_set_parameter multifd-compression qatzip``
+
+4 - Set ``migrate_set_parameter multifd-qatzip-level comp_level``, the default
+comp_level value is 1, and it supports levels from 1 to 9
+
+QAT Memory Requirements
+=======================
+
+The user needs to reserve system memory for the QAT memory management to
+allocate DMA memory. The size of the reserved system memory depends on the
+number of devices used for migration and the number of multifd channels.
+
+Because memory usage depends on QAT configuration, please refer to `QAT Memory
+Driver Queries
+<https://intel.github.io/quickassist/PG/infrastructure_debugability.html?highlight=memory>`_
+for memory usage calculation.
+
+.. list-table:: An example of a PF used for migration
+ :header-rows: 1
+
+ * - Number of channels
+ - Sender memory usage
+ - Receiver memory usage
+ * - 2
+ - 10M
+ - 10M
+ * - 4
+ - 12M
+ - 14M
+ * - 8
+ - 16M
+ - 20M
+
+How To Choose Between QATzip and QPL
+====================================
+Starting from 4th Gen Intel Xeon Scalable processors, codenamed Sapphire Rapids
+processor(``SPR``), multiple built-in accelerators are supported including
+``QAT`` and ``IAA``. The former can accelerate ``QATzip`` and the latter is
+used to accelerate ``QPL``.
+
+Here are some suggestions:
+
+1 - If the live migration scenario is limited by network bandwidth and ``QAT``
+hardware resources exceed ``IAA``, use the ``QATzip`` method, which can save a
+lot of host CPU resources for compression.
+
+2 - If the system cannot support shared virtual memory (SVM) technology, use
+the ``QATzip`` method because ``QPL`` performance is not good without SVM
+support.
+
+3 - For other scenarios, use the ``QPL`` method first.
diff --git a/docs/devel/migration/uadk-compression.rst b/docs/devel/migration/uadk-compression.rst
index 3f73345..64cadeb 100644
--- a/docs/devel/migration/uadk-compression.rst
+++ b/docs/devel/migration/uadk-compression.rst
@@ -114,7 +114,7 @@ Make sure all these above kernel configurations are selected.
Accelerator dev node permissions
--------------------------------
-Harware accelerators(eg: HiSilicon Kunpeng Zip accelerator) gets registered to
+Hardware accelerators (eg: HiSilicon Kunpeng Zip accelerator) gets registered to
UADK and char devices are created in dev directory. In order to access resources
on hardware accelerator devices, write permission should be provided to user.
@@ -134,7 +134,7 @@ How To Use UADK Compression In QEMU Migration
Set ``migrate_set_parameter multifd-compression uadk``
Since UADK uses Shared Virtual Addressing(SVA) and device access virtual memory
-directly it is possible that SMMUv3 may enounter page faults while walking the
+directly it is possible that SMMUv3 may encounter page faults while walking the
IO page tables. This may impact the performance. In order to mitigate this,
please make sure to specify ``-mem-prealloc`` parameter to the destination VM
boot parameters.
diff --git a/docs/devel/multiple-iothreads.rst b/docs/devel/multiple-iothreads.rst
new file mode 100644
index 0000000..d1f3fc4
--- /dev/null
+++ b/docs/devel/multiple-iothreads.rst
@@ -0,0 +1,139 @@
+Using Multiple ``IOThread``\ s
+==============================
+
+..
+ Copyright (c) 2014-2017 Red Hat Inc.
+
+ This work is licensed under the terms of the GNU GPL, version 2 or later. See
+ the COPYING file in the top-level directory.
+
+
+This document explains the ``IOThread`` feature and how to write code that runs
+outside the BQL.
+
+The main loop and ``IOThread``\ s
+---------------------------------
+QEMU is an event-driven program that can do several things at once using an
+event loop. The VNC server and the QMP monitor are both processed from the
+same event loop, which monitors their file descriptors until they become
+readable and then invokes a callback.
+
+The default event loop is called the main loop (see ``main-loop.c``). It is
+possible to create additional event loop threads using
+``-object iothread,id=my-iothread``.
+
+Side note: The main loop and ``IOThread`` are both event loops but their code is
+not shared completely. Sometimes it is useful to remember that although they
+are conceptually similar they are currently not interchangeable.
+
+Why ``IOThread``\ s are useful
+------------------------------
+``IOThread``\ s allow the user to control the placement of work. The main loop is a
+scalability bottleneck on hosts with many CPUs. Work can be spread across
+several ``IOThread``\ s instead of just one main loop. When set up correctly this
+can improve I/O latency and reduce jitter seen by the guest.
+
+The main loop is also deeply associated with the BQL, which is a
+scalability bottleneck in itself. vCPU threads and the main loop use the BQL
+to serialize execution of QEMU code. This mutex is necessary because a lot of
+QEMU's code historically was not thread-safe.
+
+The fact that all I/O processing is done in a single main loop and that the
+BQL is contended by all vCPU threads and the main loop explain
+why it is desirable to place work into ``IOThread``\ s.
+
+The experimental ``virtio-blk`` data-plane implementation has been benchmarked and
+shows these effects:
+ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
+
+.. _how-to-program:
+
+How to program for ``IOThread``\ s
+----------------------------------
+The main difference between legacy code and new code that can run in an
+``IOThread`` is dealing explicitly with the event loop object, ``AioContext``
+(see ``include/block/aio.h``). Code that only works in the main loop
+implicitly uses the main loop's ``AioContext``. Code that supports running
+in ``IOThread``\ s must be aware of its ``AioContext``.
+
+AioContext supports the following services:
+ * File descriptor monitoring (read/write/error on POSIX hosts)
+ * Event notifiers (inter-thread signalling)
+ * Timers
+ * Bottom Halves (BH) deferred callbacks
+
+There are several old APIs that use the main loop AioContext:
+ * LEGACY ``qemu_aio_set_fd_handler()`` - monitor a file descriptor
+ * LEGACY ``qemu_aio_set_event_notifier()`` - monitor an event notifier
+ * LEGACY ``timer_new_ms()`` - create a timer
+ * LEGACY ``qemu_bh_new()`` - create a BH
+ * LEGACY ``qemu_bh_new_guarded()`` - create a BH with a device re-entrancy guard
+ * LEGACY ``qemu_aio_wait()`` - run an event loop iteration
+
+Since they implicitly work on the main loop they cannot be used in code that
+runs in an ``IOThread``. They might cause a crash or deadlock if called from an
+``IOThread`` since the BQL is not held.
+
+Instead, use the ``AioContext`` functions directly (see ``include/block/aio.h``):
+ * ``aio_set_fd_handler()`` - monitor a file descriptor
+ * ``aio_set_event_notifier()`` - monitor an event notifier
+ * ``aio_timer_new()`` - create a timer
+ * ``aio_bh_new()`` - create a BH
+ * ``aio_bh_new_guarded()`` - create a BH with a device re-entrancy guard
+ * ``aio_poll()`` - run an event loop iteration
+
+The ``qemu_bh_new_guarded``/``aio_bh_new_guarded`` APIs accept a
+``MemReentrancyGuard``
+argument, which is used to check for and prevent re-entrancy problems. For
+BHs associated with devices, the reentrancy-guard is contained in the
+corresponding ``DeviceState`` and named ``mem_reentrancy_guard``.
+
+The ``AioContext`` can be obtained from the ``IOThread`` using
+``iothread_get_aio_context()`` or for the main loop using
+``qemu_get_aio_context()``. Code that takes an ``AioContext`` argument
+works both in ``IOThread``\ s or the main loop, depending on which ``AioContext``
+instance the caller passes in.
+
+How to synchronize with an ``IOThread``
+---------------------------------------
+Variables that can be accessed by multiple threads require some form of
+synchronization such as ``qemu_mutex_lock()``, ``rcu_read_lock()``, etc.
+
+``AioContext`` functions like ``aio_set_fd_handler()``,
+``aio_set_event_notifier()``, ``aio_bh_new()``, and ``aio_timer_new()``
+are thread-safe. They can be used to trigger activity in an ``IOThread``.
+
+Side note: the best way to schedule a function call across threads is to call
+``aio_bh_schedule_oneshot()``.
+
+The main loop thread can wait synchronously for a condition using
+``AIO_WAIT_WHILE()``.
+
+``AioContext`` and the block layer
+----------------------------------
+The ``AioContext`` originates from the QEMU block layer, even though nowadays
+``AioContext`` is a generic event loop that can be used by any QEMU subsystem.
+
+The block layer has support for ``AioContext`` integrated. Each
+``BlockDriverState`` is associated with an ``AioContext`` using
+``bdrv_try_change_aio_context()`` and ``bdrv_get_aio_context()``.
+This allows block layer code to process I/O inside the
+right ``AioContext``. Other subsystems may wish to follow a similar approach.
+
+Block layer code must therefore expect to run in an ``IOThread`` and avoid using
+old APIs that implicitly use the main loop. See
+`How to program for IOThreads`_ for information on how to do that.
+
+Code running in the monitor typically needs to ensure that past
+requests from the guest are completed. When a block device is running
+in an ``IOThread``, the ``IOThread`` can also process requests from the guest
+(via ioeventfd). To achieve both objects, wrap the code between
+``bdrv_drained_begin()`` and ``bdrv_drained_end()``, thus creating a "drained
+section".
+
+Long-running jobs (usually in the form of coroutines) are often scheduled in
+the ``BlockDriverState``'s ``AioContext``. The functions
+``bdrv_add``/``remove_aio_context_notifier``, or alternatively
+``blk_add``/``remove_aio_context_notifier`` if you use ``BlockBackends``,
+can be used to get a notification whenever ``bdrv_try_change_aio_context()``
+moves a ``BlockDriverState`` to a different ``AioContext``.
diff --git a/docs/devel/multiple-iothreads.txt b/docs/devel/multiple-iothreads.txt
deleted file mode 100644
index de85767..0000000
--- a/docs/devel/multiple-iothreads.txt
+++ /dev/null
@@ -1,130 +0,0 @@
-Copyright (c) 2014-2017 Red Hat Inc.
-
-This work is licensed under the terms of the GNU GPL, version 2 or later. See
-the COPYING file in the top-level directory.
-
-
-This document explains the IOThread feature and how to write code that runs
-outside the BQL.
-
-The main loop and IOThreads
----------------------------
-QEMU is an event-driven program that can do several things at once using an
-event loop. The VNC server and the QMP monitor are both processed from the
-same event loop, which monitors their file descriptors until they become
-readable and then invokes a callback.
-
-The default event loop is called the main loop (see main-loop.c). It is
-possible to create additional event loop threads using -object
-iothread,id=my-iothread.
-
-Side note: The main loop and IOThread are both event loops but their code is
-not shared completely. Sometimes it is useful to remember that although they
-are conceptually similar they are currently not interchangeable.
-
-Why IOThreads are useful
-------------------------
-IOThreads allow the user to control the placement of work. The main loop is a
-scalability bottleneck on hosts with many CPUs. Work can be spread across
-several IOThreads instead of just one main loop. When set up correctly this
-can improve I/O latency and reduce jitter seen by the guest.
-
-The main loop is also deeply associated with the BQL, which is a
-scalability bottleneck in itself. vCPU threads and the main loop use the BQL
-to serialize execution of QEMU code. This mutex is necessary because a lot of
-QEMU's code historically was not thread-safe.
-
-The fact that all I/O processing is done in a single main loop and that the
-BQL is contended by all vCPU threads and the main loop explain
-why it is desirable to place work into IOThreads.
-
-The experimental virtio-blk data-plane implementation has been benchmarked and
-shows these effects:
-ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
-
-How to program for IOThreads
-----------------------------
-The main difference between legacy code and new code that can run in an
-IOThread is dealing explicitly with the event loop object, AioContext
-(see include/block/aio.h). Code that only works in the main loop
-implicitly uses the main loop's AioContext. Code that supports running
-in IOThreads must be aware of its AioContext.
-
-AioContext supports the following services:
- * File descriptor monitoring (read/write/error on POSIX hosts)
- * Event notifiers (inter-thread signalling)
- * Timers
- * Bottom Halves (BH) deferred callbacks
-
-There are several old APIs that use the main loop AioContext:
- * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
- * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
- * LEGACY timer_new_ms() - create a timer
- * LEGACY qemu_bh_new() - create a BH
- * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
- * LEGACY qemu_aio_wait() - run an event loop iteration
-
-Since they implicitly work on the main loop they cannot be used in code that
-runs in an IOThread. They might cause a crash or deadlock if called from an
-IOThread since the BQL is not held.
-
-Instead, use the AioContext functions directly (see include/block/aio.h):
- * aio_set_fd_handler() - monitor a file descriptor
- * aio_set_event_notifier() - monitor an event notifier
- * aio_timer_new() - create a timer
- * aio_bh_new() - create a BH
- * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
- * aio_poll() - run an event loop iteration
-
-The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard"
-argument, which is used to check for and prevent re-entrancy problems. For
-BHs associated with devices, the reentrancy-guard is contained in the
-corresponding DeviceState and named "mem_reentrancy_guard".
-
-The AioContext can be obtained from the IOThread using
-iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
-Code that takes an AioContext argument works both in IOThreads or the main
-loop, depending on which AioContext instance the caller passes in.
-
-How to synchronize with an IOThread
------------------------------------
-Variables that can be accessed by multiple threads require some form of
-synchronization such as qemu_mutex_lock(), rcu_read_lock(), etc.
-
-AioContext functions like aio_set_fd_handler(), aio_set_event_notifier(),
-aio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger
-activity in an IOThread.
-
-Side note: the best way to schedule a function call across threads is to call
-aio_bh_schedule_oneshot().
-
-The main loop thread can wait synchronously for a condition using
-AIO_WAIT_WHILE().
-
-AioContext and the block layer
-------------------------------
-The AioContext originates from the QEMU block layer, even though nowadays
-AioContext is a generic event loop that can be used by any QEMU subsystem.
-
-The block layer has support for AioContext integrated. Each BlockDriverState
-is associated with an AioContext using bdrv_try_change_aio_context() and
-bdrv_get_aio_context(). This allows block layer code to process I/O inside the
-right AioContext. Other subsystems may wish to follow a similar approach.
-
-Block layer code must therefore expect to run in an IOThread and avoid using
-old APIs that implicitly use the main loop. See the "How to program for
-IOThreads" above for information on how to do that.
-
-Code running in the monitor typically needs to ensure that past
-requests from the guest are completed. When a block device is running
-in an IOThread, the IOThread can also process requests from the guest
-(via ioeventfd). To achieve both objects, wrap the code between
-bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
-section".
-
-Long-running jobs (usually in the form of coroutines) are often scheduled in
-the BlockDriverState's AioContext. The functions
-bdrv_add/remove_aio_context_notifier, or alternatively
-blk_add/remove_aio_context_notifier if you use BlockBackends, can be used to
-get a notification whenever bdrv_try_change_aio_context() moves a
-BlockDriverState to a different AioContext.
diff --git a/docs/devel/nested-papr.txt b/docs/devel/nested-papr.txt
deleted file mode 100644
index 9094365..0000000
--- a/docs/devel/nested-papr.txt
+++ /dev/null
@@ -1,119 +0,0 @@
-Nested PAPR API (aka KVM on PowerVM)
-====================================
-
-This API aims at providing support to enable nested virtualization with
-KVM on PowerVM. While the existing support for nested KVM on PowerNV was
-introduced with cap-nested-hv option, however, with a slight design change,
-to enable this on papr/pseries, a new cap-nested-papr option is added. eg:
-
- qemu-system-ppc64 -cpu POWER10 -machine pseries,cap-nested-papr=true ...
-
-Work by:
- Michael Neuling <mikey@neuling.org>
- Vaibhav Jain <vaibhav@linux.ibm.com>
- Jordan Niethe <jniethe5@gmail.com>
- Harsh Prateek Bora <harshpb@linux.ibm.com>
- Shivaprasad G Bhat <sbhat@linux.ibm.com>
- Kautuk Consul <kconsul@linux.vnet.ibm.com>
-
-Below taken from the kernel documentation:
-
-Introduction
-============
-
-This document explains how a guest operating system can act as a
-hypervisor and run nested guests through the use of hypercalls, if the
-hypervisor has implemented them. The terms L0, L1, and L2 are used to
-refer to different software entities. L0 is the hypervisor mode entity
-that would normally be called the "host" or "hypervisor". L1 is a
-guest virtual machine that is directly run under L0 and is initiated
-and controlled by L0. L2 is a guest virtual machine that is initiated
-and controlled by L1 acting as a hypervisor. A significant design change
-wrt existing API is that now the entire L2 state is maintained within L0.
-
-Existing Nested-HV API
-======================
-
-Linux/KVM has had support for Nesting as an L0 or L1 since 2018
-
-The L0 code was added::
-
- commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce
- Author: Paul Mackerras <paulus@ozlabs.org>
- Date: Mon Oct 8 16:31:03 2018 +1100
- KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization
-
-The L1 code was added::
-
- commit 360cae313702cdd0b90f82c261a8302fecef030a
- Author: Paul Mackerras <paulus@ozlabs.org>
- Date: Mon Oct 8 16:31:04 2018 +1100
- KVM: PPC: Book3S HV: Nested guest entry via hypercall
-
-This API works primarily using a signal hcall h_enter_nested(). This
-call made by the L1 to tell the L0 to start an L2 vCPU with the given
-state. The L0 then starts this L2 and runs until an L2 exit condition
-is reached. Once the L2 exits, the state of the L2 is given back to
-the L1 by the L0. The full L2 vCPU state is always transferred from
-and to L1 when the L2 is run. The L0 doesn't keep any state on the L2
-vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2
--> L1 exit).
-
-The only state kept by the L0 is the partition table. The L1 registers
-it's partition table using the h_set_partition_table() hcall. All
-other state held by the L0 about the L2s is cached state (such as
-shadow page tables).
-
-The L1 may run any L2 or vCPU without first informing the L0. It
-simply starts the vCPU using h_enter_nested(). The creation of L2s and
-vCPUs is done implicitly whenever h_enter_nested() is called.
-
-In this document, we call this existing API the v1 API.
-
-New PAPR API
-===============
-
-The new PAPR API changes from the v1 API such that the creating L2 and
-associated vCPUs is explicit. In this document, we call this the v2
-API.
-
-h_enter_nested() is replaced with H_GUEST_VCPU_RUN(). Before this can
-be called the L1 must explicitly create the L2 using h_guest_create()
-and any associated vCPUs() created with h_guest_create_vCPU(). Getting
-and setting vCPU state can also be performed using h_guest_{g|s}et
-hcall.
-
-The basic execution flow is for an L1 to create an L2, run it, and
-delete it is:
-
-- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES()
- (normally at L1 boot time).
-
-- L1 requests the L0 to create an L2 with H_GUEST_CREATE() and receives a token
-
-- L1 requests the L0 to create an L2 vCPU with H_GUEST_CREATE_VCPU()
-
-- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall
-
-- L1 requests the L0 to run the vCPU using H_GUEST_RUN_VCPU() hcall
-
-- L1 deletes L2 with H_GUEST_DELETE()
-
-For more details, please refer:
-
-[1] Linux Kernel documentation (upstream documentation commit):
-
-commit 476652297f94a2e5e5ef29e734b0da37ade94110
-Author: Michael Neuling <mikey@neuling.org>
-Date: Thu Sep 14 13:06:00 2023 +1000
-
- docs: powerpc: Document nested KVM on POWER
-
- Document support for nested KVM on POWER using the existing API as well
- as the new PAPR API. This includes the new HCALL interface and how it
- used by KVM.
-
- Signed-off-by: Michael Neuling <mikey@neuling.org>
- Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
- Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
- Link: https://msgid.link/20230914030600.16993-12-jniethe5@gmail.com
diff --git a/docs/devel/qapi-code-gen.rst b/docs/devel/qapi-code-gen.rst
index ae97b33..583207a 100644
--- a/docs/devel/qapi-code-gen.rst
+++ b/docs/devel/qapi-code-gen.rst
@@ -899,7 +899,7 @@ Documentation markup
~~~~~~~~~~~~~~~~~~~~
Documentation comments can use most rST markup. In particular,
-a ``::`` literal block can be used for examples::
+a ``::`` literal block can be used for pre-formatted text::
# ::
#
@@ -995,8 +995,8 @@ line "Features:", like this::
# @feature: Description text
A tagged section begins with a paragraph that starts with one of the
-following words: "Since:", "Example:"/"Examples:", "Returns:",
-"Errors:", "TODO:". It ends with the start of a new section.
+following words: "Since:", "Returns:", "Errors:", "TODO:". It ends with
+the start of a new section.
The second and subsequent lines of tagged sections must be indented
like this::
@@ -1020,13 +1020,53 @@ detailing a relevant error condition. For example::
A "Since: x.y.z" tagged section lists the release that introduced the
definition.
-An "Example" or "Examples" section is rendered entirely
-as literal fixed-width text. "TODO" sections are not rendered at all
-(they are for developers, not users of QMP). In other sections, the
-text is formatted, and rST markup can be used.
+"TODO" sections are not rendered (they are for developers, not users of
+QMP). In other sections, the text is formatted, and rST markup can be
+used.
+
+QMP Examples can be added by using the ``.. qmp-example::``
+directive. In its simplest form, this can be used to contain a single
+QMP code block which accepts standard JSON syntax with additional server
+directionality indicators (``->`` and ``<-``), and elisions (``...``).
+
+Optionally, a plaintext title may be provided by using the ``:title:``
+directive option. If the title is omitted, the example title will
+default to "Example:".
+
+A simple QMP example::
+
+ # .. qmp-example::
+ # :title: Using query-block
+ #
+ # -> { "execute": "query-block" }
+ # <- { ... }
+
+More complex or multi-step examples where exposition is needed before or
+between QMP code blocks can be created by using the ``:annotated:``
+directive option. When using this option, nested QMP code blocks must be
+entered explicitly with rST's ``::`` syntax.
+
+Highlighting in non-QMP languages can be accomplished by using the
+``.. code-block:: lang`` directive, and non-highlighted text can be
+achieved by omitting the language argument.
For example::
+ # .. qmp-example::
+ # :annotated:
+ # :title: A more complex demonstration
+ #
+ # This is a more complex example that can use
+ # ``arbitrary rST syntax`` in its exposition::
+ #
+ # -> { "execute": "query-block" }
+ # <- { ... }
+ #
+ # Above, lengthy output has been omitted for brevity.
+
+
+Examples of complete definition documentation::
+
##
# @BlockStats:
#
@@ -1058,11 +1098,11 @@ For example::
#
# Since: 0.14
#
- # Example:
+ # .. qmp-example::
#
# -> { "execute": "query-blockstats" }
# <- {
- # ... lots of output ...
+ # ...
# }
##
{ 'command': 'query-blockstats',
diff --git a/docs/devel/rcu.txt b/docs/devel/rcu.rst
index 2e6cc60..dd07c1d 100644
--- a/docs/devel/rcu.txt
+++ b/docs/devel/rcu.rst
@@ -20,7 +20,7 @@ for the execution of all *currently running* critical sections before
proceeding, or before asynchronously executing a callback.
The key point here is that only the currently running critical sections
-are waited for; critical sections that are started _after_ the beginning
+are waited for; critical sections that are started **after** the beginning
of the wait do not extend the wait, despite running concurrently with
the updater. This is the reason why RCU is more scalable than,
for example, reader-writer locks. It is so much more scalable that
@@ -37,7 +37,7 @@ do not matter; as soon as all previous critical sections have finished,
there cannot be any readers who hold references to the data structure,
and these can now be safely reclaimed (e.g., freed or unref'ed).
-Here is a picture:
+Here is a picture::
thread 1 thread 2 thread 3
------------------- ------------------------ -------------------
@@ -58,43 +58,38 @@ that critical section.
RCU API
-=======
+-------
The core RCU API is small:
- void rcu_read_lock(void);
-
+``void rcu_read_lock(void);``
Used by a reader to inform the reclaimer that the reader is
entering an RCU read-side critical section.
- void rcu_read_unlock(void);
-
+``void rcu_read_unlock(void);``
Used by a reader to inform the reclaimer that the reader is
exiting an RCU read-side critical section. Note that RCU
read-side critical sections may be nested and/or overlapping.
- void synchronize_rcu(void);
-
+``void synchronize_rcu(void);``
Blocks until all pre-existing RCU read-side critical sections
on all threads have completed. This marks the end of the removal
phase and the beginning of reclamation phase.
Note that it would be valid for another update to come while
- synchronize_rcu is running. Because of this, it is better that
+ ``synchronize_rcu`` is running. Because of this, it is better that
the updater releases any locks it may hold before calling
- synchronize_rcu. If this is not possible (for example, because
- the updater is protected by the BQL), you can use call_rcu.
+ ``synchronize_rcu``. If this is not possible (for example, because
+ the updater is protected by the BQL), you can use ``call_rcu``.
- void call_rcu1(struct rcu_head * head,
- void (*func)(struct rcu_head *head));
-
- This function invokes func(head) after all pre-existing RCU
+``void call_rcu1(struct rcu_head * head, void (*func)(struct rcu_head *head));``
+ This function invokes ``func(head)`` after all pre-existing RCU
read-side critical sections on all threads have completed. This
marks the end of the removal phase, with func taking care
asynchronously of the reclamation phase.
- The foo struct needs to have an rcu_head structure added,
- perhaps as follows:
+ The ``foo`` struct needs to have an ``rcu_head`` structure added,
+ perhaps as follows::
struct foo {
struct rcu_head rcu;
@@ -103,8 +98,8 @@ The core RCU API is small:
long c;
};
- so that the reclaimer function can fetch the struct foo address
- and free it:
+ so that the reclaimer function can fetch the ``struct foo`` address
+ and free it::
call_rcu1(&foo.rcu, foo_reclaim);
@@ -114,29 +109,27 @@ The core RCU API is small:
g_free(fp);
}
- For the common case where the rcu_head member is the first of the
- struct, you can use the following macro.
+ ``call_rcu1`` is typically used via either the ``call_rcu`` or
+ ``g_free_rcu`` macros, which handle the common case where the
+ ``rcu_head`` member is the first of the struct.
- void call_rcu(T *p,
- void (*func)(T *p),
- field-name);
- void g_free_rcu(T *p,
- field-name);
+``void call_rcu(T *p, void (*func)(T *p), field-name);``
+ If the ``struct rcu_head`` is the first field in the struct, you can
+ use this macro instead of ``call_rcu1``.
- call_rcu1 is typically used through these macro, in the common case
- where the "struct rcu_head" is the first field in the struct. If
- the callback function is g_free, in particular, g_free_rcu can be
- used. In the above case, one could have written simply:
+``void g_free_rcu(T *p, field-name);``
+ This is a special-case version of ``call_rcu`` where the callback
+ function is ``g_free``.
+ In the example given in ``call_rcu1``, one could have written simply::
g_free_rcu(&foo, rcu);
- typeof(*p) qatomic_rcu_read(p);
-
- qatomic_rcu_read() is similar to qatomic_load_acquire(), but it makes
- some assumptions on the code that calls it. This allows a more
- optimized implementation.
+``typeof(*p) qatomic_rcu_read(p);``
+ ``qatomic_rcu_read()`` is similar to ``qatomic_load_acquire()``, but
+ it makes some assumptions on the code that calls it. This allows a
+ more optimized implementation.
- qatomic_rcu_read assumes that whenever a single RCU critical
+ ``qatomic_rcu_read`` assumes that whenever a single RCU critical
section reads multiple shared data, these reads are either
data-dependent or need no ordering. This is almost always the
case when using RCU, because read-side critical sections typically
@@ -144,7 +137,7 @@ The core RCU API is small:
every update) until reaching a data structure of interest,
and then read from there.
- RCU read-side critical sections must use qatomic_rcu_read() to
+ RCU read-side critical sections must use ``qatomic_rcu_read()`` to
read data, unless concurrent writes are prevented by another
synchronization mechanism.
@@ -152,18 +145,17 @@ The core RCU API is small:
data structure in a single direction, opposite to the direction
in which the updater initializes it.
- void qatomic_rcu_set(p, typeof(*p) v);
-
- qatomic_rcu_set() is similar to qatomic_store_release(), though it also
- makes assumptions on the code that calls it in order to allow a more
- optimized implementation.
+``void qatomic_rcu_set(p, typeof(*p) v);``
+ ``qatomic_rcu_set()`` is similar to ``qatomic_store_release()``,
+ though it also makes assumptions on the code that calls it in
+ order to allow a more optimized implementation.
- In particular, qatomic_rcu_set() suffices for synchronization
+ In particular, ``qatomic_rcu_set()`` suffices for synchronization
with readers, if the updater never mutates a field within a
data item that is already accessible to readers. This is the
case when initializing a new copy of the RCU-protected data
- structure; just ensure that initialization of *p is carried out
- before qatomic_rcu_set() makes the data item visible to readers.
+ structure; just ensure that initialization of ``*p`` is carried out
+ before ``qatomic_rcu_set()`` makes the data item visible to readers.
If this rule is observed, writes will happen in the opposite
order as reads in the RCU read-side critical sections (or if
there is just one update), and there will be no need for other
@@ -171,58 +163,54 @@ The core RCU API is small:
The following APIs must be used before RCU is used in a thread:
- void rcu_register_thread(void);
-
+``void rcu_register_thread(void);``
Mark a thread as taking part in the RCU mechanism. Such a thread
will have to report quiescent points regularly, either manually
- or through the QemuCond/QemuSemaphore/QemuEvent APIs.
-
- void rcu_unregister_thread(void);
+ or through the ``QemuCond``/``QemuSemaphore``/``QemuEvent`` APIs.
+``void rcu_unregister_thread(void);``
Mark a thread as not taking part anymore in the RCU mechanism.
It is not a problem if such a thread reports quiescent points,
- either manually or by using the QemuCond/QemuSemaphore/QemuEvent
- APIs.
+ either manually or by using the
+ ``QemuCond``/``QemuSemaphore``/``QemuEvent`` APIs.
-Note that these APIs are relatively heavyweight, and should _not_ be
+Note that these APIs are relatively heavyweight, and should **not** be
nested.
Convenience macros
-==================
+------------------
Two macros are provided that automatically release the read lock at the
end of the scope.
- RCU_READ_LOCK_GUARD()
-
+``RCU_READ_LOCK_GUARD()``
Takes the lock and will release it at the end of the block it's
used in.
- WITH_RCU_READ_LOCK_GUARD() { code }
-
+``WITH_RCU_READ_LOCK_GUARD() { code }``
Is used at the head of a block to protect the code within the block.
-Note that 'goto'ing out of the guarded block will also drop the lock.
+Note that a ``goto`` out of the guarded block will also drop the lock.
-DIFFERENCES WITH LINUX
-======================
+Differences with Linux
+----------------------
- Waiting on a mutex is possible, though discouraged, within an RCU critical
section. This is because spinlocks are rarely (if ever) used in userspace
programming; not allowing this would prevent upgrading an RCU read-side
critical section to become an updater.
-- qatomic_rcu_read and qatomic_rcu_set replace rcu_dereference and
- rcu_assign_pointer. They take a _pointer_ to the variable being accessed.
+- ``qatomic_rcu_read`` and ``qatomic_rcu_set`` replace ``rcu_dereference`` and
+ ``rcu_assign_pointer``. They take a **pointer** to the variable being accessed.
-- call_rcu is a macro that has an extra argument (the name of the first
- field in the struct, which must be a struct rcu_head), and expects the
+- ``call_rcu`` is a macro that has an extra argument (the name of the first
+ field in the struct, which must be a struct ``rcu_head``), and expects the
type of the callback's argument to be the type of the first argument.
- call_rcu1 is the same as Linux's call_rcu.
+ ``call_rcu1`` is the same as Linux's ``call_rcu``.
-RCU PATTERNS
-============
+RCU Patterns
+------------
Many patterns using read-writer locks translate directly to RCU, with
the advantages of higher scalability and deadlock immunity.
@@ -243,28 +231,28 @@ Here are some frequently-used RCU idioms that are worth noting.
RCU list processing
--------------------
+^^^^^^^^^^^^^^^^^^^
TBD (not yet used in QEMU)
RCU reference counting
-----------------------
+^^^^^^^^^^^^^^^^^^^^^^
Because grace periods are not allowed to complete while there is an RCU
read-side critical section in progress, the RCU read-side primitives
may be used as a restricted reference-counting mechanism. For example,
-consider the following code fragment:
+consider the following code fragment::
rcu_read_lock();
p = qatomic_rcu_read(&foo);
/* do something with p. */
rcu_read_unlock();
-The RCU read-side critical section ensures that the value of "p" remains
-valid until after the rcu_read_unlock(). In some sense, it is acquiring
-a reference to p that is later released when the critical section ends.
-The write side looks simply like this (with appropriate locking):
+The RCU read-side critical section ensures that the value of ``p`` remains
+valid until after the ``rcu_read_unlock()``. In some sense, it is acquiring
+a reference to ``p`` that is later released when the critical section ends.
+The write side looks simply like this (with appropriate locking)::
qemu_mutex_lock(&foo_mutex);
old = foo;
@@ -274,7 +262,7 @@ The write side looks simply like this (with appropriate locking):
free(old);
If the processing cannot be done purely within the critical section, it
-is possible to combine this idiom with a "real" reference count:
+is possible to combine this idiom with a "real" reference count::
rcu_read_lock();
p = qatomic_rcu_read(&foo);
@@ -283,7 +271,7 @@ is possible to combine this idiom with a "real" reference count:
/* do something with p. */
foo_unref(p);
-The write side can be like this:
+The write side can be like this::
qemu_mutex_lock(&foo_mutex);
old = foo;
@@ -292,7 +280,7 @@ The write side can be like this:
synchronize_rcu();
foo_unref(old);
-or with call_rcu:
+or with ``call_rcu``::
qemu_mutex_lock(&foo_mutex);
old = foo;
@@ -301,10 +289,10 @@ or with call_rcu:
call_rcu(foo_unref, old, rcu);
In both cases, the write side only performs removal. Reclamation
-happens when the last reference to a "foo" object is dropped.
-Using synchronize_rcu() is undesirably expensive, because the
+happens when the last reference to a ``foo`` object is dropped.
+Using ``synchronize_rcu()`` is undesirably expensive, because the
last reference may be dropped on the read side. Hence you can
-use call_rcu() instead:
+use ``call_rcu()`` instead::
foo_unref(struct foo *p) {
if (qatomic_fetch_dec(&p->refcount) == 1) {
@@ -314,7 +302,7 @@ use call_rcu() instead:
Note that the same idioms would be possible with reader/writer
-locks:
+locks::
read_lock(&foo_rwlock); write_mutex_lock(&foo_rwlock);
p = foo; p = foo;
@@ -334,15 +322,15 @@ locks:
foo_unref(p);
read_unlock(&foo_rwlock);
-foo_unref could use a mechanism such as bottom halves to move deallocation
+``foo_unref`` could use a mechanism such as bottom halves to move deallocation
out of the write-side critical section.
RCU resizable arrays
---------------------
+^^^^^^^^^^^^^^^^^^^^
Resizable arrays can be used with RCU. The expensive RCU synchronization
-(or call_rcu) only needs to take place when the array is resized.
+(or ``call_rcu``) only needs to take place when the array is resized.
The two items to take care of are:
- ensuring that the old version of the array is available between removal
@@ -351,10 +339,10 @@ The two items to take care of are:
- avoiding mismatches in the read side between the array data and the
array size.
-The first problem is avoided simply by not using realloc. Instead,
+The first problem is avoided simply by not using ``realloc``. Instead,
each resize will allocate a new array and copy the old data into it.
The second problem would arise if the size and the data pointers were
-two members of a larger struct:
+two members of a larger struct::
struct mystuff {
...
@@ -364,7 +352,7 @@ two members of a larger struct:
...
};
-Instead, we store the size of the array with the array itself:
+Instead, we store the size of the array with the array itself::
struct arr {
int size;
@@ -400,7 +388,7 @@ Instead, we store the size of the array with the array itself:
}
-SOURCES
-=======
+References
+----------
-* Documentation/RCU/ from the Linux kernel
+* The `Linux kernel RCU documentation <https://docs.kernel.org/RCU/>`__
diff --git a/docs/devel/replay.rst b/docs/devel/replay.rst
index effd856..40f58d9 100644
--- a/docs/devel/replay.rst
+++ b/docs/devel/replay.rst
@@ -202,6 +202,9 @@ into the log.
Saving/restoring the VM state
-----------------------------
+Record/replay relies on VM state save and restore being complete and
+deterministic.
+
All fields in the device state structure (including virtual timers)
should be restored by loadvm to the same values they had before savevm.
diff --git a/docs/devel/reset.rst b/docs/devel/reset.rst
index 9746a4e..74c7c01 100644
--- a/docs/devel/reset.rst
+++ b/docs/devel/reset.rst
@@ -44,6 +44,26 @@ The Resettable interface handles reset types with an enum ``ResetType``:
value on each cold reset, such as RNG seed information, and which they
must not reinitialize on a snapshot-load reset.
+``RESET_TYPE_WAKEUP``
+ If the machine supports waking up from a suspended state and needs to reset
+ its devices during wake-up (from the ``MachineClass::wakeup()`` method), this
+ reset type should be used for such a request. Devices can utilize this reset
+ type to differentiate the reset requested during machine wake-up from other
+ reset requests. For example, RAM content must not be lost during wake-up, and
+ memory devices like virtio-mem that provide additional RAM must not reset
+ such state during wake-ups, but might do so during cold resets. However, this
+ reset type should not be used for wake-up detection, as not every machine
+ type issues a device reset request during wake-up.
+
+``RESET_TYPE_S390_CPU_NORMAL``
+ This is only used for S390 CPU objects; it clears interrupts, stops
+ processing, and clears the TLB, but does not touch register contents.
+
+``RESET_TYPE_S390_CPU_INITIAL``
+ This is only used for S390 CPU objects; it does everything
+ ``RESET_TYPE_S390_CPU_NORMAL`` does and also clears the PSW, prefix,
+ FPC, timer and control registers. It does not touch gprs, fprs or acrs.
+
Devices which implement reset methods must treat any unknown ``ResetType``
as equivalent to ``RESET_TYPE_COLD``; this will reduce the amount of
existing code we need to change if we add more types in future.
diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
index f7d7b9e..9463692 100644
--- a/docs/devel/tcg-plugins.rst
+++ b/docs/devel/tcg-plugins.rst
@@ -8,38 +8,6 @@
QEMU TCG Plugins
================
-QEMU TCG plugins provide a way for users to run experiments taking
-advantage of the total system control emulation can have over a guest.
-It provides a mechanism for plugins to subscribe to events during
-translation and execution and optionally callback into the plugin
-during these events. TCG plugins are unable to change the system state
-only monitor it passively. However they can do this down to an
-individual instruction granularity including potentially subscribing
-to all load and store operations.
-
-Usage
------
-
-Any QEMU binary with TCG support has plugins enabled by default.
-Earlier releases needed to be explicitly enabled with::
-
- configure --enable-plugins
-
-Once built a program can be run with multiple plugins loaded each with
-their own arguments::
-
- $QEMU $OTHER_QEMU_ARGS \
- -plugin contrib/plugin/libhowvec.so,inline=on,count=hint \
- -plugin contrib/plugin/libhotblocks.so
-
-Arguments are plugin specific and can be used to modify their
-behaviour. In this case the howvec plugin is being asked to use inline
-ops to count and break down the hint instructions by type.
-
-Linux user-mode emulation also evaluates the environment variable
-``QEMU_PLUGIN``::
-
- QEMU_PLUGIN="file=contrib/plugins/libhowvec.so,inline=on,count=hint" $QEMU
Writing plugins
---------------
@@ -93,11 +61,14 @@ translation event the plugin has an option to enumerate the
instructions in a block of instructions and optionally register
callbacks to some or all instructions when they are executed.
-There is also a facility to add an inline event where code to
-increment a counter can be directly inlined with the translation.
-Currently only a simple increment is supported. This is not atomic so
-can miss counts. If you want absolute precision you should use a
-callback which can then ensure atomicity itself.
+There is also a facility to add inline instructions doing various operations,
+like adding or storing an immediate value. It is also possible to execute a
+callback conditionally, with condition being evaluated inline. All those inline
+operations are associated to a ``scoreboard``, which is a thread-local storage
+automatically expanded when new cores/threads are created and that can be
+accessed/modified in a thread-safe way without any lock needed. Combining inline
+operations and conditional callbacks offer a more efficient way to instrument
+binaries, compared to classic callbacks.
Finally when QEMU exits all the registered *atexit* callbacks are
invoked.
@@ -191,457 +162,6 @@ which means callbacks may still occur after the uninstall operation is
requested. The plugin isn't completely uninstalled until the safe work
has executed while all vCPUs are quiescent.
-Example Plugins
-===============
-
-There are a number of plugins included with QEMU and you are
-encouraged to contribute your own plugins plugins upstream. There is a
-``contrib/plugins`` directory where they can go. There are also some
-basic plugins that are used to test and exercise the API during the
-``make check-tcg`` target in ``tests\plugins``.
-
-- tests/plugins/empty.c
-
-Purely a test plugin for measuring the overhead of the plugins system
-itself. Does no instrumentation.
-
-- tests/plugins/bb.c
-
-A very basic plugin which will measure execution in course terms as
-each basic block is executed. By default the results are shown once
-execution finishes::
-
- $ qemu-aarch64 -plugin tests/plugin/libbb.so \
- -d plugin ./tests/tcg/aarch64-linux-user/sha1
- SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
- bb's: 2277338, insns: 158483046
-
-Behaviour can be tweaked with the following arguments:
-
- * inline=true|false
-
- Use faster inline addition of a single counter. Not per-cpu and not
- thread safe.
-
- * idle=true|false
-
- Dump the current execution stats whenever the guest vCPU idles
-
-- tests/plugins/insn.c
-
-This is a basic instruction level instrumentation which can count the
-number of instructions executed on each core/thread::
-
- $ qemu-aarch64 -plugin tests/plugin/libinsn.so \
- -d plugin ./tests/tcg/aarch64-linux-user/threadcount
- Created 10 threads
- Done
- cpu 0 insns: 46765
- cpu 1 insns: 3694
- cpu 2 insns: 3694
- cpu 3 insns: 2994
- cpu 4 insns: 1497
- cpu 5 insns: 1497
- cpu 6 insns: 1497
- cpu 7 insns: 1497
- total insns: 63135
-
-Behaviour can be tweaked with the following arguments:
-
- * inline=true|false
-
- Use faster inline addition of a single counter. Not per-cpu and not
- thread safe.
-
- * sizes=true|false
-
- Give a summary of the instruction sizes for the execution
-
- * match=<string>
-
- Only instrument instructions matching the string prefix. Will show
- some basic stats including how many instructions have executed since
- the last execution. For example::
-
- $ qemu-aarch64 -plugin tests/plugin/libinsn.so,match=bl \
- -d plugin ./tests/tcg/aarch64-linux-user/sha512-vector
- ...
- 0x40069c, 'bl #0x4002b0', 10 hits, 1093 match hits, Δ+1257 since last match, 98 avg insns/match
- 0x4006ac, 'bl #0x403690', 10 hits, 1094 match hits, Δ+47 since last match, 98 avg insns/match
- 0x4037fc, 'bl #0x4002b0', 18 hits, 1095 match hits, Δ+22 since last match, 98 avg insns/match
- 0x400720, 'bl #0x403690', 10 hits, 1096 match hits, Δ+58 since last match, 98 avg insns/match
- 0x4037fc, 'bl #0x4002b0', 19 hits, 1097 match hits, Δ+22 since last match, 98 avg insns/match
- 0x400730, 'bl #0x403690', 10 hits, 1098 match hits, Δ+33 since last match, 98 avg insns/match
- 0x4037ac, 'bl #0x4002b0', 12 hits, 1099 match hits, Δ+20 since last match, 98 avg insns/match
- ...
-
-For more detailed execution tracing see the ``execlog`` plugin for
-other options.
-
-- tests/plugins/mem.c
-
-Basic instruction level memory instrumentation::
-
- $ qemu-aarch64 -plugin tests/plugin/libmem.so,inline=true \
- -d plugin ./tests/tcg/aarch64-linux-user/sha1
- SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
- inline mem accesses: 79525013
-
-Behaviour can be tweaked with the following arguments:
-
- * inline=true|false
-
- Use faster inline addition of a single counter. Not per-cpu and not
- thread safe.
-
- * callback=true|false
-
- Use callbacks on each memory instrumentation.
-
- * hwaddr=true|false
-
- Count IO accesses (only for system emulation)
-
-- tests/plugins/syscall.c
-
-A basic syscall tracing plugin. This only works for user-mode. By
-default it will give a summary of syscall stats at the end of the
-run::
-
- $ qemu-aarch64 -plugin tests/plugin/libsyscall \
- -d plugin ./tests/tcg/aarch64-linux-user/threadcount
- Created 10 threads
- Done
- syscall no. calls errors
- 226 12 0
- 99 11 11
- 115 11 0
- 222 11 0
- 93 10 0
- 220 10 0
- 233 10 0
- 215 8 0
- 214 4 0
- 134 2 0
- 64 2 0
- 96 1 0
- 94 1 0
- 80 1 0
- 261 1 0
- 78 1 0
- 160 1 0
- 135 1 0
-
-- contrib/plugins/hotblocks.c
-
-The hotblocks plugin allows you to examine the where hot paths of
-execution are in your program. Once the program has finished you will
-get a sorted list of blocks reporting the starting PC, translation
-count, number of instructions and execution count. This will work best
-with linux-user execution as system emulation tends to generate
-re-translations as blocks from different programs get swapped in and
-out of system memory.
-
-If your program is single-threaded you can use the ``inline`` option for
-slightly faster (but not thread safe) counters.
-
-Example::
-
- $ qemu-aarch64 \
- -plugin contrib/plugins/libhotblocks.so -d plugin \
- ./tests/tcg/aarch64-linux-user/sha1
- SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
- collected 903 entries in the hash table
- pc, tcount, icount, ecount
- 0x0000000041ed10, 1, 5, 66087
- 0x000000004002b0, 1, 4, 66087
- ...
-
-- contrib/plugins/hotpages.c
-
-Similar to hotblocks but this time tracks memory accesses::
-
- $ qemu-aarch64 \
- -plugin contrib/plugins/libhotpages.so -d plugin \
- ./tests/tcg/aarch64-linux-user/sha1
- SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6
- Addr, RCPUs, Reads, WCPUs, Writes
- 0x000055007fe000, 0x0001, 31747952, 0x0001, 8835161
- 0x000055007ff000, 0x0001, 29001054, 0x0001, 8780625
- 0x00005500800000, 0x0001, 687465, 0x0001, 335857
- 0x0000000048b000, 0x0001, 130594, 0x0001, 355
- 0x0000000048a000, 0x0001, 1826, 0x0001, 11
-
-The hotpages plugin can be configured using the following arguments:
-
- * sortby=reads|writes|address
-
- Log the data sorted by either the number of reads, the number of writes, or
- memory address. (Default: entries are sorted by the sum of reads and writes)
-
- * io=on
-
- Track IO addresses. Only relevant to full system emulation. (Default: off)
-
- * pagesize=N
-
- The page size used. (Default: N = 4096)
-
-- contrib/plugins/howvec.c
-
-This is an instruction classifier so can be used to count different
-types of instructions. It has a number of options to refine which get
-counted. You can give a value to the ``count`` argument for a class of
-instructions to break it down fully, so for example to see all the system
-registers accesses::
-
- $ qemu-system-aarch64 $(QEMU_ARGS) \
- -append "root=/dev/sda2 systemd.unit=benchmark.service" \
- -smp 4 -plugin ./contrib/plugins/libhowvec.so,count=sreg -d plugin
-
-which will lead to a sorted list after the class breakdown::
-
- Instruction Classes:
- Class: UDEF not counted
- Class: SVE (68 hits)
- Class: PCrel addr (47789483 hits)
- Class: Add/Sub (imm) (192817388 hits)
- Class: Logical (imm) (93852565 hits)
- Class: Move Wide (imm) (76398116 hits)
- Class: Bitfield (44706084 hits)
- Class: Extract (5499257 hits)
- Class: Cond Branch (imm) (147202932 hits)
- Class: Exception Gen (193581 hits)
- Class: NOP not counted
- Class: Hints (6652291 hits)
- Class: Barriers (8001661 hits)
- Class: PSTATE (1801695 hits)
- Class: System Insn (6385349 hits)
- Class: System Reg counted individually
- Class: Branch (reg) (69497127 hits)
- Class: Branch (imm) (84393665 hits)
- Class: Cmp & Branch (110929659 hits)
- Class: Tst & Branch (44681442 hits)
- Class: AdvSimd ldstmult (736 hits)
- Class: ldst excl (9098783 hits)
- Class: Load Reg (lit) (87189424 hits)
- Class: ldst noalloc pair (3264433 hits)
- Class: ldst pair (412526434 hits)
- Class: ldst reg (imm) (314734576 hits)
- Class: Loads & Stores (2117774 hits)
- Class: Data Proc Reg (223519077 hits)
- Class: Scalar FP (31657954 hits)
- Individual Instructions:
- Instr: mrs x0, sp_el0 (2682661 hits) (op=0xd5384100/ System Reg)
- Instr: mrs x1, tpidr_el2 (1789339 hits) (op=0xd53cd041/ System Reg)
- Instr: mrs x2, tpidr_el2 (1513494 hits) (op=0xd53cd042/ System Reg)
- Instr: mrs x0, tpidr_el2 (1490823 hits) (op=0xd53cd040/ System Reg)
- Instr: mrs x1, sp_el0 (933793 hits) (op=0xd5384101/ System Reg)
- Instr: mrs x2, sp_el0 (699516 hits) (op=0xd5384102/ System Reg)
- Instr: mrs x4, tpidr_el2 (528437 hits) (op=0xd53cd044/ System Reg)
- Instr: mrs x30, ttbr1_el1 (480776 hits) (op=0xd538203e/ System Reg)
- Instr: msr ttbr1_el1, x30 (480713 hits) (op=0xd518203e/ System Reg)
- Instr: msr vbar_el1, x30 (480671 hits) (op=0xd518c01e/ System Reg)
- ...
-
-To find the argument shorthand for the class you need to examine the
-source code of the plugin at the moment, specifically the ``*opt``
-argument in the InsnClassExecCount tables.
-
-- contrib/plugins/lockstep.c
-
-This is a debugging tool for developers who want to find out when and
-where execution diverges after a subtle change to TCG code generation.
-It is not an exact science and results are likely to be mixed once
-asynchronous events are introduced. While the use of -icount can
-introduce determinism to the execution flow it doesn't always follow
-the translation sequence will be exactly the same. Typically this is
-caused by a timer firing to service the GUI causing a block to end
-early. However in some cases it has proved to be useful in pointing
-people at roughly where execution diverges. The only argument you need
-for the plugin is a path for the socket the two instances will
-communicate over::
-
-
- $ qemu-system-sparc -monitor none -parallel none \
- -net none -M SS-20 -m 256 -kernel day11/zImage.elf \
- -plugin ./contrib/plugins/liblockstep.so,sockpath=lockstep-sparc.sock \
- -d plugin,nochain
-
-which will eventually report::
-
- qemu-system-sparc: warning: nic lance.0 has no peer
- @ 0x000000ffd06678 vs 0x000000ffd001e0 (2/1 since last)
- @ 0x000000ffd07d9c vs 0x000000ffd06678 (3/1 since last)
- Δ insn_count @ 0x000000ffd07d9c (809900609) vs 0x000000ffd06678 (809900612)
- previously @ 0x000000ffd06678/10 (809900609 insns)
- previously @ 0x000000ffd001e0/4 (809900599 insns)
- previously @ 0x000000ffd080ac/2 (809900595 insns)
- previously @ 0x000000ffd08098/5 (809900593 insns)
- previously @ 0x000000ffd080c0/1 (809900588 insns)
-
-- contrib/plugins/hwprofile.c
-
-The hwprofile tool can only be used with system emulation and allows
-the user to see what hardware is accessed how often. It has a number of options:
-
- * track=read or track=write
-
- By default the plugin tracks both reads and writes. You can use one
- of these options to limit the tracking to just one class of accesses.
-
- * source
-
- Will include a detailed break down of what the guest PC that made the
- access was. Not compatible with the pattern option. Example output::
-
- cirrus-low-memory @ 0xfffffd00000a0000
- pc:fffffc0000005cdc, 1, 256
- pc:fffffc0000005ce8, 1, 256
- pc:fffffc0000005cec, 1, 256
-
- * pattern
-
- Instead break down the accesses based on the offset into the HW
- region. This can be useful for seeing the most used registers of a
- device. Example output::
-
- pci0-conf @ 0xfffffd01fe000000
- off:00000004, 1, 1
- off:00000010, 1, 3
- off:00000014, 1, 3
- off:00000018, 1, 2
- off:0000001c, 1, 2
- off:00000020, 1, 2
- ...
-
-- contrib/plugins/execlog.c
-
-The execlog tool traces executed instructions with memory access. It can be used
-for debugging and security analysis purposes.
-Please be aware that this will generate a lot of output.
-
-The plugin needs default argument::
-
- $ qemu-system-arm $(QEMU_ARGS) \
- -plugin ./contrib/plugins/libexeclog.so -d plugin
-
-which will output an execution trace following this structure::
-
- # vCPU, vAddr, opcode, disassembly[, load/store, memory addr, device]...
- 0, 0xa12, 0xf8012400, "movs r4, #0"
- 0, 0xa14, 0xf87f42b4, "cmp r4, r6"
- 0, 0xa16, 0xd206, "bhs #0xa26"
- 0, 0xa18, 0xfff94803, "ldr r0, [pc, #0xc]", load, 0x00010a28, RAM
- 0, 0xa1a, 0xf989f000, "bl #0xd30"
- 0, 0xd30, 0xfff9b510, "push {r4, lr}", store, 0x20003ee0, RAM, store, 0x20003ee4, RAM
- 0, 0xd32, 0xf9893014, "adds r0, #0x14"
- 0, 0xd34, 0xf9c8f000, "bl #0x10c8"
- 0, 0x10c8, 0xfff96c43, "ldr r3, [r0, #0x44]", load, 0x200000e4, RAM
-
-Please note that you need to configure QEMU with Capstone support to get disassembly.
-
-The output can be filtered to only track certain instructions or
-addresses using the ``ifilter`` or ``afilter`` options. You can stack the
-arguments if required::
-
- $ qemu-system-arm $(QEMU_ARGS) \
- -plugin ./contrib/plugins/libexeclog.so,ifilter=st1w,afilter=0x40001808 -d plugin
-
-This plugin can also dump registers when they change value. Specify the name of the
-registers with multiple ``reg`` options. You can also use glob style matching if you wish::
-
- $ qemu-system-arm $(QEMU_ARGS) \
- -plugin ./contrib/plugins/libexeclog.so,reg=\*_el2,reg=sp -d plugin
-
-Be aware that each additional register to check will slow down
-execution quite considerably. You can optimise the number of register
-checks done by using the rdisas option. This will only instrument
-instructions that mention the registers in question in disassembly.
-This is not foolproof as some instructions implicitly change
-instructions. You can use the ifilter to catch these cases:
-
- $ qemu-system-arm $(QEMU_ARGS) \
- -plugin ./contrib/plugins/libexeclog.so,ifilter=msr,ifilter=blr,reg=x30,reg=\*_el1,rdisas=on
-
-- contrib/plugins/cache.c
-
-Cache modelling plugin that measures the performance of a given L1 cache
-configuration, and optionally a unified L2 per-core cache when a given working
-set is run::
-
- $ qemu-x86_64 -plugin ./contrib/plugins/libcache.so \
- -d plugin -D cache.log ./tests/tcg/x86_64-linux-user/float_convs
-
-will report the following::
-
- core #, data accesses, data misses, dmiss rate, insn accesses, insn misses, imiss rate
- 0 996695 508 0.0510% 2642799 18617 0.7044%
-
- address, data misses, instruction
- 0x424f1e (_int_malloc), 109, movq %rax, 8(%rcx)
- 0x41f395 (_IO_default_xsputn), 49, movb %dl, (%rdi, %rax)
- 0x42584d (ptmalloc_init.part.0), 33, movaps %xmm0, (%rax)
- 0x454d48 (__tunables_init), 20, cmpb $0, (%r8)
- ...
-
- address, fetch misses, instruction
- 0x4160a0 (__vfprintf_internal), 744, movl $1, %ebx
- 0x41f0a0 (_IO_setb), 744, endbr64
- 0x415882 (__vfprintf_internal), 744, movq %r12, %rdi
- 0x4268a0 (__malloc), 696, andq $0xfffffffffffffff0, %rax
- ...
-
-The plugin has a number of arguments, all of them are optional:
-
- * limit=N
-
- Print top N icache and dcache thrashing instructions along with their
- address, number of misses, and its disassembly. (default: 32)
-
- * icachesize=N
- * iblksize=B
- * iassoc=A
-
- Instruction cache configuration arguments. They specify the cache size, block
- size, and associativity of the instruction cache, respectively.
- (default: N = 16384, B = 64, A = 8)
-
- * dcachesize=N
- * dblksize=B
- * dassoc=A
-
- Data cache configuration arguments. They specify the cache size, block size,
- and associativity of the data cache, respectively.
- (default: N = 16384, B = 64, A = 8)
-
- * evict=POLICY
-
- Sets the eviction policy to POLICY. Available policies are: :code:`lru`,
- :code:`fifo`, and :code:`rand`. The plugin will use the specified policy for
- both instruction and data caches. (default: POLICY = :code:`lru`)
-
- * cores=N
-
- Sets the number of cores for which we maintain separate icache and dcache.
- (default: for linux-user, N = 1, for full system emulation: N = cores
- available to guest)
-
- * l2=on
-
- Simulates a unified L2 cache (stores blocks for both instructions and data)
- using the default L2 configuration (cache size = 2MB, associativity = 16-way,
- block size = 64B).
-
- * l2cachesize=N
- * l2blksize=B
- * l2assoc=A
-
- L2 cache configuration arguments. They specify the cache size, block size, and
- associativity of the L2 cache, respectively. Setting any of the L2
- configuration arguments implies ``l2=on``.
- (default: N = 2097152 (2MB), B = 64, A = 16)
-
Plugin API
==========
diff --git a/docs/devel/acpi-bits.rst b/docs/devel/testing/acpi-bits.rst
index 1ec394f..9a4d716 100644
--- a/docs/devel/acpi-bits.rst
+++ b/docs/devel/testing/acpi-bits.rst
@@ -1,6 +1,6 @@
-=============================================================================
-ACPI/SMBIOS avocado tests using biosbits
-=============================================================================
+==================================
+ACPI/SMBIOS testing using biosbits
+==================================
************
Introduction
************
@@ -30,26 +30,31 @@ OS modules are generally written using low level languages such as C and
low level assembly machine language. Writing test routines in a low level
language makes things more cumbersome. These and other reasons makes using
bios-bits very attractive for testing bioses. More details on the inspiration
-for developing biosbits and its real life uses can be found in [#a]_ and [#b]_.
+for developing biosbits and its real life uses were presented `at Plumbers
+in 2011 <Plumbers_>`__ and `at Linux.conf.au in 2012 <Linux.conf.au_>`__.
-For QEMU, we maintain a fork of bios bits in gitlab along with all the
-dependent submodules `here <https://gitlab.com/qemu-project/biosbits-bits>`__.
-This fork contains numerous fixes, a newer acpica and changes specific to
-running this avocado QEMU tests using bits. The author of this document
-is the sole maintainer of the QEMU fork of bios bits repository. For more
-information, please see author's `FOSDEM talk on this bios-bits based test
-framework <https://fosdem.org/2024/schedule/event/fosdem-2024-2262-exercising-qemu-generated-acpi-smbios-tables-using-biosbits-from-within-a-guest-vm-/>`__.
+For QEMU, we maintain a fork of bios bits in `gitlab`_, along with all
+the dependent submodules. This fork contains numerous fixes, a newer
+acpica and changes specific to running these functional QEMU tests using
+bits. The author of this document is the current maintainer of the QEMU
+fork of bios bits repository. For more information, please see `the
+author's FOSDEM presentation <FOSDEM_>`__ on this bios-bits based test framework.
+
+.. _Plumbers: https://blog.linuxplumbersconf.org/2011/ocw/system/presentations/867/original/bits.pdf
+.. _Linux.conf.au: https://www.youtube.com/watch?v=36QIepyUuhg
+.. _gitlab: https://gitlab.com/qemu-project/biosbits-bits
+.. _FOSDEM: https://fosdem.org/2024/schedule/event/fosdem-2024-2262-exercising-qemu-generated-acpi-smbios-tables-using-biosbits-from-within-a-guest-vm-/
*********************************
Description of the test framework
*********************************
-Under the directory ``tests/avocado/``, ``acpi-bits.py`` is a QEMU avocado
-test that drives all this.
+Under the directory ``tests/functional/``, ``test_acpi_bits.py`` is a QEMU
+functional test that drives all this.
A brief description of the various test files follows.
-Under ``tests/avocado/`` as the root we have:
+Under ``tests/functional/`` as the root we have:
::
@@ -60,12 +65,12 @@ Under ``tests/avocado/`` as the root we have:
│ ├── smbios.py2
│ ├── testacpi.py2
│ └── testcpuid.py2
- ├── acpi-bits.py
+ ├── test_acpi_bits.py
-* ``tests/avocado``:
+* ``tests/functional``:
- ``acpi-bits.py``:
- This is the main python avocado test script that generates a
+ ``test_acpi_bits.py``:
+ This is the main python functional test script that generates a
biosbits iso. It then spawns a QEMU VM with it, collects the log and reports
test failures. This is the script one would be interested in if they wanted
to add or change some component of the log parsing, add a new command line
@@ -79,35 +84,22 @@ Under ``tests/avocado/`` as the root we have:
you to inspect and run the specific commands manually.
In order to run this test, please perform the following steps from the QEMU
- build directory:
- ::
-
- $ make check-venv (needed only the first time to create the venv)
- $ ./pyvenv/bin/avocado run -t acpi tests/avocado
-
- The above will run all acpi avocado tests including this one.
- In order to run the individual tests, perform the following:
+ build directory (assuming that the sources are in ".."):
::
- $ ./pyvenv/bin/avocado run tests/avocado/acpi-bits.py --tap -
-
- The above will produce output in tap format. You can omit "--tap -" in the
- end and it will produce output like the following:
- ::
+ $ export PYTHONPATH=../python:../tests/functional
+ $ export QEMU_TEST_QEMU_BINARY=$PWD/qemu-system-x86_64
+ $ python3 ../tests/functional/test_acpi_bits.py
- $ ./pyvenv/bin/avocado run tests/avocado/acpi-bits.py
- Fetching asset from tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits
- JOB ID : eab225724da7b64c012c65705dc2fa14ab1defef
- JOB LOG : /home/anisinha/avocado/job-results/job-2022-10-10T17.58-eab2257/job.log
- (1/1) tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits: PASS (33.09 s)
- RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0
- JOB TIME : 39.22 s
+ The above will run all acpi-bits functional tests (producing output in
+ tap format).
- You can inspect the log file for more information about the run or in order
- to diagnoze issues. If you pass V=1 in the environment, more diagnostic logs
- would be found in the test log.
+ You can inspect the log files in tests/functional/x86_64/test_acpi_bits.*/
+ for more information about the run or in order to diagnoze issues.
+ If you pass V=1 in the environment, more diagnostic logs will be put into
+ the test log.
-* ``tests/avocado/acpi-bits/bits-config``:
+* ``tests/functional/acpi-bits/bits-config``:
This location contains biosbits configuration files that determine how the
software runs the tests.
@@ -117,7 +109,7 @@ Under ``tests/avocado/`` as the root we have:
or actions are performed by bits. The description of the config options are
provided in the file itself.
-* ``tests/avocado/acpi-bits/bits-tests``:
+* ``tests/functional/acpi-bits/bits-tests``:
This directory contains biosbits python based tests that are run from within
the biosbits environment in the spawned VM. New additions of test cases can
@@ -155,13 +147,9 @@ Under ``tests/avocado/`` as the root we have:
(a) They are python2.7 based scripts and not python 3 scripts.
(b) They are run from within the bios bits VM and is not subjected to QEMU
build/test python script maintenance and dependency resolutions.
- (c) They need not be loaded by avocado framework when running tests.
+ (c) They need not be loaded by the test framework by accident when running
+ tests.
Author: Ani Sinha <anisinha@redhat.com>
-References:
------------
-.. [#a] https://blog.linuxplumbersconf.org/2011/ocw/system/presentations/867/original/bits.pdf
-.. [#b] https://www.youtube.com/watch?v=36QIepyUuhg
-.. [#c] https://fosdem.org/2024/schedule/event/fosdem-2024-2262-exercising-qemu-generated-acpi-smbios-tables-using-biosbits-from-within-a-guest-vm-/
diff --git a/docs/devel/testing/avocado.rst b/docs/devel/testing/avocado.rst
new file mode 100644
index 0000000..eda76fe
--- /dev/null
+++ b/docs/devel/testing/avocado.rst
@@ -0,0 +1,581 @@
+.. _checkavocado-ref:
+
+
+Integration testing with Avocado
+================================
+
+The ``tests/avocado`` directory hosts integration tests. They're usually
+higher level tests, and may interact with external resources and with
+various guest operating systems.
+
+These tests are written using the Avocado Testing Framework (which must be
+installed separately) in conjunction with a the ``avocado_qemu.QemuSystemTest``
+class, implemented at ``tests/avocado/avocado_qemu``.
+
+Tests based on ``avocado_qemu.QemuSystemTest`` can easily:
+
+ * Customize the command line arguments given to the convenience
+ ``self.vm`` attribute (a QEMUMachine instance)
+
+ * Interact with the QEMU monitor, send QMP commands and check
+ their results
+
+ * Interact with the guest OS, using the convenience console device
+ (which may be useful to assert the effectiveness and correctness of
+ command line arguments or QMP commands)
+
+ * Interact with external data files that accompany the test itself
+ (see ``self.get_data()``)
+
+ * Download (and cache) remote data files, such as firmware and kernel
+ images
+
+ * Have access to a library of guest OS images (by means of the
+ ``avocado.utils.vmimage`` library)
+
+ * Make use of various other test related utilities available at the
+ test class itself and at the utility library:
+
+ - http://avocado-framework.readthedocs.io/en/latest/api/test/avocado.html#avocado.Test
+ - http://avocado-framework.readthedocs.io/en/latest/api/utils/avocado.utils.html
+
+Running tests
+-------------
+
+You can run the avocado tests simply by executing:
+
+.. code::
+
+ make check-avocado
+
+This involves the automatic installation, from PyPI, of all the
+necessary avocado-framework dependencies into the QEMU venv within the
+build tree (at ``./pyvenv``). Test results are also saved within the
+build tree (at ``tests/results``).
+
+Note: the build environment must be using a Python 3 stack, and have
+the ``venv`` and ``pip`` packages installed. If necessary, make sure
+``configure`` is called with ``--python=`` and that those modules are
+available. On Debian and Ubuntu based systems, depending on the
+specific version, they may be on packages named ``python3-venv`` and
+``python3-pip``.
+
+It is also possible to run tests based on tags using the
+``make check-avocado`` command and the ``AVOCADO_TAGS`` environment
+variable:
+
+.. code::
+
+ make check-avocado AVOCADO_TAGS=quick
+
+Note that tags separated with commas have an AND behavior, while tags
+separated by spaces have an OR behavior. For more information on Avocado
+tags, see:
+
+ https://avocado-framework.readthedocs.io/en/latest/guides/user/chapters/tags.html
+
+To run a single test file, a couple of them, or a test within a file
+using the ``make check-avocado`` command, set the ``AVOCADO_TESTS``
+environment variable with the test files or test names. To run all
+tests from a single file, use:
+
+ .. code::
+
+ make check-avocado AVOCADO_TESTS=$FILEPATH
+
+The same is valid to run tests from multiple test files:
+
+ .. code::
+
+ make check-avocado AVOCADO_TESTS='$FILEPATH1 $FILEPATH2'
+
+To run a single test within a file, use:
+
+ .. code::
+
+ make check-avocado AVOCADO_TESTS=$FILEPATH:$TESTCLASS.$TESTNAME
+
+The same is valid to run single tests from multiple test files:
+
+ .. code::
+
+ make check-avocado AVOCADO_TESTS='$FILEPATH1:$TESTCLASS1.$TESTNAME1 $FILEPATH2:$TESTCLASS2.$TESTNAME2'
+
+The scripts installed inside the virtual environment may be used
+without an "activation". For instance, the Avocado test runner
+may be invoked by running:
+
+ .. code::
+
+ pyvenv/bin/avocado run $OPTION1 $OPTION2 tests/avocado/
+
+Note that if ``make check-avocado`` was not executed before, it is
+possible to create the Python virtual environment with the dependencies
+needed running:
+
+ .. code::
+
+ make check-venv
+
+It is also possible to run tests from a single file or a single test within
+a test file. To run tests from a single file within the build tree, use:
+
+ .. code::
+
+ pyvenv/bin/avocado run tests/avocado/$TESTFILE
+
+To run a single test within a test file, use:
+
+ .. code::
+
+ pyvenv/bin/avocado run tests/avocado/$TESTFILE:$TESTCLASS.$TESTNAME
+
+Valid test names are visible in the output from any previous execution
+of Avocado or ``make check-avocado``, and can also be queried using:
+
+ .. code::
+
+ pyvenv/bin/avocado list tests/avocado
+
+Manual Installation
+-------------------
+
+To manually install Avocado and its dependencies, run:
+
+.. code::
+
+ pip install --user avocado-framework
+
+Alternatively, follow the instructions on this link:
+
+ https://avocado-framework.readthedocs.io/en/latest/guides/user/chapters/installing.html
+
+Overview
+--------
+
+The ``tests/avocado/avocado_qemu`` directory provides the
+``avocado_qemu`` Python module, containing the ``avocado_qemu.QemuSystemTest``
+class. Here's a simple usage example:
+
+.. code::
+
+ from avocado_qemu import QemuSystemTest
+
+
+ class Version(QemuSystemTest):
+ """
+ :avocado: tags=quick
+ """
+ def test_qmp_human_info_version(self):
+ self.vm.launch()
+ res = self.vm.cmd('human-monitor-command',
+ command_line='info version')
+ self.assertRegex(res, r'^(\d+\.\d+\.\d)')
+
+To execute your test, run:
+
+.. code::
+
+ avocado run version.py
+
+Tests may be classified according to a convention by using docstring
+directives such as ``:avocado: tags=TAG1,TAG2``. To run all tests
+in the current directory, tagged as "quick", run:
+
+.. code::
+
+ avocado run -t quick .
+
+The ``avocado_qemu.QemuSystemTest`` base test class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``avocado_qemu.QemuSystemTest`` class has a number of characteristics
+that are worth being mentioned right away.
+
+First of all, it attempts to give each test a ready to use QEMUMachine
+instance, available at ``self.vm``. Because many tests will tweak the
+QEMU command line, launching the QEMUMachine (by using ``self.vm.launch()``)
+is left to the test writer.
+
+The base test class has also support for tests with more than one
+QEMUMachine. The way to get machines is through the ``self.get_vm()``
+method which will return a QEMUMachine instance. The ``self.get_vm()``
+method accepts arguments that will be passed to the QEMUMachine creation
+and also an optional ``name`` attribute so you can identify a specific
+machine and get it more than once through the tests methods. A simple
+and hypothetical example follows:
+
+.. code::
+
+ from avocado_qemu import QemuSystemTest
+
+
+ class MultipleMachines(QemuSystemTest):
+ def test_multiple_machines(self):
+ first_machine = self.get_vm()
+ second_machine = self.get_vm()
+ self.get_vm(name='third_machine').launch()
+
+ first_machine.launch()
+ second_machine.launch()
+
+ first_res = first_machine.cmd(
+ 'human-monitor-command',
+ command_line='info version')
+
+ second_res = second_machine.cmd(
+ 'human-monitor-command',
+ command_line='info version')
+
+ third_res = self.get_vm(name='third_machine').cmd(
+ 'human-monitor-command',
+ command_line='info version')
+
+ self.assertEqual(first_res, second_res, third_res)
+
+At test "tear down", ``avocado_qemu.QemuSystemTest`` handles all the
+QEMUMachines shutdown.
+
+The ``avocado_qemu.LinuxTest`` base test class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``avocado_qemu.LinuxTest`` is further specialization of the
+``avocado_qemu.QemuSystemTest`` class, so it contains all the characteristics
+of the later plus some extra features.
+
+First of all, this base class is intended for tests that need to
+interact with a fully booted and operational Linux guest. At this
+time, it uses a Fedora 31 guest image. The most basic example looks
+like this:
+
+.. code::
+
+ from avocado_qemu import LinuxTest
+
+
+ class SomeTest(LinuxTest):
+
+ def test(self):
+ self.launch_and_wait()
+ self.ssh_command('some_command_to_be_run_in_the_guest')
+
+Please refer to tests that use ``avocado_qemu.LinuxTest`` under
+``tests/avocado`` for more examples.
+
+QEMUMachine
+-----------
+
+The QEMUMachine API is already widely used in the Python iotests,
+device-crash-test and other Python scripts. It's a wrapper around the
+execution of a QEMU binary, giving its users:
+
+ * the ability to set command line arguments to be given to the QEMU
+ binary
+
+ * a ready to use QMP connection and interface, which can be used to
+ send commands and inspect its results, as well as asynchronous
+ events
+
+ * convenience methods to set commonly used command line arguments in
+ a more succinct and intuitive way
+
+QEMU binary selection
+^^^^^^^^^^^^^^^^^^^^^
+
+The QEMU binary used for the ``self.vm`` QEMUMachine instance will
+primarily depend on the value of the ``qemu_bin`` parameter. If it's
+not explicitly set, its default value will be the result of a dynamic
+probe in the same source tree. A suitable binary will be one that
+targets the architecture matching host machine.
+
+Based on this description, test writers will usually rely on one of
+the following approaches:
+
+1) Set ``qemu_bin``, and use the given binary
+
+2) Do not set ``qemu_bin``, and use a QEMU binary named like
+ "qemu-system-${arch}", either in the current
+ working directory, or in the current source tree.
+
+The resulting ``qemu_bin`` value will be preserved in the
+``avocado_qemu.QemuSystemTest`` as an attribute with the same name.
+
+Attribute reference
+-------------------
+
+Test
+^^^^
+
+Besides the attributes and methods that are part of the base
+``avocado.Test`` class, the following attributes are available on any
+``avocado_qemu.QemuSystemTest`` instance.
+
+vm
+""
+
+A QEMUMachine instance, initially configured according to the given
+``qemu_bin`` parameter.
+
+arch
+""""
+
+The architecture can be used on different levels of the stack, e.g. by
+the framework or by the test itself. At the framework level, it will
+currently influence the selection of a QEMU binary (when one is not
+explicitly given).
+
+Tests are also free to use this attribute value, for their own needs.
+A test may, for instance, use the same value when selecting the
+architecture of a kernel or disk image to boot a VM with.
+
+The ``arch`` attribute will be set to the test parameter of the same
+name. If one is not given explicitly, it will either be set to
+``None``, or, if the test is tagged with one (and only one)
+``:avocado: tags=arch:VALUE`` tag, it will be set to ``VALUE``.
+
+cpu
+"""
+
+The cpu model that will be set to all QEMUMachine instances created
+by the test.
+
+The ``cpu`` attribute will be set to the test parameter of the same
+name. If one is not given explicitly, it will either be set to
+``None ``, or, if the test is tagged with one (and only one)
+``:avocado: tags=cpu:VALUE`` tag, it will be set to ``VALUE``.
+
+machine
+"""""""
+
+The machine type that will be set to all QEMUMachine instances created
+by the test.
+
+The ``machine`` attribute will be set to the test parameter of the same
+name. If one is not given explicitly, it will either be set to
+``None``, or, if the test is tagged with one (and only one)
+``:avocado: tags=machine:VALUE`` tag, it will be set to ``VALUE``.
+
+qemu_bin
+""""""""
+
+The preserved value of the ``qemu_bin`` parameter or the result of the
+dynamic probe for a QEMU binary in the current working directory or
+source tree.
+
+LinuxTest
+^^^^^^^^^
+
+Besides the attributes present on the ``avocado_qemu.QemuSystemTest`` base
+class, the ``avocado_qemu.LinuxTest`` adds the following attributes:
+
+distro
+""""""
+
+The name of the Linux distribution used as the guest image for the
+test. The name should match the **Provider** column on the list
+of images supported by the avocado.utils.vmimage library:
+
+https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images
+
+distro_version
+""""""""""""""
+
+The version of the Linux distribution as the guest image for the
+test. The name should match the **Version** column on the list
+of images supported by the avocado.utils.vmimage library:
+
+https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images
+
+distro_checksum
+"""""""""""""""
+
+The sha256 hash of the guest image file used for the test.
+
+If this value is not set in the code or by a test parameter (with the
+same name), no validation on the integrity of the image will be
+performed.
+
+Parameter reference
+-------------------
+
+To understand how Avocado parameters are accessed by tests, and how
+they can be passed to tests, please refer to::
+
+ https://avocado-framework.readthedocs.io/en/latest/guides/writer/chapters/writing.html#accessing-test-parameters
+
+Parameter values can be easily seen in the log files, and will look
+like the following:
+
+.. code::
+
+ PARAMS (key=qemu_bin, path=*, default=./qemu-system-x86_64) => './qemu-system-x86_64
+
+Test
+^^^^
+
+arch
+""""
+
+The architecture that will influence the selection of a QEMU binary
+(when one is not explicitly given).
+
+Tests are also free to use this parameter value, for their own needs.
+A test may, for instance, use the same value when selecting the
+architecture of a kernel or disk image to boot a VM with.
+
+This parameter has a direct relation with the ``arch`` attribute. If
+not given, it will default to None.
+
+cpu
+"""
+
+The cpu model that will be set to all QEMUMachine instances created
+by the test.
+
+machine
+"""""""
+
+The machine type that will be set to all QEMUMachine instances created
+by the test.
+
+qemu_bin
+""""""""
+
+The exact QEMU binary to be used on QEMUMachine.
+
+LinuxTest
+^^^^^^^^^
+
+Besides the parameters present on the ``avocado_qemu.QemuSystemTest`` base
+class, the ``avocado_qemu.LinuxTest`` adds the following parameters:
+
+distro
+""""""
+
+The name of the Linux distribution used as the guest image for the
+test. The name should match the **Provider** column on the list
+of images supported by the avocado.utils.vmimage library:
+
+https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images
+
+distro_version
+""""""""""""""
+
+The version of the Linux distribution as the guest image for the
+test. The name should match the **Version** column on the list
+of images supported by the avocado.utils.vmimage library:
+
+https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images
+
+distro_checksum
+"""""""""""""""
+
+The sha256 hash of the guest image file used for the test.
+
+If this value is not set in the code or by this parameter no
+validation on the integrity of the image will be performed.
+
+Skipping tests
+--------------
+
+The Avocado framework provides Python decorators which allow for easily skip
+tests running under certain conditions. For example, on the lack of a binary
+on the test system or when the running environment is a CI system. For further
+information about those decorators, please refer to::
+
+ https://avocado-framework.readthedocs.io/en/latest/guides/writer/chapters/writing.html#skipping-tests
+
+While the conditions for skipping tests are often specifics of each one, there
+are recurring scenarios identified by the QEMU developers and the use of
+environment variables became a kind of standard way to enable/disable tests.
+
+Here is a list of the most used variables:
+
+AVOCADO_ALLOW_LARGE_STORAGE
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Tests which are going to fetch or produce assets considered *large* are not
+going to run unless that ``AVOCADO_ALLOW_LARGE_STORAGE=1`` is exported on
+the environment.
+
+The definition of *large* is a bit arbitrary here, but it usually means an
+asset which occupies at least 1GB of size on disk when uncompressed.
+
+SPEED
+^^^^^
+Tests which have a long runtime will not be run unless ``SPEED=slow`` is
+exported on the environment.
+
+The definition of *long* is a bit arbitrary here, and it depends on the
+usefulness of the test too. A unique test is worth spending more time on,
+small variations on existing tests perhaps less so. As a rough guide,
+a test or set of similar tests which take more than 100 seconds to
+complete.
+
+AVOCADO_ALLOW_UNTRUSTED_CODE
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+There are tests which will boot a kernel image or firmware that can be
+considered not safe to run on the developer's workstation, thus they are
+skipped by default. The definition of *not safe* is also arbitrary but
+usually it means a blob which either its source or build process aren't
+public available.
+
+You should export ``AVOCADO_ALLOW_UNTRUSTED_CODE=1`` on the environment in
+order to allow tests which make use of those kind of assets.
+
+AVOCADO_TIMEOUT_EXPECTED
+^^^^^^^^^^^^^^^^^^^^^^^^
+The Avocado framework has a timeout mechanism which interrupts tests to avoid the
+test suite of getting stuck. The timeout value can be set via test parameter or
+property defined in the test class, for further details::
+
+ https://avocado-framework.readthedocs.io/en/latest/guides/writer/chapters/writing.html#setting-a-test-timeout
+
+Even though the timeout can be set by the test developer, there are some tests
+that may not have a well-defined limit of time to finish under certain
+conditions. For example, tests that take longer to execute when QEMU is
+compiled with debug flags. Therefore, the ``AVOCADO_TIMEOUT_EXPECTED`` variable
+has been used to determine whether those tests should run or not.
+
+QEMU_TEST_FLAKY_TESTS
+^^^^^^^^^^^^^^^^^^^^^
+Some tests are not working reliably and thus are disabled by default.
+This includes tests that don't run reliably on GitLab's CI which
+usually expose real issues that are rarely seen on developer machines
+due to the constraints of the CI environment. If you encounter a
+similar situation then raise a bug and then mark the test as shown on
+the code snippet below:
+
+.. code::
+
+ # See https://gitlab.com/qemu-project/qemu/-/issues/nnnn
+ @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
+ def test(self):
+ do_something()
+
+You can also add ``:avocado: tags=flaky`` to the test meta-data so
+only the flaky tests can be run as a group:
+
+.. code::
+
+ env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado \
+ run tests/avocado -filter-by-tags=flaky
+
+Tests should not live in this state forever and should either be fixed
+or eventually removed.
+
+
+Uninstalling Avocado
+--------------------
+
+If you've followed the manual installation instructions above, you can
+easily uninstall Avocado. Start by listing the packages you have
+installed::
+
+ pip list --user
+
+And remove any package you want with::
+
+ pip uninstall <package_name>
+
+If you've used ``make check-avocado``, the Python virtual environment where
+Avocado is installed will be cleaned up as part of ``make check-clean``.
diff --git a/docs/devel/testing/blkdebug.rst b/docs/devel/testing/blkdebug.rst
new file mode 100644
index 0000000..63887c9
--- /dev/null
+++ b/docs/devel/testing/blkdebug.rst
@@ -0,0 +1,177 @@
+Block I/O error injection using ``blkdebug``
+============================================
+
+..
+ Copyright (C) 2014-2015 Red Hat Inc
+
+ This work is licensed under the terms of the GNU GPL, version 2 or later. See
+ the COPYING file in the top-level directory.
+
+The ``blkdebug`` block driver is a rule-based error injection engine. It can be
+used to exercise error code paths in block drivers including ``ENOSPC`` (out of
+space) and ``EIO``.
+
+This document gives an overview of the features available in ``blkdebug``.
+
+Background
+----------
+Block drivers have many error code paths that handle I/O errors. Image formats
+are especially complex since metadata I/O errors during cluster allocation or
+while updating tables happen halfway through request processing and require
+discipline to keep image files consistent.
+
+Error injection allows test cases to trigger I/O errors at specific points.
+This way, all error paths can be tested to make sure they are correct.
+
+Rules
+-----
+The ``blkdebug`` block driver takes a list of "rules" that tell the error injection
+engine when to fail an I/O request.
+
+Each I/O request is evaluated against the rules. If a rule matches the request
+then its "action" is executed.
+
+Rules can be placed in a configuration file; the configuration file
+follows the same .ini-like format used by QEMU's ``-readconfig`` option, and
+each section of the file represents a rule.
+
+The following configuration file defines a single rule::
+
+ $ cat blkdebug.conf
+ [inject-error]
+ event = "read_aio"
+ errno = "28"
+
+This rule fails all aio read requests with ``ENOSPC`` (28). Note that the errno
+value depends on the host. On Linux, see
+``/usr/include/asm-generic/errno-base.h`` for errno values.
+
+Invoke QEMU as follows::
+
+ $ qemu-system-x86_64
+ -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
+ -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
+
+Rules support the following attributes:
+
+``event``
+ which type of operation to match (e.g. ``read_aio``, ``write_aio``,
+ ``flush_to_os``, ``flush_to_disk``). See `Events`_ for
+ information on events.
+
+``state``
+ (optional) the engine must be in this state number in order for this
+ rule to match. See `State transitions`_ for information
+ on states.
+
+``errno``
+ the numeric errno value to return when a request matches this rule.
+ The errno values depend on the host since the numeric values are not
+ standardized in the POSIX specification.
+
+``sector``
+ (optional) a sector number that the request must overlap in order to
+ match this rule
+
+``once``
+ (optional, default ``off``) only execute this action on the first
+ matching request
+
+``immediately``
+ (optional, default ``off``) return a NULL ``BlockAIOCB``
+ pointer and fail without an errno instead. This
+ exercises the code path where ``BlockAIOCB`` fails and the
+ caller's ``BlockCompletionFunc`` is not invoked.
+
+Events
+------
+Block drivers provide information about the type of I/O request they are about
+to make so rules can match specific types of requests. For example, the ``qcow2``
+block driver tells ``blkdebug`` when it accesses the L1 table so rules can match
+only L1 table accesses and not other metadata or guest data requests.
+
+The core events are:
+
+``read_aio``
+ guest data read
+
+``write_aio``
+ guest data write
+
+``flush_to_os``
+ write out unwritten block driver state (e.g. cached metadata)
+
+``flush_to_disk``
+ flush the host block device's disk cache
+
+See ``qapi/block-core.json:BlkdebugEvent`` for the full list of events.
+You may need to grep block driver source code to understand the
+meaning of specific events.
+
+State transitions
+-----------------
+There are cases where more power is needed to match a particular I/O request in
+a longer sequence of requests. For example::
+
+ write_aio
+ flush_to_disk
+ write_aio
+
+How do we match the 2nd ``write_aio`` but not the first? This is where state
+transitions come in.
+
+The error injection engine has an integer called the "state" that always starts
+initialized to 1. The state integer is internal to ``blkdebug`` and cannot be
+observed from outside but rules can interact with it for powerful matching
+behavior.
+
+Rules can be conditional on the current state and they can transition to a new
+state.
+
+When a rule's "state" attribute is non-zero then the current state must equal
+the attribute in order for the rule to match.
+
+For example, to match the 2nd write_aio::
+
+ [set-state]
+ event = "write_aio"
+ state = "1"
+ new_state = "2"
+
+ [inject-error]
+ event = "write_aio"
+ state = "2"
+ errno = "5"
+
+The first ``write_aio`` request matches the ``set-state`` rule and transitions from
+state 1 to state 2. Once state 2 has been entered, the ``set-state`` rule no
+longer matches since it requires state 1. But the ``inject-error`` rule now
+matches the next ``write_aio`` request and injects ``EIO`` (5).
+
+State transition rules support the following attributes:
+
+``event``
+ which type of operation to match (e.g. ``read_aio``, ``write_aio``,
+ ``flush_to_os`, ``flush_to_disk``). See `Events`_ for
+ information on events.
+
+``state``
+ (optional) the engine must be in this state number in order for this
+ rule to match
+
+``new_state``
+ transition to this state number
+
+Suspend and resume
+------------------
+Exercising code paths in block drivers may require specific ordering amongst
+concurrent requests. The "breakpoint" feature allows requests to be halted on
+a ``blkdebug`` event and resumed later. This makes it possible to achieve
+deterministic ordering when multiple requests are in flight.
+
+Breakpoints on ``blkdebug`` events are associated with a user-defined ``tag`` string.
+This tag serves as an identifier by which the request can be resumed at a later
+point.
+
+See the ``qemu-io(1)`` ``break``, ``resume``, ``remove_break``, and ``wait_break``
+commands for details.
diff --git a/docs/devel/blkverify.txt b/docs/devel/testing/blkverify.rst
index aca826c..2a71778 100644
--- a/docs/devel/blkverify.txt
+++ b/docs/devel/testing/blkverify.rst
@@ -1,8 +1,10 @@
-= Block driver correctness testing with blkverify =
+Block driver correctness testing with ``blkverify``
+===================================================
-== Introduction ==
+Introduction
+------------
-This document describes how to use the blkverify protocol to test that a block
+This document describes how to use the ``blkverify`` protocol to test that a block
driver is operating correctly.
It is difficult to test and debug block drivers against real guests. Often
@@ -11,12 +13,13 @@ of the executable. Other times obscure errors are raised by a program inside
the guest. These issues are extremely hard to trace back to bugs in the block
driver.
-Blkverify solves this problem by catching data corruption inside QEMU the first
+``blkverify`` solves this problem by catching data corruption inside QEMU the first
time bad data is read and reporting the disk sector that is corrupted.
-== How it works ==
+How it works
+------------
-The blkverify protocol has two child block devices, the "test" device and the
+The ``blkverify`` protocol has two child block devices, the "test" device and the
"raw" device. Read/write operations are mirrored to both devices so their
state should always be in sync.
@@ -25,13 +28,14 @@ contents to the "test" image. The idea is that the "raw" device will handle
read/write operations correctly and not corrupt data. It can be used as a
reference for comparison against the "test" device.
-After a mirrored read operation completes, blkverify will compare the data and
+After a mirrored read operation completes, ``blkverify`` will compare the data and
raise an error if it is not identical. This makes it possible to catch the
first instance where corrupt data is read.
-== Example ==
+Example
+-------
-Imagine raw.img has 0xcd repeated throughout its first sector:
+Imagine raw.img has 0xcd repeated throughout its first sector::
$ ./qemu-io -c 'read -v 0 512' raw.img
00000000: cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd ................
@@ -42,7 +46,7 @@ Imagine raw.img has 0xcd repeated throughout its first sector:
read 512/512 bytes at offset 0
512.000000 bytes, 1 ops; 0.0000 sec (97.656 MiB/sec and 200000.0000 ops/sec)
-And test.img is corrupt, its first sector is zeroed when it shouldn't be:
+And test.img is corrupt, its first sector is zeroed when it shouldn't be::
$ ./qemu-io -c 'read -v 0 512' test.img
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
@@ -53,17 +57,17 @@ And test.img is corrupt, its first sector is zeroed when it shouldn't be:
read 512/512 bytes at offset 0
512.000000 bytes, 1 ops; 0.0000 sec (81.380 MiB/sec and 166666.6667 ops/sec)
-This error is caught by blkverify:
+This error is caught by ``blkverify``::
$ ./qemu-io -c 'read 0 512' blkverify:a.img:b.img
blkverify: read sector_num=0 nb_sectors=4 contents mismatch in sector 0
-A more realistic scenario is verifying the installation of a guest OS:
+A more realistic scenario is verifying the installation of a guest OS::
$ ./qemu-img create raw.img 16G
$ ./qemu-img create -f qcow2 test.qcow2 16G
$ ./qemu-system-x86_64 -cdrom debian.iso \
-drive file=blkverify:raw.img:test.qcow2
-If the installation is aborted when blkverify detects corruption, use qemu-io
+If the installation is aborted when ``blkverify`` detects corruption, use ``qemu-io``
to explore the contents of the disk image at the sector in question.
diff --git a/docs/devel/ci-definitions.rst.inc b/docs/devel/testing/ci-definitions.rst.inc
index 6d5c6fd..6d5c6fd 100644
--- a/docs/devel/ci-definitions.rst.inc
+++ b/docs/devel/testing/ci-definitions.rst.inc
diff --git a/docs/devel/ci-jobs.rst.inc b/docs/devel/testing/ci-jobs.rst.inc
index 3756bbe..3756bbe 100644
--- a/docs/devel/ci-jobs.rst.inc
+++ b/docs/devel/testing/ci-jobs.rst.inc
diff --git a/docs/devel/ci-runners.rst.inc b/docs/devel/testing/ci-runners.rst.inc
index 67b23d3..67b23d3 100644
--- a/docs/devel/ci-runners.rst.inc
+++ b/docs/devel/testing/ci-runners.rst.inc
diff --git a/docs/devel/ci.rst b/docs/devel/testing/ci.rst
index ed88a20..ed88a20 100644
--- a/docs/devel/ci.rst
+++ b/docs/devel/testing/ci.rst
diff --git a/docs/devel/testing/functional.rst b/docs/devel/testing/functional.rst
new file mode 100644
index 0000000..bf6f1bb
--- /dev/null
+++ b/docs/devel/testing/functional.rst
@@ -0,0 +1,338 @@
+.. _checkfunctional-ref:
+
+Functional testing with Python
+==============================
+
+The ``tests/functional`` directory hosts functional tests written in
+Python. They are usually higher level tests, and may interact with
+external resources and with various guest operating systems.
+The functional tests have initially evolved from the Avocado tests, so there
+is a lot of similarity to those tests here (see :ref:`checkavocado-ref` for
+details about the Avocado tests).
+
+The tests should be written in the style of the Python `unittest`_ framework,
+using stdio for the TAP protocol. The folder ``tests/functional/qemu_test``
+provides classes (e.g. the ``QemuBaseTest``, ``QemuUserTest`` and the
+``QemuSystemTest`` classes) and utility functions that help to get your test
+into the right shape, e.g. by replacing the 'stdout' python object to redirect
+the normal output of your test to stderr instead.
+
+Note that if you don't use one of the QemuBaseTest based classes for your
+test, or if you spawn subprocesses from your test, you have to make sure
+that there is no TAP-incompatible output written to stdio, e.g. either by
+prefixing every line with a "# " to mark the output as a TAP comment, or
+e.g. by capturing the stdout output of subprocesses (redirecting it to
+stderr is OK).
+
+Tests based on ``qemu_test.QemuSystemTest`` can easily:
+
+ * Customize the command line arguments given to the convenience
+ ``self.vm`` attribute (a QEMUMachine instance)
+
+ * Interact with the QEMU monitor, send QMP commands and check
+ their results
+
+ * Interact with the guest OS, using the convenience console device
+ (which may be useful to assert the effectiveness and correctness of
+ command line arguments or QMP commands)
+
+ * Download (and cache) remote data files, such as firmware and kernel
+ images
+
+Running tests
+-------------
+
+You can run the functional tests simply by executing:
+
+.. code::
+
+ make check-functional
+
+It is also possible to run tests for a certain target only, for example
+the following line will only run the tests for the x86_64 target:
+
+.. code::
+
+ make check-functional-x86_64
+
+To run a single test file without the meson test runner, you can also
+execute the file directly by specifying two environment variables first,
+the PYTHONPATH that has to include the python folder and the tests/functional
+folder of the source tree, and QEMU_TEST_QEMU_BINARY that has to point
+to the QEMU binary that should be used for the test, for example::
+
+ $ export PYTHONPATH=../python:../tests/functional
+ $ export QEMU_TEST_QEMU_BINARY=$PWD/qemu-system-x86_64
+ $ python3 ../tests/functional/test_file.py
+
+Overview
+--------
+
+The ``tests/functional/qemu_test`` directory provides the ``qemu_test``
+Python module, containing the ``qemu_test.QemuSystemTest`` class.
+Here is a simple usage example:
+
+.. code::
+
+ #!/usr/bin/env python3
+
+ from qemu_test import QemuSystemTest
+
+ class Version(QemuSystemTest):
+
+ def test_qmp_human_info_version(self):
+ self.vm.launch()
+ res = self.vm.cmd('human-monitor-command',
+ command_line='info version')
+ self.assertRegex(res, r'^(\d+\.\d+\.\d)')
+
+ if __name__ == '__main__':
+ QemuSystemTest.main()
+
+By providing the "hash bang" line at the beginning of the script, marking
+the file as executable and by calling into QemuSystemTest.main(), the test
+can also be run stand-alone, without a test runner. OTOH when run via a test
+runner, the QemuSystemTest.main() function takes care of running the test
+functions in the right fassion (e.g. with TAP output that is required by the
+meson test runner).
+
+The ``qemu_test.QemuSystemTest`` base test class
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``qemu_test.QemuSystemTest`` class has a number of characteristics
+that are worth being mentioned.
+
+First of all, it attempts to give each test a ready to use QEMUMachine
+instance, available at ``self.vm``. Because many tests will tweak the
+QEMU command line, launching the QEMUMachine (by using ``self.vm.launch()``)
+is left to the test writer.
+
+The base test class has also support for tests with more than one
+QEMUMachine. The way to get machines is through the ``self.get_vm()``
+method which will return a QEMUMachine instance. The ``self.get_vm()``
+method accepts arguments that will be passed to the QEMUMachine creation
+and also an optional ``name`` attribute so you can identify a specific
+machine and get it more than once through the tests methods. A simple
+and hypothetical example follows:
+
+.. code::
+
+ from qemu_test import QemuSystemTest
+
+ class MultipleMachines(QemuSystemTest):
+ def test_multiple_machines(self):
+ first_machine = self.get_vm()
+ second_machine = self.get_vm()
+ self.get_vm(name='third_machine').launch()
+
+ first_machine.launch()
+ second_machine.launch()
+
+ first_res = first_machine.cmd(
+ 'human-monitor-command',
+ command_line='info version')
+
+ second_res = second_machine.cmd(
+ 'human-monitor-command',
+ command_line='info version')
+
+ third_res = self.get_vm(name='third_machine').cmd(
+ 'human-monitor-command',
+ command_line='info version')
+
+ self.assertEqual(first_res, second_res, third_res)
+
+At test "tear down", ``qemu_test.QemuSystemTest`` handles all the QEMUMachines
+shutdown.
+
+QEMUMachine
+-----------
+
+The QEMUMachine API is already widely used in the Python iotests,
+device-crash-test and other Python scripts. It's a wrapper around the
+execution of a QEMU binary, giving its users:
+
+ * the ability to set command line arguments to be given to the QEMU
+ binary
+
+ * a ready to use QMP connection and interface, which can be used to
+ send commands and inspect its results, as well as asynchronous
+ events
+
+ * convenience methods to set commonly used command line arguments in
+ a more succinct and intuitive way
+
+QEMU binary selection
+^^^^^^^^^^^^^^^^^^^^^
+
+The QEMU binary used for the ``self.vm`` QEMUMachine instance will
+primarily depend on the value of the ``qemu_bin`` class attribute.
+If it is not explicitly set by the test code, its default value will
+be the result the QEMU_TEST_QEMU_BINARY environment variable.
+
+Attribute reference
+-------------------
+
+QemuBaseTest
+^^^^^^^^^^^^
+
+The following attributes are available on any ``qemu_test.QemuBaseTest``
+instance.
+
+arch
+""""
+
+The target architecture of the QEMU binary.
+
+Tests are also free to use this attribute value, for their own needs.
+A test may, for instance, use this value when selecting the architecture
+of a kernel or disk image to boot a VM with.
+
+qemu_bin
+""""""""
+
+The preserved value of the ``QEMU_TEST_QEMU_BINARY`` environment
+variable.
+
+QemuUserTest
+^^^^^^^^^^^^
+
+The QemuUserTest class can be used for running an executable via the
+usermode emulation binaries.
+
+QemuSystemTest
+^^^^^^^^^^^^^^
+
+The QemuSystemTest class can be used for running tests via one of the
+qemu-system-* binaries.
+
+vm
+""
+
+A QEMUMachine instance, initially configured according to the given
+``qemu_bin`` parameter.
+
+cpu
+"""
+
+The cpu model that will be set to all QEMUMachine instances created
+by the test.
+
+machine
+"""""""
+
+The machine type that will be set to all QEMUMachine instances created
+by the test. By using the set_machine() function of the QemuSystemTest
+class to set this attribute, you can automatically check whether the
+machine is available to skip the test in case it is not built into the
+QEMU binary.
+
+Asset handling
+--------------
+
+Many functional tests download assets (e.g. Linux kernels, initrds,
+firmware images, etc.) from the internet to be able to run tests with
+them. This imposes additional challenges to the test framework.
+
+First there is the the problem that some people might not have an
+unconstrained internet connection, so such tests should not be run by
+default when running ``make check``. To accomplish this situation,
+the tests that download files should only be added to the "thorough"
+speed mode in the meson.build file, while the "quick" speed mode is
+fine for functional tests that can be run without downloading files.
+``make check`` then only runs the quick functional tests along with
+the other quick tests from the other test suites. If you choose to
+run only run ``make check-functional``, the "thorough" tests will be
+executed, too. And to run all functional tests along with the others,
+you can use something like::
+
+ make -j$(nproc) check SPEED=thorough
+
+The second problem with downloading files from the internet are time
+constraints. The time for downloading files should not be taken into
+account when the test is running and the timeout of the test is ticking
+(since downloading can be very slow, depending on the network bandwidth).
+This problem is solved by downloading the assets ahead of time, before
+the tests are run. This pre-caching is done with the qemu_test.Asset
+class. To use it in your test, declare an asset in your test class with
+its URL and SHA256 checksum like this::
+
+ ASSET_somename = (
+ ('https://www.qemu.org/assets/images/qemu_head_200.png'),
+ '34b74cad46ea28a2966c1d04e102510daf1fd73e6582b6b74523940d5da029dd')
+
+In your test function, you can then get the file name of the cached
+asset like this::
+
+ def test_function(self):
+ file_path = self.ASSET_somename.fetch()
+
+The pre-caching will be done automatically when running
+``make check-functional`` (but not when running e.g.
+``make check-functional-<target>``). In case you just want to download
+the assets without running the tests, you can do so by running::
+
+ make precache-functional
+
+The cache is populated in the ``~/.cache/qemu/download`` directory by
+default, but the location can be changed by setting the
+``QEMU_TEST_CACHE_DIR`` environment variable.
+
+Skipping tests
+--------------
+
+Since the test framework is based on the common Python unittest framework,
+you can use the usual Python decorators which allow for easily skipping
+tests running under certain conditions, for example, on the lack of a binary
+on the test system or when the running environment is a CI system. For further
+information about those decorators, please refer to:
+
+ https://docs.python.org/3/library/unittest.html#skipping-tests-and-expected-failures
+
+While the conditions for skipping tests are often specifics of each one, there
+are recurring scenarios identified by the QEMU developers and the use of
+environment variables became a kind of standard way to enable/disable tests.
+
+Here is a list of the most used variables:
+
+QEMU_TEST_ALLOW_LARGE_STORAGE
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Tests which are going to fetch or produce assets considered *large* are not
+going to run unless that ``QEMU_TEST_ALLOW_LARGE_STORAGE=1`` is exported on
+the environment.
+
+The definition of *large* is a bit arbitrary here, but it usually means an
+asset which occupies at least 1GB of size on disk when uncompressed.
+
+QEMU_TEST_ALLOW_UNTRUSTED_CODE
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+There are tests which will boot a kernel image or firmware that can be
+considered not safe to run on the developer's workstation, thus they are
+skipped by default. The definition of *not safe* is also arbitrary but
+usually it means a blob which either its source or build process aren't
+public available.
+
+You should export ``QEMU_TEST_ALLOW_UNTRUSTED_CODE=1`` on the environment in
+order to allow tests which make use of those kind of assets.
+
+QEMU_TEST_FLAKY_TESTS
+^^^^^^^^^^^^^^^^^^^^^
+Some tests are not working reliably and thus are disabled by default.
+This includes tests that don't run reliably on GitLab's CI which
+usually expose real issues that are rarely seen on developer machines
+due to the constraints of the CI environment. If you encounter a
+similar situation then raise a bug and then mark the test as shown on
+the code snippet below:
+
+.. code::
+
+ # See https://gitlab.com/qemu-project/qemu/-/issues/nnnn
+ @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
+ def test(self):
+ do_something()
+
+Tests should not live in this state forever and should either be fixed
+or eventually removed.
+
+
+.. _unittest: https://docs.python.org/3/library/unittest.html
diff --git a/docs/devel/fuzzing.rst b/docs/devel/testing/fuzzing.rst
index 3bfcb33..c3ac084 100644
--- a/docs/devel/fuzzing.rst
+++ b/docs/devel/testing/fuzzing.rst
@@ -21,11 +21,12 @@ Building the fuzzers
To build the fuzzers, install a recent version of clang:
Configure with (substitute the clang binaries with the version you installed).
-Here, enable-sanitizers, is optional but it allows us to reliably detect bugs
-such as out-of-bounds accesses, use-after-frees, double-frees etc.::
+Here, enable-asan and enable-ubsan are optional but they allow us to reliably
+detect bugs such as out-of-bounds accesses, uses-after-free, double-frees
+etc.::
- CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \
- --enable-sanitizers
+ CC=clang-8 CXX=clang++-8 /path/to/configure \
+ --enable-fuzzing --enable-asan --enable-ubsan
Fuzz targets are built similarly to system targets::
diff --git a/docs/devel/testing/index.rst b/docs/devel/testing/index.rst
new file mode 100644
index 0000000..1171f7d
--- /dev/null
+++ b/docs/devel/testing/index.rst
@@ -0,0 +1,18 @@
+Testing QEMU
+------------
+
+Details about how to test QEMU and how it is integrated into our CI
+testing infrastructure.
+
+.. toctree::
+ :maxdepth: 3
+
+ main
+ qtest
+ functional
+ avocado
+ acpi-bits
+ ci
+ fuzzing
+ blkdebug
+ blkverify
diff --git a/docs/devel/testing.rst b/docs/devel/testing/main.rst
index 23d3f44..09725e8 100644
--- a/docs/devel/testing.rst
+++ b/docs/devel/testing/main.rst
@@ -3,13 +3,28 @@
Testing in QEMU
===============
-This document describes the testing infrastructure in QEMU.
+QEMU's testing infrastructure is fairly complex as it covers
+everything from unit testing and exercising specific sub-systems all
+the way to full blown acceptance tests. To get an overview of the
+tests you can run ``make check-help`` from either the source or build
+tree.
+
+Most (but not all) tests are also integrated into the meson build
+system so can be run directly from the build tree, for example:
+
+.. code::
+
+ [./pyvenv/bin/]meson test --suite qemu:softfloat
+
+will run just the softfloat tests.
+
+The rest of this document will cover the details for specific test
+groups.
Testing with "make check"
-------------------------
-The "make check" testing family includes most of the C based tests in QEMU. For
-a quick help, run ``make check-help`` from the source tree.
+The "make check" testing family includes most of the C based tests in QEMU.
The usual way to run these tests is:
@@ -485,12 +500,6 @@ first to contribute the mapping to the ``libvirt-ci`` project:
`CI <https://www.qemu.org/docs/master/devel/ci.html>`__ documentation
page on how to trigger gitlab CI pipelines on your change.
- * Please also trigger gitlab container generation pipelines on your change
- for as many OS distros as practical to make sure that there are no
- obvious breakages when adding the new pre-requisite. Please see
- `CI <https://www.qemu.org/docs/master/devel/ci.html>`__ documentation
- page on how to trigger gitlab CI pipelines on your change.
-
For enterprise distros that default to old, end-of-life versions of the
Python runtime, QEMU uses a separate set of mappings that work with more
recent versions. These can be found in ``tests/lcitool/mappings.yml``.
@@ -847,46 +856,24 @@ supported. To start the fuzzer, run
Alternatively, some command different from ``qemu-img info`` can be tested, by
changing the ``-c`` option.
-Integration tests using the Avocado Framework
----------------------------------------------
-
-The ``tests/avocado`` directory hosts integration tests. They're usually
-higher level tests, and may interact with external resources and with
-various guest operating systems.
-
-These tests are written using the Avocado Testing Framework (which must
-be installed separately) in conjunction with a the ``avocado_qemu.Test``
-class, implemented at ``tests/avocado/avocado_qemu``.
-
-Tests based on ``avocado_qemu.Test`` can easily:
-
- * Customize the command line arguments given to the convenience
- ``self.vm`` attribute (a QEMUMachine instance)
-
- * Interact with the QEMU monitor, send QMP commands and check
- their results
-
- * Interact with the guest OS, using the convenience console device
- (which may be useful to assert the effectiveness and correctness of
- command line arguments or QMP commands)
+Functional tests using Python
+-----------------------------
- * Interact with external data files that accompany the test itself
- (see ``self.get_data()``)
+The ``tests/functional`` directory hosts functional tests written in
+Python. You can run the functional tests simply by executing:
- * Download (and cache) remote data files, such as firmware and kernel
- images
+.. code::
- * Have access to a library of guest OS images (by means of the
- ``avocado.utils.vmimage`` library)
+ make check-functional
- * Make use of various other test related utilities available at the
- test class itself and at the utility library:
+See :ref:`checkfunctional-ref` for more details.
- - http://avocado-framework.readthedocs.io/en/latest/api/test/avocado.html#avocado.Test
- - http://avocado-framework.readthedocs.io/en/latest/api/utils/avocado.utils.html
+Integration tests using the Avocado Framework
+---------------------------------------------
-Running tests
-~~~~~~~~~~~~~
+The ``tests/avocado`` directory hosts integration tests. They're usually
+higher level tests, and may interact with external resources and with
+various guest operating systems.
You can run the avocado tests simply by executing:
@@ -894,537 +881,8 @@ You can run the avocado tests simply by executing:
make check-avocado
-This involves the automatic installation, from PyPI, of all the
-necessary avocado-framework dependencies into the QEMU venv within the
-build tree (at ``./pyvenv``). Test results are also saved within the
-build tree (at ``tests/results``).
-
-Note: the build environment must be using a Python 3 stack, and have
-the ``venv`` and ``pip`` packages installed. If necessary, make sure
-``configure`` is called with ``--python=`` and that those modules are
-available. On Debian and Ubuntu based systems, depending on the
-specific version, they may be on packages named ``python3-venv`` and
-``python3-pip``.
-
-It is also possible to run tests based on tags using the
-``make check-avocado`` command and the ``AVOCADO_TAGS`` environment
-variable:
-
-.. code::
-
- make check-avocado AVOCADO_TAGS=quick
-
-Note that tags separated with commas have an AND behavior, while tags
-separated by spaces have an OR behavior. For more information on Avocado
-tags, see:
-
- https://avocado-framework.readthedocs.io/en/latest/guides/user/chapters/tags.html
-
-To run a single test file, a couple of them, or a test within a file
-using the ``make check-avocado`` command, set the ``AVOCADO_TESTS``
-environment variable with the test files or test names. To run all
-tests from a single file, use:
-
- .. code::
-
- make check-avocado AVOCADO_TESTS=$FILEPATH
-
-The same is valid to run tests from multiple test files:
-
- .. code::
-
- make check-avocado AVOCADO_TESTS='$FILEPATH1 $FILEPATH2'
-
-To run a single test within a file, use:
-
- .. code::
-
- make check-avocado AVOCADO_TESTS=$FILEPATH:$TESTCLASS.$TESTNAME
-
-The same is valid to run single tests from multiple test files:
-
- .. code::
-
- make check-avocado AVOCADO_TESTS='$FILEPATH1:$TESTCLASS1.$TESTNAME1 $FILEPATH2:$TESTCLASS2.$TESTNAME2'
-
-The scripts installed inside the virtual environment may be used
-without an "activation". For instance, the Avocado test runner
-may be invoked by running:
-
- .. code::
-
- pyvenv/bin/avocado run $OPTION1 $OPTION2 tests/avocado/
-
-Note that if ``make check-avocado`` was not executed before, it is
-possible to create the Python virtual environment with the dependencies
-needed running:
-
- .. code::
-
- make check-venv
-
-It is also possible to run tests from a single file or a single test within
-a test file. To run tests from a single file within the build tree, use:
-
- .. code::
-
- pyvenv/bin/avocado run tests/avocado/$TESTFILE
-
-To run a single test within a test file, use:
-
- .. code::
-
- pyvenv/bin/avocado run tests/avocado/$TESTFILE:$TESTCLASS.$TESTNAME
-
-Valid test names are visible in the output from any previous execution
-of Avocado or ``make check-avocado``, and can also be queried using:
-
- .. code::
-
- pyvenv/bin/avocado list tests/avocado
-
-Manual Installation
-~~~~~~~~~~~~~~~~~~~
-
-To manually install Avocado and its dependencies, run:
-
-.. code::
-
- pip install --user avocado-framework
-
-Alternatively, follow the instructions on this link:
-
- https://avocado-framework.readthedocs.io/en/latest/guides/user/chapters/installing.html
-
-Overview
-~~~~~~~~
-
-The ``tests/avocado/avocado_qemu`` directory provides the
-``avocado_qemu`` Python module, containing the ``avocado_qemu.Test``
-class. Here's a simple usage example:
-
-.. code::
-
- from avocado_qemu import QemuSystemTest
-
-
- class Version(QemuSystemTest):
- """
- :avocado: tags=quick
- """
- def test_qmp_human_info_version(self):
- self.vm.launch()
- res = self.vm.cmd('human-monitor-command',
- command_line='info version')
- self.assertRegex(res, r'^(\d+\.\d+\.\d)')
-
-To execute your test, run:
-
-.. code::
-
- avocado run version.py
-
-Tests may be classified according to a convention by using docstring
-directives such as ``:avocado: tags=TAG1,TAG2``. To run all tests
-in the current directory, tagged as "quick", run:
-
-.. code::
-
- avocado run -t quick .
-
-The ``avocado_qemu.Test`` base test class
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The ``avocado_qemu.Test`` class has a number of characteristics that
-are worth being mentioned right away.
-
-First of all, it attempts to give each test a ready to use QEMUMachine
-instance, available at ``self.vm``. Because many tests will tweak the
-QEMU command line, launching the QEMUMachine (by using ``self.vm.launch()``)
-is left to the test writer.
-
-The base test class has also support for tests with more than one
-QEMUMachine. The way to get machines is through the ``self.get_vm()``
-method which will return a QEMUMachine instance. The ``self.get_vm()``
-method accepts arguments that will be passed to the QEMUMachine creation
-and also an optional ``name`` attribute so you can identify a specific
-machine and get it more than once through the tests methods. A simple
-and hypothetical example follows:
-
-.. code::
-
- from avocado_qemu import QemuSystemTest
-
-
- class MultipleMachines(QemuSystemTest):
- def test_multiple_machines(self):
- first_machine = self.get_vm()
- second_machine = self.get_vm()
- self.get_vm(name='third_machine').launch()
-
- first_machine.launch()
- second_machine.launch()
-
- first_res = first_machine.cmd(
- 'human-monitor-command',
- command_line='info version')
-
- second_res = second_machine.cmd(
- 'human-monitor-command',
- command_line='info version')
-
- third_res = self.get_vm(name='third_machine').cmd(
- 'human-monitor-command',
- command_line='info version')
-
- self.assertEqual(first_res, second_res, third_res)
-
-At test "tear down", ``avocado_qemu.Test`` handles all the QEMUMachines
-shutdown.
+See :ref:`checkavocado-ref` for more details.
-The ``avocado_qemu.LinuxTest`` base test class
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The ``avocado_qemu.LinuxTest`` is further specialization of the
-``avocado_qemu.Test`` class, so it contains all the characteristics of
-the later plus some extra features.
-
-First of all, this base class is intended for tests that need to
-interact with a fully booted and operational Linux guest. At this
-time, it uses a Fedora 31 guest image. The most basic example looks
-like this:
-
-.. code::
-
- from avocado_qemu import LinuxTest
-
-
- class SomeTest(LinuxTest):
-
- def test(self):
- self.launch_and_wait()
- self.ssh_command('some_command_to_be_run_in_the_guest')
-
-Please refer to tests that use ``avocado_qemu.LinuxTest`` under
-``tests/avocado`` for more examples.
-
-QEMUMachine
-~~~~~~~~~~~
-
-The QEMUMachine API is already widely used in the Python iotests,
-device-crash-test and other Python scripts. It's a wrapper around the
-execution of a QEMU binary, giving its users:
-
- * the ability to set command line arguments to be given to the QEMU
- binary
-
- * a ready to use QMP connection and interface, which can be used to
- send commands and inspect its results, as well as asynchronous
- events
-
- * convenience methods to set commonly used command line arguments in
- a more succinct and intuitive way
-
-QEMU binary selection
-^^^^^^^^^^^^^^^^^^^^^
-
-The QEMU binary used for the ``self.vm`` QEMUMachine instance will
-primarily depend on the value of the ``qemu_bin`` parameter. If it's
-not explicitly set, its default value will be the result of a dynamic
-probe in the same source tree. A suitable binary will be one that
-targets the architecture matching host machine.
-
-Based on this description, test writers will usually rely on one of
-the following approaches:
-
-1) Set ``qemu_bin``, and use the given binary
-
-2) Do not set ``qemu_bin``, and use a QEMU binary named like
- "qemu-system-${arch}", either in the current
- working directory, or in the current source tree.
-
-The resulting ``qemu_bin`` value will be preserved in the
-``avocado_qemu.Test`` as an attribute with the same name.
-
-Attribute reference
-~~~~~~~~~~~~~~~~~~~
-
-Test
-^^^^
-
-Besides the attributes and methods that are part of the base
-``avocado.Test`` class, the following attributes are available on any
-``avocado_qemu.Test`` instance.
-
-vm
-''
-
-A QEMUMachine instance, initially configured according to the given
-``qemu_bin`` parameter.
-
-arch
-''''
-
-The architecture can be used on different levels of the stack, e.g. by
-the framework or by the test itself. At the framework level, it will
-currently influence the selection of a QEMU binary (when one is not
-explicitly given).
-
-Tests are also free to use this attribute value, for their own needs.
-A test may, for instance, use the same value when selecting the
-architecture of a kernel or disk image to boot a VM with.
-
-The ``arch`` attribute will be set to the test parameter of the same
-name. If one is not given explicitly, it will either be set to
-``None``, or, if the test is tagged with one (and only one)
-``:avocado: tags=arch:VALUE`` tag, it will be set to ``VALUE``.
-
-cpu
-'''
-
-The cpu model that will be set to all QEMUMachine instances created
-by the test.
-
-The ``cpu`` attribute will be set to the test parameter of the same
-name. If one is not given explicitly, it will either be set to
-``None ``, or, if the test is tagged with one (and only one)
-``:avocado: tags=cpu:VALUE`` tag, it will be set to ``VALUE``.
-
-machine
-'''''''
-
-The machine type that will be set to all QEMUMachine instances created
-by the test.
-
-The ``machine`` attribute will be set to the test parameter of the same
-name. If one is not given explicitly, it will either be set to
-``None``, or, if the test is tagged with one (and only one)
-``:avocado: tags=machine:VALUE`` tag, it will be set to ``VALUE``.
-
-qemu_bin
-''''''''
-
-The preserved value of the ``qemu_bin`` parameter or the result of the
-dynamic probe for a QEMU binary in the current working directory or
-source tree.
-
-LinuxTest
-^^^^^^^^^
-
-Besides the attributes present on the ``avocado_qemu.Test`` base
-class, the ``avocado_qemu.LinuxTest`` adds the following attributes:
-
-distro
-''''''
-
-The name of the Linux distribution used as the guest image for the
-test. The name should match the **Provider** column on the list
-of images supported by the avocado.utils.vmimage library:
-
-https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images
-
-distro_version
-''''''''''''''
-
-The version of the Linux distribution as the guest image for the
-test. The name should match the **Version** column on the list
-of images supported by the avocado.utils.vmimage library:
-
-https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images
-
-distro_checksum
-'''''''''''''''
-
-The sha256 hash of the guest image file used for the test.
-
-If this value is not set in the code or by a test parameter (with the
-same name), no validation on the integrity of the image will be
-performed.
-
-Parameter reference
-~~~~~~~~~~~~~~~~~~~
-
-To understand how Avocado parameters are accessed by tests, and how
-they can be passed to tests, please refer to::
-
- https://avocado-framework.readthedocs.io/en/latest/guides/writer/chapters/writing.html#accessing-test-parameters
-
-Parameter values can be easily seen in the log files, and will look
-like the following:
-
-.. code::
-
- PARAMS (key=qemu_bin, path=*, default=./qemu-system-x86_64) => './qemu-system-x86_64
-
-Test
-^^^^
-
-arch
-''''
-
-The architecture that will influence the selection of a QEMU binary
-(when one is not explicitly given).
-
-Tests are also free to use this parameter value, for their own needs.
-A test may, for instance, use the same value when selecting the
-architecture of a kernel or disk image to boot a VM with.
-
-This parameter has a direct relation with the ``arch`` attribute. If
-not given, it will default to None.
-
-cpu
-'''
-
-The cpu model that will be set to all QEMUMachine instances created
-by the test.
-
-machine
-'''''''
-
-The machine type that will be set to all QEMUMachine instances created
-by the test.
-
-qemu_bin
-''''''''
-
-The exact QEMU binary to be used on QEMUMachine.
-
-LinuxTest
-^^^^^^^^^
-
-Besides the parameters present on the ``avocado_qemu.Test`` base
-class, the ``avocado_qemu.LinuxTest`` adds the following parameters:
-
-distro
-''''''
-
-The name of the Linux distribution used as the guest image for the
-test. The name should match the **Provider** column on the list
-of images supported by the avocado.utils.vmimage library:
-
-https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images
-
-distro_version
-''''''''''''''
-
-The version of the Linux distribution as the guest image for the
-test. The name should match the **Version** column on the list
-of images supported by the avocado.utils.vmimage library:
-
-https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images
-
-distro_checksum
-'''''''''''''''
-
-The sha256 hash of the guest image file used for the test.
-
-If this value is not set in the code or by this parameter no
-validation on the integrity of the image will be performed.
-
-Skipping tests
-~~~~~~~~~~~~~~
-
-The Avocado framework provides Python decorators which allow for easily skip
-tests running under certain conditions. For example, on the lack of a binary
-on the test system or when the running environment is a CI system. For further
-information about those decorators, please refer to::
-
- https://avocado-framework.readthedocs.io/en/latest/guides/writer/chapters/writing.html#skipping-tests
-
-While the conditions for skipping tests are often specifics of each one, there
-are recurring scenarios identified by the QEMU developers and the use of
-environment variables became a kind of standard way to enable/disable tests.
-
-Here is a list of the most used variables:
-
-AVOCADO_ALLOW_LARGE_STORAGE
-^^^^^^^^^^^^^^^^^^^^^^^^^^^
-Tests which are going to fetch or produce assets considered *large* are not
-going to run unless that ``AVOCADO_ALLOW_LARGE_STORAGE=1`` is exported on
-the environment.
-
-The definition of *large* is a bit arbitrary here, but it usually means an
-asset which occupies at least 1GB of size on disk when uncompressed.
-
-SPEED
-^^^^^
-Tests which have a long runtime will not be run unless ``SPEED=slow`` is
-exported on the environment.
-
-The definition of *long* is a bit arbitrary here, and it depends on the
-usefulness of the test too. A unique test is worth spending more time on,
-small variations on existing tests perhaps less so. As a rough guide,
-a test or set of similar tests which take more than 100 seconds to
-complete.
-
-AVOCADO_ALLOW_UNTRUSTED_CODE
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-There are tests which will boot a kernel image or firmware that can be
-considered not safe to run on the developer's workstation, thus they are
-skipped by default. The definition of *not safe* is also arbitrary but
-usually it means a blob which either its source or build process aren't
-public available.
-
-You should export ``AVOCADO_ALLOW_UNTRUSTED_CODE=1`` on the environment in
-order to allow tests which make use of those kind of assets.
-
-AVOCADO_TIMEOUT_EXPECTED
-^^^^^^^^^^^^^^^^^^^^^^^^
-The Avocado framework has a timeout mechanism which interrupts tests to avoid the
-test suite of getting stuck. The timeout value can be set via test parameter or
-property defined in the test class, for further details::
-
- https://avocado-framework.readthedocs.io/en/latest/guides/writer/chapters/writing.html#setting-a-test-timeout
-
-Even though the timeout can be set by the test developer, there are some tests
-that may not have a well-defined limit of time to finish under certain
-conditions. For example, tests that take longer to execute when QEMU is
-compiled with debug flags. Therefore, the ``AVOCADO_TIMEOUT_EXPECTED`` variable
-has been used to determine whether those tests should run or not.
-
-QEMU_TEST_FLAKY_TESTS
-^^^^^^^^^^^^^^^^^^^^^
-Some tests are not working reliably and thus are disabled by default.
-This includes tests that don't run reliably on GitLab's CI which
-usually expose real issues that are rarely seen on developer machines
-due to the constraints of the CI environment. If you encounter a
-similar situation then raise a bug and then mark the test as shown on
-the code snippet below:
-
-.. code::
-
- # See https://gitlab.com/qemu-project/qemu/-/issues/nnnn
- @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab')
- def test(self):
- do_something()
-
-You can also add ``:avocado: tags=flaky`` to the test meta-data so
-only the flaky tests can be run as a group:
-
-.. code::
-
- env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado \
- run tests/avocado -filter-by-tags=flaky
-
-Tests should not live in this state forever and should either be fixed
-or eventually removed.
-
-
-Uninstalling Avocado
-~~~~~~~~~~~~~~~~~~~~
-
-If you've followed the manual installation instructions above, you can
-easily uninstall Avocado. Start by listing the packages you have
-installed::
-
- pip list --user
-
-And remove any package you want with::
-
- pip uninstall <package_name>
-
-If you've used ``make check-avocado``, the Python virtual environment where
-Avocado is installed will be cleaned up as part of ``make check-clean``.
.. _checktcg-ref:
@@ -1475,6 +933,19 @@ And run with::
Adding ``V=1`` to the invocation will show the details of how to
invoke QEMU for the test which is useful for debugging tests.
+Running individual tests
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+Tests can also be run directly from the test build directory. If you
+run ``make help`` from the test build directory you will get a list of
+all the tests that can be run. Please note that same binaries are used
+in multiple tests, for example::
+
+ make run-plugin-test-mmap-with-libinline.so
+
+will run the mmap test with the ``libinline.so`` TCG plugin. The
+gdbstub tests also re-use the test binaries but while exercising gdb.
+
TCG test dependencies
~~~~~~~~~~~~~~~~~~~~~
diff --git a/docs/devel/qgraph.rst b/docs/devel/testing/qgraph.rst
index 43342d9..43342d9 100644
--- a/docs/devel/qgraph.rst
+++ b/docs/devel/testing/qgraph.rst
diff --git a/docs/devel/qtest.rst b/docs/devel/testing/qtest.rst
index c5b8546..c5b8546 100644
--- a/docs/devel/qtest.rst
+++ b/docs/devel/testing/qtest.rst
diff --git a/docs/interop/firmware.json b/docs/interop/firmware.json
index 54a1fc6..57f55f6 100644
--- a/docs/interop/firmware.json
+++ b/docs/interop/firmware.json
@@ -14,8 +14,10 @@
# = Firmware
##
-{ 'include' : 'machine.json' }
-{ 'include' : 'block-core.json' }
+{ 'pragma': {
+ 'member-name-exceptions': [
+ 'FirmwareArchitecture' # x86_64
+ ] } }
##
# @FirmwareOSInterface:
@@ -61,6 +63,27 @@
'data' : [ 'flash', 'kernel', 'memory' ] }
##
+# @FirmwareArchitecture:
+#
+# Enumeration of architectures for which Qemu uses additional
+# firmware files.
+#
+# @aarch64: 64-bit Arm.
+#
+# @arm: 32-bit Arm.
+#
+# @i386: 32-bit x86.
+#
+# @loongarch64: 64-bit LoongArch. (since: 7.1)
+#
+# @x86_64: 64-bit x86.
+#
+# Since: 3.0
+##
+{ 'enum' : 'FirmwareArchitecture',
+ 'data' : [ 'aarch64', 'arm', 'i386', 'loongarch64', 'x86_64' ] }
+
+##
# @FirmwareTarget:
#
# Defines the machine types that firmware may execute on.
@@ -81,7 +104,7 @@
# Since: 3.0
##
{ 'struct' : 'FirmwareTarget',
- 'data' : { 'architecture' : 'SysEmuTarget',
+ 'data' : { 'architecture' : 'FirmwareArchitecture',
'machines' : [ 'str' ] } }
##
@@ -201,6 +224,20 @@
'verbose-dynamic', 'verbose-static' ] }
##
+# @FirmwareFormat:
+#
+# Formats that are supported for firmware images.
+#
+# @raw: Raw disk image format.
+#
+# @qcow2: The QCOW2 image format.
+#
+# Since: 3.0
+##
+{ 'enum': 'FirmwareFormat',
+ 'data': [ 'raw', 'qcow2' ] }
+
+##
# @FirmwareFlashFile:
#
# Defines common properties that are necessary for loading a firmware
@@ -219,7 +256,7 @@
##
{ 'struct' : 'FirmwareFlashFile',
'data' : { 'filename' : 'str',
- 'format' : 'BlockdevDriver' } }
+ 'format' : 'FirmwareFormat' } }
##
@@ -433,7 +470,7 @@
#
# Since: 3.0
#
-# Examples:
+# .. qmp-example::
#
# {
# "description": "SeaBIOS",
diff --git a/docs/interop/index.rst b/docs/interop/index.rst
index ed65395..999e44e 100644
--- a/docs/interop/index.rst
+++ b/docs/interop/index.rst
@@ -14,6 +14,9 @@ are useful for making QEMU interoperate with other software.
dbus-vmstate
dbus-display
live-block-operations
+ nbd
+ parallels
+ prl-xml
pr-helper
qmp-spec
qemu-ga
diff --git a/docs/interop/live-block-operations.rst b/docs/interop/live-block-operations.rst
index 691429c..6b549ed 100644
--- a/docs/interop/live-block-operations.rst
+++ b/docs/interop/live-block-operations.rst
@@ -931,8 +931,8 @@ Shutdown the guest, by issuing the ``quit`` QMP command::
}
-Live disk backup --- ``blockdev-backup`` and the deprecated``drive-backup``
----------------------------------------------------------------------------
+Live disk backup --- ``blockdev-backup`` and the deprecated ``drive-backup``
+----------------------------------------------------------------------------
The ``blockdev-backup`` (and the deprecated ``drive-backup``) allows
you to create a point-in-time snapshot.
diff --git a/docs/interop/nbd.rst b/docs/interop/nbd.rst
new file mode 100644
index 0000000..de079d3
--- /dev/null
+++ b/docs/interop/nbd.rst
@@ -0,0 +1,89 @@
+QEMU NBD protocol support
+=========================
+
+QEMU supports the NBD protocol, and has an internal NBD client (see
+``block/nbd.c``), an internal NBD server (see ``blockdev-nbd.c``), and an
+external NBD server tool (see ``qemu-nbd.c``). The common code is placed
+in ``nbd/*``.
+
+The NBD protocol is specified here:
+https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md
+
+The following paragraphs describe some specific properties of NBD
+protocol realization in QEMU.
+
+Metadata namespaces
+-------------------
+
+QEMU supports the ``base:allocation`` metadata context as defined in the
+NBD protocol specification, and also defines an additional metadata
+namespace ``qemu``.
+
+``qemu`` namespace
+------------------
+
+The ``qemu`` namespace currently contains two available metadata context
+types. The first is related to exposing the contents of a dirty
+bitmap alongside the associated disk contents. That metadata context
+is named with the following form::
+
+ qemu:dirty-bitmap:<dirty-bitmap-export-name>
+
+Each dirty-bitmap metadata context defines only one flag for extents
+in reply for ``NBD_CMD_BLOCK_STATUS``:
+
+bit 0:
+ ``NBD_STATE_DIRTY``, set when the extent is "dirty"
+
+The second is related to exposing the source of various extents within
+the image, with a single metadata context named::
+
+ qemu:allocation-depth
+
+In the allocation depth context, the entire 32-bit value represents a
+depth of which layer in a thin-provisioned backing chain provided the
+data (0 for unallocated, 1 for the active layer, 2 for the first
+backing layer, and so forth).
+
+For ``NBD_OPT_LIST_META_CONTEXT`` the following queries are supported
+in addition to the specific ``qemu:allocation-depth`` and
+``qemu:dirty-bitmap:<dirty-bitmap-export-name>``:
+
+``qemu:``
+ returns list of all available metadata contexts in the namespace
+``qemu:dirty-bitmap:``
+ returns list of all available dirty-bitmap metadata contexts
+
+Features by version
+-------------------
+
+The following list documents which qemu version first implemented
+various features (both as a server exposing the feature, and as a
+client taking advantage of the feature when present), to make it
+easier to plan for cross-version interoperability. Note that in
+several cases, the initial release containing a feature may require
+additional patches from the corresponding stable branch to fix bugs in
+the operation of that feature.
+
+2.6
+ ``NBD_OPT_STARTTLS`` with TLS X.509 Certificates
+2.8
+ ``NBD_CMD_WRITE_ZEROES``
+2.10
+ ``NBD_OPT_GO``, ``NBD_INFO_BLOCK``
+2.11
+ ``NBD_OPT_STRUCTURED_REPLY``
+2.12
+ ``NBD_CMD_BLOCK_STATUS`` for ``base:allocation``
+3.0
+ ``NBD_OPT_STARTTLS`` with TLS Pre-Shared Keys (PSK),
+ ``NBD_CMD_BLOCK_STATUS`` for ``qemu:dirty-bitmap:``, ``NBD_CMD_CACHE``
+4.2
+ ``NBD_FLAG_CAN_MULTI_CONN`` for shareable read-only exports,
+ ``NBD_CMD_FLAG_FAST_ZERO``
+5.2
+ ``NBD_CMD_BLOCK_STATUS`` for ``qemu:allocation-depth``
+7.1
+ ``NBD_FLAG_CAN_MULTI_CONN`` for shareable writable exports
+8.2
+ ``NBD_OPT_EXTENDED_HEADERS``, ``NBD_FLAG_BLOCK_STATUS_PAYLOAD``
diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
deleted file mode 100644
index 18efb25..0000000
--- a/docs/interop/nbd.txt
+++ /dev/null
@@ -1,72 +0,0 @@
-QEMU supports the NBD protocol, and has an internal NBD client (see
-block/nbd.c), an internal NBD server (see blockdev-nbd.c), and an
-external NBD server tool (see qemu-nbd.c). The common code is placed
-in nbd/*.
-
-The NBD protocol is specified here:
-https://github.com/NetworkBlockDevice/nbd/blob/master/doc/proto.md
-
-The following paragraphs describe some specific properties of NBD
-protocol realization in QEMU.
-
-= Metadata namespaces =
-
-QEMU supports the "base:allocation" metadata context as defined in the
-NBD protocol specification, and also defines an additional metadata
-namespace "qemu".
-
-== "qemu" namespace ==
-
-The "qemu" namespace currently contains two available metadata context
-types. The first is related to exposing the contents of a dirty
-bitmap alongside the associated disk contents. That metadata context
-is named with the following form:
-
- qemu:dirty-bitmap:<dirty-bitmap-export-name>
-
-Each dirty-bitmap metadata context defines only one flag for extents
-in reply for NBD_CMD_BLOCK_STATUS:
-
- bit 0: NBD_STATE_DIRTY, set when the extent is "dirty"
-
-The second is related to exposing the source of various extents within
-the image, with a single metadata context named:
-
- qemu:allocation-depth
-
-In the allocation depth context, the entire 32-bit value represents a
-depth of which layer in a thin-provisioned backing chain provided the
-data (0 for unallocated, 1 for the active layer, 2 for the first
-backing layer, and so forth).
-
-For NBD_OPT_LIST_META_CONTEXT the following queries are supported
-in addition to the specific "qemu:allocation-depth" and
-"qemu:dirty-bitmap:<dirty-bitmap-export-name>":
-
-* "qemu:" - returns list of all available metadata contexts in the
- namespace.
-* "qemu:dirty-bitmap:" - returns list of all available dirty-bitmap
- metadata contexts.
-
-= Features by version =
-
-The following list documents which qemu version first implemented
-various features (both as a server exposing the feature, and as a
-client taking advantage of the feature when present), to make it
-easier to plan for cross-version interoperability. Note that in
-several cases, the initial release containing a feature may require
-additional patches from the corresponding stable branch to fix bugs in
-the operation of that feature.
-
-* 2.6: NBD_OPT_STARTTLS with TLS X.509 Certificates
-* 2.8: NBD_CMD_WRITE_ZEROES
-* 2.10: NBD_OPT_GO, NBD_INFO_BLOCK
-* 2.11: NBD_OPT_STRUCTURED_REPLY
-* 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation"
-* 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK),
-NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
-* 4.2: NBD_FLAG_CAN_MULTI_CONN for shareable read-only exports,
-NBD_CMD_FLAG_FAST_ZERO
-* 5.2: NBD_CMD_BLOCK_STATUS for "qemu:allocation-depth"
-* 7.1: NBD_FLAG_CAN_MULTI_CONN for shareable writable exports
-* 8.2: NBD_OPT_EXTENDED_HEADERS, NBD_FLAG_BLOCK_STATUS_PAYLOAD
diff --git a/docs/interop/parallels.txt b/docs/interop/parallels.rst
index bb3fadf..7b328a4 100644
--- a/docs/interop/parallels.txt
+++ b/docs/interop/parallels.rst
@@ -1,41 +1,46 @@
-= License =
+Parallels Expandable Image File Format
+======================================
-Copyright (c) 2015 Denis Lunev
-Copyright (c) 2015 Vladimir Sementsov-Ogievskiy
+..
+ Copyright (c) 2015 Denis Lunev
+ Copyright (c) 2015 Vladimir Sementsov-Ogievskiy
-This work is licensed under the terms of the GNU GPL, version 2 or later.
-See the COPYING file in the top-level directory.
+ This work is licensed under the terms of the GNU GPL, version 2 or later.
+ See the COPYING file in the top-level directory.
-= Parallels Expandable Image File Format =
A Parallels expandable image file consists of three consecutive parts:
- * header
- * BAT
- * data area
+
+* header
+* BAT
+* data area
All numbers in a Parallels expandable image are stored in little-endian byte
order.
-== Definitions ==
-
- Sector A 512-byte data chunk.
+Definitions
+-----------
- Cluster A data chunk of the size specified in the image header.
- Currently, the default size is 1MiB (2048 sectors). In previous
- versions, cluster sizes of 63 sectors, 256 and 252 kilobytes were
- used.
+Sector
+ A 512-byte data chunk.
- BAT Block Allocation Table, an entity that contains information for
- guest-to-host I/O data address translation.
+Cluster
+ A data chunk of the size specified in the image header.
+ Currently, the default size is 1MiB (2048 sectors). In previous
+ versions, cluster sizes of 63 sectors, 256 and 252 kilobytes were used.
+BAT
+ Block Allocation Table, an entity that contains information for
+ guest-to-host I/O data address translation.
-== Header ==
+Header
+------
The header is placed at the start of an image and contains the following
-fields:
+fields::
-Bytes:
+ Bytes:
0 - 15: magic
Must contain "WithoutFreeSpace" or "WithouFreSpacExt".
@@ -103,44 +108,46 @@ Bytes:
ext_off must meet the same requirements as cluster offsets
defined by BAT entries (see below).
-
-== BAT ==
+BAT
+---
BAT is placed immediately after the image header. In the file, BAT is a
contiguous array of 32-bit unsigned little-endian integers with
-(bat_entries * 4) bytes size.
+``(bat_entries * 4)`` bytes size.
Each BAT entry contains an offset from the start of the file to the
-corresponding cluster. The offset set in clusters for "WithouFreSpacExt" images
-and in sectors for "WithoutFreeSpace" images.
+corresponding cluster. The offset set in clusters for ``WithouFreSpacExt``
+images and in sectors for ``WithoutFreeSpace`` images.
If a BAT entry is zero, the corresponding cluster is not allocated and should
be considered as filled with zeroes.
Cluster offsets specified by BAT entries must meet the following requirements:
- - the value must not be lower than data offset (provided by header.data_off
- or calculated as specified above),
- - the value must be lower than the desired file size,
- - the value must be unique among all BAT entries,
- - the result of (cluster offset - data offset) must be aligned to cluster
- size.
+- the value must not be lower than data offset (provided by ``header.data_off``
+ or calculated as specified above)
+- the value must be lower than the desired file size
+- the value must be unique among all BAT entries
+- the result of ``(cluster offset - data offset)`` must be aligned to
+ cluster size
-== Data Area ==
+Data Area
+---------
-The data area is an area from the data offset (provided by header.data_off or
-calculated as specified above) to the end of the file. It represents a
+The data area is an area from the data offset (provided by ``header.data_off``
+or calculated as specified above) to the end of the file. It represents a
contiguous array of clusters. Most of them are allocated by the BAT, some may
-be allocated by the ext_off field in the header while other may be allocated by
-extensions. All clusters allocated by ext_off and extensions should meet the
-same requirements as clusters specified by BAT entries.
+be allocated by the ``ext_off`` field in the header while other may be
+allocated by extensions. All clusters allocated by ``ext_off`` and extensions
+should meet the same requirements as clusters specified by BAT entries.
-== Format Extension ==
+Format Extension
+----------------
The Format Extension is an area 1 cluster in size that provides additional
format features. This cluster is addressed by the ext_off field in the header.
-The format of the Format Extension area is the following:
+The format of the Format Extension area is the following::
0 - 7: magic
Must be 0xAB234CEF23DCEA87
@@ -149,10 +156,10 @@ The format of the Format Extension area is the following:
The MD5 checksum of the entire Header Extension cluster except
the first 24 bytes.
- The above are followed by feature sections or "extensions". The last
- extension must be "End of features" (see below).
+The above are followed by feature sections or "extensions". The last
+extension must be "End of features" (see below).
-Each feature section has the following format:
+Each feature section has the following format::
0 - 7: magic
The identifier of the feature:
@@ -183,16 +190,17 @@ Each feature section has the following format:
variable: data (data_size bytes)
- The above is followed by padding to the next 8 bytes boundary, then the
- next extension starts.
+The above is followed by padding to the next 8 bytes boundary, then the
+next extension starts.
- The last extension must be "End of features" with all the fields set to 0.
+The last extension must be "End of features" with all the fields set to 0.
-=== Dirty bitmaps feature ===
+Dirty bitmaps feature
+---------------------
This feature provides a way of storing dirty bitmaps in the image. The fields
-of its data area are:
+of its data area are::
0 - 7: size
The bitmap size, should be equal to disk size in sectors.
@@ -215,7 +223,7 @@ clusters inside the Parallels image file. The offsets of these clusters are
saved in the L1 offset table specified by the feature extension. Each L1 table
entry is a 64 bit integer as described below:
-Given an offset in bytes into the bitmap data, corresponding L1 entry is
+Given an offset in bytes into the bitmap data, corresponding L1 entry is::
l1_table[offset / cluster_size]
@@ -227,6 +235,6 @@ are assumed to be 1.
If an L1 table entry is not 0 or 1, it contains the corresponding cluster
offset (in 512b sectors). Given an offset in bytes into the bitmap data the
-offset in bytes into the image file can be obtained as follows:
+offset in bytes into the image file can be obtained as follows::
offset = l1_table[offset / cluster_size] * 512 + (offset % cluster_size)
diff --git a/docs/interop/prl-xml.rst b/docs/interop/prl-xml.rst
new file mode 100644
index 0000000..5bb63bb
--- /dev/null
+++ b/docs/interop/prl-xml.rst
@@ -0,0 +1,192 @@
+Parallels Disk Format
+=====================
+
+..
+ Copyright (c) 2015-2017, Virtuozzo, Inc.
+ Authors:
+ 2015 Denis Lunev <den@openvz.org>
+ 2015 Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
+ 2016-2017 Klim Kireev <klim.kireev@virtuozzo.com>
+ 2016-2017 Edgar Kaziakhmedov <edgar.kaziakhmedov@virtuozzo.com>
+
+ This work is licensed under the terms of the GNU GPL, version 2 or later.
+ See the COPYING file in the top-level directory.
+
+This specification contains minimal information about Parallels Disk Format,
+which is enough to properly work with QEMU. Nevertheless, Parallels Cloud Server
+and Parallels Desktop are able to add some unspecified nodes to the xml and use
+them, but they are for internal work and don't affect functionality. Also it
+uses auxiliary xml ``Snapshot.xml``, which allows storage of optional snapshot
+information, but this doesn't influence open/read/write functionality. QEMU and
+other software should not use fields not covered in this document or the
+``Snapshot.xml`` file, and must leave them as is.
+
+A Parallels disk consists of two parts: the set of snapshots and the disk
+descriptor file, which stores information about all files and snapshots.
+
+Definitions
+-----------
+
+Snapshot
+ a record of the contents captured at a particular time, capable
+ of storing current state. A snapshot has a UUID and a parent UUID.
+
+Snapshot image
+ an overlay representing the difference between this
+ snapshot and some earlier snapshot.
+
+Overlay
+ an image storing the different sectors between two captured states.
+
+Root image
+ a snapshot image with no parent, the root of the snapshot tree.
+
+Storage
+ the backing storage for a subset of the virtual disk. When
+ there is more than one storage in a Parallels disk then that
+ is referred to as a split image. In this case every storage
+ covers a specific address space area of the disk and has its
+ particular root image. Split images are not considered here
+ and are not supported. Each storage consists of disk
+ parameters and a list of images. The list of images always
+ contains a root image and may also contain overlays. The
+ root image can be an expandable Parallels image file or
+ plain. Overlays must be expandable.
+
+Description file
+ ``DiskDescriptor.xml`` stores information about disk parameters,
+ snapshots, and storages.
+
+Top Snapshot
+ The overlay between actual state and some previous snapshot.
+ It is not a snapshot in the classical sense because it
+ serves as the active image that the guest writes to.
+
+Sector
+ a 512-byte data chunk.
+
+Description file
+----------------
+
+All information is placed in a single XML element
+``Parallels_disk_image``.
+The element has only one attribute, ``Version``, which must be ``1.0``.
+
+The schema of ``DiskDescriptor.xml``::
+
+ <Parallels_disk_image Version="1.0">
+ <Disk_Parameters>
+ ...
+ </Disk_Parameters>
+ <StorageData>
+ ...
+ </StorageData>
+ <Snapshots>
+ ...
+ </Snapshots>
+ </Parallels_disk_image>
+
+``Disk_Parameters`` element
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``Disk_Parameters`` element describes the physical layout of the
+virtual disk and some general settings.
+
+The ``Disk_Parameters`` element MUST contain the following child elements:
+
+* ``Disk_size`` - number of sectors in the disk,
+ desired size of the disk.
+* ``Cylinders`` - number of the disk cylinders.
+* ``Heads`` - number of the disk heads.
+* ``Sectors`` - number of the disk sectors per cylinder
+ (sector size is 512 bytes)
+ Limitation: The product of the ``Heads``, ``Sectors`` and ``Cylinders``
+ values MUST be equal to the value of the Disk_size parameter.
+* ``Padding`` - must be 0. Parallels Cloud Server and Parallels Desktop may
+ use padding set to 1; however this case is not covered
+ by this specification. QEMU and other software should not open
+ such disks and should not create them.
+
+``StorageData`` element
+^^^^^^^^^^^^^^^^^^^^^^^
+
+This element of the file describes the root image and all snapshot images.
+
+The ``StorageData`` element consists of the ``Storage`` child element,
+as shown below::
+
+ <StorageData>
+ <Storage>
+ ...
+ </Storage>
+ </StorageData>
+
+A ``Storage`` element has the following child elements:
+
+* ``Start`` - start sector of the storage, in case of non split storage
+ equals to 0.
+* ``End`` - number of sector following the last sector, in case of non
+ split storage equals to ``Disk_size``.
+* ``Blocksize`` - storage cluster size, number of sectors per one cluster.
+ The cluster size for each "Compressed" (see below) image in
+ a parallels disk must be equal to this field. Note: the cluster
+ size for a Parallels Expandable Image is in the ``tracks`` field of
+ its header (see :doc:`parallels`).
+* Several ``Image`` child elements.
+
+Each ``Image`` element has the following child elements:
+
+* ``GUID`` - image identifier, UUID in curly brackets.
+ For instance, ``{12345678-9abc-def1-2345-6789abcdef12}.``
+ The GUID is used by the Snapshots element to reference images
+ (see below)
+* ``Type`` - image type of the element. It can be:
+
+ * ``Plain`` for raw files.
+ * ``Compressed`` for expanding disks.
+
+* ``File`` - path to image file. The path can be relative to
+ ``DiskDescriptor.xml`` or absolute.
+
+``Snapshots`` element
+^^^^^^^^^^^^^^^^^^^^^
+
+The ``Snapshots`` element describes the snapshot relations with the snapshot tree.
+
+The element contains the set of ``Shot`` child elements, as shown below::
+
+ <Snapshots>
+ <TopGUID> ... </TopGUID> /* Optional child element */
+ <Shot>
+ ...
+ </Shot>
+ <Shot>
+ ...
+ </Shot>
+ ...
+ </Snapshots>
+
+Each ``Shot`` element contains the following child elements:
+
+* ``GUID`` - an image GUID.
+* ``ParentGUID`` - GUID of the image of the parent snapshot.
+
+The software may traverse snapshots from child to parent using the
+``<ParentGUID>`` field as reference. The ``ParentGUID`` of the root
+snapshot is ``{00000000-0000-0000-0000-000000000000}``.
+There should be only one root snapshot.
+
+The Top snapshot could be
+described via two ways: via the ``TopGUID`` child
+element of the ``Snapshots`` element, or via the predefined GUID
+``{5fbaabe3-6958-40ff-92a7-860e329aab41}``. If ``TopGUID`` is defined,
+the predefined GUID is interpreted as a normal GUID. All snapshot images
+(except the Top Snapshot) should be
+opened read-only.
+
+There is another predefined GUID,
+``BackupID = {704718e1-2314-44c8-9087-d78ed36b0f4e}``, which is used by
+original and some third-party software for backup. QEMU and other
+software may operate with images with ``GUID = BackupID`` as usual.
+However, it is not recommended to use this
+GUID for new disks. The Top snapshot cannot have this GUID.
diff --git a/docs/interop/prl-xml.txt b/docs/interop/prl-xml.txt
deleted file mode 100644
index cf9b3fb..0000000
--- a/docs/interop/prl-xml.txt
+++ /dev/null
@@ -1,158 +0,0 @@
-= License =
-
-Copyright (c) 2015-2017, Virtuozzo, Inc.
-Authors:
- 2015 Denis Lunev <den@openvz.org>
- 2015 Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
- 2016-2017 Klim Kireev <klim.kireev@virtuozzo.com>
- 2016-2017 Edgar Kaziakhmedov <edgar.kaziakhmedov@virtuozzo.com>
-
-This work is licensed under the terms of the GNU GPL, version 2 or later.
-See the COPYING file in the top-level directory.
-
-This specification contains minimal information about Parallels Disk Format,
-which is enough to proper work with QEMU. Nevertheless, Parallels Cloud Server
-and Parallels Desktop are able to add some unspecified nodes to xml and use
-them, but they are for internal work and don't affect functionality. Also it
-uses auxiliary xml "Snapshot.xml", which allows to store optional snapshot
-information, but it doesn't influence open/read/write functionality. QEMU and
-other software should not use fields not covered in this document and
-Snapshot.xml file and must leave them as is.
-
-= Parallels Disk Format =
-
-Parallels disk consists of two parts: the set of snapshots and the disk
-descriptor file, which stores information about all files and snapshots.
-
-== Definitions ==
- Snapshot a record of the contents captured at a particular time,
- capable of storing current state. A snapshot has UUID and
- parent UUID.
-
- Snapshot image an overlay representing the difference between this
- snapshot and some earlier snapshot.
-
- Overlay an image storing the different sectors between two captured
- states.
-
- Root image snapshot image with no parent, the root of snapshot tree.
-
- Storage the backing storage for a subset of the virtual disk. When
- there is more than one storage in a Parallels disk then that
- is referred to as a split image. In this case every storage
- covers specific address space area of the disk and has its
- particular root image. Split images are not considered here
- and are not supported. Each storage consists of disk
- parameters and a list of images. The list of images always
- contains a root image and may also contain overlays. The
- root image can be an expandable Parallels image file or
- plain. Overlays must be expandable.
-
- Description DiskDescriptor.xml stores information about disk parameters,
- file snapshots, storages.
-
- Top The overlay between actual state and some previous snapshot.
- Snapshot It is not a snapshot in the classical sense because it
- serves as the active image that the guest writes to.
-
- Sector a 512-byte data chunk.
-
-== Description file ==
-All information is placed in a single XML element Parallels_disk_image.
-The element has only one attribute "Version", that must be 1.0.
-Schema of DiskDescriptor.xml:
-
-<Parallels_disk_image Version="1.0">
- <Disk_Parameters>
- ...
- </Disk_Parameters>
- <StorageData>
- ...
- </StorageData>
- <Snapshots>
- ...
- </Snapshots>
-</Parallels_disk_image>
-
-== Disk_Parameters element ==
-The Disk_Parameters element describes the physical layout of the virtual disk
-and some general settings.
-
-The Disk_Parameters element MUST contain the following child elements:
- * Disk_size - number of sectors in the disk,
- desired size of the disk.
- * Cylinders - number of the disk cylinders.
- * Heads - number of the disk heads.
- * Sectors - number of the disk sectors per cylinder
- (sector size is 512 bytes)
- Limitation: Product of the Heads, Sectors and Cylinders
- values MUST be equal to the value of the Disk_size parameter.
- * Padding - must be 0. Parallels Cloud Server and Parallels Desktop may
- use padding set to 1, however this case is not covered
- by this spec, QEMU and other software should not open
- such disks and should not create them.
-
-== StorageData element ==
-This element of the file describes the root image and all snapshot images.
-
-The StorageData element consists of the Storage child element, as shown below:
-<StorageData>
- <Storage>
- ...
- </Storage>
-</StorageData>
-
-A Storage element has following child elements:
- * Start - start sector of the storage, in case of non split storage
- equals to 0.
- * End - number of sector following the last sector, in case of non
- split storage equals to Disk_size.
- * Blocksize - storage cluster size, number of sectors per one cluster.
- Cluster size for each "Compressed" (see below) image in
- parallels disk must be equal to this field. Note: cluster
- size for Parallels Expandable Image is in 'tracks' field of
- its header (see docs/interop/parallels.txt).
- * Several Image child elements.
-
-Each Image element has following child elements:
- * GUID - image identifier, UUID in curly brackets.
- For instance, {12345678-9abc-def1-2345-6789abcdef12}.
- The GUID is used by the Snapshots element to reference images
- (see below)
- * Type - image type of the element. It can be:
- "Plain" for raw files.
- "Compressed" for expanding disks.
- * File - path to image file. Path can be relative to DiskDescriptor.xml or
- absolute.
-
-== Snapshots element ==
-The Snapshots element describes the snapshot relations with the snapshot tree.
-
-The element contains the set of Shot child elements, as shown below:
-<Snapshots>
- <TopGUID> ... </TopGUID> /* Optional child element */
- <Shot>
- ...
- </Shot>
- <Shot>
- ...
- </Shot>
- ...
-</Snapshots>
-
-Each Shot element contains the following child elements:
- * GUID - an image GUID.
- * ParentGUID - GUID of the image of the parent snapshot.
-
-The software may traverse snapshots from child to parent using <ParentGUID>
-field as reference. ParentGUID of root snapshot is
-{00000000-0000-0000-0000-000000000000}. There should be only one root
-snapshot. Top snapshot could be described via two ways: via TopGUID child
-element of the Snapshots element or via predefined GUID
-{5fbaabe3-6958-40ff-92a7-860e329aab41}. If TopGUID is defined, predefined GUID is
-interpreted as usual GUID. All snapshot images (except Top Snapshot) should be
-opened read-only. There is another predefined GUID,
-BackupID = {704718e1-2314-44c8-9087-d78ed36b0f4e}, which is used by original and
-some third-party software for backup, QEMU and other software may operate with
-images with GUID = BackupID as usual, however, it is not recommended to use this
-GUID for new disks. Top snapshot cannot have this GUID.
diff --git a/docs/interop/qemu-ga.rst b/docs/interop/qemu-ga.rst
index 72fb75a..11f7bae 100644
--- a/docs/interop/qemu-ga.rst
+++ b/docs/interop/qemu-ga.rst
@@ -28,11 +28,30 @@ configuration options on the command line. For the same key, the last
option wins, but the lists accumulate (see below for configuration
file format).
+If an allowed RPCs list is defined in the configuration, then all
+RPCs will be blocked by default, except for the allowed list.
+
+If a blocked RPCs list is defined in the configuration, then all
+RPCs will be allowed by default, except for the blocked list.
+
+If both allowed and blocked RPCs lists are defined in the configuration,
+then all RPCs will be blocked by default, then the allowed list will
+be applied, followed by the blocked list.
+
+While filesystems are frozen, all except for a designated safe set
+of RPCs will blocked, regardless of what the general configuration
+declares.
+
Options
-------
.. program:: qemu-ga
+.. option:: -c, --config=PATH
+
+ Configuration file path (the default is |CONFDIR|\ ``/qemu-ga.conf``,
+ unless overridden by the QGA_CONF environment variable)
+
.. option:: -m, --method=METHOD
Transport method: one of ``unix-listen``, ``virtio-serial``, or
@@ -131,6 +150,7 @@ fsfreeze-hook string
statedir string
verbose boolean
block-rpcs string list
+allow-rpcs string list
============= ===========
See also
diff --git a/docs/meson.build b/docs/meson.build
index 9040f86..3676f81 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -54,7 +54,6 @@ if build_docs
'qemu-pr-helper.8': (have_tools ? 'man8' : ''),
'qemu-storage-daemon.1': (have_tools ? 'man1' : ''),
'qemu-trace-stap.1': (stap.found() ? 'man1' : ''),
- 'virtfs-proxy-helper.1': (have_virtfs_proxy_helper ? 'man1' : ''),
'qemu.1': 'man1',
'qemu-block-drivers.7': 'man7',
'qemu-cpu-models.7': 'man7'
@@ -99,3 +98,8 @@ if build_docs
alias_target('html', sphinxdocs)
alias_target('man', sphinxmans)
endif
+
+test('QAPI firmware.json regression tests', qapi_gen,
+ args: ['-o', meson.current_build_dir() / 'qapi',
+ meson.current_source_dir() / 'interop/firmware.json'],
+ suite: ['qapi-schema', 'qapi-interop'])
diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index ab21428..a47aad0 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -52,11 +52,9 @@ setting up a BAR for a VF.
...
/* Add and initialize the SR/IOV capability */
- if (!pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
- vf_devid, initial_vfs, total_vfs,
- fun_offset, stride, errp)) {
- return;
- }
+ pcie_sriov_pf_init(d, 0x200, "your_virtual_dev",
+ vf_devid, initial_vfs, total_vfs,
+ fun_offset, stride);
/* Set up individual VF BARs (parameters as for normal BARs) */
pcie_sriov_pf_init_vf_bar( ... )
diff --git a/docs/specs/acpi_hw_reduced_hotplug.rst b/docs/specs/acpi_hw_reduced_hotplug.rst
index 0bd3f93..3acd6fc 100644
--- a/docs/specs/acpi_hw_reduced_hotplug.rst
+++ b/docs/specs/acpi_hw_reduced_hotplug.rst
@@ -64,7 +64,8 @@ GED IO interface (4 byte access)
0: Memory hotplug event
1: System power down event
2: NVDIMM hotplug event
- 3-31: Reserved
+ 3: CPU hotplug event
+ 4-31: Reserved
**write_access:**
diff --git a/docs/specs/fw_cfg.rst b/docs/specs/fw_cfg.rst
index 5ad47a9..31ae315 100644
--- a/docs/specs/fw_cfg.rst
+++ b/docs/specs/fw_cfg.rst
@@ -54,11 +54,11 @@ Data Register
-------------
* Read/Write (writes ignored as of QEMU v2.4, but see the DMA interface)
-* Location: platform dependent (IOport [#]_ or MMIO)
+* Location: platform dependent (IOport\ [#placement]_ or MMIO)
* Width: 8-bit (if IOport), 8/16/32/64-bit (if MMIO)
* Endianness: string-preserving
-.. [#]
+.. [#placement]
On platforms where the data register is exposed as an IOport, its
port number will always be one greater than the port number of the
selector register. In other words, the two ports overlap, and can not
diff --git a/docs/specs/index.rst b/docs/specs/index.rst
index 1484e3e..6495ed5 100644
--- a/docs/specs/index.rst
+++ b/docs/specs/index.rst
@@ -29,7 +29,10 @@ guest hardware that is specific to QEMU.
edu
ivshmem-spec
pvpanic
+ spdm
standard-vga
virt-ctlr
vmcoreinfo
vmgenid
+ rapl-msr
+ rocker
diff --git a/docs/specs/pci-ids.rst b/docs/specs/pci-ids.rst
index c0a3dec..328ab31 100644
--- a/docs/specs/pci-ids.rst
+++ b/docs/specs/pci-ids.rst
@@ -77,13 +77,17 @@ PCI devices (other than virtio):
1b36:0008
PCIe host bridge
1b36:0009
- PCI Expander Bridge (-device pxb)
+ PCI Expander Bridge (``-device pxb``)
1b36:000a
PCI-PCI bridge (multiseat)
1b36:000b
- PCIe Expander Bridge (-device pxb-pcie)
+ PCIe Expander Bridge (``-device pxb-pcie``)
+1b36:000c
+ PCIe Root Port (``-device pcie-root-port``)
1b36:000d
PCI xhci usb host adapter
+1b36:000e
+ PCIe-to-PCI bridge (``-device pcie-pci-bridge``)
1b36:000f
mdpy (mdev sample device), ``linux/samples/vfio-mdev/mdpy.c``
1b36:0010
diff --git a/docs/specs/rapl-msr.rst b/docs/specs/rapl-msr.rst
new file mode 100644
index 0000000..aaf0db9
--- /dev/null
+++ b/docs/specs/rapl-msr.rst
@@ -0,0 +1,154 @@
+================
+RAPL MSR support
+================
+
+The RAPL interface (Running Average Power Limit) is advertising the accumulated
+energy consumption of various power domains (e.g. CPU packages, DRAM, etc.).
+
+The consumption is reported via MSRs (model specific registers) like
+MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are 64 bits
+registers that represent the accumulated energy consumption in micro Joules.
+
+Thanks to KVM's `MSR filtering <msr-filter-patch_>`__ functionality,
+not all MSRs are handled by KVM. Some of them can now be handled by the
+userspace (QEMU); a list of MSRs is given at VM creation time to KVM, and
+a userspace exit occurs when they are accessed.
+
+.. _msr-filter-patch: https://patchwork.kernel.org/project/kvm/patch/20200916202951.23760-7-graf@amazon.com/
+
+At the moment the following MSRs are involved:
+
+.. code:: C
+
+ #define MSR_RAPL_POWER_UNIT 0x00000606
+ #define MSR_PKG_POWER_LIMIT 0x00000610
+ #define MSR_PKG_ENERGY_STATUS 0x00000611
+ #define MSR_PKG_POWER_INFO 0x00000614
+
+The ``*_POWER_UNIT``, ``*_POWER_LIMIT``, ``*_POWER INFO`` are part of the RAPL
+spec and specify the power limit of the package, provide range of parameter(min
+power, max power,..) and also the information of the multiplier for the energy
+counter to calculate the power. Those MSRs are populated once at the beginning
+by reading the host CPU MSRs and are given back to the guest 1:1 when
+requested.
+
+The MSR_PKG_ENERGY_STATUS is a counter; it represents the total amount of
+energy consumed since the last time the register was cleared. If you multiply
+it with the UNIT provided above you'll get the power in micro-joules. This
+counter is always increasing and it increases more or less faster depending on
+the consumption of the package. This counter is supposed to overflow at some
+point.
+
+Each core belonging to the same Package reading the MSR_PKG_ENERGY_STATUS (i.e
+"rdmsr 0x611") will retrieve the same value. The value represents the energy
+for the whole package. Whatever Core reading it will get the same value and a
+core that belongs to PKG-0 will not be able to get the value of PKG-1 and
+vice-versa.
+
+High level implementation
+-------------------------
+
+In order to update the value of the virtual MSR, a QEMU thread is created.
+The thread is basically just an infinity loop that does:
+
+1. Snapshot of the time metrics of all QEMU threads (Time spent scheduled in
+ Userspace and System)
+
+2. Snapshot of the actual MSR_PKG_ENERGY_STATUS counter of all packages where
+ the QEMU threads are running on.
+
+3. Sleep for 1 second - During this pause the vcpu and other non-vcpu threads
+ will do what they have to do and so the energy counter will increase.
+
+4. Repeat 2. and 3. and calculate the delta of every metrics representing the
+ time spent scheduled for each QEMU thread *and* the energy spent by the
+ packages during the pause.
+
+5. Filter the vcpu threads and the non-vcpu threads.
+
+6. Retrieve the topology of the Virtual Machine. This helps identify which
+ vCPU is running on which virtual package.
+
+7. The total energy spent by the non-vcpu threads is divided by the number
+ of vcpu threads so that each vcpu thread will get an equal part of the
+ energy spent by the QEMU workers.
+
+8. Calculate the ratio of energy spent per vcpu threads.
+
+9. Calculate the energy for each virtual package.
+
+10. The virtual MSRs are updated for each virtual package. Each vCPU that
+ belongs to the same package will return the same value when accessing the
+ the MSR.
+
+11. Loop back to 1.
+
+Ratio calculation
+-----------------
+
+In Linux, a process has an execution time associated with it. The scheduler is
+dividing the time in clock ticks. The number of clock ticks per second can be
+found by the sysconf system call. A typical value of clock ticks per second is
+100. So a core can run a process at the maximum of 100 ticks per second. If a
+package has 4 cores, 400 ticks maximum can be scheduled on all the cores
+of the package for a period of 1 second.
+
+`/proc/[pid]/stat <stat_>`__ is a procfs file that can give the executed
+time of a process with the [pid] as the process ID. It gives the amount
+of ticks the process has been scheduled in userspace (utime) and kernel
+space (stime).
+
+.. _stat: https://man7.org/linux/man-pages/man5/proc.5.html
+
+By reading those metrics for a thread, one can calculate the ratio of time the
+package has spent executing the thread.
+
+Example:
+
+A 4 cores package can schedule a maximum of 400 ticks per second with 100 ticks
+per second per core. If a thread was scheduled for 100 ticks between a second
+on this package, that means my thread has been scheduled for 1/4 of the whole
+package. With that, the calculation of the energy spent by the thread on this
+package during this whole second is 1/4 of the total energy spent by the
+package.
+
+Usage
+-----
+
+Currently this feature is only working on an Intel CPU that has the RAPL driver
+mounted and available in the sysfs. if not, QEMU fails at start-up.
+
+This feature is activated with -accel
+kvm,rapl=true,rapl-helper-socket=/path/sock.sock
+
+It is important that the socket path is the same as the one
+:program:`qemu-vmsr-helper` is listening to.
+
+qemu-vmsr-helper
+----------------
+
+The qemu-vmsr-helper is working very much like the qemu-pr-helper. Instead of
+making persistent reservation, qemu-vmsr-helper is here to overcome the
+CVE-2020-8694 which remove user access to the rapl msr attributes.
+
+A socket communication is established between QEMU processes that has the RAPL
+MSR support activated and the qemu-vmsr-helper. A systemd service and socket
+activation is provided in contrib/systemd/qemu-vmsr-helper.(service/socket).
+
+The systemd socket uses 600, like contrib/systemd/qemu-pr-helper.socket. The
+socket can be passed via SCM_RIGHTS by libvirt, or its permissions can be
+changed (e.g. 660 and root:kvm for a Debian system for example). Libvirt could
+also start a separate helper if needed. All in all, the policy is left to the
+user.
+
+See the qemu-pr-helper documentation or manpage for further details.
+
+Current Limitations
+-------------------
+
+- Works only on Intel host CPUs because AMD CPUs are using different MSR
+ addresses.
+
+- Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at the
+ moment.
+
diff --git a/docs/specs/rocker.txt b/docs/specs/rocker.rst
index 1857b31..3a7fc6a 100644
--- a/docs/specs/rocker.txt
+++ b/docs/specs/rocker.rst
@@ -1,23 +1,23 @@
Rocker Network Switch Register Programming Guide
-Copyright (c) Scott Feldman <sfeldma@gmail.com>
-Copyright (c) Neil Horman <nhorman@tuxdriver.com>
-Version 0.11, 12/29/2014
+************************************************
-LICENSE
-=======
+..
+ Copyright (c) Scott Feldman <sfeldma@gmail.com>
+ Copyright (c) Neil Horman <nhorman@tuxdriver.com>
+ Version 0.11, 12/29/2014
-This program is free software; you can redistribute it and/or modify
-it under the terms of the GNU General Public License as published by
-the Free Software Foundation; either version 2 of the License, or
-(at your option) any later version.
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
-GNU General Public License for more details.
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
-SECTION 1: Introduction
-=======================
+Introduction
+============
Overview
--------
@@ -29,25 +29,25 @@ software.
Notations and Conventions
-------------------------
-o In register descriptions, [n:m] indicates a range from bit n to bit m,
-inclusive.
-o Use of leading 0x indicates a hexadecimal number.
-o Use of leading 0b indicates a binary number.
-o The use of RSVD or Reserved indicates that a bit or field is reserved for
-future use.
-o Field width is in bytes, unless otherwise noted.
-o Register are (R) read-only, (R/W) read/write, (W) write-only, or (COR) clear
-on read
-o TLV values in network-byte-order are designated with (N).
+* In register descriptions, [n:m] indicates a range from bit n to bit m,
+ inclusive.
+* Use of leading 0x indicates a hexadecimal number.
+* Use of leading 0b indicates a binary number.
+* The use of RSVD or Reserved indicates that a bit or field is reserved for
+ future use.
+* Field width is in bytes, unless otherwise noted.
+* Register are (R) read-only, (R/W) read/write, (W) write-only, or (COR) clear
+ on read
+* TLV values in network-byte-order are designated with (N).
-SECTION 2: PCI Configuration Registers
-======================================
+PCI Configuration Registers
+===========================
PCI Configuration Space
-----------------------
-Each switch instance registers as a PCI device with PCI configuration space:
+Each switch instance registers as a PCI device with PCI configuration space::
offset width description value
---------------------------------------------
@@ -74,11 +74,10 @@ Each switch instance registers as a PCI device with PCI configuration space:
0x41 1 Retry count
0x42 2 Reserved
+ * Assigned by sub-system implementation
-* Assigned by sub-system implementation
-
-SECTION 3: Memory-Mapped Register Space
-=======================================
+Memory-Mapped Register Space
+============================
There are two memory-mapped BARs. BAR0 maps device register space and is
0x2000 in size. BAR1 maps MSI-X vector and PBA tables and is also 0x2000 in
@@ -89,7 +88,7 @@ byte registers with one 4-byte access, and 8 byte registers with either two
4-byte accesses or a single 8-byte access. In the case of two 4-byte accesses,
access must be lower and then upper 4-bytes, in that order.
-BAR0 device register space is organized as follows:
+BAR0 device register space is organized as follows::
offset description
------------------------------------------------------
@@ -105,7 +104,7 @@ Reads to reserved registers read back as 0.
No fancy stuff like write-combining is enabled on any of the registers.
-BAR1 MSI-X register space is organized as follows:
+BAR1 MSI-X register space is organized as follows::
offset description
------------------------------------------------------
@@ -113,8 +112,8 @@ BAR1 MSI-X register space is organized as follows:
0x1000-0x1fff MSI-X PBA table
-SECTION 4: Interrupts, DMA, and Endianness
-==========================================
+Interrupts, DMA, and Endianness
+===============================
PCI Interrupts
--------------
@@ -122,7 +121,7 @@ PCI Interrupts
The device supports only MSI-X interrupts. BAR1 memory-mapped region contains
the MSI-X vector and PBA tables, with support for up to 256 MSI-X vectors.
-The vector assignment is:
+The vector assignment is::
vector description
-----------------------------------------------------
@@ -134,7 +133,7 @@ The vector assignment is:
Tx vector is even
Rx vector is odd
-A MSI-X vector table entry is 16 bytes:
+A MSI-X vector table entry is 16 bytes::
field offset width description
-------------------------------------------------------------
@@ -170,7 +169,7 @@ ring, and hardware will set this bit when the descriptor is complete.
Descriptor ring sizes must be a power of 2 and range from 2 to 64K entries.
Descriptor rings' base address must be 8-byte aligned. Descriptors must be
packed within ring. Each descriptor in each ring must also be aligned on an 8
-byte boundary. Each descriptor ring will have these registers:
+byte boundary. Each descriptor ring will have these registers::
DMA_DESC_xxx_BASE_ADDR, offset 0x1000 + (x * 32), 64-bit, (R/W)
DMA_DESC_xxx_SIZE, offset 0x1008 + (x * 32), 32-bit, (R/W)
@@ -180,7 +179,7 @@ byte boundary. Each descriptor ring will have these registers:
DMA_DESC_xxx_CREDITS, offset 0x1018 + (x * 32), 32-bit, (R/W)
DMA_DESC_xxx_RSVD1, offset 0x101c + (x * 32), 32-bit, (R/W)
-Where x is descriptor ring index:
+Where x is descriptor ring index::
index ring
--------------------
@@ -203,14 +202,14 @@ written past TAIL. To do so would wrap the ring. An empty ring is when HEAD
== TAIL. A full ring is when HEAD is one position behind TAIL. Both HEAD and
TAIL increment and modulo wrap at the ring size.
-CTRL register bits:
+CTRL register bits::
bit name description
------------------------------------------------------------------------
[0] CTRL_RESET Reset the descriptor ring
[1:31] Reserved
-All descriptor types share some common fields:
+All descriptor types share some common fields::
field width description
-------------------------------------------------------------------
@@ -234,7 +233,7 @@ filled in by the switch. Likewise, the switch will ignore unknown fields
filled in by software.
Descriptor payload buffer is 8-byte aligned and TLVs are 8-byte aligned. The
-value within a TLV is also 8-byte aligned. The (packed, 8 byte) TLV header is:
+value within a TLV is also 8-byte aligned. The (packed, 8 byte) TLV header is::
field width description
-----------------------------
@@ -246,7 +245,7 @@ The alignment requirements for descriptors and TLVs are to avoid unaligned
access exceptions in software. Note that the payload for each TLV is also
8 byte aligned.
-Figure 1 shows an example descriptor buffer with two TLVs.
+Figure 1 shows an example descriptor buffer with two TLVs::
<------- 8 bytes ------->
@@ -316,11 +315,11 @@ network packet data. All non-network-packet TLV multi-byte values will be LE.
TLV values in network-byte-order are designated with (N).
-SECTION 5: Test Registers
-=========================
+Test Registers
+==============
Rocker has several test registers to support troubleshooting register access,
-interrupt generation, and DMA operations:
+interrupt generation, and DMA operations::
TEST_REG, offset 0x0010, 32-bit (R/W)
TEST_REG64, offset 0x0018, 64-bit (R/W)
@@ -338,7 +337,7 @@ for that vector.
To test basic DMA operations, allocate a DMA-able host buffer and put the
buffer address into TEST_DMA_ADDR and size into TEST_DMA_SIZE. Then, write to
-TEST_DMA_CTRL to manipulate the buffer contents. TEST_DMA_CTRL operations are:
+TEST_DMA_CTRL to manipulate the buffer contents. TEST_DMA_CTRL operations are::
operation value description
-----------------------------------------------------------
@@ -351,14 +350,14 @@ issue exists. In particular, buffers that start on odd-8-byte boundary and/or
span multiple PAGE sizes should be tested.
-SECTION 6: Ports
-================
+Ports
+=====
Physical and Logical Ports
------------------------------------
The switch supports up to 62 physical (front-panel) ports. Register
-PORT_PHYS_COUNT returns the actual number of physical ports available:
+PORT_PHYS_COUNT returns the actual number of physical ports available::
PORT_PHYS_COUNT, offset 0x0304, 32-bit, (R)
@@ -369,7 +368,7 @@ Front-panel ports and logical tunnel ports are mapped into a single 32-bit port
space. A special CPU port is assigned port 0. The front-panel ports are
mapped to ports 1-62. A special loopback port is assigned port 63. Logical
tunnel ports are assigned ports 0x0001000-0x0001ffff.
-To summarize the port assignments:
+To summarize the port assignments::
port mapping
-------------------------------------------------------
@@ -391,14 +390,14 @@ set/get the mode for front-panel ports, see port settings, below.
Port Settings
-------------
-Link status for all front-panel ports is available via PORT_PHYS_LINK_STATUS:
+Link status for all front-panel ports is available via PORT_PHYS_LINK_STATUS::
PORT_PHYS_LINK_STATUS, offset 0x0310, 64-bit, (R)
Value is port bitmap. Bits 0 and 63 always read 0. Bits 1-62
read 1 for link UP and 0 for link DOWN for respective front-panel ports.
-Other properties for front-panel ports are available via DMA CMD descriptors:
+Other properties for front-panel ports are available via DMA CMD descriptors::
Get PORT_SETTINGS descriptor:
@@ -438,7 +437,7 @@ Port Enable
-----------
Front-panel ports are initially disabled, which means port ingress and egress
-packets will be dropped. To enable or disable a port, use PORT_PHYS_ENABLE:
+packets will be dropped. To enable or disable a port, use PORT_PHYS_ENABLE::
PORT_PHYS_ENABLE: offset 0x0318, 64-bit, (R/W)
@@ -447,15 +446,15 @@ packets will be dropped. To enable or disable a port, use PORT_PHYS_ENABLE:
Default is 0.
-SECTION 7: Switch Control
-=========================
+Switch Control
+==============
This section covers switch-wide register settings.
Control
-------
-This register is used for low level control of the switch.
+This register is used for low level control of the switch::
CONTROL: offset 0x0300, 32-bit, (W)
@@ -468,18 +467,18 @@ Switch ID
---------
The switch has a SWITCH_ID to be used by software to uniquely identify the
-switch:
+switch::
SWITCH_ID: offset 0x0320, 64-bit, (R)
Value is opaque to switch software and no special encoding is implied.
-SECTION 8: Events
-=================
+Events
+======
Non-I/O asynchronous events from the device are notified to the host using the
-event ring. The TLV structure for events is:
+event ring. The TLV structure for events is::
field width description
---------------------------------------------------
@@ -491,7 +490,7 @@ event ring. The TLV structure for events is:
Link Changed Event
------------------
-When link status changes on a physical port, this event is generated.
+When link status changes on a physical port, this event is generated::
field width description
---------------------------------------------------
@@ -510,6 +509,8 @@ driver should install to the device the MAC/VLAN on the port into the bridge
table. Once installed, the MAC/VLAN is known on the port and this event will
no longer be generated.
+::
+
field width description
---------------------------------------------------
INFO <nest>
@@ -518,8 +519,8 @@ no longer be generated.
VLAN 2 VLAN ID
-SECTION 9: CPU Packet Processing
-================================
+CPU Packet Processing
+=====================
Ingress packets directed to the host CPU for further processing are delivered
in the DMA RX ring. Likewise, host CPU originating packets destined to egress
@@ -540,7 +541,7 @@ software that Tx is complete and software resources (e.g. skb) backing packet
can be released.
Figure 2 shows an example 3-fragment packet queued with one Tx descriptor. A
-TLV is used for each packet fragment.
+TLV is used for each packet fragment::
pkt frag 1
+–––––––+ +–+
@@ -570,7 +571,7 @@ TLV is used for each packet fragment.
fig 2.
-The TLVs for Tx descriptor buffer are:
+The TLVs for Tx descriptor buffer are::
field width description
---------------------------------------------------------------------
@@ -600,7 +601,7 @@ The TLVs for Tx descriptor buffer are:
TX_FRAG_ADDR 8 DMA address of packet fragment
TX_FRAG_LEN 2 Packet fragment length
-Possible status return codes in descriptor on completion are:
+Possible status return codes in descriptor on completion are::
DESC_COMP_ERR reason
--------------------------------------------------------------------
@@ -623,7 +624,7 @@ worst-case packet size. A single Rx descriptor will contain the entire Rx
packet data in one RX_FRAG. Other Rx TLVs describe and hardware offloads
performed on the packet, such as checksum validation.
-The TLVs for Rx descriptor buffer are:
+The TLVs for Rx descriptor buffer are::
field width description
---------------------------------------------------
@@ -649,7 +650,7 @@ The TLVs for Rx descriptor buffer are:
Offload forward RX_FLAG indicates the device has already forwarded the packet
so the host CPU should not also forward the packet.
-Possible status return codes in descriptor on completion are:
+Possible status return codes in descriptor on completion are::
DESC_COMP_ERR reason
--------------------------------------------------------------------
@@ -660,14 +661,14 @@ Possible status return codes in descriptor on completion are:
packet data TLV and other TLVs.
-SECTION 10: OF-DPA Mode
-======================
+OF-DPA Mode
+===========
OF-DPA mode allows the switch to offload flow packet processing functions to
hardware. An OpenFlow controller would communicate with an OpenFlow agent
installed on the switch. The OpenFlow agent would (directly or indirectly)
communicate with the Rocker switch driver, which in turn would program switch
-hardware with flow functionality, as defined in OF-DPA. The block diagram is:
+hardware with flow functionality, as defined in OF-DPA. The block diagram is::
+–––––––––––––––----–––+
| OF |
@@ -696,14 +697,14 @@ OF-DPA Flow Table Interface
There are commands to add, modify, delete, and get stats of flow table entries.
The commands are issued using the DMA CMD descriptor ring. The following
-commands are defined:
+commands are defined::
CMD_ADD: add an entry to flow table
CMD_MOD: modify an entry in flow table
CMD_DEL: delete an entry from flow table
CMD_GET_STATS: get stats for flow entry
-TLVs for add and modify commands are:
+TLVs for add and modify commands are::
field width description
----------------------------------------------------
@@ -723,14 +724,14 @@ TLVs for add and modify commands are:
Additional TLVs based on flow table ID:
-Table ID 0: ingress port
+Table ID 0: ingress port::
field width description
----------------------------------------------------
OF_DPA_IN_PPORT 4 ingress physical port number
OF_DPA_GOTO_TBL 2 goto table ID; zero to drop
-Table ID 10: vlan
+Table ID 10: vlan::
field width description
----------------------------------------------------
@@ -740,7 +741,7 @@ Table ID 10: vlan
OF_DPA_GOTO_TBL 2 goto table ID; zero to drop
OF_DPA_NEW_VLAN_ID 2 (N) new vlan ID
-Table ID 20: termination mac
+Table ID 20: termination mac::
field width description
----------------------------------------------------
@@ -757,7 +758,7 @@ Table ID 20: termination mac
OF_DPA_OUT_PPORT 2 if specified, must be
controller, set zero otherwise
-Table ID 30: unicast routing
+Table ID 30: unicast routing::
field width description
----------------------------------------------------
@@ -772,7 +773,7 @@ Table ID 30: unicast routing
OF_DPA_GROUP_ID 4 data for GROUP action must
be an L3 Unicast group entry
-Table ID 40: multicast routing
+Table ID 40: multicast routing::
field width description
----------------------------------------------------
@@ -797,7 +798,7 @@ Table ID 40: multicast routing
OF_DPA_GROUP_ID 4 data for GROUP action must
be an L3 multicast group entry
-Table ID 50: bridging
+Table ID 50: bridging::
field width description
----------------------------------------------------
@@ -818,7 +819,7 @@ Table ID 50: bridging
restricted to CONTROLLER,
set to 0 otherwise
-Table ID 60: acl policy
+Table ID 60: acl policy::
field width description
----------------------------------------------------
@@ -890,7 +891,7 @@ Table ID 60: acl policy
dropped (all other instructions
ignored)
-TLVs for flow delete and get stats command are:
+TLVs for flow delete and get stats command are::
field width description
---------------------------------------------------
@@ -898,7 +899,7 @@ TLVs for flow delete and get stats command are:
OF_DPA_COOKIE 8 Cookie
On completion of get stats command, the descriptor buffer is written back with
-the following TLVs:
+the following TLVs::
field width description
---------------------------------------------------
@@ -906,7 +907,7 @@ the following TLVs:
OF_DPA_STAT_RX_PKTS 8 Received packets
OF_DPA_STAT_TX_PKTS 8 Transmit packets
-Possible status return codes in descriptor on completion are:
+Possible status return codes in descriptor on completion are::
DESC_COMP_ERR command reason
--------------------------------------------------------------------
@@ -928,14 +929,14 @@ Group Table Interface
There are commands to add, modify, delete, and get stats of group table
entries. The commands are issued using the DMA CMD descriptor ring. The
-following commands are defined:
+following commands are defined::
CMD_ADD: add an entry to group table
CMD_MOD: modify an entry in group table
CMD_DEL: delete an entry from group table
CMD_GET_STATS: get stats for group entry
-TLVs for add and modify commands are:
+TLVs for add and modify commands are::
field width description
-----------------------------------------------------------
@@ -969,7 +970,7 @@ TLVs for add and modify commands are:
FLOW_SRC_MAC 6 (types 1, 2, 5)
FLOW_DST_MAC 6 (types 1, 2)
-TLVs for flow delete and get stats command are:
+TLVs for flow delete and get stats command are::
field width description
-----------------------------------------------------------
@@ -977,7 +978,7 @@ TLVs for flow delete and get stats command are:
FLOW_GROUP_ID 2 Flow group ID
On completion of get stats command, the descriptor buffer is written back with
-the following TLVs:
+the following TLVs::
field width description
---------------------------------------------------
@@ -986,7 +987,7 @@ the following TLVs:
FLOW_STAT_REF_COUNT 4 Flow reference count
FLOW_STAT_BUCKET_COUNT 4 Flow bucket count
-Possible status return codes in descriptor on completion are:
+Possible status return codes in descriptor on completion are::
DESC_COMP_ERR command reason
--------------------------------------------------------------------
diff --git a/docs/specs/spdm.rst b/docs/specs/spdm.rst
new file mode 100644
index 0000000..f7de080
--- /dev/null
+++ b/docs/specs/spdm.rst
@@ -0,0 +1,134 @@
+======================================================
+QEMU Security Protocols and Data Models (SPDM) Support
+======================================================
+
+SPDM enables authentication, attestation and key exchange to assist in
+providing infrastructure security enablement. It's a standard published
+by the `DMTF`_.
+
+QEMU supports connecting to a SPDM responder implementation. This allows an
+external application to emulate the SPDM responder logic for an SPDM device.
+
+Setting up a SPDM server
+========================
+
+When using QEMU with SPDM devices QEMU will connect to a server which
+implements the SPDM functionality.
+
+SPDM-Utils
+----------
+
+You can use `SPDM Utils`_ to emulate a responder. This is the simplest method.
+
+SPDM-Utils is a Linux applications to manage, test and develop devices
+supporting DMTF Security Protocol and Data Model (SPDM). It is written in Rust
+and utilises libspdm.
+
+To use SPDM-Utils you will need to do the following steps. Details are included
+in the SPDM-Utils README.
+
+ 1. `Build libspdm`_
+ 2. `Build SPDM Utils`_
+ 3. `Run it as a server`_
+
+spdm-emu
+--------
+
+You can use `spdm emu`_ to model the
+SPDM responder.
+
+.. code-block:: shell
+
+ $ cd spdm-emu
+ $ git submodule init; git submodule update --recursive
+ $ mkdir build; cd build
+ $ cmake -DARCH=x64 -DTOOLCHAIN=GCC -DTARGET=Debug -DCRYPTO=openssl ..
+ $ make -j32
+ $ make copy_sample_key # Build certificates, required for SPDM authentication.
+
+It is worth noting that the certificates should be in compliance with
+PCIe r6.1 sec 6.31.3. This means you will need to add the following to
+openssl.cnf
+
+.. code-block::
+
+ subjectAltName = otherName:2.23.147;UTF8:Vendor=1b36:Device=0010:CC=010802:REV=02:SSVID=1af4:SSID=1100
+ 2.23.147 = ASN1:OID:2.23.147
+
+and then manually regenerate some certificates with:
+
+.. code-block:: shell
+
+ $ openssl req -nodes -newkey ec:param.pem -keyout end_responder.key \
+ -out end_responder.req -sha384 -batch \
+ -subj "/CN=DMTF libspdm ECP384 responder cert"
+
+ $ openssl x509 -req -in end_responder.req -out end_responder.cert \
+ -CA inter.cert -CAkey inter.key -sha384 -days 3650 -set_serial 3 \
+ -extensions v3_end -extfile ../openssl.cnf
+
+ $ openssl asn1parse -in end_responder.cert -out end_responder.cert.der
+
+ $ cat ca.cert.der inter.cert.der end_responder.cert.der > bundle_responder.certchain.der
+
+You can use SPDM-Utils instead as it will generate the correct certificates
+automatically.
+
+The responder can then be launched with
+
+.. code-block:: shell
+
+ $ cd bin
+ $ ./spdm_responder_emu --trans PCI_DOE
+
+Connecting an SPDM NVMe device
+==============================
+
+Once a SPDM server is running we can start QEMU and connect to the server.
+
+For an NVMe device first let's setup a block we can use
+
+.. code-block:: shell
+
+ $ cd qemu-spdm/linux/image
+ $ dd if=/dev/zero of=blknvme bs=1M count=2096 # 2GB NNMe Drive
+
+Then you can add this to your QEMU command line:
+
+.. code-block:: shell
+
+ -drive file=blknvme,if=none,id=mynvme,format=raw \
+ -device nvme,drive=mynvme,serial=deadbeef,spdm_port=2323
+
+At which point QEMU will try to connect to the SPDM server.
+
+Note that if using x64-64 you will want to use the q35 machine instead
+of the default. So the entire QEMU command might look like this
+
+.. code-block:: shell
+
+ qemu-system-x86_64 -M q35 \
+ --kernel bzImage \
+ -drive file=rootfs.ext2,if=virtio,format=raw \
+ -append "root=/dev/vda console=ttyS0" \
+ -net none -nographic \
+ -drive file=blknvme,if=none,id=mynvme,format=raw \
+ -device nvme,drive=mynvme,serial=deadbeef,spdm_port=2323
+
+.. _DMTF:
+ https://www.dmtf.org/standards/SPDM
+
+.. _SPDM Utils:
+ https://github.com/westerndigitalcorporation/spdm-utils
+
+.. _spdm emu:
+ https://github.com/dmtf/spdm-emu
+
+.. _Build libspdm:
+ https://github.com/westerndigitalcorporation/spdm-utils?tab=readme-ov-file#build-libspdm
+
+.. _Build SPDM Utils:
+ https://github.com/westerndigitalcorporation/spdm-utils?tab=readme-ov-file#build-the-binary
+
+.. _Run it as a server:
+ https://github.com/westerndigitalcorporation/spdm-utils#qemu-spdm-device-emulation
diff --git a/docs/sphinx-static/theme_overrides.css b/docs/sphinx-static/theme_overrides.css
index c70ef95..965ecac 100644
--- a/docs/sphinx-static/theme_overrides.css
+++ b/docs/sphinx-static/theme_overrides.css
@@ -87,6 +87,55 @@ div[class^="highlight"] pre {
padding-bottom: 1px;
}
+/* qmp-example directive styling */
+
+.rst-content .admonition-example {
+ /* do not apply the standard admonition background */
+ background-color: transparent;
+ border: solid #ffd2ed 1px;
+}
+
+.rst-content .admonition-example > .admonition-title:before {
+ content: "▷";
+}
+
+.rst-content .admonition-example > .admonition-title {
+ background-color: #5980a6;
+}
+
+.rst-content .admonition-example > div[class^="highlight"] {
+ /* make code boxes take up the full width of the admonition w/o margin */
+ margin-left: -12px;
+ margin-right: -12px;
+
+ border-top: 1px solid #ffd2ed;
+ border-bottom: 1px solid #ffd2ed;
+ border-left: 0px;
+ border-right: 0px;
+}
+
+.rst-content .admonition-example > div[class^="highlight"]:nth-child(2) {
+ /* If a code box is the second element in an example admonition,
+ * it is the first child after the title. let it sit flush against
+ * the title. */
+ margin-top: -12px;
+ border-top: 0px;
+}
+
+.rst-content .admonition-example > div[class^="highlight"]:last-child {
+ /* If a code box is the final element in an example admonition, don't
+ * render margin below it; let it sit flush with the end of the
+ * admonition box */
+ margin-bottom: -12px;
+ border-bottom: 0px;
+}
+
+.rst-content .admonition-example .highlight {
+ background-color: #fffafd;
+}
+
+/* end qmp-example styling */
+
@media screen {
/* content column
diff --git a/docs/sphinx/depfile.py b/docs/sphinx/depfile.py
index afdcbce..e74be6a 100644
--- a/docs/sphinx/depfile.py
+++ b/docs/sphinx/depfile.py
@@ -19,7 +19,7 @@ __version__ = '1.0'
def get_infiles(env):
for x in env.found_docs:
- yield env.doc2path(x)
+ yield str(env.doc2path(x))
yield from ((os.path.join(env.srcdir, dep)
for dep in env.dependencies[x]))
for mod in sys.modules.values():
diff --git a/docs/sphinx/hxtool.py b/docs/sphinx/hxtool.py
index 3729084..a84723b 100644
--- a/docs/sphinx/hxtool.py
+++ b/docs/sphinx/hxtool.py
@@ -24,16 +24,10 @@ from docutils import nodes
from docutils.statemachine import ViewList
from docutils.parsers.rst import directives, Directive
from sphinx.errors import ExtensionError
+from sphinx.util.docutils import switch_source_input
from sphinx.util.nodes import nested_parse_with_titles
import sphinx
-# Sphinx up to 1.6 uses AutodocReporter; 1.7 and later
-# use switch_source_input. Check borrowed from kerneldoc.py.
-Use_SSI = sphinx.__version__[:3] >= '1.7'
-if Use_SSI:
- from sphinx.util.docutils import switch_source_input
-else:
- from sphinx.ext.autodoc import AutodocReporter
__version__ = '1.0'
@@ -185,16 +179,9 @@ class HxtoolDocDirective(Directive):
# of title_styles and section_level that kerneldoc.py does,
# because nested_parse_with_titles() does that for us.
def do_parse(self, result, node):
- if Use_SSI:
- with switch_source_input(self.state, result):
- nested_parse_with_titles(self.state, result, node)
- else:
- save = self.state.memo.reporter
- self.state.memo.reporter = AutodocReporter(result, self.state.memo.reporter)
- try:
- nested_parse_with_titles(self.state, result, node)
- finally:
- self.state.memo.reporter = save
+ with switch_source_input(self.state, result):
+ nested_parse_with_titles(self.state, result, node)
+
def setup(app):
""" Register hxtool-doc directive with Sphinx"""
diff --git a/docs/sphinx/kerneldoc.py b/docs/sphinx/kerneldoc.py
index 72c403a..3aa972f 100644
--- a/docs/sphinx/kerneldoc.py
+++ b/docs/sphinx/kerneldoc.py
@@ -38,20 +38,14 @@ from docutils import nodes, statemachine
from docutils.statemachine import ViewList
from docutils.parsers.rst import directives, Directive
-#
-# AutodocReporter is only good up to Sphinx 1.7
-#
import sphinx
+from sphinx.util import logging
+from sphinx.util.docutils import switch_source_input
-Use_SSI = sphinx.__version__[:3] >= '1.7'
-if Use_SSI:
- from sphinx.util.docutils import switch_source_input
-else:
- from sphinx.ext.autodoc import AutodocReporter
-
-import kernellog
__version__ = '1.0'
+logger = logging.getLogger('kerneldoc')
+
class KernelDocDirective(Directive):
"""Extract kernel-doc comments from the specified file"""
@@ -111,8 +105,7 @@ class KernelDocDirective(Directive):
cmd += [filename]
try:
- kernellog.verbose(env.app,
- 'calling kernel-doc \'%s\'' % (" ".join(cmd)))
+ logger.verbose('calling kernel-doc \'%s\'' % (" ".join(cmd)))
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out, err = p.communicate()
@@ -122,8 +115,10 @@ class KernelDocDirective(Directive):
if p.returncode != 0:
sys.stderr.write(err)
- kernellog.warn(env.app,
- 'kernel-doc \'%s\' failed with return code %d' % (" ".join(cmd), p.returncode))
+ logger.warning(
+ 'kernel-doc \'%s\' failed with return code %d' %
+ (" ".join(cmd), p.returncode)
+ )
return [nodes.error(None, nodes.paragraph(text = "kernel-doc missing"))]
elif env.config.kerneldoc_verbosity > 0:
sys.stderr.write(err)
@@ -149,22 +144,13 @@ class KernelDocDirective(Directive):
return node.children
except Exception as e: # pylint: disable=W0703
- kernellog.warn(env.app, 'kernel-doc \'%s\' processing failed with: %s' %
+ logger.warning('kernel-doc \'%s\' processing failed with: %s' %
(" ".join(cmd), str(e)))
return [nodes.error(None, nodes.paragraph(text = "kernel-doc missing"))]
def do_parse(self, result, node):
- if Use_SSI:
- with switch_source_input(self.state, result):
- self.state.nested_parse(result, 0, node, match_titles=1)
- else:
- save = self.state.memo.title_styles, self.state.memo.section_level, self.state.memo.reporter
- self.state.memo.reporter = AutodocReporter(result, self.state.memo.reporter)
- self.state.memo.title_styles, self.state.memo.section_level = [], 0
- try:
- self.state.nested_parse(result, 0, node, match_titles=1)
- finally:
- self.state.memo.title_styles, self.state.memo.section_level, self.state.memo.reporter = save
+ with switch_source_input(self.state, result):
+ self.state.nested_parse(result, 0, node, match_titles=1)
def setup(app):
diff --git a/docs/sphinx/kernellog.py b/docs/sphinx/kernellog.py
deleted file mode 100644
index af924f5..0000000
--- a/docs/sphinx/kernellog.py
+++ /dev/null
@@ -1,28 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-#
-# Sphinx has deprecated its older logging interface, but the replacement
-# only goes back to 1.6. So here's a wrapper layer to keep around for
-# as long as we support 1.4.
-#
-import sphinx
-
-if sphinx.__version__[:3] >= '1.6':
- UseLogging = True
- from sphinx.util import logging
- logger = logging.getLogger('kerneldoc')
-else:
- UseLogging = False
-
-def warn(app, message):
- if UseLogging:
- logger.warning(message)
- else:
- app.warn(message)
-
-def verbose(app, message):
- if UseLogging:
- logger.verbose(message)
- else:
- app.verbose(message)
-
-
diff --git a/docs/sphinx/qapidoc.py b/docs/sphinx/qapidoc.py
index 2b06750..5f96b46 100644
--- a/docs/sphinx/qapidoc.py
+++ b/docs/sphinx/qapidoc.py
@@ -26,7 +26,9 @@ https://www.sphinx-doc.org/en/master/development/index.html
import os
import re
+import sys
import textwrap
+from typing import List
from docutils import nodes
from docutils.parsers.rst import Directive, directives
@@ -35,22 +37,13 @@ from qapi.error import QAPIError, QAPISemError
from qapi.gen import QAPISchemaVisitor
from qapi.schema import QAPISchema
-import sphinx
+from sphinx import addnodes
+from sphinx.directives.code import CodeBlock
from sphinx.errors import ExtensionError
+from sphinx.util.docutils import switch_source_input
from sphinx.util.nodes import nested_parse_with_titles
-# Sphinx up to 1.6 uses AutodocReporter; 1.7 and later
-# use switch_source_input. Check borrowed from kerneldoc.py.
-USE_SSI = sphinx.__version__[:3] >= "1.7"
-if USE_SSI:
- from sphinx.util.docutils import switch_source_input
-else:
- from sphinx.ext.autodoc import ( # pylint: disable=no-name-in-module
- AutodocReporter,
- )
-
-
__version__ = "1.0"
@@ -395,6 +388,7 @@ class QAPISchemaGenRSTVisitor(QAPISchemaVisitor):
self._active_headings[level - 1] += snode
self._active_headings = self._active_headings[:level]
self._active_headings.append(snode)
+ return snode
def _add_node_to_current_heading(self, node):
"""Add the node to whatever the current active heading is"""
@@ -424,13 +418,11 @@ class QAPISchemaGenRSTVisitor(QAPISchemaVisitor):
# the first line of the block)
(heading, _, text) = text.partition('\n')
(leader, _, heading) = heading.partition(' ')
- self._start_new_heading(heading, len(leader))
+ node = self._start_new_heading(heading, len(leader))
if text == '':
return
- node = self._make_section(None)
self._parse_text_into_node(text, node)
- self._add_node_to_current_heading(node)
self._cur_doc = None
def _parse_text_into_node(self, doctext, node):
@@ -492,7 +484,25 @@ class QAPISchemaGenDepVisitor(QAPISchemaVisitor):
super().visit_module(name)
-class QAPIDocDirective(Directive):
+class NestedDirective(Directive):
+ def run(self):
+ raise NotImplementedError
+
+ def do_parse(self, rstlist, node):
+ """
+ Parse rST source lines and add them to the specified node
+
+ Take the list of rST source lines rstlist, parse them as
+ rST, and add the resulting docutils nodes as children of node.
+ The nodes are parsed in a way that allows them to include
+ subheadings (titles) without confusing the rendering of
+ anything else.
+ """
+ with switch_source_input(self.state, rstlist):
+ nested_parse_with_titles(self.state, rstlist, node)
+
+
+class QAPIDocDirective(NestedDirective):
"""Extract documentation from the specified QAPI .json file"""
required_argument = 1
@@ -530,39 +540,107 @@ class QAPIDocDirective(Directive):
# so they are displayed nicely to the user
raise ExtensionError(str(err)) from err
- def do_parse(self, rstlist, node):
- """Parse rST source lines and add them to the specified node
- Take the list of rST source lines rstlist, parse them as
- rST, and add the resulting docutils nodes as children of node.
- The nodes are parsed in a way that allows them to include
- subheadings (titles) without confusing the rendering of
- anything else.
- """
- # This is from kerneldoc.py -- it works around an API change in
- # Sphinx between 1.6 and 1.7. Unlike kerneldoc.py, we use
- # sphinx.util.nodes.nested_parse_with_titles() rather than the
- # plain self.state.nested_parse(), and so we can drop the saving
- # of title_styles and section_level that kerneldoc.py does,
- # because nested_parse_with_titles() does that for us.
- if USE_SSI:
- with switch_source_input(self.state, rstlist):
- nested_parse_with_titles(self.state, rstlist, node)
+class QMPExample(CodeBlock, NestedDirective):
+ """
+ Custom admonition for QMP code examples.
+
+ When the :annotated: option is present, the body of this directive
+ is parsed as normal rST, but with any '::' code blocks set to use
+ the QMP lexer. Code blocks must be explicitly written by the user,
+ but this allows for intermingling explanatory paragraphs with
+ arbitrary rST syntax and code blocks for more involved examples.
+
+ When :annotated: is absent, the directive body is treated as a
+ simple standalone QMP code block literal.
+ """
+
+ required_argument = 0
+ optional_arguments = 0
+ has_content = True
+ option_spec = {
+ "annotated": directives.flag,
+ "title": directives.unchanged,
+ }
+
+ def _highlightlang(self) -> addnodes.highlightlang:
+ """Return the current highlightlang setting for the document"""
+ node = None
+ doc = self.state.document
+
+ if hasattr(doc, "findall"):
+ # docutils >= 0.18.1
+ for node in doc.findall(addnodes.highlightlang):
+ pass
else:
- save = self.state.memo.reporter
- self.state.memo.reporter = AutodocReporter(
- rstlist, self.state.memo.reporter
+ for elem in doc.traverse():
+ if isinstance(elem, addnodes.highlightlang):
+ node = elem
+
+ if node:
+ return node
+
+ # No explicit directive found, use defaults
+ node = addnodes.highlightlang(
+ lang=self.env.config.highlight_language,
+ force=False,
+ # Yes, Sphinx uses this value to effectively disable line
+ # numbers and not 0 or None or -1 or something. ¯\_(ツ)_/¯
+ linenothreshold=sys.maxsize,
+ )
+ return node
+
+ def admonition_wrap(self, *content) -> List[nodes.Node]:
+ title = "Example:"
+ if "title" in self.options:
+ title = f"{title} {self.options['title']}"
+
+ admon = nodes.admonition(
+ "",
+ nodes.title("", title),
+ *content,
+ classes=["admonition", "admonition-example"],
+ )
+ return [admon]
+
+ def run_annotated(self) -> List[nodes.Node]:
+ lang_node = self._highlightlang()
+
+ content_node: nodes.Element = nodes.section()
+
+ # Configure QMP highlighting for "::" blocks, if needed
+ if lang_node["lang"] != "QMP":
+ content_node += addnodes.highlightlang(
+ lang="QMP",
+ force=False, # "True" ignores lexing errors
+ linenothreshold=lang_node["linenothreshold"],
)
- try:
- nested_parse_with_titles(self.state, rstlist, node)
- finally:
- self.state.memo.reporter = save
+
+ self.do_parse(self.content, content_node)
+
+ # Restore prior language highlighting, if needed
+ if lang_node["lang"] != "QMP":
+ content_node += addnodes.highlightlang(**lang_node.attributes)
+
+ return content_node.children
+
+ def run(self) -> List[nodes.Node]:
+ annotated = "annotated" in self.options
+
+ if annotated:
+ content_nodes = self.run_annotated()
+ else:
+ self.arguments = ["QMP"]
+ content_nodes = super().run()
+
+ return self.admonition_wrap(*content_nodes)
def setup(app):
"""Register qapi-doc directive with Sphinx"""
app.add_config_value("qapidoc_srctree", None, "env")
app.add_directive("qapi-doc", QAPIDocDirective)
+ app.add_directive("qmp-example", QMPExample)
return {
"version": __version__,
diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst
index cd9559e..6733ffd 100644
--- a/docs/system/arm/aspeed.rst
+++ b/docs/system/arm/aspeed.rst
@@ -123,6 +123,8 @@ To boot the machine from the flash image, use an MTD drive :
Options specific to Aspeed machines are :
+ * ``boot-emmc`` to set or unset boot from eMMC (AST2600).
+
* ``execute-in-place`` which emulates the boot from the CE0 flash
device by using the FMC controller to load the instructions, and
not simply from RAM. This takes a little longer.
diff --git a/docs/system/arm/cubieboard.rst b/docs/system/arm/cubieboard.rst
index 58c4a2d..90d24c7 100644
--- a/docs/system/arm/cubieboard.rst
+++ b/docs/system/arm/cubieboard.rst
@@ -15,4 +15,5 @@ Emulated devices:
- USB controller
- SATA controller
- TWI (I2C) controller
+- SPI controller
- Watchdog timer
diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 3ab6e72..35f52a5 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -45,6 +45,7 @@ the following architecture extensions:
- FEAT_DotProd (Advanced SIMD dot product instructions)
- FEAT_DoubleFault (Double Fault Extension)
- FEAT_E0PD (Preventing EL0 access to halves of address maps)
+- FEAT_EBF16 (AArch64 Extended BFloat16 instructions)
- FEAT_ECV (Enhanced Counter Virtualization)
- FEAT_EL0 (Support for execution at EL0)
- FEAT_EL1 (Support for execution at EL1)
diff --git a/docs/system/arm/gumstix.rst b/docs/system/arm/gumstix.rst
deleted file mode 100644
index cb37313..0000000
--- a/docs/system/arm/gumstix.rst
+++ /dev/null
@@ -1,21 +0,0 @@
-Gumstix Connex and Verdex (``connex``, ``verdex``)
-==================================================
-
-These machines model the Gumstix Connex and Verdex boards.
-The Connex has a PXA255 CPU and the Verdex has a PXA270.
-
-Implemented devices:
-
- * NOR flash
- * SMC91C111 ethernet
- * Interrupt controller
- * DMA
- * Timer
- * GPIO
- * MMC/SD card
- * Fast infra-red communications port (FIR)
- * LCD controller
- * Synchronous serial ports (SPI)
- * PCMCIA interface
- * I2C
- * I2S
diff --git a/docs/system/arm/mainstone.rst b/docs/system/arm/mainstone.rst
deleted file mode 100644
index 05310f4..0000000
--- a/docs/system/arm/mainstone.rst
+++ /dev/null
@@ -1,25 +0,0 @@
-Intel Mainstone II board (``mainstone``)
-========================================
-
-The ``mainstone`` board emulates the Intel Mainstone II development
-board, which uses a PXA270 CPU.
-
-Emulated devices:
-
-- Flash memory
-- Keypad
-- MMC controller
-- 91C111 ethernet
-- PIC
-- Timer
-- DMA
-- GPIO
-- FIR
-- Serial
-- LCD controller
-- SSP
-- USB controller
-- RTC
-- PCMCIA
-- I2C
-- I2S
diff --git a/docs/system/arm/nseries.rst b/docs/system/arm/nseries.rst
deleted file mode 100644
index cd9edf5..0000000
--- a/docs/system/arm/nseries.rst
+++ /dev/null
@@ -1,33 +0,0 @@
-Nokia N800 and N810 tablets (``n800``, ``n810``)
-================================================
-
-Nokia N800 and N810 internet tablets (known also as RX-34 and RX-44 /
-48) emulation supports the following elements:
-
-- Texas Instruments OMAP2420 System-on-chip (ARM1136 core)
-
-- RAM and non-volatile OneNAND Flash memories
-
-- Display connected to EPSON remote framebuffer chip and OMAP on-chip
- display controller and a LS041y3 MIPI DBI-C controller
-
-- TI TSC2301 (in N800) and TI TSC2005 (in N810) touchscreen
- controllers driven through SPI bus
-
-- National Semiconductor LM8323-controlled qwerty keyboard driven
- through |I2C| bus
-
-- Secure Digital card connected to OMAP MMC/SD host
-
-- Three OMAP on-chip UARTs and on-chip STI debugging console
-
-- Mentor Graphics \"Inventra\" dual-role USB controller embedded in a
- TI TUSB6010 chip - only USB host mode is supported
-
-- TI TMP105 temperature sensor driven through |I2C| bus
-
-- TI TWL92230C power management companion with an RTC on
- |I2C| bus
-
-- Nokia RETU and TAHVO multi-purpose chips with an RTC, connected
- through CBUS
diff --git a/docs/system/arm/palm.rst b/docs/system/arm/palm.rst
deleted file mode 100644
index 61bc8d3..0000000
--- a/docs/system/arm/palm.rst
+++ /dev/null
@@ -1,23 +0,0 @@
-Palm Tungsten|E PDA (``cheetah``)
-=================================
-
-The Palm Tungsten|E PDA (codename \"Cheetah\") emulation includes the
-following elements:
-
-- Texas Instruments OMAP310 System-on-chip (ARM925T core)
-
-- ROM and RAM memories (ROM firmware image can be loaded with
- -option-rom)
-
-- On-chip LCD controller
-
-- On-chip Real Time Clock
-
-- TI TSC2102i touchscreen controller / analog-digital converter /
- Audio CODEC, connected through MicroWire and |I2S| buses
-
-- GPIO-connected matrix keypad
-
-- Secure Digital card connected to OMAP MMC/SD host
-
-- Three on-chip UARTs
diff --git a/docs/system/arm/stm32.rst b/docs/system/arm/stm32.rst
index 3b640f3..ca7a558 100644
--- a/docs/system/arm/stm32.rst
+++ b/docs/system/arm/stm32.rst
@@ -36,6 +36,7 @@ Supported devices
* SPI controller
* System configuration (SYSCFG)
* Timer controller (TIMER)
+ * Reset and Clock Controller (RCC) (STM32F4 only, reset and enable only)
Missing devices
---------------
@@ -53,7 +54,7 @@ Missing devices
* Power supply configuration (PWR)
* Random Number Generator (RNG)
* Real-Time Clock (RTC) controller
- * Reset and Clock Controller (RCC)
+ * Reset and Clock Controller (RCC) (other features than reset and enable)
* Secure Digital Input/Output (SDIO) interface
* USB OTG
* Watchdog controller (IWDG, WWDG)
diff --git a/docs/system/arm/xscale.rst b/docs/system/arm/xscale.rst
deleted file mode 100644
index e239136..0000000
--- a/docs/system/arm/xscale.rst
+++ /dev/null
@@ -1,35 +0,0 @@
-Sharp XScale-based PDA models (``akita``, ``borzoi``, ``spitz``, ``terrier``, ``tosa``)
-=======================================================================================
-
-The Sharp Zaurus are PDAs based on XScale, able to run Linux ('SL series').
-
-The SL-6000 (\"Tosa\"), released in 2005, uses a PXA255 System-on-chip.
-
-The SL-C3000 (\"Spitz\"), SL-C1000 (\"Akita\"), SL-C3100 (\"Borzoi\") and
-SL-C3200 (\"Terrier\") use a PXA270.
-
-The clamshell PDA models emulation includes the following peripherals:
-
-- Intel PXA255/PXA270 System-on-chip (ARMv5TE core)
-
-- NAND Flash memory - not in \"Tosa\"
-
-- IBM/Hitachi DSCM microdrive in a PXA PCMCIA slot - not in \"Akita\"
-
-- On-chip OHCI USB controller - not in \"Tosa\"
-
-- On-chip LCD controller
-
-- On-chip Real Time Clock
-
-- TI ADS7846 touchscreen controller on SSP bus
-
-- Maxim MAX1111 analog-digital converter on |I2C| bus
-
-- GPIO-connected keyboard controller and LEDs
-
-- Secure Digital card connected to PXA MMC/SD host
-
-- Three on-chip UARTs
-
-- WM8750 audio CODEC on |I2C| and |I2S| buses
diff --git a/docs/system/i386/xenpvh.rst b/docs/system/i386/xenpvh.rst
new file mode 100644
index 0000000..354250f
--- /dev/null
+++ b/docs/system/i386/xenpvh.rst
@@ -0,0 +1,49 @@
+Xen PVH machine (``xenpvh``)
+=========================================
+
+Xen supports a spectrum of types of guests that vary in how they depend
+on HW virtualization features, emulation models and paravirtualization.
+PVH is a mode that uses HW virtualization features (like HVM) but tries
+to avoid emulation models and instead use passthrough or
+paravirtualized devices.
+
+QEMU can be used to provide PV virtio devices on an emulated PCIe controller.
+That is the purpose of this minimal machine.
+
+Supported devices
+-----------------
+
+The x86 Xen PVH QEMU machine provide the following devices:
+
+- RAM
+- GPEX host bridge
+- virtio-pci devices
+
+The idea is to only connect virtio-pci devices but in theory any compatible
+PCI device model will work depending on Xen and guest support.
+
+Running
+-------
+
+The Xen tools will typically construct a command-line and launch QEMU
+for you when needed. But here's an example of what it can look like in
+case you need to construct one manually:
+
+.. code-block:: console
+
+ qemu-system-i386 -xen-domid 3 -no-shutdown \
+ -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-3,server=on,wait=off \
+ -mon chardev=libxl-cmd,mode=control \
+ -chardev socket,id=libxenstat-cmd,path=/var/run/xen/qmp-libxenstat-3,server=on,wait=off \
+ -mon chardev=libxenstat-cmd,mode=control \
+ -nodefaults \
+ -no-user-config \
+ -xen-attach -name g0 \
+ -vnc none \
+ -display none \
+ -device virtio-net-pci,id=nic0,netdev=net0,mac=00:16:3e:5c:81:78 \
+ -netdev type=tap,id=net0,ifname=vif3.0-emu,br=xenbr0,script=no,downscript=no \
+ -smp 4,maxcpus=4 \
+ -nographic \
+ -machine xenpvh,ram-low-base=0,ram-low-size=2147483648,ram-high-base=4294967296,ram-high-size=2147483648,pci-ecam-base=824633720832,pci-ecam-size=268435456,pci-mmio-base=4026531840,pci-mmio-size=33554432,pci-mmio-high-base=824902156288,pci-mmio-high-size=68719476736 \
+ -m 4096
diff --git a/docs/system/loongarch/virt.rst b/docs/system/loongarch/virt.rst
index 06d034b..172fba0 100644
--- a/docs/system/loongarch/virt.rst
+++ b/docs/system/loongarch/virt.rst
@@ -64,7 +64,7 @@ Note: You need get the latest cross-tools at https://github.com/loongson/build-t
(3) Build BIOS:
- See: https://github.com/tianocore/edk2-platforms/tree/master/Platform/Loongson/LoongArchQemuPkg#readme
+ See: https://github.com/tianocore/edk2/tree/master/OvmfPkg/LoongArchVirt#readme
Note: To build the release version of the bios, set --buildtarget=RELEASE,
the bios file path: Build/LoongArchQemu/RELEASE_GCC5/FV/QEMU_EFI.fd
diff --git a/docs/system/ppc/powermac.rst b/docs/system/ppc/powermac.rst
index 04334ba..3eac81c 100644
--- a/docs/system/ppc/powermac.rst
+++ b/docs/system/ppc/powermac.rst
@@ -4,8 +4,8 @@ PowerMac family boards (``g3beige``, ``mac99``)
Use the executable ``qemu-system-ppc`` to simulate a complete PowerMac
PowerPC system.
-- ``g3beige`` Heathrow based PowerMAC
-- ``mac99`` Mac99 based PowerMAC
+- ``g3beige`` Heathrow based PowerMac
+- ``mac99`` Mac99 based PowerMac
Supported devices
-----------------
diff --git a/docs/system/target-arm.rst b/docs/system/target-arm.rst
index 7b99272..3c0a584 100644
--- a/docs/system/target-arm.rst
+++ b/docs/system/target-arm.rst
@@ -91,17 +91,12 @@ undocumented; you can get a complete list by running
arm/cubieboard
arm/emcraft-sf2
arm/musicpal
- arm/gumstix
- arm/mainstone
arm/kzm
- arm/nseries
arm/nrf
arm/nuvoton
arm/imx25-pdk
arm/orangepi
- arm/palm
arm/raspi
- arm/xscale
arm/collie
arm/sx1
arm/stellaris
diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst
index 1b8a1f2..23e84e3 100644
--- a/docs/system/target-i386.rst
+++ b/docs/system/target-i386.rst
@@ -26,6 +26,7 @@ Architectural features
i386/cpu
i386/hyperv
i386/xen
+ i386/xenpvh
i386/kvm-pv
i386/sgx
i386/amd-memory-encryption
diff --git a/docs/tools/index.rst b/docs/tools/index.rst
index 8e65ce0..1e88ae4 100644
--- a/docs/tools/index.rst
+++ b/docs/tools/index.rst
@@ -15,4 +15,4 @@ command line utilities and other standalone programs.
qemu-nbd
qemu-pr-helper
qemu-trace-stap
- virtfs-proxy-helper
+ qemu-vmsr-helper
diff --git a/docs/tools/qemu-vmsr-helper.rst b/docs/tools/qemu-vmsr-helper.rst
new file mode 100644
index 0000000..9ce10b9
--- /dev/null
+++ b/docs/tools/qemu-vmsr-helper.rst
@@ -0,0 +1,89 @@
+==================================
+QEMU virtual RAPL MSR helper
+==================================
+
+Synopsis
+--------
+
+**qemu-vmsr-helper** [*OPTION*]
+
+Description
+-----------
+
+Implements the virtual RAPL MSR helper for QEMU.
+
+Accessing the RAPL (Running Average Power Limit) MSR enables the RAPL powercap
+driver to advertise and monitor the power consumption or accumulated energy
+consumption of different power domains, such as CPU packages, DRAM, and other
+components when available.
+
+However those registers are accessible under privileged access (CAP_SYS_RAWIO).
+QEMU can use an external helper to access those privileged registers.
+
+:program:`qemu-vmsr-helper` is that external helper; it creates a listener
+socket which will accept incoming connections for communication with QEMU.
+
+If you want to run VMs in a setup like this, this helper should be started as a
+system service, and you should read the QEMU manual section on "RAPL MSR
+support" to find out how to configure QEMU to connect to the socket created by
+:program:`qemu-vmsr-helper`.
+
+After connecting to the socket, :program:`qemu-vmsr-helper` can
+optionally drop root privileges, except for those capabilities that
+are needed for its operation.
+
+:program:`qemu-vmsr-helper` can also use the systemd socket activation
+protocol. In this case, the systemd socket unit should specify a
+Unix stream socket, like this::
+
+ [Socket]
+ ListenStream=/var/run/qemu-vmsr-helper.sock
+
+Options
+-------
+
+.. program:: qemu-vmsr-helper
+
+.. option:: -d, --daemon
+
+ run in the background (and create a PID file)
+
+.. option:: -q, --quiet
+
+ decrease verbosity
+
+.. option:: -v, --verbose
+
+ increase verbosity
+
+.. option:: -f, --pidfile=PATH
+
+ PID file when running as a daemon. By default the PID file
+ is created in the system runtime state directory, for example
+ :file:`/var/run/qemu-vmsr-helper.pid`.
+
+.. option:: -k, --socket=PATH
+
+ path to the socket. By default the socket is created in
+ the system runtime state directory, for example
+ :file:`/var/run/qemu-vmsr-helper.sock`.
+
+.. option:: -T, --trace [[enable=]PATTERN][,events=FILE][,file=FILE]
+
+ .. include:: ../qemu-option-trace.rst.inc
+
+.. option:: -u, --user=USER
+
+ user to drop privileges to
+
+.. option:: -g, --group=GROUP
+
+ group to drop privileges to
+
+.. option:: -h, --help
+
+ Display a help message and exit.
+
+.. option:: -V, --version
+
+ Display version information and exit.
diff --git a/docs/tools/virtfs-proxy-helper.rst b/docs/tools/virtfs-proxy-helper.rst
deleted file mode 100644
index bd310eb..0000000
--- a/docs/tools/virtfs-proxy-helper.rst
+++ /dev/null
@@ -1,75 +0,0 @@
-QEMU 9p virtfs proxy filesystem helper
-======================================
-
-Synopsis
---------
-
-**virtfs-proxy-helper** [*OPTIONS*]
-
-Description
------------
-
-NOTE: The 9p 'proxy' backend is deprecated (since QEMU 8.1) and will be
-removed, along with this daemon, in a future version of QEMU!
-
-Pass-through security model in QEMU 9p server needs root privilege to do
-few file operations (like chown, chmod to any mode/uid:gid). There are two
-issues in pass-through security model:
-
-- TOCTTOU vulnerability: Following symbolic links in the server could
- provide access to files beyond 9p export path.
-
-- Running QEMU with root privilege could be a security issue.
-
-To overcome above issues, following approach is used: A new filesystem
-type 'proxy' is introduced. Proxy FS uses chroot + socket combination
-for securing the vulnerability known with following symbolic links.
-Intention of adding a new filesystem type is to allow qemu to run
-in non-root mode, but doing privileged operations using socket IO.
-
-Proxy helper (a stand alone binary part of qemu) is invoked with
-root privileges. Proxy helper chroots into 9p export path and creates
-a socket pair or a named socket based on the command line parameter.
-QEMU and proxy helper communicate using this socket. QEMU proxy fs
-driver sends filesystem request to proxy helper and receives the
-response from it.
-
-The proxy helper is designed so that it can drop root privileges except
-for the capabilities needed for doing filesystem operations.
-
-Options
--------
-
-The following options are supported:
-
-.. program:: virtfs-proxy-helper
-
-.. option:: -h
-
- Display help and exit
-
-.. option:: -p, --path PATH
-
- Path to export for proxy filesystem driver
-
-.. option:: -f, --fd SOCKET_ID
-
- Use given file descriptor as socket descriptor for communicating with
- qemu proxy fs drier. Usually a helper like libvirt will create
- socketpair and pass one of the fds as parameter to this option.
-
-.. option:: -s, --socket SOCKET_FILE
-
- Creates named socket file for communicating with qemu proxy fs driver
-
-.. option:: -u, --uid UID
-
- uid to give access to named socket file; used in combination with -g.
-
-.. option:: -g, --gid GID
-
- gid to give access to named socket file; used in combination with -u.
-
-.. option:: -n, --nodaemon
-
- Run as a normal program. By default program will run in daemon mode
diff --git a/docs/user/main.rst b/docs/user/main.rst
index e04bc2c..7a126ee 100644
--- a/docs/user/main.rst
+++ b/docs/user/main.rst
@@ -130,10 +130,6 @@ Other binaries
The binary format is detected automatically.
-- user mode (Cris)
-
- * ``qemu-cris`` TODO.
-
- user mode (i386)
* ``qemu-i386`` TODO.