diff options
Diffstat (limited to 'docs/devel')
61 files changed, 3946 insertions, 2043 deletions
diff --git a/docs/devel/atomics.rst b/docs/devel/atomics.rst index b77c6e1..95c7b77 100644 --- a/docs/devel/atomics.rst +++ b/docs/devel/atomics.rst @@ -204,7 +204,7 @@ They come in six kinds: before the second with respect to the other components of the system. Therefore, unlike ``smp_rmb()`` or ``qatomic_load_acquire()``, ``smp_read_barrier_depends()`` can be just a compiler barrier on - weakly-ordered architectures such as Arm or PPC[#]_. + weakly-ordered architectures such as Arm or PPC\ [#alpha]_. Note that the first load really has to have a _data_ dependency and not a control dependency. If the address for the second load is dependent @@ -212,7 +212,7 @@ They come in six kinds: than actually loading the address itself, then it's a _control_ dependency and a full read barrier or better is required. -.. [#] The DEC Alpha is an exception, because ``smp_read_barrier_depends()`` +.. [#alpha] The DEC Alpha is an exception, because ``smp_read_barrier_depends()`` needs a processor barrier. On strongly-ordered architectures such as x86 or s390, ``smp_rmb()`` and ``qatomic_load_acquire()`` can also be compiler barriers only. @@ -295,7 +295,7 @@ Acquire/release pairing and the *synchronizes-with* relation ------------------------------------------------------------ Atomic operations other than ``qatomic_set()`` and ``qatomic_read()`` have -either *acquire* or *release* semantics [#rmw]_. This has two effects: +either *acquire* or *release* semantics\ [#rmw]_. This has two effects: .. [#rmw] Read-modify-write operations can have both---acquire applies to the read part, and release to the write. diff --git a/docs/devel/blkdebug.txt b/docs/devel/blkdebug.txt deleted file mode 100644 index 0b0c128..0000000 --- a/docs/devel/blkdebug.txt +++ /dev/null @@ -1,162 +0,0 @@ -Block I/O error injection using blkdebug ----------------------------------------- -Copyright (C) 2014-2015 Red Hat Inc - -This work is licensed under the terms of the GNU GPL, version 2 or later. See -the COPYING file in the top-level directory. - -The blkdebug block driver is a rule-based error injection engine. It can be -used to exercise error code paths in block drivers including ENOSPC (out of -space) and EIO. - -This document gives an overview of the features available in blkdebug. - -Background ----------- -Block drivers have many error code paths that handle I/O errors. Image formats -are especially complex since metadata I/O errors during cluster allocation or -while updating tables happen halfway through request processing and require -discipline to keep image files consistent. - -Error injection allows test cases to trigger I/O errors at specific points. -This way, all error paths can be tested to make sure they are correct. - -Rules ------ -The blkdebug block driver takes a list of "rules" that tell the error injection -engine when to fail an I/O request. - -Each I/O request is evaluated against the rules. If a rule matches the request -then its "action" is executed. - -Rules can be placed in a configuration file; the configuration file -follows the same .ini-like format used by QEMU's -readconfig option, and -each section of the file represents a rule. - -The following configuration file defines a single rule: - - $ cat blkdebug.conf - [inject-error] - event = "read_aio" - errno = "28" - -This rule fails all aio read requests with ENOSPC (28). Note that the errno -value depends on the host. On Linux, see -/usr/include/asm-generic/errno-base.h for errno values. - -Invoke QEMU as follows: - - $ qemu-system-x86_64 - -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \ - -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0 - -Rules support the following attributes: - - event - which type of operation to match (e.g. read_aio, write_aio, - flush_to_os, flush_to_disk). See the "Events" section for - information on events. - - state - (optional) the engine must be in this state number in order for this - rule to match. See the "State transitions" section for information - on states. - - errno - the numeric errno value to return when a request matches this rule. - The errno values depend on the host since the numeric values are not - standardized in the POSIX specification. - - sector - (optional) a sector number that the request must overlap in order to - match this rule - - once - (optional, default "off") only execute this action on the first - matching request - - immediately - (optional, default "off") return a NULL BlockAIOCB - pointer and fail without an errno instead. This - exercises the code path where BlockAIOCB fails and the - caller's BlockCompletionFunc is not invoked. - -Events ------- -Block drivers provide information about the type of I/O request they are about -to make so rules can match specific types of requests. For example, the qcow2 -block driver tells blkdebug when it accesses the L1 table so rules can match -only L1 table accesses and not other metadata or guest data requests. - -The core events are: - - read_aio - guest data read - - write_aio - guest data write - - flush_to_os - write out unwritten block driver state (e.g. cached metadata) - - flush_to_disk - flush the host block device's disk cache - -See qapi/block-core.json:BlkdebugEvent for the full list of events. -You may need to grep block driver source code to understand the -meaning of specific events. - -State transitions ------------------ -There are cases where more power is needed to match a particular I/O request in -a longer sequence of requests. For example: - - write_aio - flush_to_disk - write_aio - -How do we match the 2nd write_aio but not the first? This is where state -transitions come in. - -The error injection engine has an integer called the "state" that always starts -initialized to 1. The state integer is internal to blkdebug and cannot be -observed from outside but rules can interact with it for powerful matching -behavior. - -Rules can be conditional on the current state and they can transition to a new -state. - -When a rule's "state" attribute is non-zero then the current state must equal -the attribute in order for the rule to match. - -For example, to match the 2nd write_aio: - - [set-state] - event = "write_aio" - state = "1" - new_state = "2" - - [inject-error] - event = "write_aio" - state = "2" - errno = "5" - -The first write_aio request matches the set-state rule and transitions from -state 1 to state 2. Once state 2 has been entered, the set-state rule no -longer matches since it requires state 1. But the inject-error rule now -matches the next write_aio request and injects EIO (5). - -State transition rules support the following attributes: - - event - which type of operation to match (e.g. read_aio, write_aio, - flush_to_os, flush_to_disk). See the "Events" section for - information on events. - - state - (optional) the engine must be in this state number in order for this - rule to match - - new_state - transition to this state number - -Suspend and resume ------------------- -Exercising code paths in block drivers may require specific ordering amongst -concurrent requests. The "breakpoint" feature allows requests to be halted on -a blkdebug event and resumed later. This makes it possible to achieve -deterministic ordering when multiple requests are in flight. - -Breakpoints on blkdebug events are associated with a user-defined "tag" string. -This tag serves as an identifier by which the request can be resumed at a later -point. - -See the qemu-io(1) break, resume, remove_break, and wait_break commands for -details. diff --git a/docs/devel/build-environment.rst b/docs/devel/build-environment.rst new file mode 100644 index 0000000..661f6ea --- /dev/null +++ b/docs/devel/build-environment.rst @@ -0,0 +1,118 @@ + +.. _setup-build-env: + +Setup build environment +======================= + +QEMU uses a lot of dependencies on the host system. glib2 is used everywhere in +the code base, and most of the other dependencies are optional. + +We present here simple instructions to enable native builds on most popular +systems. + +You can find additional instructions on `QEMU wiki <https://wiki.qemu.org/>`_: + +- `Linux <https://wiki.qemu.org/Hosts/Linux>`_ +- `MacOS <https://wiki.qemu.org/Hosts/Mac>`_ +- `Windows <https://wiki.qemu.org/Hosts/W32>`_ +- `BSD <https://wiki.qemu.org/Hosts/BSD>`_ + +Note: Installing dependencies using your package manager build dependencies may +miss out on deps that have been newly introduced in qemu.git. In more, it misses +deps the distribution has decided to exclude. + +Linux +----- + +Fedora +++++++ + +:: + + sudo dnf update && sudo dnf builddep qemu + +Debian/Ubuntu ++++++++++++++ + +You first need to enable `Sources List <https://wiki.debian.org/SourcesList>`_. +Then, use apt to install dependencies: + +:: + + sudo apt update && sudo apt build-dep qemu + +MacOS +----- + +You first need to install `Homebrew <https://brew.sh/>`_. Then, use it to +install dependencies: + +:: + + brew update && brew install $(brew deps --include-build qemu) + +Windows +------- + +You first need to install `MSYS2 <https://www.msys2.org/>`_. +MSYS2 offers `different environments <https://www.msys2.org/docs/environments/>`_. +x86_64 environments are based on GCC, while aarch64 is based on Clang. + +We recommend to use MINGW64 for windows-x86_64 and CLANGARM64 for windows-aarch64 +(only available on windows-aarch64 hosts). + +Then, you can open a windows shell, and enter msys2 env using: + +:: + + c:/msys64/msys2_shell.cmd -defterm -here -no-start -mingw64 + # Replace -ucrt64 by -clangarm64 or -ucrt64 for other environments. + +MSYS2 package manager does not offer a built-in way to install build +dependencies. You can start with this list of packages using pacman: + +Note: Dependencies need to be installed again if you use a different MSYS2 +environment. + +:: + + # update MSYS2 itself, you need to reopen your shell at the end. + pacman -Syu + pacman -S \ + base-devel binutils bison diffutils flex git grep make sed \ + ${MINGW_PACKAGE_PREFIX}-toolchain \ + ${MINGW_PACKAGE_PREFIX}-glib2 \ + ${MINGW_PACKAGE_PREFIX}-gtk3 \ + ${MINGW_PACKAGE_PREFIX}-libnfs \ + ${MINGW_PACKAGE_PREFIX}-libssh \ + ${MINGW_PACKAGE_PREFIX}-ninja \ + ${MINGW_PACKAGE_PREFIX}-pixman \ + ${MINGW_PACKAGE_PREFIX}-pkgconf \ + ${MINGW_PACKAGE_PREFIX}-python \ + ${MINGW_PACKAGE_PREFIX}-SDL2 \ + ${MINGW_PACKAGE_PREFIX}-zstd + +If you want to install all dependencies, it's possible to use recipe used to +build QEMU in MSYS2 itself. + +:: + + pacman -S wget base-devel git + wget https://raw.githubusercontent.com/msys2/MINGW-packages/refs/heads/master/mingw-w64-qemu/PKGBUILD + # Some packages may be missing for your environment, installation will still + # be done though. + makepkg --syncdeps --nobuild PKGBUILD || true + +Build on windows-aarch64 +++++++++++++++++++++++++ + +When trying to cross compile meson for x86_64 using UCRT64 or MINGW64 env, +configure will run into an error because the cpu detected is not correct. + +Meson detects x86_64 processes emulated, so you need to manually set the cpu, +and force a cross compilation (with empty prefix). + +:: + + ./configure --cpu=x86_64 --cross-prefix= + diff --git a/docs/devel/build-system.rst b/docs/devel/build-system.rst index 79eceb1..2c88419 100644 --- a/docs/devel/build-system.rst +++ b/docs/devel/build-system.rst @@ -134,7 +134,7 @@ in how the build process runs Python code. At this stage, ``configure`` also queries the chosen Python interpreter about QEMU's build dependencies. Note that the build process does *not* -look for ``meson``, ``sphinx-build`` or ``avocado`` binaries in the PATH; +look for ``meson`` or ``sphinx-build`` binaries in the PATH; likewise, there are no options such as ``--meson`` or ``--sphinx-build``. This avoids a potential mismatch, where Meson and Sphinx binaries on the PATH might operate in a different Python environment than the one chosen @@ -145,13 +145,13 @@ was installed in the ``site-packages`` directory of another interpreter, or with the wrong ``pip`` program. If a package is available for the chosen interpreter, ``configure`` -prepares a small script that invokes it from the venv itself[#distlib]_. +prepares a small script that invokes it from the venv itself\ [#distlib]_. If not, ``configure`` can also optionally install dependencies in the virtual environment with ``pip``, either from wheels in ``python/wheels`` or by downloading the package with PyPI. Downloading can be disabled with ``--disable-download``; and anyway, it only happens when a ``configure`` option (currently, only ``--enable-docs``) is explicitly enabled but -the dependencies are not present[#pip]_. +the dependencies are not present. .. [#distlib] The scripts are created based on the package's metadata, specifically the ``console_script`` entry points. This is the @@ -164,15 +164,11 @@ the dependencies are not present[#pip]_. because the Python Packaging Authority provides a package ``distlib.scripts`` to perform this task. -.. [#pip] ``pip`` might also be used when running ``make check-avocado`` - if downloading is enabled, to ensure that Avocado is - available. - The required versions of the packages are stored in a configuration file ``pythondeps.toml``. The format is custom to QEMU, but it is documented at the top of the file itself and it should be easy to understand. The requirements should make it possible to use the version that is packaged -that is provided by supported distros. +by QEMU's supported distros. When dependencies are downloaded, instead, ``configure`` uses a "known good" version that is also listed in ``pythondeps.toml``. In this @@ -260,7 +256,7 @@ Target-dependent emulator sourcesets: Each emulator also includes sources for files in the ``hw/`` and ``target/`` subdirectories. The subdirectory used for each emulator comes from the target's definition of ``TARGET_BASE_ARCH`` or (if missing) - ``TARGET_ARCH``, as found in ``default-configs/targets/*.mak``. + ``TARGET_ARCH``, as found in ``configs/targets/*.mak``. Each subdirectory in ``hw/`` adds one sourceset to the ``hw_arch`` dictionary, for example:: @@ -317,8 +313,8 @@ Utility sourcesets: The following files concur in the definition of which files are linked into each emulator: -``default-configs/devices/*.mak`` - The files under ``default-configs/devices/`` control the boards and devices +``configs/devices/*.mak`` + The files under ``configs/devices/`` control the boards and devices that are built into each QEMU system emulation targets. They merely contain a list of config variable definitions such as:: @@ -327,13 +323,13 @@ into each emulator: CONFIG_XLNX_VERSAL=y ``*/Kconfig`` - These files are processed together with ``default-configs/devices/*.mak`` and + These files are processed together with ``configs/devices/*.mak`` and describe the dependencies between various features, subsystems and device models. They are described in :ref:`kconfig` -``default-configs/targets/*.mak`` +``configs/targets/*.mak`` These files mostly define symbols that appear in the ``*-config-target.h`` - file for each emulator [#cfgtarget]_. However, the ``TARGET_ARCH`` + file for each emulator\ [#cfgtarget]_. However, the ``TARGET_ARCH`` and ``TARGET_BASE_ARCH`` will also be used to select the ``hw/`` and ``target/`` subdirectories that are compiled into each target. @@ -497,8 +493,7 @@ number of dynamically created files listed later. ``pyvenv/bin``, and calling ``pip`` to install dependencies. ``tests/Makefile.include`` - Rules for external test harnesses. These include the TCG tests - and the Avocado-based integration tests. + Rules for external test harnesses like the TCG tests. ``tests/docker/Makefile.include`` Rules for Docker tests. Like ``tests/Makefile.include``, this file is diff --git a/docs/devel/ci-definitions.rst.inc b/docs/devel/ci-definitions.rst.inc deleted file mode 100644 index 6d5c6fd..0000000 --- a/docs/devel/ci-definitions.rst.inc +++ /dev/null @@ -1,121 +0,0 @@ -Definition of terms -=================== - -This section defines the terms used in this document and correlates them with -what is currently used on QEMU. - -Automated tests ---------------- - -An automated test is written on a test framework using its generic test -functions/classes. The test framework can run the tests and report their -success or failure [1]_. - -An automated test has essentially three parts: - -1. The test initialization of the parameters, where the expected parameters, - like inputs and expected results, are set up; -2. The call to the code that should be tested; -3. An assertion, comparing the result from the previous call with the expected - result set during the initialization of the parameters. If the result - matches the expected result, the test has been successful; otherwise, it has - failed. - -Unit testing ------------- - -A unit test is responsible for exercising individual software components as a -unit, like interfaces, data structures, and functionality, uncovering errors -within the boundaries of a component. The verification effort is in the -smallest software unit and focuses on the internal processing logic and data -structures. A test case of unit tests should be designed to uncover errors due -to erroneous computations, incorrect comparisons, or improper control flow [2]_. - -On QEMU, unit testing is represented by the 'check-unit' target from 'make'. - -Functional testing ------------------- - -A functional test focuses on the functional requirement of the software. -Deriving sets of input conditions, the functional tests should fully exercise -all the functional requirements for a program. Functional testing is -complementary to other testing techniques, attempting to find errors like -incorrect or missing functions, interface errors, behavior errors, and -initialization and termination errors [3]_. - -On QEMU, functional testing is represented by the 'check-qtest' target from -'make'. - -System testing --------------- - -System tests ensure all application elements mesh properly while the overall -functionality and performance are achieved [4]_. Some or all system components -are integrated to create a complete system to be tested as a whole. System -testing ensures that components are compatible, interact correctly, and -transfer the right data at the right time across their interfaces. As system -testing focuses on interactions, use case-based testing is a practical approach -to system testing [5]_. Note that, in some cases, system testing may require -interaction with third-party software, like operating system images, databases, -networks, and so on. - -On QEMU, system testing is represented by the 'check-avocado' target from -'make'. - -Flaky tests ------------ - -A flaky test is defined as a test that exhibits both a passing and a failing -result with the same code on different runs. Some usual reasons for an -intermittent/flaky test are async wait, concurrency, and test order dependency -[6]_. - -Gating ------- - -A gate restricts the move of code from one stage to another on a -test/deployment pipeline. The step move is granted with approval. The approval -can be a manual intervention or a set of tests succeeding [7]_. - -On QEMU, the gating process happens during the pull request. The approval is -done by the project leader running its own set of tests. The pull request gets -merged when the tests succeed. - -Continuous Integration (CI) ---------------------------- - -Continuous integration (CI) requires the builds of the entire application and -the execution of a comprehensive set of automated tests every time there is a -need to commit any set of changes [8]_. The automated tests can be composed of -the unit, functional, system, and other tests. - -Keynotes about continuous integration (CI) [9]_: - -1. System tests may depend on external software (operating system images, - firmware, database, network). -2. It may take a long time to build and test. It may be impractical to build - the system being developed several times per day. -3. If the development platform is different from the target platform, it may - not be possible to run system tests in the developer’s private workspace. - There may be differences in hardware, operating system, or installed - software. Therefore, more time is required for testing the system. - -References ----------- - -.. [1] Sommerville, Ian (2016). Software Engineering. p. 233. -.. [2] Pressman, Roger S. & Maxim, Bruce R. (2020). Software Engineering, - A Practitioner’s Approach. p. 48, 376, 378, 381. -.. [3] Pressman, Roger S. & Maxim, Bruce R. (2020). Software Engineering, - A Practitioner’s Approach. p. 388. -.. [4] Pressman, Roger S. & Maxim, Bruce R. (2020). Software Engineering, - A Practitioner’s Approach. Software Engineering, p. 377. -.. [5] Sommerville, Ian (2016). Software Engineering. p. 59, 232, 240. -.. [6] Luo, Qingzhou, et al. An empirical analysis of flaky tests. - Proceedings of the 22nd ACM SIGSOFT International Symposium on - Foundations of Software Engineering. 2014. -.. [7] Humble, Jez & Farley, David (2010). Continuous Delivery: - Reliable Software Releases Through Build, Test, and Deployment, p. 122. -.. [8] Humble, Jez & Farley, David (2010). Continuous Delivery: - Reliable Software Releases Through Build, Test, and Deployment, p. 55. -.. [9] Sommerville, Ian (2016). Software Engineering. p. 743. diff --git a/docs/devel/ci.rst b/docs/devel/ci.rst deleted file mode 100644 index ed88a20..0000000 --- a/docs/devel/ci.rst +++ /dev/null @@ -1,14 +0,0 @@ -.. _ci: - -== -CI -== - -Most of QEMU's CI is run on GitLab's infrastructure although a number -of other CI services are used for specialised purposes. The most up to -date information about them and their status can be found on the -`project wiki testing page <https://wiki.qemu.org/Testing/CI>`_. - -.. include:: ci-definitions.rst.inc -.. include:: ci-jobs.rst.inc -.. include:: ci-runners.rst.inc diff --git a/docs/devel/clocks.rst b/docs/devel/clocks.rst index 177ee1c..3f744f2 100644 --- a/docs/devel/clocks.rst +++ b/docs/devel/clocks.rst @@ -358,6 +358,12 @@ humans (for instance in debugging), use ``clock_display_freq()``, which returns a prettified string-representation, e.g. "33.3 MHz". The caller must free the string with g_free() after use. +It's also possible to retrieve the clock period from a QTest by +accessing QOM property ``qtest-clock-period`` using a QMP command. +This property is only present when the device is being run under +the ``qtest`` accelerator; it is not available when QEMU is +being run normally. + Calculating expiry deadlines ---------------------------- diff --git a/docs/devel/code-provenance.rst b/docs/devel/code-provenance.rst new file mode 100644 index 0000000..b5aae2e --- /dev/null +++ b/docs/devel/code-provenance.rst @@ -0,0 +1,338 @@ +.. _code-provenance: + +Code provenance +=============== + +Certifying patch submissions +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The QEMU community **mandates** all contributors to certify provenance of +patch submissions they make to the project. To put it another way, +contributors must indicate that they are legally permitted to contribute to +the project. + +Certification is achieved with a low overhead by adding a single line to the +bottom of every git commit:: + + Signed-off-by: YOUR NAME <YOUR@EMAIL> + +The addition of this line asserts that the author of the patch is contributing +in accordance with the clauses specified in the +`Developer's Certificate of Origin <https://developercertificate.org>`__: + +.. _dco: + + Developer's Certificate of Origin 1.1 + + By making a contribution to this project, I certify that: + + (a) The contribution was created in whole or in part by me and I + have the right to submit it under the open source license + indicated in the file; or + + (b) The contribution is based upon previous work that, to the best + of my knowledge, is covered under an appropriate open source + license and I have the right under that license to submit that + work with modifications, whether created in whole or in part + by me, under the same open source license (unless I am + permitted to submit under a different license), as indicated + in the file; or + + (c) The contribution was provided directly to me by some other + person who certified (a), (b) or (c) and I have not modified + it. + + (d) I understand and agree that this project and the contribution + are public and that a record of the contribution (including all + personal information I submit with it, including my sign-off) is + maintained indefinitely and may be redistributed consistent with + this project or the open source license(s) involved. + +The name used with "Signed-off-by" does not need to be your legal name, nor +birth name, nor appear on any government ID. It is the identity you choose to +be known by in the community, but should not be anonymous, nor misrepresent +whom you are. + +It is generally expected that the name and email addresses used in one of the +``Signed-off-by`` lines, matches that of the git commit ``Author`` field. +It's okay if you subscribe or contribute to the list via more than one +address, but using multiple addresses in one commit just confuses +things. + +If the person sending the mail is not one of the patch authors, they are +nonetheless expected to add their own ``Signed-off-by`` to comply with the +DCO clause (c). + +Multiple authorship +~~~~~~~~~~~~~~~~~~~ + +It is not uncommon for a patch to have contributions from multiple authors. In +this scenario, git commits will usually be expected to have a ``Signed-off-by`` +line for each contributor involved in creation of the patch. Some edge cases: + + * The non-primary author's contributions were so trivial that they can be + considered not subject to copyright. In this case the secondary authors + need not include a ``Signed-off-by``. + + This case most commonly applies where QEMU reviewers give short snippets + of code as suggested fixes to a patch. The reviewers don't need to have + their own ``Signed-off-by`` added unless their code suggestion was + unusually large, but it is common to add ``Suggested-by`` as a credit + for non-trivial code. + + * Both contributors work for the same employer and the employer requires + copyright assignment. + + It can be said that in this case a ``Signed-off-by`` is indicating that + the person has permission to contribute from their employer who is the + copyright holder. It is nonetheless still preferable to include a + ``Signed-off-by`` for each contributor, as in some countries employees are + not able to assign copyright to their employer, and it also covers any + time invested outside working hours. + +When multiple ``Signed-off-by`` tags are present, they should be strictly kept +in order of authorship, from oldest to newest. + +Other commit tags +~~~~~~~~~~~~~~~~~ + +While the ``Signed-off-by`` tag is mandatory, there are a number of other tags +that are commonly used during QEMU development: + + * **``Reviewed-by``**: when a QEMU community member reviews a patch on the + mailing list, if they consider the patch acceptable, they should send an + email reply containing a ``Reviewed-by`` tag. Subsystem maintainers who + review a patch should add this even if they are also adding their + ``Signed-off-by`` to the same commit. + + * **``Acked-by``**: when a QEMU subsystem maintainer approves a patch that + touches their subsystem, but intends to allow a different maintainer to + queue it and send a pull request, they would send a mail containing a + ``Acked-by`` tag. Where a patch touches multiple subsystems, ``Acked-by`` + only implies review of the maintainers' own areas of responsibility. If a + maintainer wants to indicate they have done a full review they should use + a ``Reviewed-by`` tag. + + * **``Tested-by``**: when a QEMU community member has functionally tested the + behaviour of the patch in some manner, they should send an email reply + containing a ``Tested-by`` tag. + + * **``Reported-by``**: when a QEMU community member reports a problem via the + mailing list, or some other informal channel that is not the issue tracker, + it is good practice to credit them by including a ``Reported-by`` tag on + any patch fixing the issue. When the problem is reported via the GitLab + issue tracker, however, it is sufficient to just include a link to the + issue. + + * **``Suggested-by``**: when a reviewer or other 3rd party makes non-trivial + suggestions for how to change a patch, it is good practice to credit them + by including a ``Suggested-by`` tag. + +Subsystem maintainer requirements +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +When a subsystem maintainer accepts a patch from a contributor, in addition to +the normal code review points, they are expected to validate the presence of +suitable ``Signed-off-by`` tags. + +At the time they queue the patch in their subsystem tree, the maintainer +**must** also then add their own ``Signed-off-by`` to indicate that they have +done the aforementioned validation. This is in addition to any of their own +``Reviewed-by`` tags the subsystem maintainer may wish to include. + +When the maintainer modifies the patch after pulling into their tree, they +should record their contribution. This is typically done via a note in the +commit message, just prior to the maintainer's ``Signed-off-by``:: + + Signed-off-by: Cory Contributor <cory.contributor@example.com> + [Comment rephrased for clarity] + Signed-off-by: Mary Maintainer <mary.maintainer@mycorp.test> + + +Tools for adding ``Signed-off-by`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +There are a variety of ways tools can support adding ``Signed-off-by`` tags +for patches, avoiding the need for contributors to manually type in this +repetitive text each time. + +git commands +^^^^^^^^^^^^ + +When creating, or amending, a commit the ``-s`` flag to ``git commit`` will +append a suitable line matching the configured git author details. + +If preparing patches using the ``git format-patch`` tool, the ``-s`` flag can +be used to append a suitable line in the emails it creates, without modifying +the local commits. Alternatively to modify all the local commits on a branch:: + + git rebase master -x 'git commit --amend --no-edit -s' + +emacs +^^^^^ + +In the file ``$HOME/.emacs.d/abbrev_defs`` add: + +.. code:: elisp + + (define-abbrev-table 'global-abbrev-table + '( + ("8rev" "Reviewed-by: YOUR NAME <your@email.addr>" nil 1) + ("8ack" "Acked-by: YOUR NAME <your@email.addr>" nil 1) + ("8test" "Tested-by: YOUR NAME <your@email.addr>" nil 1) + ("8sob" "Signed-off-by: YOUR NAME <your@email.addr>" nil 1) + )) + +with this change, if you type (for example) ``8rev`` followed by ``<space>`` +or ``<enter>`` it will expand to the whole phrase. + +vim +^^^ + +In the file ``$HOME/.vimrc`` add:: + + iabbrev 8rev Reviewed-by: YOUR NAME <your@email.addr> + iabbrev 8ack Acked-by: YOUR NAME <your@email.addr> + iabbrev 8test Tested-by: YOUR NAME <your@email.addr> + iabbrev 8sob Signed-off-by: YOUR NAME <your@email.addr> + +with this change, if you type (for example) ``8rev`` followed by ``<space>`` +or ``<enter>`` it will expand to the whole phrase. + +Re-starting abandoned work +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +For a variety of reasons there are some patches that get submitted to QEMU but +never merged. An unrelated contributor may decide (months or years later) to +continue working from the abandoned patch and re-submit it with extra changes. + +The general principles when picking up abandoned work are: + + * Continue to credit the original author for their work, by maintaining their + original ``Signed-off-by`` + * Indicate where the original patch was obtained from (mailing list, bug + tracker, author's git repo, etc) when sending it for review + * Acknowledge the extra work of the new contributor by including their + ``Signed-off-by`` in the patch in addition to the orignal author's + * Indicate who is responsible for what parts of the patch. This is typically + done via a note in the commit message, just prior to the new contributor's + ``Signed-off-by``:: + + Signed-off-by: Some Person <some.person@example.com> + [Rebased and added support for 'foo'] + Signed-off-by: New Person <new.person@mycorp.test> + +In complicated cases, or if otherwise unsure, ask for advice on the project +mailing list. + +It is also recommended to attempt to contact the original author to let them +know you are interested in taking over their work, in case they still intended +to return to the work, or had any suggestions about the best way to continue. + +Inclusion of generated files +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Files in patches contributed to QEMU are generally expected to be provided +only in the preferred format for making modifications. The implication of +this is that the output of code generators or compilers is usually not +appropriate to contribute to QEMU. + +For reasons of practicality there are some exceptions to this rule, where +generated code is permitted, provided it is also accompanied by the +corresponding preferred source format. This is done where it is impractical +to expect those building QEMU to run the code generation or compilation +process. A non-exhaustive list of examples is: + + * Images: where an bitmap image is created from a vector file it is common + to include the rendered bitmaps at desired resolution(s), since subtle + changes in the rasterization process / tools may affect quality. The + original vector file is expected to accompany any generated bitmaps. + + * Firmware: QEMU includes pre-compiled binary ROMs for a variety of guest + firmwares. When such binary ROMs are contributed, the corresponding source + must also be provided, either directly, or through a git submodule link. + + * Dockerfiles: the majority of the dockerfiles are automatically generated + from a canonical list of build dependencies maintained in tree, together + with the libvirt-ci git submodule link. The generated dockerfiles are + included in tree because it is desirable to be able to directly build + container images from a clean git checkout. + + * eBPF: QEMU includes some generated eBPF machine code, since the required + eBPF compilation tools are not broadly available on all targetted OS + distributions. The corresponding eBPF C code for the binary is also + provided. This is a time-limited exception until the eBPF toolchain is + sufficiently broadly available in distros. + +In all cases above, the existence of generated files must be acknowledged +and justified in the commit that introduces them. + +Tools which perform changes to existing code with deterministic algorithmic +manipulation, driven by user specified inputs, are not generally considered +to be "generators". + +For instance, using Coccinelle to convert code from one pattern to another +pattern, or fixing documentation typos with a spell checker, or transforming +code using sed / awk / etc, are not considered to be acts of code +generation. Where an automated manipulation is performed on code, however, +this should be declared in the commit message. + +At times contributors may use or create scripts/tools to generate an initial +boilerplate code template which is then filled in to produce the final patch. +The output of such a tool would still be considered the "preferred format", +since it is intended to be a foundation for further human authored changes. +Such tools are acceptable to use, provided there is clearly defined copyright +and licensing for their output. Note in particular the caveats applying to AI +content generators below. + +Use of AI content generators +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +TL;DR: + + **Current QEMU project policy is to DECLINE any contributions which are + believed to include or derive from AI generated content. This includes + ChatGPT, Claude, Copilot, Llama and similar tools.** + +The increasing prevalence of AI-assisted software development results in a +number of difficult legal questions and risks for software projects, including +QEMU. Of particular concern is content generated by `Large Language Models +<https://en.wikipedia.org/wiki/Large_language_model>`__ (LLMs). + +The QEMU community requires that contributors certify their patch submissions +are made in accordance with the rules of the `Developer's Certificate of +Origin (DCO) <dco>`. + +To satisfy the DCO, the patch contributor has to fully understand the +copyright and license status of content they are contributing to QEMU. With AI +content generators, the copyright and license status of the output is +ill-defined with no generally accepted, settled legal foundation. + +Where the training material is known, it is common for it to include large +volumes of material under restrictive licensing/copyright terms. Even where +the training material is all known to be under open source licenses, it is +likely to be under a variety of terms, not all of which will be compatible +with QEMU's licensing requirements. + +How contributors could comply with DCO terms (b) or (c) for the output of AI +content generators commonly available today is unclear. The QEMU project is +not willing or able to accept the legal risks of non-compliance. + +The QEMU project thus requires that contributors refrain from using AI content +generators on patches intended to be submitted to the project, and will +decline any contribution if use of AI is either known or suspected. + +This policy does not apply to other uses of AI, such as researching APIs or +algorithms, static analysis, or debugging, provided their output is not to be +included in contributions. + +Examples of tools impacted by this policy includes GitHub's CoPilot, OpenAI's +ChatGPT, Anthropic's Claude, and Meta's Code Llama, and code/content +generation agents which are built on top of such tools. + +This policy may evolve as AI tools mature and the legal situation is +clarifed. In the meanwhile, requests for exceptions to this policy will be +evaluated by the QEMU project on a case by case basis. To be granted an +exception, a contributor will need to demonstrate clarity of the license and +copyright status for the tool's output in relation to its training model and +code, to the satisfaction of the project maintainers. diff --git a/docs/devel/codebase.rst b/docs/devel/codebase.rst new file mode 100644 index 0000000..2a31437 --- /dev/null +++ b/docs/devel/codebase.rst @@ -0,0 +1,215 @@ +======== +Codebase +======== + +This section presents the various parts of QEMU and how the codebase is +organized. + +Beyond giving succinct descriptions, the goal is to offer links to various +parts of the documentation/codebase. + +Subsystems +---------- + +An exhaustive list of subsystems and associated files can be found in the +`MAINTAINERS <https://gitlab.com/qemu-project/qemu/-/blob/master/MAINTAINERS>`_ +file. + +Some of the main QEMU subsystems are: + +- `Accelerators<Accelerators>` +- Block devices and `disk images<disk images>` support +- `CI<ci>` and `Tests<testing>` +- `Devices<device-emulation>` & Board models +- `Documentation <documentation-root>` +- `GDB support<GDB usage>` +- :ref:`Migration<migration>` +- `Monitor<QEMU monitor>` +- :ref:`QOM (QEMU Object Model)<qom>` +- `System mode<System emulation>` +- :ref:`TCG (Tiny Code Generator)<tcg>` +- `User mode<user-mode>` (`Linux<linux-user-mode>` & `BSD<bsd-user-mode>`) +- User Interfaces + +More documentation on QEMU subsystems can be found on :ref:`internal-subsystem` +page. + +The Grand tour +-------------- + +We present briefly here what every folder in the top directory of the codebase +contains. Hop on! + +The folder name links here will take you to that folder in our gitlab +repository. Other links will take you to more detailed documentation for that +subsystem, where we have it. Unfortunately not every subsystem has documentation +yet, so sometimes the source code is all you have. + +* `accel <https://gitlab.com/qemu-project/qemu/-/tree/master/accel>`_: + Infrastructure and architecture agnostic code related to the various + `accelerators <Accelerators>` supported by QEMU + (TCG, KVM, hvf, whpx, xen, nvmm). + Contains interfaces for operations that will be implemented per + `target <https://gitlab.com/qemu-project/qemu/-/tree/master/target>`_. +* `audio <https://gitlab.com/qemu-project/qemu/-/tree/master/audio>`_: + Audio (host) support. +* `authz <https://gitlab.com/qemu-project/qemu/-/tree/master/authz>`_: + `QEMU Authorization framework<client authorization>`. +* `backends <https://gitlab.com/qemu-project/qemu/-/tree/master/backends>`_: + Various backends that are used to access resources on the host (e.g. for + random number generation, memory backing or cryptographic functions). +* `block <https://gitlab.com/qemu-project/qemu/-/tree/master/block>`_: + Block devices and `image formats<disk images>` implementation. +* `bsd-user <https://gitlab.com/qemu-project/qemu/-/tree/master/bsd-user>`_: + `BSD User mode<bsd-user-mode>`. +* build: Where the code built goes by default. You can tell the QEMU build + system to put the built code anywhere else you like. +* `chardev <https://gitlab.com/qemu-project/qemu/-/tree/master/chardev>`_: + Various backends used by char devices. +* `common-user <https://gitlab.com/qemu-project/qemu/-/tree/master/common-user>`_: + User-mode assembly code for dealing with signals occurring during syscalls. +* `configs <https://gitlab.com/qemu-project/qemu/-/tree/master/configs>`_: + Makefiles defining configurations to build QEMU. +* `contrib <https://gitlab.com/qemu-project/qemu/-/tree/master/contrib>`_: + Community contributed devices/plugins/tools. +* `crypto <https://gitlab.com/qemu-project/qemu/-/tree/master/crypto>`_: + Cryptographic algorithms used in QEMU. +* `disas <https://gitlab.com/qemu-project/qemu/-/tree/master/disas>`_: + Disassembly functions used by QEMU target code. +* `docs <https://gitlab.com/qemu-project/qemu/-/tree/master/docs>`_: + QEMU Documentation. +* `dump <https://gitlab.com/qemu-project/qemu/-/tree/master/dump>`_: + Code to dump memory of a running VM. +* `ebpf <https://gitlab.com/qemu-project/qemu/-/tree/master/ebpf>`_: + eBPF program support in QEMU. `virtio-net RSS<ebpf-rss>` uses it. +* `fpu <https://gitlab.com/qemu-project/qemu/-/tree/master/fpu>`_: + Floating-point software emulation. +* `fsdev <https://gitlab.com/qemu-project/qemu/-/tree/master/fsdev>`_: + `VirtFS <https://www.linux-kvm.org/page/VirtFS>`_ support. +* `gdbstub <https://gitlab.com/qemu-project/qemu/-/tree/master/gdbstub>`_: + `GDB <GDB usage>` support. +* `gdb-xml <https://gitlab.com/qemu-project/qemu/-/tree/master/gdb-xml>`_: + Set of XML files describing architectures and used by `gdbstub <GDB usage>`. +* `host <https://gitlab.com/qemu-project/qemu/-/tree/master/host>`_: + Various architecture specific header files (crypto, atomic, memory + operations). +* `linux-headers <https://gitlab.com/qemu-project/qemu/-/tree/master/linux-headers>`_: + A subset of headers imported from Linux kernel and used for implementing + KVM support and user-mode. +* `linux-user <https://gitlab.com/qemu-project/qemu/-/tree/master/linux-user>`_: + `User mode <user-mode>` implementation. Contains one folder per target + architecture. +* `.gitlab-ci.d <https://gitlab.com/qemu-project/qemu/-/tree/master/.gitlab-ci.d>`_: + `CI <ci>` yaml and scripts. +* `include <https://gitlab.com/qemu-project/qemu/-/tree/master/include>`_: + All headers associated to different subsystems in QEMU. The hierarchy used + mirrors source code organization and naming. +* `hw <https://gitlab.com/qemu-project/qemu/-/tree/master/hw>`_: + `Devices <device-emulation>` and boards emulation. Devices are categorized by + type/protocol/architecture and located in associated subfolder. +* `io <https://gitlab.com/qemu-project/qemu/-/tree/master/io>`_: + QEMU `I/O channels <https://lists.gnu.org/archive/html/qemu-devel/2015-11/msg04208.html>`_. +* `libdecnumber <https://gitlab.com/qemu-project/qemu/-/tree/master/libdecnumber>`_: + Import of gcc library, used to implement decimal number arithmetic. +* `migration <https://gitlab.com/qemu-project/qemu/-/tree/master/migration>`__: + :ref:`Migration framework <migration>`. +* `monitor <https://gitlab.com/qemu-project/qemu/-/tree/master/monitor>`_: + `Monitor <QEMU monitor>` implementation (HMP & QMP). +* `nbd <https://gitlab.com/qemu-project/qemu/-/tree/master/nbd>`_: + QEMU NBD (Network Block Device) server. +* `net <https://gitlab.com/qemu-project/qemu/-/tree/master/net>`_: + Network (host) support. +* `pc-bios <https://gitlab.com/qemu-project/qemu/-/tree/master/pc-bios>`_: + Contains pre-built firmware binaries and boot images, ready to use in + QEMU without compilation. +* `plugins <https://gitlab.com/qemu-project/qemu/-/tree/master/plugins>`_: + :ref:`TCG plugins <tcg-plugins>` core implementation. Plugins can be found in + `tests <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/tcg/plugins>`__ + and `contrib <https://gitlab.com/qemu-project/qemu/-/tree/master/contrib/plugins>`__ + folders. +* `po <https://gitlab.com/qemu-project/qemu/-/tree/master/po>`_: + Translation files. +* `python <https://gitlab.com/qemu-project/qemu/-/tree/master/python>`_: + Python part of our build/test system. +* `qapi <https://gitlab.com/qemu-project/qemu/-/tree/master/qapi>`_: + `QAPI <qapi>` implementation. +* `qobject <https://gitlab.com/qemu-project/qemu/-/tree/master/qobject>`_: + QEMU Object implementation. +* `qga <https://gitlab.com/qemu-project/qemu/-/tree/master/qga>`_: + QEMU `Guest agent <qemu-ga>` implementation. +* `qom <https://gitlab.com/qemu-project/qemu/-/tree/master/qom>`_: + QEMU :ref:`Object model <qom>` implementation, with monitor associated commands. +* `replay <https://gitlab.com/qemu-project/qemu/-/tree/master/replay>`_: + QEMU :ref:`Record/replay <replay>` implementation. +* `roms <https://gitlab.com/qemu-project/qemu/-/tree/master/roms>`_: + Contains source code for various firmware and ROMs, which can be compiled if + custom or updated versions are needed. +* `rust <https://gitlab.com/qemu-project/qemu/-/tree/master/rust>`_: + Rust integration in QEMU. It contains the new interfaces defined and + associated devices using it. +* `scripts <https://gitlab.com/qemu-project/qemu/-/tree/master/scripts>`_: + Collection of scripts used in build and test systems, and various + tools for QEMU codebase and execution traces. +* `scsi <https://gitlab.com/qemu-project/qemu/-/tree/master/scsi>`_: + Code related to SCSI support, used by SCSI devices. +* `semihosting <https://gitlab.com/qemu-project/qemu/-/tree/master/semihosting>`_: + QEMU `Semihosting <Semihosting>` implementation. +* `stats <https://gitlab.com/qemu-project/qemu/-/tree/master/stats>`_: + `Monitor <QEMU monitor>` stats commands implementation. +* `storage-daemon <https://gitlab.com/qemu-project/qemu/-/tree/master/storage-daemon>`_: + QEMU `Storage daemon <storage-daemon>` implementation. +* `stubs <https://gitlab.com/qemu-project/qemu/-/tree/master/stubs>`_: + Various stubs (empty functions) used to compile QEMU with specific + configurations. +* `subprojects <https://gitlab.com/qemu-project/qemu/-/tree/master/subprojects>`_: + QEMU submodules used by QEMU build system. +* `system <https://gitlab.com/qemu-project/qemu/-/tree/master/system>`_: + QEMU `system mode <System emulation>` implementation (cpu, mmu, boot support). +* `target <https://gitlab.com/qemu-project/qemu/-/tree/master/target>`_: + Contains code for all target architectures supported (one subfolder + per arch). For every architecture, you can find accelerator specific + implementations. +* `tcg <https://gitlab.com/qemu-project/qemu/-/tree/master/tcg>`_: + :ref:`TCG <tcg>` related code. + Contains one subfolder per host supported architecture. +* `tests <https://gitlab.com/qemu-project/qemu/-/tree/master/tests>`_: + QEMU `test <testing>` suite + + - `data <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/data>`_: + Data for various tests. + - `decode <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/decode>`_: + Testsuite for :ref:`decodetree <decodetree>` implementation. + - `docker <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/docker>`_: + Code and scripts to create `containers <container-ref>` used in `CI <ci>`. + - `fp <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/fp>`_: + QEMU testsuite for soft float implementation. + - `functional <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/functional>`_: + `Functional tests <checkfunctional-ref>` (full VM boot). + - `lcitool <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/lcitool>`_: + Generate dockerfiles for CI containers. + - `migration <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/migration>`_: + Test scripts and data for :ref:`Migration framework <migration>`. + - `multiboot <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/multiboot>`_: + Test multiboot functionality for x86_64/i386. + - `qapi-schema <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/qapi-schema>`_: + Test scripts and data for `QAPI <qapi-tests>`. + - `qemu-iotests <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/qemu-iotests>`_: + `Disk image and block tests <qemu-iotests>`. + - `qtest <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/qtest>`_: + `Device emulation testing <qtest>`. + - `tcg <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/tcg>`__: + `TCG related tests <checktcg-ref>`. Contains code per architecture + (subfolder) and multiarch tests as well. + - `tsan <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/tsan>`_: + `Suppressions <tsan-suppressions>` for thread sanitizer. + - `uefi-test-tools <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/uefi-test-tools>`_: + Test tool for UEFI support. + - `unit <https://gitlab.com/qemu-project/qemu/-/tree/master/tests/unit>`_: + QEMU `Unit tests <unit-tests>`. +* `trace <https://gitlab.com/qemu-project/qemu/-/tree/master/trace>`_: + :ref:`Tracing framework <tracing>`. Used to print information associated to various + events during execution. +* `ui <https://gitlab.com/qemu-project/qemu/-/tree/master/ui>`_: + QEMU User interfaces. +* `util <https://gitlab.com/qemu-project/qemu/-/tree/master/util>`_: + Utility code used by other parts of QEMU. diff --git a/docs/devel/control-flow-integrity.rst b/docs/devel/control-flow-integrity.rst index e6b73a4..3d5702f 100644 --- a/docs/devel/control-flow-integrity.rst +++ b/docs/devel/control-flow-integrity.rst @@ -1,3 +1,5 @@ +.. _cfi: + ============================ Control-Flow Integrity (CFI) ============================ diff --git a/docs/devel/crypto.rst b/docs/devel/crypto.rst new file mode 100644 index 0000000..39b1c91 --- /dev/null +++ b/docs/devel/crypto.rst @@ -0,0 +1,10 @@ +.. _crypto-ref: + +==================== +Cryptography in QEMU +==================== + +.. toctree:: + :maxdepth: 2 + + luks-detached-header diff --git a/docs/devel/decodetree.rst b/docs/devel/decodetree.rst index e3392aa..98ad33a 100644 --- a/docs/devel/decodetree.rst +++ b/docs/devel/decodetree.rst @@ -1,3 +1,5 @@ +.. _decodetree: + ======================== Decodetree Specification ======================== diff --git a/docs/devel/ebpf_rss.rst b/docs/devel/ebpf_rss.rst index 4a68682..ed5d337 100644 --- a/docs/devel/ebpf_rss.rst +++ b/docs/devel/ebpf_rss.rst @@ -1,3 +1,5 @@ +.. _ebpf-rss: + =========================== eBPF RSS virtio-net support =========================== diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst index fe01b2b..1c487c1 100644 --- a/docs/devel/index-api.rst +++ b/docs/devel/index-api.rst @@ -9,6 +9,7 @@ generated from in-code annotations to function prototypes. bitops loads-stores + lockcnt memory modules pci diff --git a/docs/devel/index-build.rst b/docs/devel/index-build.rst index 90b406c..3f3cb21 100644 --- a/docs/devel/index-build.rst +++ b/docs/devel/index-build.rst @@ -1,20 +1,16 @@ -QEMU Build and Test System --------------------------- +QEMU Build System +----------------- -Details about how QEMU's build system works and how it is integrated -into our testing infrastructure. You will need to understand some of -the basics if you are adding new files and targets to the build. +Details about how QEMU's build system works. You will need to understand +some of the basics if you are adding new files and targets to the build. .. toctree:: :maxdepth: 3 build-system + build-environment kconfig docs - testing - acpi-bits - qtest - ci qapi-code-gen - fuzzing + qapi-domain control-flow-integrity diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst index 5636e9c..7a0678c 100644 --- a/docs/devel/index-internals.rst +++ b/docs/devel/index-internals.rst @@ -1,3 +1,5 @@ +.. _internal-subsystem: + Internal Subsystem Information ------------------------------ @@ -8,6 +10,7 @@ Details about QEMU's various subsystems including how to add features to them. qom atomics + rcu block-coroutine-wrapper clocks ebpf_rss @@ -17,6 +20,9 @@ Details about QEMU's various subsystems including how to add features to them. s390-cpu-topology s390-dasd-ipl tracing + uefi-vars vfio-iommufd writing-monitor-commands virtio-backends + crypto + multiple-iothreads diff --git a/docs/devel/index-process.rst b/docs/devel/index-process.rst index 362f97e..5807752 100644 --- a/docs/devel/index-process.rst +++ b/docs/devel/index-process.rst @@ -13,7 +13,9 @@ Notes about how to interact with the community and how and where to submit patch maintainers style submitting-a-patch + code-provenance trivial-patches stable-process submitting-a-pull-request secure-coding-practices + rust diff --git a/docs/devel/index.rst b/docs/devel/index.rst index abf6045..29f032d 100644 --- a/docs/devel/index.rst +++ b/docs/devel/index.rst @@ -31,6 +31,8 @@ the :ref:`tcg_internals`. index-process index-build + testing/index index-api index-internals index-tcg + codebase diff --git a/docs/devel/kconfig.rst b/docs/devel/kconfig.rst index 52d4b90..493b76c 100644 --- a/docs/devel/kconfig.rst +++ b/docs/devel/kconfig.rst @@ -38,7 +38,7 @@ originated in the Linux kernel, though it was heavily simplified and the handling of dependencies is stricter in QEMU. Unlike Linux, there is no user interface to edit the configuration, which -is instead specified in per-target files under the ``default-configs/`` +is instead specified in per-target files under the ``configs/`` directory of the QEMU source tree. This is because, unlike Linux, configuration and dependencies can be treated as a black box when building QEMU; the default configuration that QEMU ships with should be okay in @@ -103,7 +103,7 @@ directives can be included: **default value**: ``default <value> [if <expr>]`` Default values are assigned to the config symbol if no other value was - set by the user via ``default-configs/*.mak`` files, and only if + set by the user via ``configs/*.mak`` files, and only if ``select`` or ``depends on`` directives do not force the value to true or false respectively. ``<value>`` can be ``y`` or ``n``; it cannot be an arbitrary Boolean expression. However, a condition for applying @@ -119,7 +119,7 @@ directives can be included: This is similar to ``select`` as it applies a lower limit of ``y`` to another symbol. However, the lower limit is only a default and the "implied" symbol's value may still be set to ``n`` from a - ``default-configs/*.mak`` files. The following two examples are + ``configs/*.mak`` files. The following two examples are equivalent:: config FOO @@ -146,7 +146,7 @@ declares its dependencies in different ways: bool Subsystems always default to false (they have no ``default`` directive) - and are never visible in ``default-configs/*.mak`` files. It's + and are never visible in ``configs/*.mak`` files. It's up to other symbols to ``select`` whatever subsystems they require. They sometimes have ``select`` directives to bring in other required @@ -238,7 +238,7 @@ declares its dependencies in different ways: include libraries (such as ``FDT``) or ``TARGET_BIG_ENDIAN`` (possibly negated). - Boards are listed for convenience in the ``default-configs/*.mak`` + Boards are listed for convenience in the ``configs/*.mak`` for the target they apply to. **internal elements** @@ -251,18 +251,18 @@ declares its dependencies in different ways: Internal elements group code that is useful in several boards or devices. They are usually enabled with ``select`` and in turn select - other elements; they are never visible in ``default-configs/*.mak`` + other elements; they are never visible in ``configs/*.mak`` files, and often not even in the Makefile. Writing and modifying default configurations -------------------------------------------- In addition to the Kconfig files under hw/, each target also includes -a file called ``default-configs/TARGETNAME-softmmu.mak``. These files +a file called ``configs/TARGETNAME-softmmu.mak``. These files initialize some Kconfig variables to non-default values and provide the starting point to turn on devices and subsystems. -A file in ``default-configs/`` looks like the following example:: +A file in ``configs/`` looks like the following example:: # Default configuration for alpha-softmmu diff --git a/docs/devel/loads-stores.rst b/docs/devel/loads-stores.rst index ec627aa..9471bac 100644 --- a/docs/devel/loads-stores.rst +++ b/docs/devel/loads-stores.rst @@ -95,7 +95,7 @@ guest CPU state in case of a guest CPU exception. This is passed to ``cpu_restore_state()``. Therefore the value should either be 0, to indicate that the guest CPU state is already synchronized, or the result of ``GETPC()`` from the top level ``HELPER(foo)`` -function, which is a return address into the generated code [#gpc]_. +function, which is a return address into the generated code\ [#gpc]_. .. [#gpc] Note that ``GETPC()`` should be used with great care: calling it in other functions that are *not* the top level diff --git a/docs/devel/lockcnt.txt b/docs/devel/lockcnt.rst index a3fb3bc..8b43578 100644 --- a/docs/devel/lockcnt.txt +++ b/docs/devel/lockcnt.rst @@ -1,9 +1,9 @@ -DOCUMENTATION FOR LOCKED COUNTERS (aka QemuLockCnt) -=================================================== +Locked Counters (aka ``QemuLockCnt``) +===================================== QEMU often uses reference counts to track data structures that are being accessed and should not be freed. For example, a loop that invoke -callbacks like this is not safe: +callbacks like this is not safe:: QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { if (ioh->revents & G_IO_OUT) { @@ -11,11 +11,11 @@ callbacks like this is not safe: } } -QLIST_FOREACH_SAFE protects against deletion of the current node (ioh) -by stashing away its "next" pointer. However, ioh->fd_write could +``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``) +by stashing away its ``next`` pointer. However, ``ioh->fd_write`` could actually delete the next node from the list. The simplest way to avoid this is to mark the node as deleted, and remove it from the -list in the above loop: +list in the above loop:: QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { if (ioh->deleted) { @@ -29,7 +29,7 @@ list in the above loop: } If however this loop must also be reentrant, i.e. it is possible that -ioh->fd_write invokes the loop again, some kind of counting is needed: +``ioh->fd_write`` invokes the loop again, some kind of counting is needed:: walking_handlers++; QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) { @@ -46,8 +46,8 @@ ioh->fd_write invokes the loop again, some kind of counting is needed: } walking_handlers--; -One may think of using the RCU primitives, rcu_read_lock() and -rcu_read_unlock(); effectively, the RCU nesting count would take +One may think of using the RCU primitives, ``rcu_read_lock()`` and +``rcu_read_unlock()``; effectively, the RCU nesting count would take the place of the walking_handlers global variable. Indeed, reference counting and RCU have similar purposes, but their usage in general is complementary: @@ -70,14 +70,14 @@ general is complementary: this can improve performance, but also delay reclamation undesirably. With reference counting, reclamation is deterministic. -This file documents QemuLockCnt, an abstraction for using reference +This file documents ``QemuLockCnt``, an abstraction for using reference counting in code that has to be both thread-safe and reentrant. -QemuLockCnt concepts --------------------- +``QemuLockCnt`` concepts +------------------------ -A QemuLockCnt comprises both a counter and a mutex; it has primitives +A ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives to increment and decrement the counter, and to take and release the mutex. The counter notes how many visits to the data structures are taking place (the visits could be from different threads, or there could @@ -95,13 +95,14 @@ not just frees, though there could be cases where this is not necessary. Reads, instead, can be done without taking the mutex, as long as the readers and writers use the same macros that are used for RCU, for -example qatomic_rcu_read, qatomic_rcu_set, QLIST_FOREACH_RCU, etc. This is -because the reads are done outside a lock and a set or QLIST_INSERT_HEAD +example ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``, +etc. This is because the reads are done outside a lock and a set +or ``QLIST_INSERT_HEAD`` can happen concurrently with the read. The RCU API ensures that the processor and the compiler see all required memory barriers. This could be implemented simply by protecting the counter with the -mutex, for example: +mutex, for example:: // (1) qemu_mutex_lock(&walking_handlers_mutex); @@ -125,33 +126,33 @@ mutex, for example: Here, no frees can happen in the code represented by the ellipsis. If another thread is executing critical section (2), that part of the code cannot be entered, because the thread will not be able -to increment the walking_handlers variable. And of course +to increment the ``walking_handlers`` variable. And of course during the visit any other thread will see a nonzero value for -walking_handlers, as in the single-threaded code. +``walking_handlers``, as in the single-threaded code. Note that it is possible for multiple concurrent accesses to delay -the cleanup arbitrarily; in other words, for the walking_handlers +the cleanup arbitrarily; in other words, for the ``walking_handlers`` counter to never become zero. For this reason, this technique is more easily applicable if concurrent access to the structure is rare. However, critical sections are easy to forget since you have to do -them for each modification of the counter. QemuLockCnt ensures that +them for each modification of the counter. ``QemuLockCnt`` ensures that all modifications of the counter take the lock appropriately, and it can also be more efficient in two ways: - it avoids taking the lock for many operations (for example incrementing the counter while it is non-zero); -- on some platforms, one can implement QemuLockCnt to hold the lock +- on some platforms, one can implement ``QemuLockCnt`` to hold the lock and the mutex in a single word, making the fast path no more expensive than simply managing a counter using atomic operations (see - docs/devel/atomics.rst). This can be very helpful if concurrent access to + :doc:`atomics`). This can be very helpful if concurrent access to the data structure is expected to be rare. Using the same mutex for frees and writes can still incur some small inefficiencies; for example, a visit can never start if the counter is -zero and the mutex is taken---even if the mutex is taken by a write, +zero and the mutex is taken -- even if the mutex is taken by a write, which in principle need not block a visit of the data structure. However, these are usually not a problem if any of the following assumptions are valid: @@ -163,27 +164,27 @@ assumptions are valid: - writes are frequent, but this kind of write (e.g. appending to a list) has a very small critical section. -For example, QEMU uses QemuLockCnt to manage an AioContext's list of +For example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of bottom halves and file descriptor handlers. Modifications to the list of file descriptor handlers are rare. Creation of a new bottom half is frequent and can happen on a fast path; however: 1) it is almost never concurrent with a visit to the list of bottom halves; 2) it only has -three instructions in the critical path, two assignments and a smp_wmb(). +three instructions in the critical path, two assignments and a ``smp_wmb()``. -QemuLockCnt API ---------------- +``QemuLockCnt`` API +------------------- -The QemuLockCnt API is described in include/qemu/thread.h. +.. kernel-doc:: include/qemu/lockcnt.h -QemuLockCnt usage ------------------ +``QemuLockCnt`` usage +--------------------- -This section explains the typical usage patterns for QemuLockCnt functions. +This section explains the typical usage patterns for ``QemuLockCnt`` functions. Setting a variable to a non-NULL value can be done between -qemu_lockcnt_lock and qemu_lockcnt_unlock: +``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``:: qemu_lockcnt_lock(&xyz_lockcnt); if (!xyz) { @@ -193,8 +194,8 @@ qemu_lockcnt_lock and qemu_lockcnt_unlock: } qemu_lockcnt_unlock(&xyz_lockcnt); -Accessing the value can be done between qemu_lockcnt_inc and -qemu_lockcnt_dec: +Accessing the value can be done between ``qemu_lockcnt_inc`` and +``qemu_lockcnt_dec``:: qemu_lockcnt_inc(&xyz_lockcnt); if (xyz) { @@ -204,11 +205,11 @@ qemu_lockcnt_dec: } qemu_lockcnt_dec(&xyz_lockcnt); -Freeing the object can similarly use qemu_lockcnt_lock and -qemu_lockcnt_unlock, but you also need to ensure that the count -is zero (i.e. there is no concurrent visit). Because qemu_lockcnt_inc -takes the QemuLockCnt's lock, the count cannot become non-zero while -the object is being freed. Freeing an object looks like this: +Freeing the object can similarly use ``qemu_lockcnt_lock`` and +``qemu_lockcnt_unlock``, but you also need to ensure that the count +is zero (i.e. there is no concurrent visit). Because ``qemu_lockcnt_inc`` +takes the ``QemuLockCnt``'s lock, the count cannot become non-zero while +the object is being freed. Freeing an object looks like this:: qemu_lockcnt_lock(&xyz_lockcnt); if (!qemu_lockcnt_count(&xyz_lockcnt)) { @@ -218,7 +219,7 @@ the object is being freed. Freeing an object looks like this: qemu_lockcnt_unlock(&xyz_lockcnt); If an object has to be freed right after a visit, you can combine -the decrement, the locking and the check on count as follows: +the decrement, the locking and the check on count as follows:: qemu_lockcnt_inc(&xyz_lockcnt); if (xyz) { @@ -232,7 +233,7 @@ the decrement, the locking and the check on count as follows: qemu_lockcnt_unlock(&xyz_lockcnt); } -QemuLockCnt can also be used to access a list as follows: +``QemuLockCnt`` can also be used to access a list as follows:: qemu_lockcnt_inc(&io_handlers_lockcnt); QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) { @@ -252,10 +253,10 @@ QemuLockCnt can also be used to access a list as follows: } Again, the RCU primitives are used because new items can be added to the -list during the walk. QLIST_FOREACH_RCU ensures that the processor and +list during the walk. ``QLIST_FOREACH_RCU`` ensures that the processor and the compiler see the appropriate memory barriers. -An alternative pattern uses qemu_lockcnt_dec_if_lock: +An alternative pattern uses ``qemu_lockcnt_dec_if_lock``:: qemu_lockcnt_inc(&io_handlers_lockcnt); QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) { @@ -273,5 +274,5 @@ An alternative pattern uses qemu_lockcnt_dec_if_lock: } qemu_lockcnt_dec(&io_handlers_lockcnt); -Here you can use qemu_lockcnt_dec instead of qemu_lockcnt_dec_and_lock, +Here you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``, because there is no special task to do if the count goes from 1 to 0. diff --git a/docs/devel/luks-detached-header.rst b/docs/devel/luks-detached-header.rst new file mode 100644 index 0000000..94ec285 --- /dev/null +++ b/docs/devel/luks-detached-header.rst @@ -0,0 +1,182 @@ +================================ +LUKS volume with detached header +================================ + +Introduction +============ + +This document gives an overview of the design of LUKS volume with detached +header and how to use it. + +Background +========== + +The LUKS format has ability to store the header in a separate volume from +the payload. We could extend the LUKS driver in QEMU to support this use +case. + +Normally a LUKS volume has a layout: + +:: + + +-----------------------------------------------+ + | | | | + disk | header | key material | disk payload data | + | | | | + +-----------------------------------------------+ + +With a detached LUKS header, you need 2 disks so getting: + +:: + + +--------------------------+ + disk1 | header | key material | + +--------------------------+ + +---------------------+ + disk2 | disk payload data | + +---------------------+ + +There are a variety of benefits to doing this: + + * Secrecy - the disk2 cannot be identified as containing LUKS + volume since there's no header + * Control - if access to the disk1 is restricted, then even + if someone has access to disk2 they can't unlock + it. Might be useful if you have disks on NFS but + want to restrict which host can launch a VM + instance from it, by dynamically providing access + to the header to a designated host + * Flexibility - your application data volume may be a given + size and it is inconvenient to resize it to + add encryption.You can store the LUKS header + separately and use the existing storage + volume for payload + * Recovery - corruption of a bit in the header may make the + entire payload inaccessible. It might be + convenient to take backups of the header. If + your primary disk header becomes corrupt, you + can unlock the data still by pointing to the + backup detached header + +Architecture +============ + +Take the qcow2 encryption, for example. The architecture of the +LUKS volume with detached header is shown in the diagram below. + +There are two children of the root node: a file and a header. +Data from the disk payload is stored in the file node. The +LUKS header and key material are located in the header node, +as previously mentioned. + +:: + + +-----------------------------+ + Root node | foo[luks] | + +-----------------------------+ + | | + file | header | + | | + +---------------------+ +------------------+ + Child node |payload-format[qcow2]| |header-format[raw]| + +---------------------+ +------------------+ + | | + file | file | + | | + +----------------------+ +---------------------+ + Child node |payload-protocol[file]| |header-protocol[file]| + +----------------------+ +---------------------+ + | | + | | + | | + Host storage Host storage + +Usage +===== + +Create a LUKS disk with a detached header using qemu-img +-------------------------------------------------------- + +Shell commandline:: + + # qemu-img create --object secret,id=sec0,data=abc123 -f luks \ + -o cipher-alg=aes-256,cipher-mode=xts -o key-secret=sec0 \ + -o detached-header=true test-header.img + # qemu-img create -f qcow2 test-payload.qcow2 200G + # qemu-img info 'json:{"driver":"luks","file":{"filename": \ + "test-payload.img"},"header":{"filename":"test-header.img"}}' + +Set up a VM's LUKS volume with a detached header +------------------------------------------------ + +Qemu commandline:: + + # qemu-system-x86_64 ... \ + -object '{"qom-type":"secret","id":"libvirt-3-format-secret", \ + "data":"abc123"}' \ + -blockdev '{"driver":"file","filename":"/path/to/test-header.img", \ + "node-name":"libvirt-1-storage"}' \ + -blockdev '{"node-name":"libvirt-1-format","read-only":false, \ + "driver":"raw","file":"libvirt-1-storage"}' \ + -blockdev '{"driver":"file","filename":"/path/to/test-payload.qcow2", \ + "node-name":"libvirt-2-storage"}' \ + -blockdev '{"node-name":"libvirt-2-format","read-only":false, \ + "driver":"qcow2","file":"libvirt-2-storage"}' \ + -blockdev '{"node-name":"libvirt-3-format","driver":"luks", \ + "file":"libvirt-2-format","header":"libvirt-1-format","key-secret": \ + "libvirt-3-format-secret"}' \ + -device '{"driver":"virtio-blk-pci","bus":XXX,"addr":YYY,"drive": \ + "libvirt-3-format","id":"virtio-disk1"}' + +Add LUKS volume to a VM with a detached header +---------------------------------------------- + +1. object-add the secret for decrypting the cipher stored in + LUKS header above:: + + # virsh qemu-monitor-command vm '{"execute":"object-add", \ + "arguments":{"qom-type":"secret", "id": \ + "libvirt-4-format-secret", "data":"abc123"}}' + +2. block-add the protocol node for LUKS header:: + + # virsh qemu-monitor-command vm '{"execute":"blockdev-add", \ + "arguments":{"node-name":"libvirt-1-storage", "driver":"file", \ + "filename": "/path/to/test-header.img" }}' + +3. block-add the raw-drived node for LUKS header:: + + # virsh qemu-monitor-command vm '{"execute":"blockdev-add", \ + "arguments":{"node-name":"libvirt-1-format", "driver":"raw", \ + "file":"libvirt-1-storage"}}' + +4. block-add the protocol node for disk payload image:: + + # virsh qemu-monitor-command vm '{"execute":"blockdev-add", \ + "arguments":{"node-name":"libvirt-2-storage", "driver":"file", \ + "filename":"/path/to/test-payload.qcow2"}}' + +5. block-add the qcow2-drived format node for disk payload data:: + + # virsh qemu-monitor-command vm '{"execute":"blockdev-add", \ + "arguments":{"node-name":"libvirt-2-format", "driver":"qcow2", \ + "file":"libvirt-2-storage"}}' + +6. block-add the luks-drived format node to link the qcow2 disk + with the LUKS header by specifying the field "header":: + + # virsh qemu-monitor-command vm '{"execute":"blockdev-add", \ + "arguments":{"node-name":"libvirt-3-format", "driver":"luks", \ + "file":"libvirt-2-format", "header":"libvirt-1-format", \ + "key-secret":"libvirt-2-format-secret"}}' + +7. hot-plug the virtio-blk device finally:: + + # virsh qemu-monitor-command vm '{"execute":"device_add", \ + "arguments": {"driver":"virtio-blk-pci", \ + "drive": "libvirt-3-format", "id":"virtio-disk2"}} + +TODO +==== + +1. Support the shared detached LUKS header within the VM. diff --git a/docs/devel/maintainers.rst b/docs/devel/maintainers.rst index 5c907d9..88a613e 100644 --- a/docs/devel/maintainers.rst +++ b/docs/devel/maintainers.rst @@ -99,9 +99,9 @@ members of the QEMU community, you should make arrangements to attend a `KeySigningParty <https://wiki.qemu.org/KeySigningParty>`__ (for example at KVM Forum) or make alternative arrangements to have your key signed by an attendee. Key signing requires meeting another -community member **in person** [#]_ so please make appropriate +community member **in person**\ [#2020]_ so please make appropriate arrangements. -.. [#] In recent pandemic times we have had to exercise some +.. [#2020] In recent pandemic times we have had to exercise some flexibility here. Maintainers still need to sign their pull requests though. diff --git a/docs/devel/memory.rst b/docs/devel/memory.rst index 69c5e3f..57fb2ae 100644 --- a/docs/devel/memory.rst +++ b/docs/devel/memory.rst @@ -369,4 +369,4 @@ callbacks are called: API Reference ------------- -.. kernel-doc:: include/exec/memory.h +.. kernel-doc:: include/system/memory.h diff --git a/docs/devel/migration/CPR.rst b/docs/devel/migration/CPR.rst index 63c3647..7897873 100644 --- a/docs/devel/migration/CPR.rst +++ b/docs/devel/migration/CPR.rst @@ -5,7 +5,7 @@ CPR is the umbrella name for a set of migration modes in which the VM is migrated to a new QEMU instance on the same host. It is intended for use when the goal is to update host software components that run the VM, such as QEMU or even the host kernel. At this time, -cpr-reboot is the only available mode. +the cpr-reboot and cpr-transfer modes are available. Because QEMU is restarted on the same host, with access to the same local devices, CPR is allowed in certain cases where normal migration @@ -53,7 +53,7 @@ RAM is copied to the migration URI. Outgoing: * Set the migration mode parameter to ``cpr-reboot``. * Set the ``x-ignore-shared`` capability if desired. - * Issue the ``migrate`` command. It is recommended the the URI be a + * Issue the ``migrate`` command. It is recommended the URI be a ``file`` type, but one can use other types such as ``exec``, provided the command captures all the data from the outgoing side, and provides all the data to the incoming side. @@ -145,3 +145,183 @@ Caveats cpr-reboot mode may not be used with postcopy, background-snapshot, or COLO. + +cpr-transfer mode +----------------- + +This mode allows the user to transfer a guest to a new QEMU instance +on the same host with minimal guest pause time, by preserving guest +RAM in place, albeit with new virtual addresses in new QEMU. Devices +and their pinned memory pages will also be preserved in a future QEMU +release. + +The user starts new QEMU on the same host as old QEMU, with command- +line arguments to create the same machine, plus the ``-incoming`` +option for the main migration channel, like normal live migration. +In addition, the user adds a second -incoming option with channel +type ``cpr``. This CPR channel must support file descriptor transfer +with SCM_RIGHTS, i.e. it must be a UNIX domain socket. + +To initiate CPR, the user issues a migrate command to old QEMU, +adding a second migration channel of type ``cpr`` in the channels +argument. Old QEMU stops the VM, saves state to the migration +channels, and enters the postmigrate state. Execution resumes in +new QEMU. + +New QEMU reads the CPR channel before opening a monitor, hence +the CPR channel cannot be specified in the list of channels for a +migrate-incoming command. It may only be specified on the command +line. + +Usage +^^^^^ + +Memory backend objects must have the ``share=on`` attribute. + +The VM must be started with the ``-machine aux-ram-share=on`` +option. This causes implicit RAM blocks (those not described by +a memory-backend object) to be allocated by mmap'ing a memfd. +Examples include VGA and ROM. + +Outgoing: + * Set the migration mode parameter to ``cpr-transfer``. + * Issue the ``migrate`` command, containing a main channel and + a cpr channel. + +Incoming: + * Start new QEMU with two ``-incoming`` options. + * If the VM was running when the outgoing ``migrate`` command was + issued, then QEMU automatically resumes VM execution. + +Caveats +^^^^^^^ + +cpr-transfer mode may not be used with postcopy, background-snapshot, +or COLO. + +memory-backend-epc is not supported. + +The main incoming migration channel address cannot be a file type. + +If the main incoming channel address is an inet socket, then the port +cannot be 0 (meaning dynamically choose a port). + +When using ``-incoming defer``, you must issue the migrate command to +old QEMU before issuing any monitor commands to new QEMU, because new +QEMU blocks waiting to read from the cpr channel before starting its +monitor, and old QEMU does not write to the channel until the migrate +command is issued. However, new QEMU does not open and read the +main migration channel until you issue the migrate incoming command. + +Example 1: incoming channel +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In these examples, we simply restart the same version of QEMU, but +in a real scenario one would start new QEMU on the incoming side. +Note that new QEMU does not print the monitor prompt until old QEMU +has issued the migrate command. The outgoing side uses QMP because +HMP cannot specify a CPR channel. Some QMP responses are omitted for +brevity. + +:: + + Outgoing: Incoming: + + # qemu-kvm -qmp stdio + -object memory-backend-file,id=ram0,size=4G, + mem-path=/dev/shm/ram0,share=on -m 4G + -machine memory-backend=ram0 + -machine aux-ram-share=on + ... + # qemu-kvm -monitor stdio + -incoming tcp:0:44444 + -incoming '{"channel-type": "cpr", + "addr": { "transport": "socket", + "type": "unix", "path": "cpr.sock"}}' + ... + {"execute":"qmp_capabilities"} + + {"execute": "query-status"} + {"return": {"status": "running", + "running": true}} + + {"execute":"migrate-set-parameters", + "arguments":{"mode":"cpr-transfer"}} + + {"execute": "migrate", "arguments": { "channels": [ + {"channel-type": "main", + "addr": { "transport": "socket", "type": "inet", + "host": "0", "port": "44444" }}, + {"channel-type": "cpr", + "addr": { "transport": "socket", "type": "unix", + "path": "cpr.sock" }}]}} + + QEMU 10.0.50 monitor + (qemu) info status + VM status: running + + {"execute": "query-status"} + {"return": {"status": "postmigrate", + "running": false}} + +Example 2: incoming defer +^^^^^^^^^^^^^^^^^^^^^^^^^ + +This example uses ``-incoming defer`` to hot plug a device before +accepting the main migration channel. Again note you must issue the +migrate command to old QEMU before you can issue any monitor +commands to new QEMU. + + +:: + + Outgoing: Incoming: + + # qemu-kvm -monitor stdio + -object memory-backend-file,id=ram0,size=4G, + mem-path=/dev/shm/ram0,share=on -m 4G + -machine memory-backend=ram0 + -machine aux-ram-share=on + ... + # qemu-kvm -monitor stdio + -incoming defer + -incoming '{"channel-type": "cpr", + "addr": { "transport": "socket", + "type": "unix", "path": "cpr.sock"}}' + ... + {"execute":"qmp_capabilities"} + + {"execute": "device_add", + "arguments": {"driver": "pcie-root-port"}} + + {"execute":"migrate-set-parameters", + "arguments":{"mode":"cpr-transfer"}} + + {"execute": "migrate", "arguments": { "channels": [ + {"channel-type": "main", + "addr": { "transport": "socket", "type": "inet", + "host": "0", "port": "44444" }}, + {"channel-type": "cpr", + "addr": { "transport": "socket", "type": "unix", + "path": "cpr.sock" }}]}} + + QEMU 10.0.50 monitor + (qemu) info status + VM status: paused (inmigrate) + (qemu) device_add pcie-root-port + (qemu) migrate_incoming tcp:0:44444 + (qemu) info status + VM status: running + + {"execute": "query-status"} + {"return": {"status": "postmigrate", + "running": false}} + +Futures +^^^^^^^ + +cpr-transfer mode is based on a capability to transfer open file +descriptors from old to new QEMU. In the future, descriptors for +vfio, iommufd, vhost, and char devices could be transferred, +preserving those devices and their kernel state without interruption, +even if they do not explicitly support live migration. diff --git a/docs/devel/migration/compatibility.rst b/docs/devel/migration/compatibility.rst index 5a5417e..ecb887e 100644 --- a/docs/devel/migration/compatibility.rst +++ b/docs/devel/migration/compatibility.rst @@ -198,7 +198,7 @@ was done:: The relevant parts for migration are:: - @@ -1281,7 +1284,8 @@ static Property virtio_blk_properties[] = { + @@ -1281,7 +1284,8 @@ static const Property virtio_blk_properties[] = { #endif DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0, true), @@ -395,13 +395,12 @@ the old behaviour or the new behaviour:: index 8a87ccc8b0..5153ad63d6 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c - @@ -79,6 +79,8 @@ static Property pci_props[] = { + @@ -79,6 +79,8 @@ static const Property pci_props[] = { DEFINE_PROP_STRING("failover_pair_id", PCIDevice, failover_pair_id), DEFINE_PROP_UINT32("acpi-index", PCIDevice, acpi_index, 0), + DEFINE_PROP_BIT("x-pcie-err-unc-mask", PCIDevice, cap_present, + QEMU_PCIE_ERR_UNC_MASK_BITNR, true), - DEFINE_PROP_END_OF_LIST() }; Notice that we enable the feature for new machine types. diff --git a/docs/devel/migration/features.rst b/docs/devel/migration/features.rst index 58f8fd9..8f431d5 100644 --- a/docs/devel/migration/features.rst +++ b/docs/devel/migration/features.rst @@ -14,3 +14,4 @@ Migration has plenty of features to support different use cases. CPR qpl-compression uadk-compression + qatzip-compression diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst index 784c899..cdd4f4a 100644 --- a/docs/devel/migration/main.rst +++ b/docs/devel/migration/main.rst @@ -1,3 +1,5 @@ +.. _migration: + =================== Migration framework =================== @@ -465,6 +467,12 @@ Examples of such API functions are: - portio_list_set_address() - portio_list_set_enabled() +Since the order of device save/restore is not defined, you must +avoid accessing or changing any other device's state in one of these +callbacks. (For instance, don't do anything that calls ``update_irq()`` +in a ``post_load`` hook.) Otherwise, restore will not be deterministic, +and this will break execution record/replay. + Iterative device migration -------------------------- diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst index d352b54..b08c2b4 100644 --- a/docs/devel/migration/mapped-ram.rst +++ b/docs/devel/migration/mapped-ram.rst @@ -44,7 +44,7 @@ Use-cases The mapped-ram feature was designed for use cases where the migration stream will be directed to a file in the filesystem and not -immediately restored on the destination VM [#]_. These could be +immediately restored on the destination VM\ [#alternatives]_. These could be thought of as snapshots. We can further categorize them into live and non-live. @@ -70,7 +70,7 @@ mapped-ram in this scenario is portability since background-snapshot depends on async dirty tracking (KVM_GET_DIRTY_LOG) which is not supported outside of Linux. -.. [#] While this same effect could be obtained with the usage of +.. [#alternatives] While this same effect could be obtained with the usage of snapshots or the ``file:`` migration alone, mapped-ram provides a performance increase for VMs with larger RAM sizes (10s to 100s of GiBs), specially if the VM has been stopped beforehand. diff --git a/docs/devel/migration/qatzip-compression.rst b/docs/devel/migration/qatzip-compression.rst new file mode 100644 index 0000000..862b383 --- /dev/null +++ b/docs/devel/migration/qatzip-compression.rst @@ -0,0 +1,165 @@ +================== +QATzip Compression +================== +In scenarios with limited network bandwidth, the ``QATzip`` solution can help +users save a lot of host CPU resources by accelerating compression and +decompression through the Intel QuickAssist Technology(``QAT``) hardware. + + +The following test was conducted using 8 multifd channels and 10Gbps network +bandwidth. The results show that, compared to zstd, ``QATzip`` significantly +saves CPU resources on the sender and reduces migration time. Compared to the +uncompressed solution, ``QATzip`` greatly improves the dirty page processing +capability, indicated by the Pages per Second metric, and also reduces the +total migration time. + +:: + + VM Configuration: 16 vCPU and 64G memory + VM Workload: all vCPUs are idle and 54G memory is filled with Silesia data. + QAT Devices: 4 + |-----------|--------|---------|----------|----------|------|------| + |8 Channels |Total |down |throughput|pages per | send | recv | + | |time(ms)|time(ms) |(mbps) |second | cpu %| cpu% | + |-----------|--------|---------|----------|----------|------|------| + |qatzip | 16630| 28| 10467| 2940235| 160| 360| + |-----------|--------|---------|----------|----------|------|------| + |zstd | 20165| 24| 8579| 2391465| 810| 340| + |-----------|--------|---------|----------|----------|------|------| + |none | 46063| 40| 10848| 330240| 45| 85| + |-----------|--------|---------|----------|----------|------|------| + + +QATzip Compression Framework +============================ + +``QATzip`` is a user space library which builds on top of the Intel QuickAssist +Technology to provide extended accelerated compression and decompression +services. + +For more ``QATzip`` introduction, please refer to `QATzip Introduction +<https://github.com/intel/QATzip?tab=readme-ov-file#introductionl>`_ + +:: + + +----------------+ + | MultiFd Thread | + +-------+--------+ + | + | compress/decompress + +-------+--------+ + | QATzip library | + +-------+--------+ + | + +-------+--------+ + | QAT library | + +-------+--------+ + | user space + --------+--------------------- + | kernel space + +------+-------+ + | QAT Driver | + +------+-------+ + | + +------+-------+ + | QAT Devices | + +--------------+ + + +QATzip Installation +------------------- + +The ``QATzip`` installation package has been integrated into some Linux +distributions and can be installed directly. For example, the Ubuntu Server +24.04 LTS system can be installed using below command + +.. code-block:: shell + + #apt search qatzip + libqatzip-dev/noble 1.2.0-0ubuntu3 amd64 + Intel QuickAssist user space library development files + + libqatzip3/noble 1.2.0-0ubuntu3 amd64 + Intel QuickAssist user space library + + qatzip/noble,now 1.2.0-0ubuntu3 amd64 [installed] + Compression user-space tool for Intel QuickAssist Technology + + #sudo apt install libqatzip-dev libqatzip3 qatzip + +If your system does not support the ``QATzip`` installation package, you can +use the source code to build and install, please refer to `QATzip source code installation +<https://github.com/intel/QATzip?tab=readme-ov-file#build-intel-quickassist-technology-driver>`_ + +QAT Hardware Deployment +----------------------- + +``QAT`` supports physical functions(PFs) and virtual functions(VFs) for +deployment, and users can configure ``QAT`` resources for migration according +to actual needs. For more details about ``QAT`` deployment, please refer to +`Intel QuickAssist Technology Documentation +<https://intel.github.io/quickassist/index.html>`_ + +For more ``QAT`` hardware introduction, please refer to `intel-quick-assist-technology-overview +<https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html>`_ + +How To Use QATzip Compression +============================= + +1 - Install ``QATzip`` library + +2 - Build ``QEMU`` with ``--enable-qatzip`` parameter + + E.g. configure --target-list=x86_64-softmmu --enable-kvm ``--enable-qatzip`` + +3 - Set ``migrate_set_parameter multifd-compression qatzip`` + +4 - Set ``migrate_set_parameter multifd-qatzip-level comp_level``, the default +comp_level value is 1, and it supports levels from 1 to 9 + +QAT Memory Requirements +======================= + +The user needs to reserve system memory for the QAT memory management to +allocate DMA memory. The size of the reserved system memory depends on the +number of devices used for migration and the number of multifd channels. + +Because memory usage depends on QAT configuration, please refer to `QAT Memory +Driver Queries +<https://intel.github.io/quickassist/PG/infrastructure_debugability.html?highlight=memory>`_ +for memory usage calculation. + +.. list-table:: An example of a PF used for migration + :header-rows: 1 + + * - Number of channels + - Sender memory usage + - Receiver memory usage + * - 2 + - 10M + - 10M + * - 4 + - 12M + - 14M + * - 8 + - 16M + - 20M + +How To Choose Between QATzip and QPL +==================================== +Starting from 4th Gen Intel Xeon Scalable processors, codenamed Sapphire Rapids +processor(``SPR``), multiple built-in accelerators are supported including +``QAT`` and ``IAA``. The former can accelerate ``QATzip`` and the latter is +used to accelerate ``QPL``. + +Here are some suggestions: + +1 - If the live migration scenario is limited by network bandwidth and ``QAT`` +hardware resources exceed ``IAA``, use the ``QATzip`` method, which can save a +lot of host CPU resources for compression. + +2 - If the system cannot support shared virtual memory (SVM) technology, use +the ``QATzip`` method because ``QPL`` performance is not good without SVM +support. + +3 - For other scenarios, use the ``QPL`` method first. diff --git a/docs/devel/migration/uadk-compression.rst b/docs/devel/migration/uadk-compression.rst index 3f73345..64cadeb 100644 --- a/docs/devel/migration/uadk-compression.rst +++ b/docs/devel/migration/uadk-compression.rst @@ -114,7 +114,7 @@ Make sure all these above kernel configurations are selected. Accelerator dev node permissions -------------------------------- -Harware accelerators(eg: HiSilicon Kunpeng Zip accelerator) gets registered to +Hardware accelerators (eg: HiSilicon Kunpeng Zip accelerator) gets registered to UADK and char devices are created in dev directory. In order to access resources on hardware accelerator devices, write permission should be provided to user. @@ -134,7 +134,7 @@ How To Use UADK Compression In QEMU Migration Set ``migrate_set_parameter multifd-compression uadk`` Since UADK uses Shared Virtual Addressing(SVA) and device access virtual memory -directly it is possible that SMMUv3 may enounter page faults while walking the +directly it is possible that SMMUv3 may encounter page faults while walking the IO page tables. This may impact the performance. In order to mitigate this, please make sure to specify ``-mem-prealloc`` parameter to the destination VM boot parameters. diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst index c49482e..673e354 100644 --- a/docs/devel/migration/vfio.rst +++ b/docs/devel/migration/vfio.rst @@ -67,15 +67,35 @@ VFIO implements the device hooks for the iterative approach as follows: * A ``switchover_ack_needed`` function that checks if the VFIO device uses "switchover-ack" migration capability when this capability is enabled. -* A ``save_state`` function to save the device config space if it is present. +* A ``switchover_start`` function that in the multifd mode starts a thread that + reassembles the multifd received data and loads it in-order into the device. + In the non-multifd mode this function is a NOP. + +* A ``save_state`` function to save the device config space if it is present + in the non-multifd mode. + In the multifd mode it just emits either a dummy EOS marker. * A ``save_live_complete_precopy`` function that sets the VFIO device in _STOP_COPY state and iteratively copies the data for the VFIO device until the vendor driver indicates that no data remains. + In the multifd mode it just emits a dummy EOS marker. + +* A ``save_live_complete_precopy_thread`` function that in the multifd mode + provides thread handler performing multifd device state transfer. + It sets the VFIO device to _STOP_COPY state, iteratively reads the data + from the VFIO device and queues it for multifd transmission until the vendor + driver indicates that no data remains. + After that, it saves the device config space and queues it for multifd + transfer too. + In the non-multifd mode this thread is a NOP. * A ``load_state`` function that loads the config section and the data sections that are generated by the save functions above. +* A ``load_state_buffer`` function that loads the device state and the device + config that arrived via multifd channels. + It's used only in the multifd mode. + * ``cleanup`` functions for both save and load that perform any migration related cleanup. @@ -176,8 +196,11 @@ Live migration save path Then the VFIO device is put in _STOP_COPY state (FINISH_MIGRATE, _ACTIVE, _STOP_COPY) .save_live_complete_precopy() is called for each active device - For the VFIO device, iterate in .save_live_complete_precopy() until + For the VFIO device: in the non-multifd mode iterate in + .save_live_complete_precopy() until pending data is 0 + In the multifd mode this iteration is done in + .save_live_complete_precopy_thread() instead. | (POSTMIGRATE, _COMPLETED, _STOP_COPY) Migraton thread schedules cleanup bottom half and exits @@ -194,6 +217,9 @@ Live migration resume path (RESTORE_VM, _ACTIVE, _STOP) | For each device, .load_state() is called for that device section data + transmitted via the main migration channel. + For data transmitted via multifd channels .load_state_buffer() is called + instead. (RESTORE_VM, _ACTIVE, _RESUMING) | At the end, .load_cleanup() is called for each device and vCPUs are started @@ -206,3 +232,18 @@ Postcopy ======== Postcopy migration is currently not supported for VFIO devices. + +Multifd +======= + +Starting from QEMU version 10.0 there's a possibility to transfer VFIO device +_STOP_COPY state via multifd channels. This helps reduce downtime - especially +with multiple VFIO devices or with devices having a large migration state. +As an additional benefit, setting the VFIO device to _STOP_COPY state and +saving its config space is also parallelized (run in a separate thread) in +such migration mode. + +The multifd VFIO device state transfer is controlled by +"x-migration-multifd-transfer" VFIO device property. This property defaults to +AUTO, which means that VFIO device state transfer via multifd channels is +attempted in configurations that otherwise support it. diff --git a/docs/devel/multi-thread-tcg.rst b/docs/devel/multi-thread-tcg.rst index d706c27..da9a153 100644 --- a/docs/devel/multi-thread-tcg.rst +++ b/docs/devel/multi-thread-tcg.rst @@ -4,6 +4,8 @@ This work is licensed under the terms of the GNU GPL, version 2 or later. See the COPYING file in the top-level directory. +.. _mttcg: + ================== Multi-threaded TCG ================== @@ -26,16 +28,15 @@ vCPU Scheduling We introduce a new running mode where each vCPU will run on its own user-space thread. This is enabled by default for all FE/BE combinations where the host memory model is able to accommodate the -guest (TCG_GUEST_DEFAULT_MO & ~TCG_TARGET_DEFAULT_MO is zero) and the -guest has had the required work done to support this safely -(TARGET_SUPPORTS_MTTCG). +guest (TCGCPUOps::guest_default_memory_order & ~TCG_TARGET_DEFAULT_MO is zero) +and the guest has had the required work done to support this safely +(TCGCPUOps::mttcg_supported). System emulation will fall back to the original round robin approach if: * forced by --accel tcg,thread=single * enabling --icount mode -* 64 bit guests on 32 bit hosts (TCG_OVERSIZED_GUEST) In the general case of running translated code there should be no inter-vCPU dependencies and all vCPUs should be able to run at full diff --git a/docs/devel/multiple-iothreads.rst b/docs/devel/multiple-iothreads.rst new file mode 100644 index 0000000..d1f3fc4 --- /dev/null +++ b/docs/devel/multiple-iothreads.rst @@ -0,0 +1,139 @@ +Using Multiple ``IOThread``\ s +============================== + +.. + Copyright (c) 2014-2017 Red Hat Inc. + + This work is licensed under the terms of the GNU GPL, version 2 or later. See + the COPYING file in the top-level directory. + + +This document explains the ``IOThread`` feature and how to write code that runs +outside the BQL. + +The main loop and ``IOThread``\ s +--------------------------------- +QEMU is an event-driven program that can do several things at once using an +event loop. The VNC server and the QMP monitor are both processed from the +same event loop, which monitors their file descriptors until they become +readable and then invokes a callback. + +The default event loop is called the main loop (see ``main-loop.c``). It is +possible to create additional event loop threads using +``-object iothread,id=my-iothread``. + +Side note: The main loop and ``IOThread`` are both event loops but their code is +not shared completely. Sometimes it is useful to remember that although they +are conceptually similar they are currently not interchangeable. + +Why ``IOThread``\ s are useful +------------------------------ +``IOThread``\ s allow the user to control the placement of work. The main loop is a +scalability bottleneck on hosts with many CPUs. Work can be spread across +several ``IOThread``\ s instead of just one main loop. When set up correctly this +can improve I/O latency and reduce jitter seen by the guest. + +The main loop is also deeply associated with the BQL, which is a +scalability bottleneck in itself. vCPU threads and the main loop use the BQL +to serialize execution of QEMU code. This mutex is necessary because a lot of +QEMU's code historically was not thread-safe. + +The fact that all I/O processing is done in a single main loop and that the +BQL is contended by all vCPU threads and the main loop explain +why it is desirable to place work into ``IOThread``\ s. + +The experimental ``virtio-blk`` data-plane implementation has been benchmarked and +shows these effects: +ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf + +.. _how-to-program: + +How to program for ``IOThread``\ s +---------------------------------- +The main difference between legacy code and new code that can run in an +``IOThread`` is dealing explicitly with the event loop object, ``AioContext`` +(see ``include/block/aio.h``). Code that only works in the main loop +implicitly uses the main loop's ``AioContext``. Code that supports running +in ``IOThread``\ s must be aware of its ``AioContext``. + +AioContext supports the following services: + * File descriptor monitoring (read/write/error on POSIX hosts) + * Event notifiers (inter-thread signalling) + * Timers + * Bottom Halves (BH) deferred callbacks + +There are several old APIs that use the main loop AioContext: + * LEGACY ``qemu_aio_set_fd_handler()`` - monitor a file descriptor + * LEGACY ``qemu_aio_set_event_notifier()`` - monitor an event notifier + * LEGACY ``timer_new_ms()`` - create a timer + * LEGACY ``qemu_bh_new()`` - create a BH + * LEGACY ``qemu_bh_new_guarded()`` - create a BH with a device re-entrancy guard + * LEGACY ``qemu_aio_wait()`` - run an event loop iteration + +Since they implicitly work on the main loop they cannot be used in code that +runs in an ``IOThread``. They might cause a crash or deadlock if called from an +``IOThread`` since the BQL is not held. + +Instead, use the ``AioContext`` functions directly (see ``include/block/aio.h``): + * ``aio_set_fd_handler()`` - monitor a file descriptor + * ``aio_set_event_notifier()`` - monitor an event notifier + * ``aio_timer_new()`` - create a timer + * ``aio_bh_new()`` - create a BH + * ``aio_bh_new_guarded()`` - create a BH with a device re-entrancy guard + * ``aio_poll()`` - run an event loop iteration + +The ``qemu_bh_new_guarded``/``aio_bh_new_guarded`` APIs accept a +``MemReentrancyGuard`` +argument, which is used to check for and prevent re-entrancy problems. For +BHs associated with devices, the reentrancy-guard is contained in the +corresponding ``DeviceState`` and named ``mem_reentrancy_guard``. + +The ``AioContext`` can be obtained from the ``IOThread`` using +``iothread_get_aio_context()`` or for the main loop using +``qemu_get_aio_context()``. Code that takes an ``AioContext`` argument +works both in ``IOThread``\ s or the main loop, depending on which ``AioContext`` +instance the caller passes in. + +How to synchronize with an ``IOThread`` +--------------------------------------- +Variables that can be accessed by multiple threads require some form of +synchronization such as ``qemu_mutex_lock()``, ``rcu_read_lock()``, etc. + +``AioContext`` functions like ``aio_set_fd_handler()``, +``aio_set_event_notifier()``, ``aio_bh_new()``, and ``aio_timer_new()`` +are thread-safe. They can be used to trigger activity in an ``IOThread``. + +Side note: the best way to schedule a function call across threads is to call +``aio_bh_schedule_oneshot()``. + +The main loop thread can wait synchronously for a condition using +``AIO_WAIT_WHILE()``. + +``AioContext`` and the block layer +---------------------------------- +The ``AioContext`` originates from the QEMU block layer, even though nowadays +``AioContext`` is a generic event loop that can be used by any QEMU subsystem. + +The block layer has support for ``AioContext`` integrated. Each +``BlockDriverState`` is associated with an ``AioContext`` using +``bdrv_try_change_aio_context()`` and ``bdrv_get_aio_context()``. +This allows block layer code to process I/O inside the +right ``AioContext``. Other subsystems may wish to follow a similar approach. + +Block layer code must therefore expect to run in an ``IOThread`` and avoid using +old APIs that implicitly use the main loop. See +`How to program for IOThreads`_ for information on how to do that. + +Code running in the monitor typically needs to ensure that past +requests from the guest are completed. When a block device is running +in an ``IOThread``, the ``IOThread`` can also process requests from the guest +(via ioeventfd). To achieve both objects, wrap the code between +``bdrv_drained_begin()`` and ``bdrv_drained_end()``, thus creating a "drained +section". + +Long-running jobs (usually in the form of coroutines) are often scheduled in +the ``BlockDriverState``'s ``AioContext``. The functions +``bdrv_add``/``remove_aio_context_notifier``, or alternatively +``blk_add``/``remove_aio_context_notifier`` if you use ``BlockBackends``, +can be used to get a notification whenever ``bdrv_try_change_aio_context()`` +moves a ``BlockDriverState`` to a different ``AioContext``. diff --git a/docs/devel/multiple-iothreads.txt b/docs/devel/multiple-iothreads.txt deleted file mode 100644 index de85767..0000000 --- a/docs/devel/multiple-iothreads.txt +++ /dev/null @@ -1,130 +0,0 @@ -Copyright (c) 2014-2017 Red Hat Inc. - -This work is licensed under the terms of the GNU GPL, version 2 or later. See -the COPYING file in the top-level directory. - - -This document explains the IOThread feature and how to write code that runs -outside the BQL. - -The main loop and IOThreads ---------------------------- -QEMU is an event-driven program that can do several things at once using an -event loop. The VNC server and the QMP monitor are both processed from the -same event loop, which monitors their file descriptors until they become -readable and then invokes a callback. - -The default event loop is called the main loop (see main-loop.c). It is -possible to create additional event loop threads using -object -iothread,id=my-iothread. - -Side note: The main loop and IOThread are both event loops but their code is -not shared completely. Sometimes it is useful to remember that although they -are conceptually similar they are currently not interchangeable. - -Why IOThreads are useful ------------------------- -IOThreads allow the user to control the placement of work. The main loop is a -scalability bottleneck on hosts with many CPUs. Work can be spread across -several IOThreads instead of just one main loop. When set up correctly this -can improve I/O latency and reduce jitter seen by the guest. - -The main loop is also deeply associated with the BQL, which is a -scalability bottleneck in itself. vCPU threads and the main loop use the BQL -to serialize execution of QEMU code. This mutex is necessary because a lot of -QEMU's code historically was not thread-safe. - -The fact that all I/O processing is done in a single main loop and that the -BQL is contended by all vCPU threads and the main loop explain -why it is desirable to place work into IOThreads. - -The experimental virtio-blk data-plane implementation has been benchmarked and -shows these effects: -ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf - -How to program for IOThreads ----------------------------- -The main difference between legacy code and new code that can run in an -IOThread is dealing explicitly with the event loop object, AioContext -(see include/block/aio.h). Code that only works in the main loop -implicitly uses the main loop's AioContext. Code that supports running -in IOThreads must be aware of its AioContext. - -AioContext supports the following services: - * File descriptor monitoring (read/write/error on POSIX hosts) - * Event notifiers (inter-thread signalling) - * Timers - * Bottom Halves (BH) deferred callbacks - -There are several old APIs that use the main loop AioContext: - * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor - * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier - * LEGACY timer_new_ms() - create a timer - * LEGACY qemu_bh_new() - create a BH - * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard - * LEGACY qemu_aio_wait() - run an event loop iteration - -Since they implicitly work on the main loop they cannot be used in code that -runs in an IOThread. They might cause a crash or deadlock if called from an -IOThread since the BQL is not held. - -Instead, use the AioContext functions directly (see include/block/aio.h): - * aio_set_fd_handler() - monitor a file descriptor - * aio_set_event_notifier() - monitor an event notifier - * aio_timer_new() - create a timer - * aio_bh_new() - create a BH - * aio_bh_new_guarded() - create a BH with a device re-entrancy guard - * aio_poll() - run an event loop iteration - -The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard" -argument, which is used to check for and prevent re-entrancy problems. For -BHs associated with devices, the reentrancy-guard is contained in the -corresponding DeviceState and named "mem_reentrancy_guard". - -The AioContext can be obtained from the IOThread using -iothread_get_aio_context() or for the main loop using qemu_get_aio_context(). -Code that takes an AioContext argument works both in IOThreads or the main -loop, depending on which AioContext instance the caller passes in. - -How to synchronize with an IOThread ------------------------------------ -Variables that can be accessed by multiple threads require some form of -synchronization such as qemu_mutex_lock(), rcu_read_lock(), etc. - -AioContext functions like aio_set_fd_handler(), aio_set_event_notifier(), -aio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger -activity in an IOThread. - -Side note: the best way to schedule a function call across threads is to call -aio_bh_schedule_oneshot(). - -The main loop thread can wait synchronously for a condition using -AIO_WAIT_WHILE(). - -AioContext and the block layer ------------------------------- -The AioContext originates from the QEMU block layer, even though nowadays -AioContext is a generic event loop that can be used by any QEMU subsystem. - -The block layer has support for AioContext integrated. Each BlockDriverState -is associated with an AioContext using bdrv_try_change_aio_context() and -bdrv_get_aio_context(). This allows block layer code to process I/O inside the -right AioContext. Other subsystems may wish to follow a similar approach. - -Block layer code must therefore expect to run in an IOThread and avoid using -old APIs that implicitly use the main loop. See the "How to program for -IOThreads" above for information on how to do that. - -Code running in the monitor typically needs to ensure that past -requests from the guest are completed. When a block device is running -in an IOThread, the IOThread can also process requests from the guest -(via ioeventfd). To achieve both objects, wrap the code between -bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained -section". - -Long-running jobs (usually in the form of coroutines) are often scheduled in -the BlockDriverState's AioContext. The functions -bdrv_add/remove_aio_context_notifier, or alternatively -blk_add/remove_aio_context_notifier if you use BlockBackends, can be used to -get a notification whenever bdrv_try_change_aio_context() moves a -BlockDriverState to a different AioContext. diff --git a/docs/devel/nested-papr.txt b/docs/devel/nested-papr.txt deleted file mode 100644 index 9094365..0000000 --- a/docs/devel/nested-papr.txt +++ /dev/null @@ -1,119 +0,0 @@ -Nested PAPR API (aka KVM on PowerVM) -==================================== - -This API aims at providing support to enable nested virtualization with -KVM on PowerVM. While the existing support for nested KVM on PowerNV was -introduced with cap-nested-hv option, however, with a slight design change, -to enable this on papr/pseries, a new cap-nested-papr option is added. eg: - - qemu-system-ppc64 -cpu POWER10 -machine pseries,cap-nested-papr=true ... - -Work by: - Michael Neuling <mikey@neuling.org> - Vaibhav Jain <vaibhav@linux.ibm.com> - Jordan Niethe <jniethe5@gmail.com> - Harsh Prateek Bora <harshpb@linux.ibm.com> - Shivaprasad G Bhat <sbhat@linux.ibm.com> - Kautuk Consul <kconsul@linux.vnet.ibm.com> - -Below taken from the kernel documentation: - -Introduction -============ - -This document explains how a guest operating system can act as a -hypervisor and run nested guests through the use of hypercalls, if the -hypervisor has implemented them. The terms L0, L1, and L2 are used to -refer to different software entities. L0 is the hypervisor mode entity -that would normally be called the "host" or "hypervisor". L1 is a -guest virtual machine that is directly run under L0 and is initiated -and controlled by L0. L2 is a guest virtual machine that is initiated -and controlled by L1 acting as a hypervisor. A significant design change -wrt existing API is that now the entire L2 state is maintained within L0. - -Existing Nested-HV API -====================== - -Linux/KVM has had support for Nesting as an L0 or L1 since 2018 - -The L0 code was added:: - - commit 8e3f5fc1045dc49fd175b978c5457f5f51e7a2ce - Author: Paul Mackerras <paulus@ozlabs.org> - Date: Mon Oct 8 16:31:03 2018 +1100 - KVM: PPC: Book3S HV: Framework and hcall stubs for nested virtualization - -The L1 code was added:: - - commit 360cae313702cdd0b90f82c261a8302fecef030a - Author: Paul Mackerras <paulus@ozlabs.org> - Date: Mon Oct 8 16:31:04 2018 +1100 - KVM: PPC: Book3S HV: Nested guest entry via hypercall - -This API works primarily using a signal hcall h_enter_nested(). This -call made by the L1 to tell the L0 to start an L2 vCPU with the given -state. The L0 then starts this L2 and runs until an L2 exit condition -is reached. Once the L2 exits, the state of the L2 is given back to -the L1 by the L0. The full L2 vCPU state is always transferred from -and to L1 when the L2 is run. The L0 doesn't keep any state on the L2 -vCPU (except in the short sequence in the L0 on L1 -> L2 entry and L2 --> L1 exit). - -The only state kept by the L0 is the partition table. The L1 registers -it's partition table using the h_set_partition_table() hcall. All -other state held by the L0 about the L2s is cached state (such as -shadow page tables). - -The L1 may run any L2 or vCPU without first informing the L0. It -simply starts the vCPU using h_enter_nested(). The creation of L2s and -vCPUs is done implicitly whenever h_enter_nested() is called. - -In this document, we call this existing API the v1 API. - -New PAPR API -=============== - -The new PAPR API changes from the v1 API such that the creating L2 and -associated vCPUs is explicit. In this document, we call this the v2 -API. - -h_enter_nested() is replaced with H_GUEST_VCPU_RUN(). Before this can -be called the L1 must explicitly create the L2 using h_guest_create() -and any associated vCPUs() created with h_guest_create_vCPU(). Getting -and setting vCPU state can also be performed using h_guest_{g|s}et -hcall. - -The basic execution flow is for an L1 to create an L2, run it, and -delete it is: - -- L1 and L0 negotiate capabilities with H_GUEST_{G,S}ET_CAPABILITIES() - (normally at L1 boot time). - -- L1 requests the L0 to create an L2 with H_GUEST_CREATE() and receives a token - -- L1 requests the L0 to create an L2 vCPU with H_GUEST_CREATE_VCPU() - -- L1 and L0 communicate the vCPU state using the H_GUEST_{G,S}ET() hcall - -- L1 requests the L0 to run the vCPU using H_GUEST_RUN_VCPU() hcall - -- L1 deletes L2 with H_GUEST_DELETE() - -For more details, please refer: - -[1] Linux Kernel documentation (upstream documentation commit): - -commit 476652297f94a2e5e5ef29e734b0da37ade94110 -Author: Michael Neuling <mikey@neuling.org> -Date: Thu Sep 14 13:06:00 2023 +1000 - - docs: powerpc: Document nested KVM on POWER - - Document support for nested KVM on POWER using the existing API as well - as the new PAPR API. This includes the new HCALL interface and how it - used by KVM. - - Signed-off-by: Michael Neuling <mikey@neuling.org> - Signed-off-by: Jordan Niethe <jniethe5@gmail.com> - Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> - Link: https://msgid.link/20230914030600.16993-12-jniethe5@gmail.com diff --git a/docs/devel/qapi-code-gen.rst b/docs/devel/qapi-code-gen.rst index 583207a..231cc0f 100644 --- a/docs/devel/qapi-code-gen.rst +++ b/docs/devel/qapi-code-gen.rst @@ -9,6 +9,7 @@ How to use the QAPI code generator This work is licensed under the terms of the GNU GPL, version 2 or later. See the COPYING file in the top-level directory. +.. _qapi: Introduction ============ @@ -228,7 +229,8 @@ These are of the form PREFIX_NAME, where PREFIX is derived from the enumeration type's name, and NAME from the value's name. For the example above, the generator maps 'MyEnum' to MY_ENUM and 'value1' to VALUE1, resulting in the enumeration constant MY_ENUM_VALUE1. The -optional 'prefix' member overrides PREFIX. +optional 'prefix' member overrides PREFIX. This is rarely necessary, +and should be used with restraint. The generated C enumeration constants have values 0, 1, ..., N-1 (in QAPI schema order), where N is the number of values. There is an @@ -761,8 +763,8 @@ Names beginning with ``x-`` used to signify "experimental". This convention has been replaced by special feature "unstable". Pragmas ``command-name-exceptions`` and ``member-name-exceptions`` let -you violate naming rules. Use for new code is strongly discouraged. See -`Pragma directives`_ for details. +you violate naming rules. Use for new code is strongly discouraged. +See `Pragma directives`_ for details. Downstream extensions @@ -1011,7 +1013,7 @@ like this:: document the success and the error response, respectively. "Errors" sections should be formatted as an rST list, each entry -detailing a relevant error condition. For example:: +detailing a relevant error condition. For example:: # Errors: # - If @device does not exist, DeviceNotFound @@ -1024,31 +1026,28 @@ definition. QMP). In other sections, the text is formatted, and rST markup can be used. -QMP Examples can be added by using the ``.. qmp-example::`` -directive. In its simplest form, this can be used to contain a single -QMP code block which accepts standard JSON syntax with additional server -directionality indicators (``->`` and ``<-``), and elisions (``...``). +QMP Examples can be added by using the ``.. qmp-example::`` directive. +In its simplest form, this can be used to contain a single QMP code +block which accepts standard JSON syntax with additional server +directionality indicators (``->`` and ``<-``), and elisions. An +elision is commonly ``...``, but it can also be or a pair of ``...`` +with text in between. Optionally, a plaintext title may be provided by using the ``:title:`` -directive option. If the title is omitted, the example title will +directive option. If the title is omitted, the example title will default to "Example:". A simple QMP example:: # .. qmp-example:: - # :title: Using query-block # - # -> { "execute": "query-block" } - # <- { ... } + # -> { "execute": "query-name" } + # <- { "return": { "name": "Fred" } } -More complex or multi-step examples where exposition is needed before or -between QMP code blocks can be created by using the ``:annotated:`` -directive option. When using this option, nested QMP code blocks must be -entered explicitly with rST's ``::`` syntax. - -Highlighting in non-QMP languages can be accomplished by using the -``.. code-block:: lang`` directive, and non-highlighted text can be -achieved by omitting the language argument. +More complex or multi-step examples where exposition is needed before +or between QMP code blocks can be created by using the ``:annotated:`` +directive option. When using this option, nested QMP code blocks must +be entered explicitly with rST's ``::`` syntax. For example:: @@ -1059,11 +1058,21 @@ For example:: # This is a more complex example that can use # ``arbitrary rST syntax`` in its exposition:: # - # -> { "execute": "query-block" } - # <- { ... } + # -> { "execute": "query-block" } + # <- { "return": [ + # { + # "device": "ide0-hd0", + # ... + # } + # ... more ... + # ] } # # Above, lengthy output has been omitted for brevity. +Highlighting in non-QMP languages can be accomplished by using the +``.. code-block:: lang`` directive, and non-highlighted text can be +achieved by omitting the language argument. + Examples of complete definition documentation:: @@ -1464,7 +1473,9 @@ As an example, we'll use the following schema, which describes a single complex user-defined type, along with command which takes a list of that type as a parameter, and returns a single element of that type. The user is responsible for writing the implementation of -qmp_my_command(); everything else is produced by the generator. :: +qmp_my_command(); everything else is produced by the generator. + +:: $ cat example-schema.json { 'struct': 'UserDefOne', @@ -1854,7 +1865,7 @@ Example:: #ifndef EXAMPLE_QAPI_INIT_COMMANDS_H #define EXAMPLE_QAPI_INIT_COMMANDS_H - #include "qapi/qmp/dispatch.h" + #include "qapi/qmp-registry.h" void example_qmp_init_marshal(QmpCommandList *cmds); @@ -1985,7 +1996,7 @@ Example:: #ifndef EXAMPLE_QAPI_INTROSPECT_H #define EXAMPLE_QAPI_INTROSPECT_H - #include "qapi/qmp/qlit.h" + #include "qobject/qlit.h" extern const QLitObject example_qmp_schema_qlit; diff --git a/docs/devel/qapi-domain.rst b/docs/devel/qapi-domain.rst new file mode 100644 index 0000000..1123872 --- /dev/null +++ b/docs/devel/qapi-domain.rst @@ -0,0 +1,716 @@ +====================== +The Sphinx QAPI Domain +====================== + +An extension to the `rST syntax +<https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html>`_ +in Sphinx is provided by the QAPI Domain, located in +``docs/sphinx/qapi_domain.py``. This extension is analogous to the +`Python Domain +<https://www.sphinx-doc.org/en/master/usage/domains/python.html>`_ +included with Sphinx, but provides special directives and roles +speciically for annotating and documenting QAPI definitions +specifically. + +A `Domain +<https://www.sphinx-doc.org/en/master/usage/domains/index.html>`_ +provides a set of special rST directives and cross-referencing roles to +Sphinx for understanding rST markup written to document a specific +language. By itself, this QAPI extension is only sufficient to parse rST +markup written by hand; the `autodoc +<https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html>`_ +functionality is provided elsewhere, in ``docs/sphinx/qapidoc.py``, by +the "Transmogrifier". + +It is not expected that any developer nor documentation writer would +never need to write *nor* read these special rST forms. However, in the +event that something needs to be debugged, knowing the syntax of the +domain is quite handy. This reference may also be useful as a guide for +understanding the QAPI Domain extension code itself. Although most of +these forms will not be needed for documentation writing purposes, +understanding the cross-referencing syntax *will* be helpful when +writing rST documentation elsewhere, or for enriching the body of +QAPIDoc blocks themselves. + + +Concepts +======== + +The QAPI Domain itself provides no mechanisms for reading the QAPI +Schema or generating documentation from code that exists. It is merely +the rST syntax used to describe things. For instance, the Sphinx Python +domain adds syntax like ``:py:func:`` for describing Python functions in +documentation, but it's the autodoc module that is responsible for +reading Python code and generating such syntax. QAPI is analogous here: +qapidoc.py is responsible for reading the QAPI Schema and generating rST +syntax, and qapi_domain.py is responsible for translating that special +syntax and providing APIs for Sphinx internals. + +In other words: + +qapi_domain.py adds syntax like ``.. qapi:command::`` to Sphinx, and +qapidoc.py transforms the documentation in ``qapi/*.json`` into rST +using directives defined by the domain. + +Or even shorter: + +``:py:`` is to ``:qapi:`` as *autodoc* is to *qapidoc*. + + +Info Field Lists +================ + +`Field lists +<https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#field-lists>`_ +are a standard syntax in reStructuredText. Sphinx `extends that syntax +<https://www.sphinx-doc.org/en/master/usage/domains/python.html#info-field-lists>`_ +to give certain field list entries special meaning and parsing to, for +example, add cross-references. The QAPI Domain takes advantage of this +field list extension to document things like Arguments, Members, Values, +and so on. + +The special parsing and handling of info field lists in Sphinx is provided by +three main classes; Field, GroupedField, and TypedField. The behavior +and formatting for each configured field list entry in the domain +changes depending on which class is used. + +Field: + * Creates an ungrouped field: i.e., each entry will create its own + section and they will not be combined. + * May *optionally* support an argument. + * May apply cross-reference roles to *either* the argument *or* the + content body, both, or neither. + +This is used primarily for entries which are not expected to be +repeated, i.e., items that may only show up at most once. The QAPI +domain uses this class for "Errors" section. + +GroupedField: + * Creates a grouped field: i.e. multiple adjacent entries will be + merged into one section, and the content will form a bulleted list. + * *Must* take an argument. + * May optionally apply a cross-reference role to the argument, but not + the body. + * Can be configured to remove the bulleted list if there is only a + single entry. + * All items will be generated with the form: "argument -- body" + +This is used for entries which are expected to be repeated, but aren't +expected to have two arguments, i.e. types without names, or names +without types. The QAPI domain uses this class for features, returns, +and enum values. + +TypedField: + * Creates a grouped, typed field. Multiple adjacent entres will be + merged into one section, and the content will form a bulleted list. + * *Must* take at least one argument, but supports up to two - + nominally, a name and a type. + * May optionally apply a cross-reference role to the type or the name + argument, but not the body. + * Can be configured to remove the bulleted list if there is only a + single entry. + * All items will be generated with the form "name (type) -- body" + +This is used for entries that are expected to be repeated and will have +a name, a type, and a description. The QAPI domain uses this class for +arguments, alternatives, and members. Wherever type names are referenced +below, They must be a valid, documented type that will be +cross-referenced in the HTML output; or one of the built-in JSON types +(string, number, int, boolean, null, value, q_empty). + + +``:feat:`` +---------- + +Document a feature attached to a QAPI definition. + +:availability: This field list is available in the body of Command, + Event, Enum, Object and Alternate directives. +:syntax: ``:feat name: Lorem ipsum, dolor sit amet...`` +:type: `sphinx.util.docfields.GroupedField + <https://pydoc.dev/sphinx/latest/sphinx.util.docfields.GroupedField.html?private=1>`_ + +Example:: + + .. qapi:object:: BlockdevOptionsVirtioBlkVhostVdpa + :since: 7.2 + :ifcond: CONFIG_BLKIO + + Driver specific block device options for the virtio-blk-vhost-vdpa + backend. + + :memb string path: path to the vhost-vdpa character device. + :feat fdset: Member ``path`` supports the special "/dev/fdset/N" path + (since 8.1) + + +``:arg:`` +--------- + +Document an argument to a QAPI command. + +:availability: This field list is only available in the body of the + Command directive. +:syntax: ``:arg type name: description`` +:type: `sphinx.util.docfields.TypedField + <https://pydoc.dev/sphinx/latest/sphinx.util.docfields.TypedField.html?private=1>`_ + + +Example:: + + .. qapi:command:: job-pause + :since: 3.0 + + Pause an active job. + + This command returns immediately after marking the active job for + pausing. Pausing an already paused job is an error. + + The job will pause as soon as possible, which means transitioning + into the PAUSED state if it was RUNNING, or into STANDBY if it was + READY. The corresponding JOB_STATUS_CHANGE event will be emitted. + + Cancelling a paused job automatically resumes it. + + :arg string id: The job identifier. + + +``:error:`` +----------- + +Document the error condition(s) of a QAPI command. + +:availability: This field list is only available in the body of the + Command directive. +:syntax: ``:error: Lorem ipsum dolor sit amet ...`` +:type: `sphinx.util.docfields.Field + <https://pydoc.dev/sphinx/latest/sphinx.util.docfields.Field.html?private=1>`_ + +The format of the :errors: field list description is free-form rST. The +alternative spelling ":errors:" is also permitted, but strictly +analogous. + +Example:: + + .. qapi:command:: block-job-set-speed + :since: 1.1 + + Set maximum speed for a background block operation. + + This command can only be issued when there is an active block job. + + Throttling can be disabled by setting the speed to 0. + + :arg string device: The job identifier. This used to be a device + name (hence the name of the parameter), but since QEMU 2.7 it + can have other values. + :arg int speed: the maximum speed, in bytes per second, or 0 for + unlimited. Defaults to 0. + :error: + - If no background operation is active on this device, + DeviceNotActive + + +``:return:`` +------------- + +Document the return type(s) and value(s) of a QAPI command. + +:availability: This field list is only available in the body of the + Command directive. +:syntax: ``:return type: Lorem ipsum dolor sit amet ...`` +:type: `sphinx.util.docfields.GroupedField + <https://pydoc.dev/sphinx/latest/sphinx.util.docfields.GroupedField.html?private=1>`_ + + +Example:: + + .. qapi:command:: query-replay + :since: 5.2 + + Retrieve the record/replay information. It includes current + instruction count which may be used for ``replay-break`` and + ``replay-seek`` commands. + + :return ReplayInfo: record/replay information. + + .. qmp-example:: + + -> { "execute": "query-replay" } + <- { "return": { + "mode": "play", "filename": "log.rr", "icount": 220414 } + } + + +``:value:`` +----------- + +Document a possible value for a QAPI enum. + +:availability: This field list is only available in the body of the Enum + directive. +:syntax: ``:value name: Lorem ipsum, dolor sit amet ...`` +:type: `sphinx.util.docfields.GroupedField + <https://pydoc.dev/sphinx/latest/sphinx.util.docfields.GroupedField.html?private=1>`_ + +Example:: + + .. qapi:enum:: QapiErrorClass + :since: 1.2 + + QEMU error classes + + :value GenericError: this is used for errors that don't require a specific + error class. This should be the default case for most errors + :value CommandNotFound: the requested command has not been found + :value DeviceNotActive: a device has failed to be become active + :value DeviceNotFound: the requested device has not been found + :value KVMMissingCap: the requested operation can't be fulfilled because a + required KVM capability is missing + + +``:alt:`` +------------ + +Document a possible branch for a QAPI alternate. + +:availability: This field list is only available in the body of the + Alternate directive. +:syntax: ``:alt type name: Lorem ipsum, dolor sit amet ...`` +:type: `sphinx.util.docfields.TypedField + <https://pydoc.dev/sphinx/latest/sphinx.util.docfields.TypedField.html?private=1>`_ + +As a limitation of Sphinx, we must document the "name" of the branch in +addition to the type, even though this information is not visible on the +wire in the QMP protocol format. This limitation *may* be lifted at a +future date. + +Example:: + + .. qapi:alternate:: StrOrNull + :since: 2.10 + + This is a string value or the explicit lack of a string (null + pointer in C). Intended for cases when 'optional absent' already + has a different meaning. + + :alt string s: the string value + :alt null n: no string value + + +``:memb:`` +---------- + +Document a member of an Event or Object. + +:availability: This field list is available in the body of Event or + Object directives. +:syntax: ``:memb type name: Lorem ipsum, dolor sit amet ...`` +:type: `sphinx.util.docfields.TypedField + <https://pydoc.dev/sphinx/latest/sphinx.util.docfields.TypedField.html?private=1>`_ + +This is fundamentally the same as ``:arg:`` and ``:alt:``, but uses the +"Members" phrasing for Events and Objects (Structs and Unions). + +Example:: + + .. qapi:event:: JOB_STATUS_CHANGE + :since: 3.0 + + Emitted when a job transitions to a different status. + + :memb string id: The job identifier + :memb JobStatus status: The new job status + + +Arbitrary field lists +--------------------- + +Other field list names, while valid rST syntax, are prohibited inside of +QAPI directives to help prevent accidental misspellings of info field +list names. If you want to add a new arbitrary "non-value-added" field +list to QAPI documentation, you must add the field name to the allow +list in ``docs/conf.py`` + +For example:: + + qapi_allowed_fields = { + "see also", + } + +Will allow you to add arbitrary field lists in QAPI directives:: + + .. qapi:command:: x-fake-command + + :see also: Lorem ipsum, dolor sit amet ... + + +Cross-references +================ + +Cross-reference `roles +<https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html>`_ +in the QAPI domain are modeled closely after the `Python +cross-referencing syntax +<https://www.sphinx-doc.org/en/master/usage/domains/python.html#cross-referencing-python-objects>`_. + +QAPI definitions can be referenced using the standard `any +<https://www.sphinx-doc.org/en/master/usage/referencing.html#role-any>`_ +role cross-reference syntax, such as with ```query-blockstats```. In +the event that disambiguation is needed, cross-references can also be +written using a number of explicit cross-reference roles: + +* ``:qapi:mod:`block-core``` -- Reference a QAPI module. The link will + take you to the beginning of that section in the documentation. +* ``:qapi:cmd:`query-block``` -- Reference a QAPI command. +* ``:qapi:event:`JOB_STATUS_CHANGE``` -- Reference a QAPI event. +* ``:qapi:enum:`QapiErrorClass``` -- Reference a QAPI enum. +* ``:qapi:obj:`BlockdevOptionsVirtioBlkVhostVdpa`` -- Reference a QAPI + object (struct or union) +* ``:qapi:alt:`StrOrNull``` -- Reference a QAPI alternate. +* ``:qapi:type:`BlockDirtyInfo``` -- Reference *any* QAPI type; this + excludes modules, commands, and events. +* ``:qapi:any:`block-job-set-speed``` -- Reference absolutely any QAPI entity. + +Type arguments in info field lists are converted into references as if +you had used the ``:qapi:type:`` role. All of the special syntax below +applies to both info field lists and standalone explicit +cross-references. + + +Type decorations +---------------- + +Type names in references can be surrounded by brackets, like +``[typename]``, to indicate an array of that type. The cross-reference +will apply only to the type name between the brackets. For example; +``:qapi:type:`[Qcow2BitmapInfoFlags]``` renders to: +:qapi:type:`[QMP:Qcow2BitmapInfoFlags]` + +To indicate an optional argument/member in a field list, the type name +can be suffixed with ``?``. The cross-reference will be transformed to +"type, Optional" with the link applying only to the type name. For +example; ``:qapi:type:`BitmapSyncMode?``` renders to: +:qapi:type:`QMP:BitmapSyncMode?` + + +Namespaces +---------- + +Mimicking the `Python domain target specification syntax +<https://www.sphinx-doc.org/en/master/usage/domains/python.html#target-specification>`_, +QAPI allows you to specify the fully qualified path for a data +type. + +* A namespace can be explicitly provided; + e.g. ``:qapi:type:`QMP:BitmapSyncMode`` +* A module can be explicitly provided; + ``:qapi:type:`QMP:block-core.BitmapSyncMode``` will render to: + :qapi:type:`QMP:block-core.BitmapSyncMode` +* If you don't want to display the "fully qualified" name, it can be + prefixed with a tilde; ``:qapi:type:`~QMP:block-core.BitmapSyncMode``` + will render to: :qapi:type:`~QMP:block-core.BitmapSyncMode` + + +Target resolution +----------------- + +Any cross-reference to a QAPI type, whether using the ```any``` style of +reference or the more explicit ```:qapi:any:`target``` syntax, allows +for the presence or absence of either the namespace or module +information. + +When absent, their value will be inferred from context by the presence +of any ``qapi:namespace`` or ``qapi:module`` directives preceding the +cross-reference. + +If no results are found when using the inferred values, other +namespaces/modules will be searched as a last resort; but any explicitly +provided values must always match in order to succeed. + +This allows for efficient cross-referencing with a minimum of syntax in +the large majority of cases, but additional context or namespace markup +may be required outside of the QAPI reference documents when linking to +items that share a name across multiple documented QAPI schema. + + +Custom link text +---------------- + +The name of a cross-reference link can be explicitly overridden like +`most stock Sphinx references +<https://www.sphinx-doc.org/en/master/usage/referencing.html#syntax>`_ +using the ``custom text <target>`` syntax. + +For example, ``:qapi:cmd:`Merge dirty bitmaps +<block-dirty-bitmap-merge>``` will render as: :qapi:cmd:`Merge dirty +bitmaps <QMP:block-dirty-bitmap-merge>` + + +Directives +========== + +The QAPI domain adds a number of custom directives for documenting +various QAPI/QMP entities. The syntax is plain rST, and follows this +general format:: + + .. qapi:directive:: argument + :option: + :another-option: with an argument + + Content body, arbitrary rST is allowed here. + + +Sphinx standard options +----------------------- + +All QAPI directives inherit a number of `standard options +<https://www.sphinx-doc.org/en/master/usage/domains/index.html#basic-markup>`_ +from Sphinx's ObjectDescription class. + +The dashed spellings of the below options were added in Sphinx 7.2, the +undashed spellings are currently retained as aliases, but will be +removed in a future version. + +* ``:no-index:`` and ``:noindex:`` -- Do not add this item into the + Index, and do not make it available for cross-referencing. +* ``no-index-entry:`` and ``:noindexentry:`` -- Do not add this item + into the Index, but allow it to be cross-referenced. +* ``no-contents-entry`` and ``:nocontentsentry:`` -- Exclude this item + from the Table of Contents. +* ``no-typesetting`` -- Create TOC, Index and cross-referencing + entities, but don't actually display the content. + + +QAPI standard options +--------------------- + +All QAPI directives -- *except* for namespace and module -- support +these common options. + +* ``:namespace: name`` -- This option allows you to override the + namespace association of a given definition. +* ``:module: modname`` -- Borrowed from the Python domain, this option allows + you to override the module association of a given definition. +* ``:since: x.y`` -- Allows the documenting of "Since" information, which is + displayed in the signature bar. +* ``:ifcond: CONDITION`` -- Allows the documenting of conditional availability + information, which is displayed in an eyecatch just below the + signature bar. +* ``:deprecated:`` -- Adds an eyecatch just below the signature bar that + advertises that this definition is deprecated and should be avoided. +* ``:unstable:`` -- Adds an eyecatch just below the signature bar that + advertises that this definition is unstable and should not be used in + production code. + + +qapi:namespace +-------------- + +The ``qapi:namespace`` directive marks the start of a QAPI namespace. It +does not take a content body, nor any options. All subsequent QAPI +directives are associated with the most recent namespace. This affects +the definition's "fully qualified name", allowing two different +namespaces to create an otherwise identically named definition. + +This directive also influences how reference resolution works for any +references that do not explicitly specify a namespace, so this directive +can be used to nudge references into preferring targets from within that +namespace. + +Example:: + + .. qapi:namespace:: QMP + + +This directive has no visible effect. + + +qapi:module +----------- + +The ``qapi:module`` directive marks the start of a QAPI module. It may have +a content body, but it can be omitted. All subsequent QAPI directives +are associated with the most recent module; this effects their "fully +qualified" name, but has no other effect. + +Example:: + + .. qapi:module:: block-core + + Welcome to the block-core module! + +Will be rendered as: + +.. qapi:module:: block-core + :noindex: + + Welcome to the block-core module! + + +qapi:command +------------ + +This directive documents a QMP command. It may use any of the standard +Sphinx or QAPI options, and the documentation body may contain +``:arg:``, ``:feat:``, ``:error:``, or ``:return:`` info field list +entries. + +Example:: + + .. qapi:command:: x-fake-command + :since: 42.0 + :unstable: + + This command is fake, so it can't hurt you! + + :arg int foo: Your favorite number. + :arg string? bar: Your favorite season. + :return [string]: A lovely computer-written poem for you. + + +Will be rendered as: + + .. qapi:command:: x-fake-command + :noindex: + :since: 42.0 + :unstable: + + This command is fake, so it can't hurt you! + + :arg int foo: Your favorite number. + :arg string? bar: Your favorite season. + :return [string]: A lovely computer-written poem for you. + + +qapi:event +---------- + +This directive documents a QMP event. It may use any of the standard +Sphinx or QAPI options, and the documentation body may contain +``:memb:`` or ``:feat:`` info field list entries. + +Example:: + + .. qapi:event:: COMPUTER_IS_RUINED + :since: 0.1 + :deprecated: + + This event is emitted when your computer is *extremely* ruined. + + :memb string reason: Diagnostics as to what caused your computer to + be ruined. + :feat sadness: When present, the diagnostic message will also + explain how sad the computer is as a result of your wrongdoings. + +Will be rendered as: + +.. qapi:event:: COMPUTER_IS_RUINED + :noindex: + :since: 0.1 + :deprecated: + + This event is emitted when your computer is *extremely* ruined. + + :memb string reason: Diagnostics as to what caused your computer to + be ruined. + :feat sadness: When present, the diagnostic message will also explain + how sad the computer is as a result of your wrongdoings. + + +qapi:enum +--------- + +This directive documents a QAPI enum. It may use any of the standard +Sphinx or QAPI options, and the documentation body may contain +``:value:`` or ``:feat:`` info field list entries. + +Example:: + + .. qapi:enum:: Mood + :ifcond: LIB_PERSONALITY + + This enum represents your virtual machine's current mood! + + :value Happy: Your VM is content and well-fed. + :value Hungry: Your VM needs food. + :value Melancholic: Your VM is experiencing existential angst. + :value Petulant: Your VM is throwing a temper tantrum. + +Will be rendered as: + +.. qapi:enum:: Mood + :noindex: + :ifcond: LIB_PERSONALITY + + This enum represents your virtual machine's current mood! + + :value Happy: Your VM is content and well-fed. + :value Hungry: Your VM needs food. + :value Melancholic: Your VM is experiencing existential angst. + :value Petulant: Your VM is throwing a temper tantrum. + + +qapi:object +----------- + +This directive documents a QAPI structure or union and represents a QMP +object. It may use any of the standard Sphinx or QAPI options, and the +documentation body may contain ``:memb:`` or ``:feat:`` info field list +entries. + +Example:: + + .. qapi:object:: BigBlobOfStuff + + This object has a bunch of disparate and unrelated things in it. + + :memb int Birthday: Your birthday, represented in seconds since the + UNIX epoch. + :memb [string] Fav-Foods: A list of your favorite foods. + :memb boolean? Bizarre-Docs: True if the documentation reference + should be strange. + +Will be rendered as: + +.. qapi:object:: BigBlobOfStuff + :noindex: + + This object has a bunch of disparate and unrelated things in it. + + :memb int Birthday: Your birthday, represented in seconds since the + UNIX epoch. + :memb [string] Fav-Foods: A list of your favorite foods. + :memb boolean? Bizarre-Docs: True if the documentation reference + should be strange. + + +qapi:alternate +-------------- + +This directive documents a QAPI alternate. It may use any of the +standard Sphinx or QAPI options, and the documentation body may contain +``:alt:`` or ``:feat:`` info field list entries. + +Example:: + + .. qapi:alternate:: ErrorCode + + This alternate represents an Error Code from the VM. + + :alt int ec: An error code, like the type you're used to. + :alt string em: An expletive-laced error message, if your + computer is feeling particularly cranky and tired of your + antics. + +Will be rendered as: + +.. qapi:alternate:: ErrorCode + :noindex: + + This alternate represents an Error Code from the VM. + + :alt int ec: An error code, like the type you're used to. + :alt string em: An expletive-laced error message, if your + computer is feeling particularly cranky and tired of your + antics. diff --git a/docs/devel/qom.rst b/docs/devel/qom.rst index 0889ca9..5870745 100644 --- a/docs/devel/qom.rst +++ b/docs/devel/qom.rst @@ -147,7 +147,7 @@ to introduce an overridden virtual function: #include "qdev.h" - void my_device_class_init(ObjectClass *klass, void *class_data) + void my_device_class_init(ObjectClass *klass, const void *class_data) { DeviceClass *dc = DEVICE_CLASS(klass); dc->reset = my_device_reset; @@ -249,7 +249,7 @@ class, which someone might choose to change at some point. // do something } - static void my_class_init(ObjectClass *oc, void *data) + static void my_class_init(ObjectClass *oc, const void *data) { MyClass *mc = MY_CLASS(oc); @@ -279,7 +279,7 @@ class, which someone might choose to change at some point. // do something else here } - static void derived_class_init(ObjectClass *oc, void *data) + static void derived_class_init(ObjectClass *oc, const void *data) { MyClass *mc = MY_CLASS(oc); DerivedClass *dc = DERIVED_CLASS(oc); @@ -363,7 +363,7 @@ This is equivalent to the following: :caption: Expansion from defining a simple type static void my_device_finalize(Object *obj); - static void my_device_class_init(ObjectClass *oc, void *data); + static void my_device_class_init(ObjectClass *oc, const void *data); static void my_device_init(Object *obj); static const TypeInfo my_device_info = { diff --git a/docs/devel/rcu.txt b/docs/devel/rcu.rst index 2e6cc60..dd07c1d 100644 --- a/docs/devel/rcu.txt +++ b/docs/devel/rcu.rst @@ -20,7 +20,7 @@ for the execution of all *currently running* critical sections before proceeding, or before asynchronously executing a callback. The key point here is that only the currently running critical sections -are waited for; critical sections that are started _after_ the beginning +are waited for; critical sections that are started **after** the beginning of the wait do not extend the wait, despite running concurrently with the updater. This is the reason why RCU is more scalable than, for example, reader-writer locks. It is so much more scalable that @@ -37,7 +37,7 @@ do not matter; as soon as all previous critical sections have finished, there cannot be any readers who hold references to the data structure, and these can now be safely reclaimed (e.g., freed or unref'ed). -Here is a picture: +Here is a picture:: thread 1 thread 2 thread 3 ------------------- ------------------------ ------------------- @@ -58,43 +58,38 @@ that critical section. RCU API -======= +------- The core RCU API is small: - void rcu_read_lock(void); - +``void rcu_read_lock(void);`` Used by a reader to inform the reclaimer that the reader is entering an RCU read-side critical section. - void rcu_read_unlock(void); - +``void rcu_read_unlock(void);`` Used by a reader to inform the reclaimer that the reader is exiting an RCU read-side critical section. Note that RCU read-side critical sections may be nested and/or overlapping. - void synchronize_rcu(void); - +``void synchronize_rcu(void);`` Blocks until all pre-existing RCU read-side critical sections on all threads have completed. This marks the end of the removal phase and the beginning of reclamation phase. Note that it would be valid for another update to come while - synchronize_rcu is running. Because of this, it is better that + ``synchronize_rcu`` is running. Because of this, it is better that the updater releases any locks it may hold before calling - synchronize_rcu. If this is not possible (for example, because - the updater is protected by the BQL), you can use call_rcu. + ``synchronize_rcu``. If this is not possible (for example, because + the updater is protected by the BQL), you can use ``call_rcu``. - void call_rcu1(struct rcu_head * head, - void (*func)(struct rcu_head *head)); - - This function invokes func(head) after all pre-existing RCU +``void call_rcu1(struct rcu_head * head, void (*func)(struct rcu_head *head));`` + This function invokes ``func(head)`` after all pre-existing RCU read-side critical sections on all threads have completed. This marks the end of the removal phase, with func taking care asynchronously of the reclamation phase. - The foo struct needs to have an rcu_head structure added, - perhaps as follows: + The ``foo`` struct needs to have an ``rcu_head`` structure added, + perhaps as follows:: struct foo { struct rcu_head rcu; @@ -103,8 +98,8 @@ The core RCU API is small: long c; }; - so that the reclaimer function can fetch the struct foo address - and free it: + so that the reclaimer function can fetch the ``struct foo`` address + and free it:: call_rcu1(&foo.rcu, foo_reclaim); @@ -114,29 +109,27 @@ The core RCU API is small: g_free(fp); } - For the common case where the rcu_head member is the first of the - struct, you can use the following macro. + ``call_rcu1`` is typically used via either the ``call_rcu`` or + ``g_free_rcu`` macros, which handle the common case where the + ``rcu_head`` member is the first of the struct. - void call_rcu(T *p, - void (*func)(T *p), - field-name); - void g_free_rcu(T *p, - field-name); +``void call_rcu(T *p, void (*func)(T *p), field-name);`` + If the ``struct rcu_head`` is the first field in the struct, you can + use this macro instead of ``call_rcu1``. - call_rcu1 is typically used through these macro, in the common case - where the "struct rcu_head" is the first field in the struct. If - the callback function is g_free, in particular, g_free_rcu can be - used. In the above case, one could have written simply: +``void g_free_rcu(T *p, field-name);`` + This is a special-case version of ``call_rcu`` where the callback + function is ``g_free``. + In the example given in ``call_rcu1``, one could have written simply:: g_free_rcu(&foo, rcu); - typeof(*p) qatomic_rcu_read(p); - - qatomic_rcu_read() is similar to qatomic_load_acquire(), but it makes - some assumptions on the code that calls it. This allows a more - optimized implementation. +``typeof(*p) qatomic_rcu_read(p);`` + ``qatomic_rcu_read()`` is similar to ``qatomic_load_acquire()``, but + it makes some assumptions on the code that calls it. This allows a + more optimized implementation. - qatomic_rcu_read assumes that whenever a single RCU critical + ``qatomic_rcu_read`` assumes that whenever a single RCU critical section reads multiple shared data, these reads are either data-dependent or need no ordering. This is almost always the case when using RCU, because read-side critical sections typically @@ -144,7 +137,7 @@ The core RCU API is small: every update) until reaching a data structure of interest, and then read from there. - RCU read-side critical sections must use qatomic_rcu_read() to + RCU read-side critical sections must use ``qatomic_rcu_read()`` to read data, unless concurrent writes are prevented by another synchronization mechanism. @@ -152,18 +145,17 @@ The core RCU API is small: data structure in a single direction, opposite to the direction in which the updater initializes it. - void qatomic_rcu_set(p, typeof(*p) v); - - qatomic_rcu_set() is similar to qatomic_store_release(), though it also - makes assumptions on the code that calls it in order to allow a more - optimized implementation. +``void qatomic_rcu_set(p, typeof(*p) v);`` + ``qatomic_rcu_set()`` is similar to ``qatomic_store_release()``, + though it also makes assumptions on the code that calls it in + order to allow a more optimized implementation. - In particular, qatomic_rcu_set() suffices for synchronization + In particular, ``qatomic_rcu_set()`` suffices for synchronization with readers, if the updater never mutates a field within a data item that is already accessible to readers. This is the case when initializing a new copy of the RCU-protected data - structure; just ensure that initialization of *p is carried out - before qatomic_rcu_set() makes the data item visible to readers. + structure; just ensure that initialization of ``*p`` is carried out + before ``qatomic_rcu_set()`` makes the data item visible to readers. If this rule is observed, writes will happen in the opposite order as reads in the RCU read-side critical sections (or if there is just one update), and there will be no need for other @@ -171,58 +163,54 @@ The core RCU API is small: The following APIs must be used before RCU is used in a thread: - void rcu_register_thread(void); - +``void rcu_register_thread(void);`` Mark a thread as taking part in the RCU mechanism. Such a thread will have to report quiescent points regularly, either manually - or through the QemuCond/QemuSemaphore/QemuEvent APIs. - - void rcu_unregister_thread(void); + or through the ``QemuCond``/``QemuSemaphore``/``QemuEvent`` APIs. +``void rcu_unregister_thread(void);`` Mark a thread as not taking part anymore in the RCU mechanism. It is not a problem if such a thread reports quiescent points, - either manually or by using the QemuCond/QemuSemaphore/QemuEvent - APIs. + either manually or by using the + ``QemuCond``/``QemuSemaphore``/``QemuEvent`` APIs. -Note that these APIs are relatively heavyweight, and should _not_ be +Note that these APIs are relatively heavyweight, and should **not** be nested. Convenience macros -================== +------------------ Two macros are provided that automatically release the read lock at the end of the scope. - RCU_READ_LOCK_GUARD() - +``RCU_READ_LOCK_GUARD()`` Takes the lock and will release it at the end of the block it's used in. - WITH_RCU_READ_LOCK_GUARD() { code } - +``WITH_RCU_READ_LOCK_GUARD() { code }`` Is used at the head of a block to protect the code within the block. -Note that 'goto'ing out of the guarded block will also drop the lock. +Note that a ``goto`` out of the guarded block will also drop the lock. -DIFFERENCES WITH LINUX -====================== +Differences with Linux +---------------------- - Waiting on a mutex is possible, though discouraged, within an RCU critical section. This is because spinlocks are rarely (if ever) used in userspace programming; not allowing this would prevent upgrading an RCU read-side critical section to become an updater. -- qatomic_rcu_read and qatomic_rcu_set replace rcu_dereference and - rcu_assign_pointer. They take a _pointer_ to the variable being accessed. +- ``qatomic_rcu_read`` and ``qatomic_rcu_set`` replace ``rcu_dereference`` and + ``rcu_assign_pointer``. They take a **pointer** to the variable being accessed. -- call_rcu is a macro that has an extra argument (the name of the first - field in the struct, which must be a struct rcu_head), and expects the +- ``call_rcu`` is a macro that has an extra argument (the name of the first + field in the struct, which must be a struct ``rcu_head``), and expects the type of the callback's argument to be the type of the first argument. - call_rcu1 is the same as Linux's call_rcu. + ``call_rcu1`` is the same as Linux's ``call_rcu``. -RCU PATTERNS -============ +RCU Patterns +------------ Many patterns using read-writer locks translate directly to RCU, with the advantages of higher scalability and deadlock immunity. @@ -243,28 +231,28 @@ Here are some frequently-used RCU idioms that are worth noting. RCU list processing -------------------- +^^^^^^^^^^^^^^^^^^^ TBD (not yet used in QEMU) RCU reference counting ----------------------- +^^^^^^^^^^^^^^^^^^^^^^ Because grace periods are not allowed to complete while there is an RCU read-side critical section in progress, the RCU read-side primitives may be used as a restricted reference-counting mechanism. For example, -consider the following code fragment: +consider the following code fragment:: rcu_read_lock(); p = qatomic_rcu_read(&foo); /* do something with p. */ rcu_read_unlock(); -The RCU read-side critical section ensures that the value of "p" remains -valid until after the rcu_read_unlock(). In some sense, it is acquiring -a reference to p that is later released when the critical section ends. -The write side looks simply like this (with appropriate locking): +The RCU read-side critical section ensures that the value of ``p`` remains +valid until after the ``rcu_read_unlock()``. In some sense, it is acquiring +a reference to ``p`` that is later released when the critical section ends. +The write side looks simply like this (with appropriate locking):: qemu_mutex_lock(&foo_mutex); old = foo; @@ -274,7 +262,7 @@ The write side looks simply like this (with appropriate locking): free(old); If the processing cannot be done purely within the critical section, it -is possible to combine this idiom with a "real" reference count: +is possible to combine this idiom with a "real" reference count:: rcu_read_lock(); p = qatomic_rcu_read(&foo); @@ -283,7 +271,7 @@ is possible to combine this idiom with a "real" reference count: /* do something with p. */ foo_unref(p); -The write side can be like this: +The write side can be like this:: qemu_mutex_lock(&foo_mutex); old = foo; @@ -292,7 +280,7 @@ The write side can be like this: synchronize_rcu(); foo_unref(old); -or with call_rcu: +or with ``call_rcu``:: qemu_mutex_lock(&foo_mutex); old = foo; @@ -301,10 +289,10 @@ or with call_rcu: call_rcu(foo_unref, old, rcu); In both cases, the write side only performs removal. Reclamation -happens when the last reference to a "foo" object is dropped. -Using synchronize_rcu() is undesirably expensive, because the +happens when the last reference to a ``foo`` object is dropped. +Using ``synchronize_rcu()`` is undesirably expensive, because the last reference may be dropped on the read side. Hence you can -use call_rcu() instead: +use ``call_rcu()`` instead:: foo_unref(struct foo *p) { if (qatomic_fetch_dec(&p->refcount) == 1) { @@ -314,7 +302,7 @@ use call_rcu() instead: Note that the same idioms would be possible with reader/writer -locks: +locks:: read_lock(&foo_rwlock); write_mutex_lock(&foo_rwlock); p = foo; p = foo; @@ -334,15 +322,15 @@ locks: foo_unref(p); read_unlock(&foo_rwlock); -foo_unref could use a mechanism such as bottom halves to move deallocation +``foo_unref`` could use a mechanism such as bottom halves to move deallocation out of the write-side critical section. RCU resizable arrays --------------------- +^^^^^^^^^^^^^^^^^^^^ Resizable arrays can be used with RCU. The expensive RCU synchronization -(or call_rcu) only needs to take place when the array is resized. +(or ``call_rcu``) only needs to take place when the array is resized. The two items to take care of are: - ensuring that the old version of the array is available between removal @@ -351,10 +339,10 @@ The two items to take care of are: - avoiding mismatches in the read side between the array data and the array size. -The first problem is avoided simply by not using realloc. Instead, +The first problem is avoided simply by not using ``realloc``. Instead, each resize will allocate a new array and copy the old data into it. The second problem would arise if the size and the data pointers were -two members of a larger struct: +two members of a larger struct:: struct mystuff { ... @@ -364,7 +352,7 @@ two members of a larger struct: ... }; -Instead, we store the size of the array with the array itself: +Instead, we store the size of the array with the array itself:: struct arr { int size; @@ -400,7 +388,7 @@ Instead, we store the size of the array with the array itself: } -SOURCES -======= +References +---------- -* Documentation/RCU/ from the Linux kernel +* The `Linux kernel RCU documentation <https://docs.kernel.org/RCU/>`__ diff --git a/docs/devel/replay.rst b/docs/devel/replay.rst index effd856..40f58d9 100644 --- a/docs/devel/replay.rst +++ b/docs/devel/replay.rst @@ -202,6 +202,9 @@ into the log. Saving/restoring the VM state ----------------------------- +Record/replay relies on VM state save and restore being complete and +deterministic. + All fields in the device state structure (including virtual timers) should be restored by loadvm to the same values they had before savevm. diff --git a/docs/devel/reset.rst b/docs/devel/reset.rst index 9746a4e..c02fe0a 100644 --- a/docs/devel/reset.rst +++ b/docs/devel/reset.rst @@ -44,6 +44,26 @@ The Resettable interface handles reset types with an enum ``ResetType``: value on each cold reset, such as RNG seed information, and which they must not reinitialize on a snapshot-load reset. +``RESET_TYPE_WAKEUP`` + If the machine supports waking up from a suspended state and needs to reset + its devices during wake-up (from the ``MachineClass::wakeup()`` method), this + reset type should be used for such a request. Devices can utilize this reset + type to differentiate the reset requested during machine wake-up from other + reset requests. For example, RAM content must not be lost during wake-up, and + memory devices like virtio-mem that provide additional RAM must not reset + such state during wake-ups, but might do so during cold resets. However, this + reset type should not be used for wake-up detection, as not every machine + type issues a device reset request during wake-up. + +``RESET_TYPE_S390_CPU_NORMAL`` + This is only used for S390 CPU objects; it clears interrupts, stops + processing, and clears the TLB, but does not touch register contents. + +``RESET_TYPE_S390_CPU_INITIAL`` + This is only used for S390 CPU objects; it does everything + ``RESET_TYPE_S390_CPU_NORMAL`` does and also clears the PSW, prefix, + FPC, timer and control registers. It does not touch gprs, fprs or acrs. + Devices which implement reset methods must treat any unknown ``ResetType`` as equivalent to ``RESET_TYPE_COLD``; this will reduce the amount of existing code we need to change if we add more types in future. @@ -123,6 +143,11 @@ The *exit* phase is executed only when the last reset operation ends. Therefore the object does not need to care how many of reset controllers it has and how many of them have started a reset. +DMA capable devices are expected to cancel all outstanding DMA operations +during either 'enter' or 'hold' phases. IOMMUs are expected to reset during +the 'exit' phase and this sequencing makes sure no outstanding DMA request +will fault. + Handling reset in a resettable object ------------------------------------- @@ -191,7 +216,7 @@ in reset. ResettablePhases parent_phases; } MyDevClass; - static void mydev_class_init(ObjectClass *class, void *data) + static void mydev_class_init(ObjectClass *class, const void *data) { MyDevClass *myclass = MYDEV_CLASS(class); ResettableClass *rc = RESETTABLE_CLASS(class); @@ -266,8 +291,8 @@ every reset child of the given resettable object. All children must be resettable too. Additional parameters (a reset type and an opaque pointer) must be passed to the callback too. -In ``DeviceClass`` and ``BusClass`` the ``ResettableState`` is located -``DeviceState`` and ``BusState`` structure. ``child_foreach()`` is implemented +In ``DeviceClass`` and ``BusClass`` the ``ResettableState`` is located in the +``DeviceState`` and ``BusState`` structures. ``child_foreach()`` is implemented to follow the bus hierarchy; for a bus, it calls the function on every child device; for a device, it calls the function on every bus child. When we reset the main system bus, we reset the whole machine bus tree. diff --git a/docs/devel/rust.rst b/docs/devel/rust.rst new file mode 100644 index 0000000..dc8c441 --- /dev/null +++ b/docs/devel/rust.rst @@ -0,0 +1,478 @@ +.. |msrv| replace:: 1.63.0 + +Rust in QEMU +============ + +Rust in QEMU is a project to enable using the Rust programming language +to add new functionality to QEMU. + +Right now, the focus is on making it possible to write devices that inherit +from ``SysBusDevice`` in `*safe*`__ Rust. Later, it may become possible +to write other kinds of devices (e.g. PCI devices that can do DMA), +complete boards, or backends (e.g. block device formats). + +__ https://doc.rust-lang.org/nomicon/meet-safe-and-unsafe.html + +Building the Rust in QEMU code +------------------------------ + +The Rust in QEMU code is included in the emulators via Meson. Meson +invokes rustc directly, building static libraries that are then linked +together with the C code. This is completely automatic when you run +``make`` or ``ninja``. + +However, QEMU's build system also tries to be easy to use for people who +are accustomed to the more "normal" Cargo-based development workflow. +In particular: + +* the set of warnings and lints that are used to build QEMU always + comes from the ``rust/Cargo.toml`` workspace file + +* it is also possible to use ``cargo`` for common Rust-specific coding + tasks, in particular to invoke ``clippy``, ``rustfmt`` and ``rustdoc``. + +To this end, QEMU includes a ``build.rs`` build script that picks up +generated sources from QEMU's build directory and puts it in Cargo's +output directory (typically ``rust/target/``). A vanilla invocation +of Cargo will complain that it cannot find the generated sources, +which can be fixed in different ways: + +* by using Makefile targets, provided by Meson, that run ``clippy`` or + ``rustdoc``: + + make clippy + make rustdoc + +A target for ``rustfmt`` is also declared in ``rust/meson.build``: + + make rustfmt + +* by invoking ``cargo`` through the Meson `development environment`__ + feature:: + + pyvenv/bin/meson devenv -w ../rust cargo clippy --tests + pyvenv/bin/meson devenv -w ../rust cargo fmt + + If you are going to use ``cargo`` repeatedly, ``pyvenv/bin/meson devenv`` + will enter a shell where commands like ``cargo fmt`` just work. + +__ https://mesonbuild.com/Commands.html#devenv + +* by pointing the ``MESON_BUILD_ROOT`` to the top of your QEMU build + tree. This third method is useful if you are using ``rust-analyzer``; + you can set the environment variable through the + ``rust-analyzer.cargo.extraEnv`` setting. + +As shown above, you can use the ``--tests`` option as usual to operate on test +code. Note however that you cannot *build* or run tests via ``cargo``, because +they need support C code from QEMU that Cargo does not know about. Tests can +be run via ``meson test`` or ``make``:: + + make check-rust + +Note that doctests require all ``.o`` files from the build to be available. + +Supported tools +''''''''''''''' + +QEMU supports rustc version 1.77.0 and newer. Notably, the following features +are missing: + +* inline const expression (stable in 1.79.0), currently worked around with + associated constants in the ``FnCall`` trait. + +* associated constants have to be explicitly marked ``'static`` (`changed in + 1.81.0`__) + +* ``&raw`` (stable in 1.82.0). Use ``addr_of!`` and ``addr_of_mut!`` instead, + though hopefully the need for raw pointers will go down over time. + +* ``new_uninit`` (stable in 1.82.0). This is used internally by the ``pinned_init`` + crate, which is planned for inclusion in QEMU, but it can be easily patched + out. + +* referencing statics in constants (stable in 1.83.0). For now use a const + function; this is an important limitation for QEMU's migration stream + architecture (VMState). Right now, VMState lacks type safety because + it is hard to place the ``VMStateField`` definitions in traits. + +* NUL-terminated file names with ``#[track_caller]`` are scheduled for + inclusion as ``#![feature(location_file_nul)]``, but it will be a while + before QEMU can use them. For now, there is special code in + ``util/error.c`` to support non-NUL-terminated file names. + +* associated const equality would be nice to have for some users of + ``callbacks::FnCall``, but is still experimental. ``ASSERT_IS_SOME`` + replaces it. + +__ https://github.com/rust-lang/rust/pull/125258 + +QEMU also supports version 0.60.x of bindgen, which is missing option +``--generate-cstr``. This option requires version 0.66.x and will +be adopted as soon as supporting these older versions is not necessary +anymore. + +Writing Rust code in QEMU +------------------------- + +QEMU includes four crates: + +* ``qemu_api`` for bindings to C code and useful functionality + +* ``qemu_api_macros`` defines several procedural macros that are useful when + writing C code + +* ``pl011`` (under ``rust/hw/char/pl011``) and ``hpet`` (under ``rust/hw/timer/hpet``) + are sample devices that demonstrate ``qemu_api`` and ``qemu_api_macros``, and are + used to further develop them. These two crates are functional\ [#issues]_ replacements + for the ``hw/char/pl011.c`` and ``hw/timer/hpet.c`` files. + +.. [#issues] The ``pl011`` crate is synchronized with ``hw/char/pl011.c`` + as of commit 3e0f118f82. The ``hpet`` crate is synchronized as of + commit 1433e38cc8. Both are lacking tracing functionality. + +This section explains how to work with them. + +Status +'''''' + +Modules of ``qemu_api`` can be defined as: + +- *complete*: ready for use in new devices; if applicable, the API supports the + full functionality available in C + +- *stable*: ready for production use, the API is safe and should not undergo + major changes + +- *proof of concept*: the API is subject to change but allows working with safe + Rust + +- *initial*: the API is in its initial stages; it requires large amount of + unsafe code; it might have soundness or type-safety issues + +The status of the modules is as follows: + +================ ====================== +module status +================ ====================== +``assertions`` stable +``bitops`` complete +``callbacks`` complete +``cell`` stable +``errno`` complete +``error`` stable +``irq`` complete +``log`` proof of concept +``memory`` stable +``module`` complete +``qdev`` stable +``qom`` stable +``sysbus`` stable +``timer`` stable +``vmstate`` proof of concept +``zeroable`` stable +================ ====================== + +.. note:: + API stability is not a promise, if anything because the C APIs are not a stable + interface either. Also, ``unsafe`` interfaces may be replaced by safe interfaces + later. + +Naming convention +''''''''''''''''' + +C function names usually are prefixed according to the data type that they +apply to, for example ``timer_mod`` or ``sysbus_connect_irq``. Furthermore, +both function and structs sometimes have a ``qemu_`` or ``QEMU`` prefix. +Generally speaking, these are all removed in the corresponding Rust functions: +``QEMUTimer`` becomes ``timer::Timer``, ``timer_mod`` becomes ``Timer::modify``, +``sysbus_connect_irq`` becomes ``SysBusDeviceMethods::connect_irq``. + +Sometimes however a name appears multiple times in the QOM class hierarchy, +and the only difference is in the prefix. An example is ``qdev_realize`` and +``sysbus_realize``. In such cases, whenever a name is not unique in +the hierarchy, always add the prefix to the classes that are lower in +the hierarchy; for the top class, decide on a case by case basis. + +For example: + +========================== ========================================= +``device_cold_reset()`` ``DeviceMethods::cold_reset()`` +``pci_device_reset()`` ``PciDeviceMethods::pci_device_reset()`` +``pci_bridge_reset()`` ``PciBridgeMethods::pci_bridge_reset()`` +========================== ========================================= + +Here, the name is not exactly the same, but nevertheless ``PciDeviceMethods`` +adds the prefix to avoid confusion, because the functionality of +``device_cold_reset()`` and ``pci_device_reset()`` is subtly different. + +In this case, however, no prefix is needed: + +========================== ========================================= +``device_realize()`` ``DeviceMethods::realize()`` +``sysbus_realize()`` ``SysbusDeviceMethods::sysbus_realize()`` +``pci_realize()`` ``PciDeviceMethods::pci_realize()`` +========================== ========================================= + +Here, the lower classes do not add any functionality, and mostly +provide extra compile-time checking; the basic *realize* functionality +is the same for all devices. Therefore, ``DeviceMethods`` does not +add the prefix. + +Whenever a name is unique in the hierarchy, instead, you should +always remove the class name prefix. + +Common pitfalls +''''''''''''''' + +Rust has very strict rules with respect to how you get an exclusive (``&mut``) +reference; failure to respect those rules is a source of undefined behavior. +In particular, even if a value is loaded from a raw mutable pointer (``*mut``), +it *cannot* be casted to ``&mut`` unless the value was stored to the ``*mut`` +from a mutable reference. Furthermore, it is undefined behavior if any +shared reference was created between the store to the ``*mut`` and the load:: + + let mut p: u32 = 42; + let p_mut = &mut p; // 1 + let p_raw = p_mut as *mut u32; // 2 + + // p_raw keeps the mutable reference "alive" + + let p_shared = &p; // 3 + println!("access from &u32: {}", *p_shared); + + // Bring back the mutable reference, its lifetime overlaps + // with that of a shared reference. + let p_mut = unsafe { &mut *p_raw }; // 4 + println!("access from &mut 32: {}", *p_mut); + + println!("access from &u32: {}", *p_shared); // 5 + +These rules can be tested with `MIRI`__, for example. + +__ https://github.com/rust-lang/miri + +Almost all Rust code in QEMU will involve QOM objects, and pointers to these +objects are *shared*, for example because they are part of the QOM composition +tree. This creates exactly the above scenario: + +1. a QOM object is created + +2. a ``*mut`` is created, for example as the opaque value for a ``MemoryRegion`` + +3. the QOM object is placed in the composition tree + +4. a memory access dereferences the opaque value to a ``&mut`` + +5. but the shared reference is still present in the composition tree + +Because of this, QOM objects should almost always use ``&self`` instead +of ``&mut self``; access to internal fields must use *interior mutability* +to go from a shared reference to a ``&mut``. + +Whenever C code provides you with an opaque ``void *``, avoid converting it +to a Rust mutable reference, and use a shared reference instead. The +``qemu_api::cell`` module provides wrappers that can be used to tell the +Rust compiler about interior mutability, and optionally to enforce locking +rules for the "Big QEMU Lock". In the future, similar cell types might +also be provided for ``AioContext``-based locking as well. + +In particular, device code will usually rely on the ``BqlRefCell`` and +``BqlCell`` type to ensure that data is accessed correctly under the +"Big QEMU Lock". These cell types are also known to the ``vmstate`` +crate, which is able to "look inside" them when building an in-memory +representation of a ``struct``'s layout. Note that the same is not true +of a ``RefCell`` or ``Mutex``. + +Bindings code instead will usually use the ``Opaque`` type, which hides +the contents of the underlying struct and can be easily converted to +a raw pointer, for use in calls to C functions. It can be used for +example as follows:: + + #[repr(transparent)] + #[derive(Debug, qemu_api_macros::Wrapper)] + pub struct Object(Opaque<bindings::Object>); + +where the special ``derive`` macro provides useful methods such as +``from_raw``, ``as_ptr`, ``as_mut_ptr`` and ``raw_get``. The bindings will +then manually check for the big QEMU lock with assertions, which allows +the wrapper to be declared thread-safe:: + + unsafe impl Send for Object {} + unsafe impl Sync for Object {} + +Writing bindings to C code +'''''''''''''''''''''''''' + +Here are some things to keep in mind when working on the ``qemu_api`` crate. + +**Look at existing code** + Very often, similar idioms in C code correspond to similar tricks in + Rust bindings. If the C code uses ``offsetof``, look at qdev properties + or ``vmstate``. If the C code has a complex const struct, look at + ``MemoryRegion``. Reuse existing patterns for handling lifetimes; + for example use ``&T`` for QOM objects that do not need a reference + count (including those that can be embedded in other objects) and + ``Owned<T>`` for those that need it. + +**Use the type system** + Bindings often will need access information that is specific to a type + (either a builtin one or a user-defined one) in order to pass it to C + functions. Put them in a trait and access it through generic parameters. + The ``vmstate`` module has examples of how to retrieve type information + for the fields of a Rust ``struct``. + +**Prefer unsafe traits to unsafe functions** + Unsafe traits are much easier to prove correct than unsafe functions. + They are an excellent place to store metadata that can later be accessed + by generic functions. C code usually places metadata in global variables; + in Rust, they can be stored in traits and then turned into ``static`` + variables. Often, unsafe traits can be generated by procedural macros. + +**Document limitations due to old Rust versions** + If you need to settle for an inferior solution because of the currently + supported set of Rust versions, document it in the source and in this + file. This ensures that it can be fixed when the minimum supported + version is bumped. + +**Keep locking in mind**. + When marking a type ``Sync``, be careful of whether it needs the big + QEMU lock. Use ``BqlCell`` and ``BqlRefCell`` for interior data, + or assert ``bql_locked()``. + +**Don't be afraid of complexity, but document and isolate it** + It's okay to be tricky; device code is written more often than bindings + code and it's important that it is idiomatic. However, you should strive + to isolate any tricks in a place (for example a ``struct``, a trait + or a macro) where it can be documented and tested. If needed, include + toy versions of the code in the documentation. + +Writing procedural macros +''''''''''''''''''''''''' + +By conventions, procedural macros are split in two functions, one +returning ``Result<proc_macro2::TokenStream, MacroError>`` with the body of +the procedural macro, and the second returning ``proc_macro::TokenStream`` +which is the actual procedural macro. The former's name is the same as +the latter with the ``_or_error`` suffix. The code for the latter is more +or less fixed; it follows the following template, which is fixed apart +from the type after ``as`` in the invocation of ``parse_macro_input!``:: + + #[proc_macro_derive(Object)] + pub fn derive_object(input: TokenStream) -> TokenStream { + let input = parse_macro_input!(input as DeriveInput); + let expanded = derive_object_or_error(input).unwrap_or_else(Into::into); + + TokenStream::from(expanded) + } + +The ``qemu_api_macros`` crate has utility functions to examine a +``DeriveInput`` and perform common checks (e.g. looking for a struct +with named fields). These functions return ``Result<..., MacroError>`` +and can be used easily in the procedural macro function:: + + fn derive_object_or_error(input: DeriveInput) -> + Result<proc_macro2::TokenStream, MacroError> + { + is_c_repr(&input, "#[derive(Object)]")?; + + let name = &input.ident; + let parent = &get_fields(&input, "#[derive(Object)]")?[0].ident; + ... + } + +Use procedural macros with care. They are mostly useful for two purposes: + +* Performing consistency checks; for example ``#[derive(Object)]`` checks + that the structure has ``#[repr[C])`` and that the type of the first field + is consistent with the ``ObjectType`` declaration. + +* Extracting information from Rust source code into traits, typically based + on types and attributes. For example, ``#[derive(TryInto)]`` builds an + implementation of ``TryFrom``, and it uses the ``#[repr(...)]`` attribute + as the ``TryFrom`` source and error types. + +Procedural macros can be hard to debug and test; if the code generation +exceeds a few lines of code, it may be worthwhile to delegate work to +"regular" declarative (``macro_rules!``) macros and write unit tests for +those instead. + + +Coding style +'''''''''''' + +Code should pass clippy and be formatted with rustfmt. + +Right now, only the nightly version of ``rustfmt`` is supported. This +might change in the future. While CI checks for correct formatting via +``cargo fmt --check``, maintainers can fix this for you when applying patches. + +It is expected that ``qemu_api`` provides full ``rustdoc`` documentation for +bindings that are in their final shape or close. + +Adding dependencies +------------------- + +Generally, the set of dependent crates is kept small. Think twice before +adding a new external crate, especially if it comes with a large set of +dependencies itself. Sometimes QEMU only needs a small subset of the +functionality; see for example QEMU's ``assertions`` module. + +On top of this recommendation, adding external crates to QEMU is a +slightly complicated process, mostly due to the need to teach Meson how +to build them. While Meson has initial support for parsing ``Cargo.lock`` +files, it is still highly experimental and is therefore not used. + +Therefore, external crates must be added as subprojects for Meson to +learn how to build them, as well as to the relevant ``Cargo.toml`` files. +The versions specified in ``rust/Cargo.lock`` must be the same as the +subprojects; note that the ``rust/`` directory forms a Cargo `workspace`__, +and therefore there is a single lock file for the whole build. + +__ https://doc.rust-lang.org/cargo/reference/workspaces.html#virtual-workspace + +Choose a version of the crate that works with QEMU's minimum supported +Rust version (|msrv|). + +Second, a new ``wrap`` file must be added to teach Meson how to download the +crate. The wrap file must be named ``NAME-SEMVER-rs.wrap``, where ``NAME`` +is the name of the crate and ``SEMVER`` is the version up to and including the +first non-zero number. For example, a crate with version ``0.2.3`` will use +``0.2`` for its ``SEMVER``, while a crate with version ``1.0.84`` will use ``1``. + +Third, the Meson rules to build the crate must be added at +``subprojects/NAME-SEMVER-rs/meson.build``. Generally this includes: + +* ``subproject`` and ``dependency`` lines for all dependent crates + +* a ``static_library`` or ``rust.proc_macro`` line to perform the actual build + +* ``declare_dependency`` and a ``meson.override_dependency`` lines to expose + the result to QEMU and to other subprojects + +Remember to add ``native: true`` to ``dependency``, ``static_library`` and +``meson.override_dependency`` for dependencies of procedural macros. +If a crate is needed in both procedural macros and QEMU binaries, everything +apart from ``subproject`` must be duplicated to build both native and +non-native versions of the crate. + +It's important to specify the right compiler options. These include: + +* the language edition (which can be found in the ``Cargo.toml`` file) + +* the ``--cfg`` (which have to be "reverse engineered" from the ``build.rs`` + file of the crate). + +* usually, a ``--cap-lints allow`` argument to hide warnings from rustc + or clippy. + +After every change to the ``meson.build`` file you have to update the patched +version with ``meson subprojects update --reset ``NAME-SEMVER-rs``. This might +be automated in the future. + +Also, after every change to the ``meson.build`` file it is strongly suggested to +do a dummy change to the ``.wrap`` file (for example adding a comment like +``# version 2``), which will help Meson notice that the subproject is out of date. + +As a last step, add the new subproject to ``scripts/archive-source.sh``, +``scripts/make-release`` and ``subprojects/.gitignore``. diff --git a/docs/devel/style.rst b/docs/devel/style.rst index 2f68b50..d025933 100644 --- a/docs/devel/style.rst +++ b/docs/devel/style.rst @@ -416,6 +416,26 @@ definitions instead of typedefs in headers and function prototypes; this avoids problems with duplicated typedefs and reduces the need to include headers from other headers. +Bitfields +--------- + +C bitfields can be a cause of non-portability issues, especially under windows +where `MSVC has a different way to lay them out than GCC +<https://gcc.gnu.org/onlinedocs/gcc/x86-Type-Attributes.html>`_, or where +endianness matters. + +For this reason, we disallow usage of bitfields in packed structures and in any +structures which are supposed to exactly match a specific layout in guest +memory. Some existing code may use it, and we carefully ensured the layout was +the one expected. + +We also suggest avoiding bitfields even in structures where the exact +layout does not matter, unless you can show that they provide a significant +usability benefit. + +We encourage the usage of ``include/hw/registerfields.h`` as a safe replacement +for bitfields. + Reserved namespaces in C and POSIX ---------------------------------- diff --git a/docs/devel/submitting-a-patch.rst b/docs/devel/submitting-a-patch.rst index 83e9092..f7917b8 100644 --- a/docs/devel/submitting-a-patch.rst +++ b/docs/devel/submitting-a-patch.rst @@ -18,7 +18,7 @@ one-shot fix, the bare minimum we ask is that: * - Check - Reason - * - Patches contain Signed-off-by: Real Name <author@email> + * - Patches contain Signed-off-by: Your Name <author@email> - States you are legally able to contribute the code. See :ref:`patch_emails_must_include_a_signed_off_by_line` * - Sent as patch emails to ``qemu-devel@nongnu.org`` - The project uses an email list based workflow. See :ref:`submitting_your_patches` @@ -235,6 +235,31 @@ to another list.) ``git send-email`` (`step-by-step setup guide works best for delivering the patch without mangling it, but attachments can be used as a last resort on a first-time submission. +.. _use_git_publish: + +Use git-publish +~~~~~~~~~~~~~~~ + +If you already configured git send-email, you can simply use `git-publish +<https://github.com/stefanha/git-publish>`__ to send series. + +:: + + $ git checkout master -b my-feature + $ # work on new commits, add your 'Signed-off-by' lines to each + $ git publish + $ ... more work, rebase on master, ... + $ git publish # will send a v2 + +Each time you post a series, git-publish will create a local tag with the format +``<branchname>-v<version>`` to record the patch series. + +When sending patch emails, 'git publish' will consult the output of +'scripts/get_maintainers.pl' and automatically CC anyone listed as maintainers +of the affected code. Generally you should accept the suggested CC list, but +there may sometimes be scenarios where it is appropriate to cut it down (eg on +certain large tree-wide cleanups), or augment it with other interested people. + .. _if_you_cannot_send_patch_emails: If you cannot send patch emails @@ -252,10 +277,7 @@ patches to the QEMU mailing list by following these steps: #. Send your patches to the QEMU mailing list using the web-based ``git-send-email`` UI at https://git.sr.ht/~USERNAME/qemu/send-email -`This video -<https://spacepub.space/videos/watch/ad258d23-0ac6-488c-83fc-2bacf578de3a>`__ -shows the web-based ``git-send-email`` workflow. Documentation is -available `here +Documentation for sourcehut is available `here <https://man.sr.ht/git.sr.ht/#sending-patches-upstream>`__. .. _cc_the_relevant_maintainer: @@ -322,23 +344,9 @@ Patch emails must include a ``Signed-off-by:`` line Your patches **must** include a Signed-off-by: line. This is a hard requirement because it's how you say "I'm legally okay to contribute -this and happy for it to go into QEMU". The process is modelled after -the `Linux kernel -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__ -policy. - -If you wrote the patch, make sure your "From:" and "Signed-off-by:" -lines use the same spelling. It's okay if you subscribe or contribute to -the list via more than one address, but using multiple addresses in one -commit just confuses things. If someone else wrote the patch, git will -include a "From:" line in the body of the email (different from your -envelope From:) that will give credit to the correct author; but again, -that author's Signed-off-by: line is mandatory, with the same spelling. - -There are various tooling options for automatically adding these tags -include using ``git commit -s`` or ``git format-patch -s``. For more -information see `SubmittingPatches 1.12 -<http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/SubmittingPatches?id=f6f94e2ab1b33f0082ac22d71f66385a60d8157f#n297>`__. +this and happy for it to go into QEMU". For full guidance, read the +:ref:`code-provenance` documentation. + .. _include_a_meaningful_cover_letter: @@ -406,6 +414,20 @@ For more details on how QEMU's stable process works, refer to the .. _participating_in_code_review: +Retrieve an existing series +--------------------------- + +If you want to apply an existing series on top of your tree, you can simply use +`b4 <https://github.com/mricon/b4>`__. + +:: + + b4 shazam $msg-id + +The message id is related to the patch series that has been sent to the mailing +list. You need to retrieve the "Message-Id:" header from one of the patches. Any +of them can be used and b4 will apply the whole series. + Participating in Code Review ---------------------------- diff --git a/docs/devel/tcg-ops.rst b/docs/devel/tcg-ops.rst index d46b625..f26b837 100644 --- a/docs/devel/tcg-ops.rst +++ b/docs/devel/tcg-ops.rst @@ -239,7 +239,7 @@ Jumps/Labels - | Jump to label. - * - brcond_i32/i64 *t0*, *t1*, *cond*, *label* + * - brcond *t0*, *t1*, *cond*, *label* - | Conditional jump if *t0* *cond* *t1* is true. *cond* can be: | @@ -261,98 +261,117 @@ Arithmetic .. list-table:: - * - add_i32/i64 *t0*, *t1*, *t2* + * - add *t0*, *t1*, *t2* - | *t0* = *t1* + *t2* - * - sub_i32/i64 *t0*, *t1*, *t2* + * - sub *t0*, *t1*, *t2* - | *t0* = *t1* - *t2* - * - neg_i32/i64 *t0*, *t1* + * - neg *t0*, *t1* - | *t0* = -*t1* (two's complement) - * - mul_i32/i64 *t0*, *t1*, *t2* + * - mul *t0*, *t1*, *t2* - | *t0* = *t1* * *t2* - * - div_i32/i64 *t0*, *t1*, *t2* + * - divs *t0*, *t1*, *t2* - | *t0* = *t1* / *t2* (signed) | Undefined behavior if division by zero or overflow. - * - divu_i32/i64 *t0*, *t1*, *t2* + * - divu *t0*, *t1*, *t2* - | *t0* = *t1* / *t2* (unsigned) | Undefined behavior if division by zero. - * - rem_i32/i64 *t0*, *t1*, *t2* + * - rems *t0*, *t1*, *t2* - | *t0* = *t1* % *t2* (signed) | Undefined behavior if division by zero or overflow. - * - remu_i32/i64 *t0*, *t1*, *t2* + * - remu *t0*, *t1*, *t2* - | *t0* = *t1* % *t2* (unsigned) | Undefined behavior if division by zero. + * - divs2 *q*, *r*, *nl*, *nh*, *d* + + - | *q* = *nh:nl* / *d* (signed) + | *r* = *nh:nl* % *d* + | Undefined behaviour if division by zero, or the double-word + numerator divided by the single-word divisor does not fit + within the single-word quotient. The code generator will + pass *nh* as a simple sign-extension of *nl*, so the only + overflow should be *INT_MIN* / -1. + + * - divu2 *q*, *r*, *nl*, *nh*, *d* + + - | *q* = *nh:nl* / *d* (unsigned) + | *r* = *nh:nl* % *d* + | Undefined behaviour if division by zero, or the double-word + numerator divided by the single-word divisor does not fit + within the single-word quotient. The code generator will + pass 0 to *nh* to make a simple zero-extension of *nl*, + so overflow should never occur. Logical ------- .. list-table:: - * - and_i32/i64 *t0*, *t1*, *t2* + * - and *t0*, *t1*, *t2* - | *t0* = *t1* & *t2* - * - or_i32/i64 *t0*, *t1*, *t2* + * - or *t0*, *t1*, *t2* - | *t0* = *t1* | *t2* - * - xor_i32/i64 *t0*, *t1*, *t2* + * - xor *t0*, *t1*, *t2* - | *t0* = *t1* ^ *t2* - * - not_i32/i64 *t0*, *t1* + * - not *t0*, *t1* - | *t0* = ~\ *t1* - * - andc_i32/i64 *t0*, *t1*, *t2* + * - andc *t0*, *t1*, *t2* - | *t0* = *t1* & ~\ *t2* - * - eqv_i32/i64 *t0*, *t1*, *t2* + * - eqv *t0*, *t1*, *t2* - | *t0* = ~(*t1* ^ *t2*), or equivalently, *t0* = *t1* ^ ~\ *t2* - * - nand_i32/i64 *t0*, *t1*, *t2* + * - nand *t0*, *t1*, *t2* - | *t0* = ~(*t1* & *t2*) - * - nor_i32/i64 *t0*, *t1*, *t2* + * - nor *t0*, *t1*, *t2* - | *t0* = ~(*t1* | *t2*) - * - orc_i32/i64 *t0*, *t1*, *t2* + * - orc *t0*, *t1*, *t2* - | *t0* = *t1* | ~\ *t2* - * - clz_i32/i64 *t0*, *t1*, *t2* + * - clz *t0*, *t1*, *t2* - | *t0* = *t1* ? clz(*t1*) : *t2* - * - ctz_i32/i64 *t0*, *t1*, *t2* + * - ctz *t0*, *t1*, *t2* - | *t0* = *t1* ? ctz(*t1*) : *t2* - * - ctpop_i32/i64 *t0*, *t1* + * - ctpop *t0*, *t1* - | *t0* = number of bits set in *t1* | - | With *ctpop* short for "count population", matching - | the function name used in ``include/qemu/host-utils.h``. + | The name *ctpop* is short for "count population", and matches + the function name used in ``include/qemu/host-utils.h``. Shifts/Rotates @@ -360,30 +379,30 @@ Shifts/Rotates .. list-table:: - * - shl_i32/i64 *t0*, *t1*, *t2* + * - shl *t0*, *t1*, *t2* - | *t0* = *t1* << *t2* - | Unspecified behavior if *t2* < 0 or *t2* >= 32 (resp 64) + | Unspecified behavior for negative or out-of-range shifts. - * - shr_i32/i64 *t0*, *t1*, *t2* + * - shr *t0*, *t1*, *t2* - | *t0* = *t1* >> *t2* (unsigned) - | Unspecified behavior if *t2* < 0 or *t2* >= 32 (resp 64) + | Unspecified behavior for negative or out-of-range shifts. - * - sar_i32/i64 *t0*, *t1*, *t2* + * - sar *t0*, *t1*, *t2* - | *t0* = *t1* >> *t2* (signed) - | Unspecified behavior if *t2* < 0 or *t2* >= 32 (resp 64) + | Unspecified behavior for negative or out-of-range shifts. - * - rotl_i32/i64 *t0*, *t1*, *t2* + * - rotl *t0*, *t1*, *t2* - | Rotation of *t2* bits to the left - | Unspecified behavior if *t2* < 0 or *t2* >= 32 (resp 64) + | Unspecified behavior for negative or out-of-range shifts. - * - rotr_i32/i64 *t0*, *t1*, *t2* + * - rotr *t0*, *t1*, *t2* - | Rotation of *t2* bits to the right. - | Unspecified behavior if *t2* < 0 or *t2* >= 32 (resp 64) + | Unspecified behavior for negative or out-of-range shifts. Misc @@ -391,26 +410,12 @@ Misc .. list-table:: - * - mov_i32/i64 *t0*, *t1* + * - mov *t0*, *t1* - | *t0* = *t1* - | Move *t1* to *t0* (both operands must have the same type). - - * - ext8s_i32/i64 *t0*, *t1* - - ext8u_i32/i64 *t0*, *t1* - - ext16s_i32/i64 *t0*, *t1* - - ext16u_i32/i64 *t0*, *t1* + | Move *t1* to *t0*. - ext32s_i64 *t0*, *t1* - - ext32u_i64 *t0*, *t1* - - - | 8, 16 or 32 bit sign/zero extension (both operands must have the same type) - - * - bswap16_i32/i64 *t0*, *t1*, *flags* + * - bswap16 *t0*, *t1*, *flags* - | 16 bit byte swap on the low bits of a 32/64 bit input. | @@ -420,24 +425,24 @@ Misc | | If neither ``TCG_BSWAP_OZ`` nor ``TCG_BSWAP_OS`` are set, then the bits of *t0* above bit 15 may contain any value. - * - bswap32_i64 *t0*, *t1*, *flags* - - - | 32 bit byte swap on a 64-bit value. The flags are the same as for bswap16, - except they apply from bit 31 instead of bit 15. + * - bswap32 *t0*, *t1*, *flags* - * - bswap32_i32 *t0*, *t1*, *flags* + - | 32 bit byte swap. The flags are the same as for bswap16, except + they apply from bit 31 instead of bit 15. On TCG_TYPE_I32, the + flags should be zero. - bswap64_i64 *t0*, *t1*, *flags* + * - bswap64 *t0*, *t1*, *flags* - - | 32/64 bit byte swap. The flags are ignored, but still present - for consistency with the other bswap opcodes. + - | 64 bit byte swap. The flags are ignored, but still present + for consistency with the other bswap opcodes. For future + compatibility, the flags should be zero. * - discard_i32/i64 *t0* - | Indicate that the value of *t0* won't be used later. It is useful to force dead code elimination. - * - deposit_i32/i64 *dest*, *t1*, *t2*, *pos*, *len* + * - deposit *dest*, *t1*, *t2*, *pos*, *len* - | Deposit *t2* as a bitfield into *t1*, placing the result in *dest*. | @@ -446,14 +451,16 @@ Misc | *len* - the length of the bitfield | *pos* - the position of the first bit, counting from the LSB | - | For example, "deposit_i32 dest, t1, t2, 8, 4" indicates a 4-bit field + | For example, "deposit dest, t1, t2, 8, 4" indicates a 4-bit field at bit 8. This operation would be equivalent to | | *dest* = (*t1* & ~0x0f00) | ((*t2* << 8) & 0x0f00) + | + | on TCG_TYPE_I32. - * - extract_i32/i64 *dest*, *t1*, *pos*, *len* + * - extract *dest*, *t1*, *pos*, *len* - sextract_i32/i64 *dest*, *t1*, *pos*, *len* + sextract *dest*, *t1*, *pos*, *len* - | Extract a bitfield from *t1*, placing the result in *dest*. | @@ -462,16 +469,16 @@ Misc to the left with zeros; for sextract_*, the result will be extended to the left with copies of the bitfield sign bit at *pos* + *len* - 1. | - | For example, "sextract_i32 dest, t1, 8, 4" indicates a 4-bit field + | For example, "sextract dest, t1, 8, 4" indicates a 4-bit field at bit 8. This operation would be equivalent to | | *dest* = (*t1* << 20) >> 28 | - | (using an arithmetic right shift). + | (using an arithmetic right shift) on TCG_TYPE_I32. - * - extract2_i32/i64 *dest*, *t1*, *t2*, *pos* + * - extract2 *dest*, *t1*, *t2*, *pos* - - | For N = {32,64}, extract an N-bit quantity from the concatenation + - | For TCG_TYPE_I{N}, extract an N-bit quantity from the concatenation of *t2*:*t1*, beginning at *pos*. The tcg_gen_extract2_{i32,i64} expander accepts 0 <= *pos* <= N as inputs. The backend code generator will not see either 0 or N as inputs for these opcodes. @@ -494,19 +501,19 @@ Conditional moves .. list-table:: - * - setcond_i32/i64 *dest*, *t1*, *t2*, *cond* + * - setcond *dest*, *t1*, *t2*, *cond* - | *dest* = (*t1* *cond* *t2*) | | Set *dest* to 1 if (*t1* *cond* *t2*) is true, otherwise set to 0. - * - negsetcond_i32/i64 *dest*, *t1*, *t2*, *cond* + * - negsetcond *dest*, *t1*, *t2*, *cond* - | *dest* = -(*t1* *cond* *t2*) | | Set *dest* to -1 if (*t1* *cond* *t2*) is true, otherwise set to 0. - * - movcond_i32/i64 *dest*, *c1*, *c2*, *v1*, *v2*, *cond* + * - movcond *dest*, *c1*, *c2*, *v1*, *v2*, *cond* - | *dest* = (*c1* *cond* *c2* ? *v1* : *v2*) | @@ -586,26 +593,79 @@ Multiword arithmetic support .. list-table:: - * - add2_i32/i64 *t0_low*, *t0_high*, *t1_low*, *t1_high*, *t2_low*, *t2_high* + * - addco *t0*, *t1*, *t2* + + - | Compute *t0* = *t1* + *t2* and in addition output to the + carry bit provided by the host architecture. + + * - addci *t0, *t1*, *t2* - sub2_i32/i64 *t0_low*, *t0_high*, *t1_low*, *t1_high*, *t2_low*, *t2_high* + - | Compute *t0* = *t1* + *t2* + *C*, where *C* is the + input carry bit provided by the host architecture. + The output carry bit need not be computed. - - | Similar to add/sub, except that the double-word inputs *t1* and *t2* are - formed from two single-word arguments, and the double-word output *t0* - is returned in two single-word outputs. + * - addcio *t0, *t1*, *t2* - * - mulu2_i32/i64 *t0_low*, *t0_high*, *t1*, *t2* + - | Compute *t0* = *t1* + *t2* + *C*, where *C* is the + input carry bit provided by the host architecture, + and also compute the output carry bit. + + * - addc1o *t0, *t1*, *t2* + + - | Compute *t0* = *t1* + *t2* + 1, and in addition output to the + carry bit provided by the host architecture. This is akin to + *addcio* with a fixed carry-in value of 1. + | This is intended to be used by the optimization pass, + intermediate to complete folding of the addition chain. + In some cases complete folding is not possible and this + opcode will remain until output. If this happens, the + code generator will use ``tcg_out_set_carry`` and then + the output routine for *addcio*. + + * - subbo *t0*, *t1*, *t2* + + - | Compute *t0* = *t1* - *t2* and in addition output to the + borrow bit provided by the host architecture. + | Depending on the host architecture, the carry bit may or may not be + identical to the borrow bit. Thus the addc\* and subb\* + opcodes must not be mixed. + + * - subbi *t0, *t1*, *t2* + + - | Compute *t0* = *t1* - *t2* - *B*, where *B* is the + input borrow bit provided by the host architecture. + The output borrow bit need not be computed. + + * - subbio *t0, *t1*, *t2* + + - | Compute *t0* = *t1* - *t2* - *B*, where *B* is the + input borrow bit provided by the host architecture, + and also compute the output borrow bit. + + * - subb1o *t0, *t1*, *t2* + + - | Compute *t0* = *t1* - *t2* - 1, and in addition output to the + borrow bit provided by the host architecture. This is akin to + *subbio* with a fixed borrow-in value of 1. + | This is intended to be used by the optimization pass, + intermediate to complete folding of the subtraction chain. + In some cases complete folding is not possible and this + opcode will remain until output. If this happens, the + code generator will use ``tcg_out_set_borrow`` and then + the output routine for *subbio*. + + * - mulu2 *t0_low*, *t0_high*, *t1*, *t2* - | Similar to mul, except two unsigned inputs *t1* and *t2* yielding the full double-word product *t0*. The latter is returned in two single-word outputs. - * - muls2_i32/i64 *t0_low*, *t0_high*, *t1*, *t2* + * - muls2 *t0_low*, *t0_high*, *t1*, *t2* - | Similar to mulu2, except the two inputs *t1* and *t2* are signed. - * - mulsh_i32/i64 *t0*, *t1*, *t2* + * - mulsh *t0*, *t1*, *t2* - muluh_i32/i64 *t0*, *t1*, *t2* + muluh *t0*, *t1*, *t2* - | Provide the high part of a signed or unsigned multiply, respectively. | @@ -684,8 +744,6 @@ QEMU specific operations qemu_st_i32/i64/i128 *t0*, *t1*, *flags*, *memidx* - qemu_st8_i32 *t0*, *t1*, *flags*, *memidx* - - | Load data at the guest address *t1* into *t0*, or store data in *t0* at guest address *t1*. The _i32/_i64/_i128 size applies to the size of the input/output register *t0* only. The address *t1* is always sized according to the guest, @@ -703,19 +761,14 @@ QEMU specific operations 64-bit memory access specified in *flags*. | | For qemu_ld/st_i128, these are only supported for a 64-bit host. - | - | For i386, qemu_st8_i32 is exactly like qemu_st_i32, except the size of - the memory operation is known to be 8-bit. This allows the backend to - provide a different set of register constraints. Host vector operations ---------------------- -All of the vector ops have two parameters, ``TCGOP_VECL`` & ``TCGOP_VECE``. -The former specifies the length of the vector in log2 64-bit units; the -latter specifies the length of the element (if applicable) in log2 8-bit units. -E.g. VECL = 1 -> 64 << 1 -> v128, and VECE = 2 -> 1 << 2 -> i32. +All of the vector ops have two parameters, ``TCGOP_TYPE`` & ``TCGOP_VECE``. +The former specifies the length of the vector as a TCGType; the latter +specifies the length of the element (if applicable) in log2 8-bit units. .. list-table:: @@ -729,7 +782,7 @@ E.g. VECL = 1 -> 64 << 1 -> v128, and VECE = 2 -> 1 << 2 -> i32. * - dup_vec *v0*, *r1* - - | Duplicate the low N bits of *r1* into VECL/VECE copies across *v0*. + - | Duplicate the low N bits of *r1* into TYPE/VECE copies across *v0*. * - dupi_vec *v0*, *c* @@ -738,7 +791,7 @@ E.g. VECL = 1 -> 64 << 1 -> v128, and VECE = 2 -> 1 << 2 -> i32. * - dup2_vec *v0*, *r1*, *r2* - - | Duplicate *r2*:*r1* into VECL/64 copies across *v0*. This opcode is + - | Duplicate *r2*:*r1* into TYPE/64 copies across *v0*. This opcode is only present for 32-bit hosts. * - add_vec *v0*, *v1*, *v2* @@ -810,7 +863,7 @@ E.g. VECL = 1 -> 64 << 1 -> v128, and VECE = 2 -> 1 << 2 -> i32. .. code-block:: c - for (i = 0; i < VECL/VECE; ++i) { + for (i = 0; i < TYPE/VECE; ++i) { v0[i] = v1[i] << s2; } @@ -832,7 +885,7 @@ E.g. VECL = 1 -> 64 << 1 -> v128, and VECE = 2 -> 1 << 2 -> i32. .. code-block:: c - for (i = 0; i < VECL/VECE; ++i) { + for (i = 0; i < TYPE/VECE; ++i) { v0[i] = v1[i] << v2[i]; } @@ -885,9 +938,9 @@ Assumptions The target word size (``TCG_TARGET_REG_BITS``) is expected to be 32 bit or 64 bit. It is expected that the pointer has the same size as the word. -On a 32 bit target, all 64 bit operations are converted to 32 bits. A -few specific operations must be implemented to allow it (see add2_i32, -sub2_i32, brcond2_i32). +On a 32 bit target, all 64 bit operations are converted to 32 bits. +A few specific operations must be implemented to allow it +(see brcond2_i32, setcond2_i32). On a 64 bit target, the values are transferred between 32 and 64-bit registers using the following ops: @@ -928,7 +981,9 @@ operation uses a constant input constraint which does not allow all constants, it must also accept registers in order to have a fallback. The constraint '``i``' is defined generically to accept any constant. The constraint '``r``' is not defined generically, but is consistently -used by each backend to indicate all registers. +used by each backend to indicate all registers. If ``TCG_REG_ZERO`` +is defined by the backend, the constraint '``z``' is defined generically +to map constant 0 to the hardware zero register. The movi_i32 and movi_i64 operations must accept any constants. diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst index f7d7b9e..9463692 100644 --- a/docs/devel/tcg-plugins.rst +++ b/docs/devel/tcg-plugins.rst @@ -8,38 +8,6 @@ QEMU TCG Plugins ================ -QEMU TCG plugins provide a way for users to run experiments taking -advantage of the total system control emulation can have over a guest. -It provides a mechanism for plugins to subscribe to events during -translation and execution and optionally callback into the plugin -during these events. TCG plugins are unable to change the system state -only monitor it passively. However they can do this down to an -individual instruction granularity including potentially subscribing -to all load and store operations. - -Usage ------ - -Any QEMU binary with TCG support has plugins enabled by default. -Earlier releases needed to be explicitly enabled with:: - - configure --enable-plugins - -Once built a program can be run with multiple plugins loaded each with -their own arguments:: - - $QEMU $OTHER_QEMU_ARGS \ - -plugin contrib/plugin/libhowvec.so,inline=on,count=hint \ - -plugin contrib/plugin/libhotblocks.so - -Arguments are plugin specific and can be used to modify their -behaviour. In this case the howvec plugin is being asked to use inline -ops to count and break down the hint instructions by type. - -Linux user-mode emulation also evaluates the environment variable -``QEMU_PLUGIN``:: - - QEMU_PLUGIN="file=contrib/plugins/libhowvec.so,inline=on,count=hint" $QEMU Writing plugins --------------- @@ -93,11 +61,14 @@ translation event the plugin has an option to enumerate the instructions in a block of instructions and optionally register callbacks to some or all instructions when they are executed. -There is also a facility to add an inline event where code to -increment a counter can be directly inlined with the translation. -Currently only a simple increment is supported. This is not atomic so -can miss counts. If you want absolute precision you should use a -callback which can then ensure atomicity itself. +There is also a facility to add inline instructions doing various operations, +like adding or storing an immediate value. It is also possible to execute a +callback conditionally, with condition being evaluated inline. All those inline +operations are associated to a ``scoreboard``, which is a thread-local storage +automatically expanded when new cores/threads are created and that can be +accessed/modified in a thread-safe way without any lock needed. Combining inline +operations and conditional callbacks offer a more efficient way to instrument +binaries, compared to classic callbacks. Finally when QEMU exits all the registered *atexit* callbacks are invoked. @@ -191,457 +162,6 @@ which means callbacks may still occur after the uninstall operation is requested. The plugin isn't completely uninstalled until the safe work has executed while all vCPUs are quiescent. -Example Plugins -=============== - -There are a number of plugins included with QEMU and you are -encouraged to contribute your own plugins plugins upstream. There is a -``contrib/plugins`` directory where they can go. There are also some -basic plugins that are used to test and exercise the API during the -``make check-tcg`` target in ``tests\plugins``. - -- tests/plugins/empty.c - -Purely a test plugin for measuring the overhead of the plugins system -itself. Does no instrumentation. - -- tests/plugins/bb.c - -A very basic plugin which will measure execution in course terms as -each basic block is executed. By default the results are shown once -execution finishes:: - - $ qemu-aarch64 -plugin tests/plugin/libbb.so \ - -d plugin ./tests/tcg/aarch64-linux-user/sha1 - SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 - bb's: 2277338, insns: 158483046 - -Behaviour can be tweaked with the following arguments: - - * inline=true|false - - Use faster inline addition of a single counter. Not per-cpu and not - thread safe. - - * idle=true|false - - Dump the current execution stats whenever the guest vCPU idles - -- tests/plugins/insn.c - -This is a basic instruction level instrumentation which can count the -number of instructions executed on each core/thread:: - - $ qemu-aarch64 -plugin tests/plugin/libinsn.so \ - -d plugin ./tests/tcg/aarch64-linux-user/threadcount - Created 10 threads - Done - cpu 0 insns: 46765 - cpu 1 insns: 3694 - cpu 2 insns: 3694 - cpu 3 insns: 2994 - cpu 4 insns: 1497 - cpu 5 insns: 1497 - cpu 6 insns: 1497 - cpu 7 insns: 1497 - total insns: 63135 - -Behaviour can be tweaked with the following arguments: - - * inline=true|false - - Use faster inline addition of a single counter. Not per-cpu and not - thread safe. - - * sizes=true|false - - Give a summary of the instruction sizes for the execution - - * match=<string> - - Only instrument instructions matching the string prefix. Will show - some basic stats including how many instructions have executed since - the last execution. For example:: - - $ qemu-aarch64 -plugin tests/plugin/libinsn.so,match=bl \ - -d plugin ./tests/tcg/aarch64-linux-user/sha512-vector - ... - 0x40069c, 'bl #0x4002b0', 10 hits, 1093 match hits, Δ+1257 since last match, 98 avg insns/match - 0x4006ac, 'bl #0x403690', 10 hits, 1094 match hits, Δ+47 since last match, 98 avg insns/match - 0x4037fc, 'bl #0x4002b0', 18 hits, 1095 match hits, Δ+22 since last match, 98 avg insns/match - 0x400720, 'bl #0x403690', 10 hits, 1096 match hits, Δ+58 since last match, 98 avg insns/match - 0x4037fc, 'bl #0x4002b0', 19 hits, 1097 match hits, Δ+22 since last match, 98 avg insns/match - 0x400730, 'bl #0x403690', 10 hits, 1098 match hits, Δ+33 since last match, 98 avg insns/match - 0x4037ac, 'bl #0x4002b0', 12 hits, 1099 match hits, Δ+20 since last match, 98 avg insns/match - ... - -For more detailed execution tracing see the ``execlog`` plugin for -other options. - -- tests/plugins/mem.c - -Basic instruction level memory instrumentation:: - - $ qemu-aarch64 -plugin tests/plugin/libmem.so,inline=true \ - -d plugin ./tests/tcg/aarch64-linux-user/sha1 - SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 - inline mem accesses: 79525013 - -Behaviour can be tweaked with the following arguments: - - * inline=true|false - - Use faster inline addition of a single counter. Not per-cpu and not - thread safe. - - * callback=true|false - - Use callbacks on each memory instrumentation. - - * hwaddr=true|false - - Count IO accesses (only for system emulation) - -- tests/plugins/syscall.c - -A basic syscall tracing plugin. This only works for user-mode. By -default it will give a summary of syscall stats at the end of the -run:: - - $ qemu-aarch64 -plugin tests/plugin/libsyscall \ - -d plugin ./tests/tcg/aarch64-linux-user/threadcount - Created 10 threads - Done - syscall no. calls errors - 226 12 0 - 99 11 11 - 115 11 0 - 222 11 0 - 93 10 0 - 220 10 0 - 233 10 0 - 215 8 0 - 214 4 0 - 134 2 0 - 64 2 0 - 96 1 0 - 94 1 0 - 80 1 0 - 261 1 0 - 78 1 0 - 160 1 0 - 135 1 0 - -- contrib/plugins/hotblocks.c - -The hotblocks plugin allows you to examine the where hot paths of -execution are in your program. Once the program has finished you will -get a sorted list of blocks reporting the starting PC, translation -count, number of instructions and execution count. This will work best -with linux-user execution as system emulation tends to generate -re-translations as blocks from different programs get swapped in and -out of system memory. - -If your program is single-threaded you can use the ``inline`` option for -slightly faster (but not thread safe) counters. - -Example:: - - $ qemu-aarch64 \ - -plugin contrib/plugins/libhotblocks.so -d plugin \ - ./tests/tcg/aarch64-linux-user/sha1 - SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 - collected 903 entries in the hash table - pc, tcount, icount, ecount - 0x0000000041ed10, 1, 5, 66087 - 0x000000004002b0, 1, 4, 66087 - ... - -- contrib/plugins/hotpages.c - -Similar to hotblocks but this time tracks memory accesses:: - - $ qemu-aarch64 \ - -plugin contrib/plugins/libhotpages.so -d plugin \ - ./tests/tcg/aarch64-linux-user/sha1 - SHA1=15dd99a1991e0b3826fede3deffc1feba42278e6 - Addr, RCPUs, Reads, WCPUs, Writes - 0x000055007fe000, 0x0001, 31747952, 0x0001, 8835161 - 0x000055007ff000, 0x0001, 29001054, 0x0001, 8780625 - 0x00005500800000, 0x0001, 687465, 0x0001, 335857 - 0x0000000048b000, 0x0001, 130594, 0x0001, 355 - 0x0000000048a000, 0x0001, 1826, 0x0001, 11 - -The hotpages plugin can be configured using the following arguments: - - * sortby=reads|writes|address - - Log the data sorted by either the number of reads, the number of writes, or - memory address. (Default: entries are sorted by the sum of reads and writes) - - * io=on - - Track IO addresses. Only relevant to full system emulation. (Default: off) - - * pagesize=N - - The page size used. (Default: N = 4096) - -- contrib/plugins/howvec.c - -This is an instruction classifier so can be used to count different -types of instructions. It has a number of options to refine which get -counted. You can give a value to the ``count`` argument for a class of -instructions to break it down fully, so for example to see all the system -registers accesses:: - - $ qemu-system-aarch64 $(QEMU_ARGS) \ - -append "root=/dev/sda2 systemd.unit=benchmark.service" \ - -smp 4 -plugin ./contrib/plugins/libhowvec.so,count=sreg -d plugin - -which will lead to a sorted list after the class breakdown:: - - Instruction Classes: - Class: UDEF not counted - Class: SVE (68 hits) - Class: PCrel addr (47789483 hits) - Class: Add/Sub (imm) (192817388 hits) - Class: Logical (imm) (93852565 hits) - Class: Move Wide (imm) (76398116 hits) - Class: Bitfield (44706084 hits) - Class: Extract (5499257 hits) - Class: Cond Branch (imm) (147202932 hits) - Class: Exception Gen (193581 hits) - Class: NOP not counted - Class: Hints (6652291 hits) - Class: Barriers (8001661 hits) - Class: PSTATE (1801695 hits) - Class: System Insn (6385349 hits) - Class: System Reg counted individually - Class: Branch (reg) (69497127 hits) - Class: Branch (imm) (84393665 hits) - Class: Cmp & Branch (110929659 hits) - Class: Tst & Branch (44681442 hits) - Class: AdvSimd ldstmult (736 hits) - Class: ldst excl (9098783 hits) - Class: Load Reg (lit) (87189424 hits) - Class: ldst noalloc pair (3264433 hits) - Class: ldst pair (412526434 hits) - Class: ldst reg (imm) (314734576 hits) - Class: Loads & Stores (2117774 hits) - Class: Data Proc Reg (223519077 hits) - Class: Scalar FP (31657954 hits) - Individual Instructions: - Instr: mrs x0, sp_el0 (2682661 hits) (op=0xd5384100/ System Reg) - Instr: mrs x1, tpidr_el2 (1789339 hits) (op=0xd53cd041/ System Reg) - Instr: mrs x2, tpidr_el2 (1513494 hits) (op=0xd53cd042/ System Reg) - Instr: mrs x0, tpidr_el2 (1490823 hits) (op=0xd53cd040/ System Reg) - Instr: mrs x1, sp_el0 (933793 hits) (op=0xd5384101/ System Reg) - Instr: mrs x2, sp_el0 (699516 hits) (op=0xd5384102/ System Reg) - Instr: mrs x4, tpidr_el2 (528437 hits) (op=0xd53cd044/ System Reg) - Instr: mrs x30, ttbr1_el1 (480776 hits) (op=0xd538203e/ System Reg) - Instr: msr ttbr1_el1, x30 (480713 hits) (op=0xd518203e/ System Reg) - Instr: msr vbar_el1, x30 (480671 hits) (op=0xd518c01e/ System Reg) - ... - -To find the argument shorthand for the class you need to examine the -source code of the plugin at the moment, specifically the ``*opt`` -argument in the InsnClassExecCount tables. - -- contrib/plugins/lockstep.c - -This is a debugging tool for developers who want to find out when and -where execution diverges after a subtle change to TCG code generation. -It is not an exact science and results are likely to be mixed once -asynchronous events are introduced. While the use of -icount can -introduce determinism to the execution flow it doesn't always follow -the translation sequence will be exactly the same. Typically this is -caused by a timer firing to service the GUI causing a block to end -early. However in some cases it has proved to be useful in pointing -people at roughly where execution diverges. The only argument you need -for the plugin is a path for the socket the two instances will -communicate over:: - - - $ qemu-system-sparc -monitor none -parallel none \ - -net none -M SS-20 -m 256 -kernel day11/zImage.elf \ - -plugin ./contrib/plugins/liblockstep.so,sockpath=lockstep-sparc.sock \ - -d plugin,nochain - -which will eventually report:: - - qemu-system-sparc: warning: nic lance.0 has no peer - @ 0x000000ffd06678 vs 0x000000ffd001e0 (2/1 since last) - @ 0x000000ffd07d9c vs 0x000000ffd06678 (3/1 since last) - Δ insn_count @ 0x000000ffd07d9c (809900609) vs 0x000000ffd06678 (809900612) - previously @ 0x000000ffd06678/10 (809900609 insns) - previously @ 0x000000ffd001e0/4 (809900599 insns) - previously @ 0x000000ffd080ac/2 (809900595 insns) - previously @ 0x000000ffd08098/5 (809900593 insns) - previously @ 0x000000ffd080c0/1 (809900588 insns) - -- contrib/plugins/hwprofile.c - -The hwprofile tool can only be used with system emulation and allows -the user to see what hardware is accessed how often. It has a number of options: - - * track=read or track=write - - By default the plugin tracks both reads and writes. You can use one - of these options to limit the tracking to just one class of accesses. - - * source - - Will include a detailed break down of what the guest PC that made the - access was. Not compatible with the pattern option. Example output:: - - cirrus-low-memory @ 0xfffffd00000a0000 - pc:fffffc0000005cdc, 1, 256 - pc:fffffc0000005ce8, 1, 256 - pc:fffffc0000005cec, 1, 256 - - * pattern - - Instead break down the accesses based on the offset into the HW - region. This can be useful for seeing the most used registers of a - device. Example output:: - - pci0-conf @ 0xfffffd01fe000000 - off:00000004, 1, 1 - off:00000010, 1, 3 - off:00000014, 1, 3 - off:00000018, 1, 2 - off:0000001c, 1, 2 - off:00000020, 1, 2 - ... - -- contrib/plugins/execlog.c - -The execlog tool traces executed instructions with memory access. It can be used -for debugging and security analysis purposes. -Please be aware that this will generate a lot of output. - -The plugin needs default argument:: - - $ qemu-system-arm $(QEMU_ARGS) \ - -plugin ./contrib/plugins/libexeclog.so -d plugin - -which will output an execution trace following this structure:: - - # vCPU, vAddr, opcode, disassembly[, load/store, memory addr, device]... - 0, 0xa12, 0xf8012400, "movs r4, #0" - 0, 0xa14, 0xf87f42b4, "cmp r4, r6" - 0, 0xa16, 0xd206, "bhs #0xa26" - 0, 0xa18, 0xfff94803, "ldr r0, [pc, #0xc]", load, 0x00010a28, RAM - 0, 0xa1a, 0xf989f000, "bl #0xd30" - 0, 0xd30, 0xfff9b510, "push {r4, lr}", store, 0x20003ee0, RAM, store, 0x20003ee4, RAM - 0, 0xd32, 0xf9893014, "adds r0, #0x14" - 0, 0xd34, 0xf9c8f000, "bl #0x10c8" - 0, 0x10c8, 0xfff96c43, "ldr r3, [r0, #0x44]", load, 0x200000e4, RAM - -Please note that you need to configure QEMU with Capstone support to get disassembly. - -The output can be filtered to only track certain instructions or -addresses using the ``ifilter`` or ``afilter`` options. You can stack the -arguments if required:: - - $ qemu-system-arm $(QEMU_ARGS) \ - -plugin ./contrib/plugins/libexeclog.so,ifilter=st1w,afilter=0x40001808 -d plugin - -This plugin can also dump registers when they change value. Specify the name of the -registers with multiple ``reg`` options. You can also use glob style matching if you wish:: - - $ qemu-system-arm $(QEMU_ARGS) \ - -plugin ./contrib/plugins/libexeclog.so,reg=\*_el2,reg=sp -d plugin - -Be aware that each additional register to check will slow down -execution quite considerably. You can optimise the number of register -checks done by using the rdisas option. This will only instrument -instructions that mention the registers in question in disassembly. -This is not foolproof as some instructions implicitly change -instructions. You can use the ifilter to catch these cases: - - $ qemu-system-arm $(QEMU_ARGS) \ - -plugin ./contrib/plugins/libexeclog.so,ifilter=msr,ifilter=blr,reg=x30,reg=\*_el1,rdisas=on - -- contrib/plugins/cache.c - -Cache modelling plugin that measures the performance of a given L1 cache -configuration, and optionally a unified L2 per-core cache when a given working -set is run:: - - $ qemu-x86_64 -plugin ./contrib/plugins/libcache.so \ - -d plugin -D cache.log ./tests/tcg/x86_64-linux-user/float_convs - -will report the following:: - - core #, data accesses, data misses, dmiss rate, insn accesses, insn misses, imiss rate - 0 996695 508 0.0510% 2642799 18617 0.7044% - - address, data misses, instruction - 0x424f1e (_int_malloc), 109, movq %rax, 8(%rcx) - 0x41f395 (_IO_default_xsputn), 49, movb %dl, (%rdi, %rax) - 0x42584d (ptmalloc_init.part.0), 33, movaps %xmm0, (%rax) - 0x454d48 (__tunables_init), 20, cmpb $0, (%r8) - ... - - address, fetch misses, instruction - 0x4160a0 (__vfprintf_internal), 744, movl $1, %ebx - 0x41f0a0 (_IO_setb), 744, endbr64 - 0x415882 (__vfprintf_internal), 744, movq %r12, %rdi - 0x4268a0 (__malloc), 696, andq $0xfffffffffffffff0, %rax - ... - -The plugin has a number of arguments, all of them are optional: - - * limit=N - - Print top N icache and dcache thrashing instructions along with their - address, number of misses, and its disassembly. (default: 32) - - * icachesize=N - * iblksize=B - * iassoc=A - - Instruction cache configuration arguments. They specify the cache size, block - size, and associativity of the instruction cache, respectively. - (default: N = 16384, B = 64, A = 8) - - * dcachesize=N - * dblksize=B - * dassoc=A - - Data cache configuration arguments. They specify the cache size, block size, - and associativity of the data cache, respectively. - (default: N = 16384, B = 64, A = 8) - - * evict=POLICY - - Sets the eviction policy to POLICY. Available policies are: :code:`lru`, - :code:`fifo`, and :code:`rand`. The plugin will use the specified policy for - both instruction and data caches. (default: POLICY = :code:`lru`) - - * cores=N - - Sets the number of cores for which we maintain separate icache and dcache. - (default: for linux-user, N = 1, for full system emulation: N = cores - available to guest) - - * l2=on - - Simulates a unified L2 cache (stores blocks for both instructions and data) - using the default L2 configuration (cache size = 2MB, associativity = 16-way, - block size = 64B). - - * l2cachesize=N - * l2blksize=B - * l2assoc=A - - L2 cache configuration arguments. They specify the cache size, block size, and - associativity of the L2 cache, respectively. Setting any of the L2 - configuration arguments implies ``l2=on``. - (default: N = 2097152 (2MB), B = 64, A = 16) - Plugin API ========== diff --git a/docs/devel/acpi-bits.rst b/docs/devel/testing/acpi-bits.rst index 1ec394f..9a4d716 100644 --- a/docs/devel/acpi-bits.rst +++ b/docs/devel/testing/acpi-bits.rst @@ -1,6 +1,6 @@ -============================================================================= -ACPI/SMBIOS avocado tests using biosbits -============================================================================= +================================== +ACPI/SMBIOS testing using biosbits +================================== ************ Introduction ************ @@ -30,26 +30,31 @@ OS modules are generally written using low level languages such as C and low level assembly machine language. Writing test routines in a low level language makes things more cumbersome. These and other reasons makes using bios-bits very attractive for testing bioses. More details on the inspiration -for developing biosbits and its real life uses can be found in [#a]_ and [#b]_. +for developing biosbits and its real life uses were presented `at Plumbers +in 2011 <Plumbers_>`__ and `at Linux.conf.au in 2012 <Linux.conf.au_>`__. -For QEMU, we maintain a fork of bios bits in gitlab along with all the -dependent submodules `here <https://gitlab.com/qemu-project/biosbits-bits>`__. -This fork contains numerous fixes, a newer acpica and changes specific to -running this avocado QEMU tests using bits. The author of this document -is the sole maintainer of the QEMU fork of bios bits repository. For more -information, please see author's `FOSDEM talk on this bios-bits based test -framework <https://fosdem.org/2024/schedule/event/fosdem-2024-2262-exercising-qemu-generated-acpi-smbios-tables-using-biosbits-from-within-a-guest-vm-/>`__. +For QEMU, we maintain a fork of bios bits in `gitlab`_, along with all +the dependent submodules. This fork contains numerous fixes, a newer +acpica and changes specific to running these functional QEMU tests using +bits. The author of this document is the current maintainer of the QEMU +fork of bios bits repository. For more information, please see `the +author's FOSDEM presentation <FOSDEM_>`__ on this bios-bits based test framework. + +.. _Plumbers: https://blog.linuxplumbersconf.org/2011/ocw/system/presentations/867/original/bits.pdf +.. _Linux.conf.au: https://www.youtube.com/watch?v=36QIepyUuhg +.. _gitlab: https://gitlab.com/qemu-project/biosbits-bits +.. _FOSDEM: https://fosdem.org/2024/schedule/event/fosdem-2024-2262-exercising-qemu-generated-acpi-smbios-tables-using-biosbits-from-within-a-guest-vm-/ ********************************* Description of the test framework ********************************* -Under the directory ``tests/avocado/``, ``acpi-bits.py`` is a QEMU avocado -test that drives all this. +Under the directory ``tests/functional/``, ``test_acpi_bits.py`` is a QEMU +functional test that drives all this. A brief description of the various test files follows. -Under ``tests/avocado/`` as the root we have: +Under ``tests/functional/`` as the root we have: :: @@ -60,12 +65,12 @@ Under ``tests/avocado/`` as the root we have: │ ├── smbios.py2 │ ├── testacpi.py2 │ └── testcpuid.py2 - ├── acpi-bits.py + ├── test_acpi_bits.py -* ``tests/avocado``: +* ``tests/functional``: - ``acpi-bits.py``: - This is the main python avocado test script that generates a + ``test_acpi_bits.py``: + This is the main python functional test script that generates a biosbits iso. It then spawns a QEMU VM with it, collects the log and reports test failures. This is the script one would be interested in if they wanted to add or change some component of the log parsing, add a new command line @@ -79,35 +84,22 @@ Under ``tests/avocado/`` as the root we have: you to inspect and run the specific commands manually. In order to run this test, please perform the following steps from the QEMU - build directory: - :: - - $ make check-venv (needed only the first time to create the venv) - $ ./pyvenv/bin/avocado run -t acpi tests/avocado - - The above will run all acpi avocado tests including this one. - In order to run the individual tests, perform the following: + build directory (assuming that the sources are in ".."): :: - $ ./pyvenv/bin/avocado run tests/avocado/acpi-bits.py --tap - - - The above will produce output in tap format. You can omit "--tap -" in the - end and it will produce output like the following: - :: + $ export PYTHONPATH=../python:../tests/functional + $ export QEMU_TEST_QEMU_BINARY=$PWD/qemu-system-x86_64 + $ python3 ../tests/functional/test_acpi_bits.py - $ ./pyvenv/bin/avocado run tests/avocado/acpi-bits.py - Fetching asset from tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits - JOB ID : eab225724da7b64c012c65705dc2fa14ab1defef - JOB LOG : /home/anisinha/avocado/job-results/job-2022-10-10T17.58-eab2257/job.log - (1/1) tests/avocado/acpi-bits.py:AcpiBitsTest.test_acpi_smbios_bits: PASS (33.09 s) - RESULTS : PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | CANCEL 0 - JOB TIME : 39.22 s + The above will run all acpi-bits functional tests (producing output in + tap format). - You can inspect the log file for more information about the run or in order - to diagnoze issues. If you pass V=1 in the environment, more diagnostic logs - would be found in the test log. + You can inspect the log files in tests/functional/x86_64/test_acpi_bits.*/ + for more information about the run or in order to diagnoze issues. + If you pass V=1 in the environment, more diagnostic logs will be put into + the test log. -* ``tests/avocado/acpi-bits/bits-config``: +* ``tests/functional/acpi-bits/bits-config``: This location contains biosbits configuration files that determine how the software runs the tests. @@ -117,7 +109,7 @@ Under ``tests/avocado/`` as the root we have: or actions are performed by bits. The description of the config options are provided in the file itself. -* ``tests/avocado/acpi-bits/bits-tests``: +* ``tests/functional/acpi-bits/bits-tests``: This directory contains biosbits python based tests that are run from within the biosbits environment in the spawned VM. New additions of test cases can @@ -155,13 +147,9 @@ Under ``tests/avocado/`` as the root we have: (a) They are python2.7 based scripts and not python 3 scripts. (b) They are run from within the bios bits VM and is not subjected to QEMU build/test python script maintenance and dependency resolutions. - (c) They need not be loaded by avocado framework when running tests. + (c) They need not be loaded by the test framework by accident when running + tests. Author: Ani Sinha <anisinha@redhat.com> -References: ------------ -.. [#a] https://blog.linuxplumbersconf.org/2011/ocw/system/presentations/867/original/bits.pdf -.. [#b] https://www.youtube.com/watch?v=36QIepyUuhg -.. [#c] https://fosdem.org/2024/schedule/event/fosdem-2024-2262-exercising-qemu-generated-acpi-smbios-tables-using-biosbits-from-within-a-guest-vm-/ diff --git a/docs/devel/testing/blkdebug.rst b/docs/devel/testing/blkdebug.rst new file mode 100644 index 0000000..63887c9 --- /dev/null +++ b/docs/devel/testing/blkdebug.rst @@ -0,0 +1,177 @@ +Block I/O error injection using ``blkdebug`` +============================================ + +.. + Copyright (C) 2014-2015 Red Hat Inc + + This work is licensed under the terms of the GNU GPL, version 2 or later. See + the COPYING file in the top-level directory. + +The ``blkdebug`` block driver is a rule-based error injection engine. It can be +used to exercise error code paths in block drivers including ``ENOSPC`` (out of +space) and ``EIO``. + +This document gives an overview of the features available in ``blkdebug``. + +Background +---------- +Block drivers have many error code paths that handle I/O errors. Image formats +are especially complex since metadata I/O errors during cluster allocation or +while updating tables happen halfway through request processing and require +discipline to keep image files consistent. + +Error injection allows test cases to trigger I/O errors at specific points. +This way, all error paths can be tested to make sure they are correct. + +Rules +----- +The ``blkdebug`` block driver takes a list of "rules" that tell the error injection +engine when to fail an I/O request. + +Each I/O request is evaluated against the rules. If a rule matches the request +then its "action" is executed. + +Rules can be placed in a configuration file; the configuration file +follows the same .ini-like format used by QEMU's ``-readconfig`` option, and +each section of the file represents a rule. + +The following configuration file defines a single rule:: + + $ cat blkdebug.conf + [inject-error] + event = "read_aio" + errno = "28" + +This rule fails all aio read requests with ``ENOSPC`` (28). Note that the errno +value depends on the host. On Linux, see +``/usr/include/asm-generic/errno-base.h`` for errno values. + +Invoke QEMU as follows:: + + $ qemu-system-x86_64 + -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \ + -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0 + +Rules support the following attributes: + +``event`` + which type of operation to match (e.g. ``read_aio``, ``write_aio``, + ``flush_to_os``, ``flush_to_disk``). See `Events`_ for + information on events. + +``state`` + (optional) the engine must be in this state number in order for this + rule to match. See `State transitions`_ for information + on states. + +``errno`` + the numeric errno value to return when a request matches this rule. + The errno values depend on the host since the numeric values are not + standardized in the POSIX specification. + +``sector`` + (optional) a sector number that the request must overlap in order to + match this rule + +``once`` + (optional, default ``off``) only execute this action on the first + matching request + +``immediately`` + (optional, default ``off``) return a NULL ``BlockAIOCB`` + pointer and fail without an errno instead. This + exercises the code path where ``BlockAIOCB`` fails and the + caller's ``BlockCompletionFunc`` is not invoked. + +Events +------ +Block drivers provide information about the type of I/O request they are about +to make so rules can match specific types of requests. For example, the ``qcow2`` +block driver tells ``blkdebug`` when it accesses the L1 table so rules can match +only L1 table accesses and not other metadata or guest data requests. + +The core events are: + +``read_aio`` + guest data read + +``write_aio`` + guest data write + +``flush_to_os`` + write out unwritten block driver state (e.g. cached metadata) + +``flush_to_disk`` + flush the host block device's disk cache + +See ``qapi/block-core.json:BlkdebugEvent`` for the full list of events. +You may need to grep block driver source code to understand the +meaning of specific events. + +State transitions +----------------- +There are cases where more power is needed to match a particular I/O request in +a longer sequence of requests. For example:: + + write_aio + flush_to_disk + write_aio + +How do we match the 2nd ``write_aio`` but not the first? This is where state +transitions come in. + +The error injection engine has an integer called the "state" that always starts +initialized to 1. The state integer is internal to ``blkdebug`` and cannot be +observed from outside but rules can interact with it for powerful matching +behavior. + +Rules can be conditional on the current state and they can transition to a new +state. + +When a rule's "state" attribute is non-zero then the current state must equal +the attribute in order for the rule to match. + +For example, to match the 2nd write_aio:: + + [set-state] + event = "write_aio" + state = "1" + new_state = "2" + + [inject-error] + event = "write_aio" + state = "2" + errno = "5" + +The first ``write_aio`` request matches the ``set-state`` rule and transitions from +state 1 to state 2. Once state 2 has been entered, the ``set-state`` rule no +longer matches since it requires state 1. But the ``inject-error`` rule now +matches the next ``write_aio`` request and injects ``EIO`` (5). + +State transition rules support the following attributes: + +``event`` + which type of operation to match (e.g. ``read_aio``, ``write_aio``, + ``flush_to_os`, ``flush_to_disk``). See `Events`_ for + information on events. + +``state`` + (optional) the engine must be in this state number in order for this + rule to match + +``new_state`` + transition to this state number + +Suspend and resume +------------------ +Exercising code paths in block drivers may require specific ordering amongst +concurrent requests. The "breakpoint" feature allows requests to be halted on +a ``blkdebug`` event and resumed later. This makes it possible to achieve +deterministic ordering when multiple requests are in flight. + +Breakpoints on ``blkdebug`` events are associated with a user-defined ``tag`` string. +This tag serves as an identifier by which the request can be resumed at a later +point. + +See the ``qemu-io(1)`` ``break``, ``resume``, ``remove_break``, and ``wait_break`` +commands for details. diff --git a/docs/devel/blkverify.txt b/docs/devel/testing/blkverify.rst index aca826c..2a71778 100644 --- a/docs/devel/blkverify.txt +++ b/docs/devel/testing/blkverify.rst @@ -1,8 +1,10 @@ -= Block driver correctness testing with blkverify = +Block driver correctness testing with ``blkverify`` +=================================================== -== Introduction == +Introduction +------------ -This document describes how to use the blkverify protocol to test that a block +This document describes how to use the ``blkverify`` protocol to test that a block driver is operating correctly. It is difficult to test and debug block drivers against real guests. Often @@ -11,12 +13,13 @@ of the executable. Other times obscure errors are raised by a program inside the guest. These issues are extremely hard to trace back to bugs in the block driver. -Blkverify solves this problem by catching data corruption inside QEMU the first +``blkverify`` solves this problem by catching data corruption inside QEMU the first time bad data is read and reporting the disk sector that is corrupted. -== How it works == +How it works +------------ -The blkverify protocol has two child block devices, the "test" device and the +The ``blkverify`` protocol has two child block devices, the "test" device and the "raw" device. Read/write operations are mirrored to both devices so their state should always be in sync. @@ -25,13 +28,14 @@ contents to the "test" image. The idea is that the "raw" device will handle read/write operations correctly and not corrupt data. It can be used as a reference for comparison against the "test" device. -After a mirrored read operation completes, blkverify will compare the data and +After a mirrored read operation completes, ``blkverify`` will compare the data and raise an error if it is not identical. This makes it possible to catch the first instance where corrupt data is read. -== Example == +Example +------- -Imagine raw.img has 0xcd repeated throughout its first sector: +Imagine raw.img has 0xcd repeated throughout its first sector:: $ ./qemu-io -c 'read -v 0 512' raw.img 00000000: cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd ................ @@ -42,7 +46,7 @@ Imagine raw.img has 0xcd repeated throughout its first sector: read 512/512 bytes at offset 0 512.000000 bytes, 1 ops; 0.0000 sec (97.656 MiB/sec and 200000.0000 ops/sec) -And test.img is corrupt, its first sector is zeroed when it shouldn't be: +And test.img is corrupt, its first sector is zeroed when it shouldn't be:: $ ./qemu-io -c 'read -v 0 512' test.img 00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ @@ -53,17 +57,17 @@ And test.img is corrupt, its first sector is zeroed when it shouldn't be: read 512/512 bytes at offset 0 512.000000 bytes, 1 ops; 0.0000 sec (81.380 MiB/sec and 166666.6667 ops/sec) -This error is caught by blkverify: +This error is caught by ``blkverify``:: $ ./qemu-io -c 'read 0 512' blkverify:a.img:b.img blkverify: read sector_num=0 nb_sectors=4 contents mismatch in sector 0 -A more realistic scenario is verifying the installation of a guest OS: +A more realistic scenario is verifying the installation of a guest OS:: $ ./qemu-img create raw.img 16G $ ./qemu-img create -f qcow2 test.qcow2 16G $ ./qemu-system-x86_64 -cdrom debian.iso \ -drive file=blkverify:raw.img:test.qcow2 -If the installation is aborted when blkverify detects corruption, use qemu-io +If the installation is aborted when ``blkverify`` detects corruption, use ``qemu-io`` to explore the contents of the disk image at the sector in question. diff --git a/docs/devel/ci-jobs.rst.inc b/docs/devel/testing/ci-jobs.rst.inc index 3756bbe..f1c541c 100644 --- a/docs/devel/ci-jobs.rst.inc +++ b/docs/devel/testing/ci-jobs.rst.inc @@ -126,10 +126,10 @@ QEMU_JOB_PUBLISH The job is for publishing content after a branch has been merged into the upstream default branch. -QEMU_JOB_AVOCADO -~~~~~~~~~~~~~~~~ +QEMU_JOB_FUNCTIONAL +~~~~~~~~~~~~~~~~~~~ -The job runs the Avocado integration test suite +The job runs the functional test suite Contributor controlled runtime variables ---------------------------------------- @@ -149,13 +149,12 @@ the jobs to be manually started from the UI Set this variable to 2 to create the pipelines and run all the jobs immediately, as was the historical behaviour -QEMU_CI_AVOCADO_TESTING -~~~~~~~~~~~~~~~~~~~~~~~ -By default, tests using the Avocado framework are not run automatically in -the pipelines (because multiple artifacts have to be downloaded, and if -these artifacts are not already cached, downloading them make the jobs -reach the timeout limit). Set this variable to have the tests using the -Avocado framework run automatically. +QEMU_CI_FUNCTIONAL +~~~~~~~~~~~~~~~~~~ +By default, tests using the functional framework are not run automatically +in the pipelines (because multiple artifacts have to be downloaded, which +might cause a lot of network traffic). Set this variable to have the tests +using the functional framework run automatically. Other misc variables -------------------- diff --git a/docs/devel/ci-runners.rst.inc b/docs/devel/testing/ci-runners.rst.inc index 67b23d3..67b23d3 100644 --- a/docs/devel/ci-runners.rst.inc +++ b/docs/devel/testing/ci-runners.rst.inc diff --git a/docs/devel/testing/ci.rst b/docs/devel/testing/ci.rst new file mode 100644 index 0000000..e21d39d --- /dev/null +++ b/docs/devel/testing/ci.rst @@ -0,0 +1,34 @@ +.. _ci: + +Continuous Integration (CI) +=========================== + +Continuous integration (CI) requires the builds of the entire application and +the execution of a comprehensive set of automated tests every time there is a +need to commit any set of changes [1]_. The automated tests are composed +of unit, functional and other tests. + +Most of QEMU's CI is run on GitLab's infrastructure although a number +of other CI services are used for specialised purposes. The most up to +date information about them and their status can be found on the +`project wiki testing page <https://wiki.qemu.org/Testing/CI>`_. + +These tests are also used as gating tests before merging pull requests. +A gating test restricts the move of code from one stage to another on a +test/deployment pipeline. The step move is granted with approval. The approval +can be a manual intervention or a set of tests succeeding [2]_. + +On QEMU, the gating process happens during the pull request. The approval is +done by the project leader running its own set of tests. The pull request gets +merged when the tests succeed. + +.. include:: ci-jobs.rst.inc +.. include:: ci-runners.rst.inc + +References +---------- + +.. [1] Humble, Jez & Farley, David (2010). Continuous Delivery: + Reliable Software Releases Through Build, Test, and Deployment, p. 55. +.. [2] Humble, Jez & Farley, David (2010). Continuous Delivery: + Reliable Software Releases Through Build, Test, and Deployment, p. 122. diff --git a/docs/devel/testing/functional.rst b/docs/devel/testing/functional.rst new file mode 100644 index 0000000..9e56dd1 --- /dev/null +++ b/docs/devel/testing/functional.rst @@ -0,0 +1,380 @@ +.. _checkfunctional-ref: + +Functional testing with Python +============================== + +The ``tests/functional`` directory hosts functional tests written in +Python. They are usually higher level tests, and may interact with +external resources and with various guest operating systems. + +The tests should be written in the style of the Python `unittest`_ framework, +using stdio for the TAP protocol. The folder ``tests/functional/qemu_test`` +provides classes (e.g. the ``QemuBaseTest``, ``QemuUserTest`` and the +``QemuSystemTest`` classes) and utility functions that help to get your test +into the right shape, e.g. by replacing the 'stdout' python object to redirect +the normal output of your test to stderr instead. + +Note that if you don't use one of the QemuBaseTest based classes for your +test, or if you spawn subprocesses from your test, you have to make sure +that there is no TAP-incompatible output written to stdio, e.g. either by +prefixing every line with a "# " to mark the output as a TAP comment, or +e.g. by capturing the stdout output of subprocesses (redirecting it to +stderr is OK). + +Tests based on ``qemu_test.QemuSystemTest`` can easily: + + * Customize the command line arguments given to the convenience + ``self.vm`` attribute (a QEMUMachine instance) + + * Interact with the QEMU monitor, send QMP commands and check + their results + + * Interact with the guest OS, using the convenience console device + (which may be useful to assert the effectiveness and correctness of + command line arguments or QMP commands) + + * Download (and cache) remote data files, such as firmware and kernel + images + +Running tests +------------- + +You can run the functional tests simply by executing: + +.. code:: + + make check-functional + +It is also possible to run tests for a certain target only, for example +the following line will only run the tests for the x86_64 target: + +.. code:: + + make check-functional-x86_64 + +To run a single test file without the meson test runner, you can also +execute the file directly by specifying two environment variables first, +the PYTHONPATH that has to include the python folder and the tests/functional +folder of the source tree, and QEMU_TEST_QEMU_BINARY that has to point +to the QEMU binary that should be used for the test. The current working +directory should be your build folder. For example:: + + $ export PYTHONPATH=../python:../tests/functional + $ export QEMU_TEST_QEMU_BINARY=$PWD/qemu-system-x86_64 + $ pyvenv/bin/python3 ../tests/functional/test_file.py + +The test framework will automatically purge any scratch files created during +the tests. If needing to debug a failed test, it is possible to keep these +files around on disk by setting ```QEMU_TEST_KEEP_SCRATCH=1``` as an env +variable. Any preserved files will be deleted the next time the test is run +without this variable set. + +Logging +------- + +The framework collects log files for each test in the build directory +in the following subfolder:: + + <builddir>/tests/functional/<arch>/<fileid>.<classid>.<testname>/ + +There are usually three log files: + +* ``base.log`` contains the generic logging information that is written + by the calls to the logging functions in the test code (e.g. by calling + the ``self.log.info()`` or ``self.log.debug()`` functions). +* ``console.log`` contains the output of the serial console of the guest. +* ``default.log`` contains the output of QEMU. This file could be named + differently if the test chooses to use a different identifier for + the guest VM (e.g. when the test spins up multiple VMs). + +Introduction to writing tests +----------------------------- + +The ``tests/functional/qemu_test`` directory provides the ``qemu_test`` +Python module, containing the ``qemu_test.QemuSystemTest`` class. +Here is a simple usage example: + +.. code:: + + #!/usr/bin/env python3 + + from qemu_test import QemuSystemTest + + class Version(QemuSystemTest): + + def test_qmp_human_info_version(self): + self.vm.launch() + res = self.vm.cmd('human-monitor-command', + command_line='info version') + self.assertRegex(res, r'^(\d+\.\d+\.\d)') + + if __name__ == '__main__': + QemuSystemTest.main() + +By providing the "hash bang" line at the beginning of the script, marking +the file as executable and by calling into QemuSystemTest.main(), the test +can also be run stand-alone, without a test runner. OTOH when run via a test +runner, the QemuSystemTest.main() function takes care of running the test +functions in the right fassion (e.g. with TAP output that is required by the +meson test runner). + +The ``qemu_test.QemuSystemTest`` base test class +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``qemu_test.QemuSystemTest`` class has a number of characteristics +that are worth being mentioned. + +First of all, it attempts to give each test a ready to use QEMUMachine +instance, available at ``self.vm``. Because many tests will tweak the +QEMU command line, launching the QEMUMachine (by using ``self.vm.launch()``) +is left to the test writer. + +The base test class has also support for tests with more than one +QEMUMachine. The way to get machines is through the ``self.get_vm()`` +method which will return a QEMUMachine instance. The ``self.get_vm()`` +method accepts arguments that will be passed to the QEMUMachine creation +and also an optional ``name`` attribute so you can identify a specific +machine and get it more than once through the tests methods. A simple +and hypothetical example follows: + +.. code:: + + from qemu_test import QemuSystemTest + + class MultipleMachines(QemuSystemTest): + def test_multiple_machines(self): + first_machine = self.get_vm() + second_machine = self.get_vm() + self.get_vm(name='third_machine').launch() + + first_machine.launch() + second_machine.launch() + + first_res = first_machine.cmd( + 'human-monitor-command', + command_line='info version') + + second_res = second_machine.cmd( + 'human-monitor-command', + command_line='info version') + + third_res = self.get_vm(name='third_machine').cmd( + 'human-monitor-command', + command_line='info version') + + self.assertEqual(first_res, second_res, third_res) + +At test "tear down", ``qemu_test.QemuSystemTest`` handles all the QEMUMachines +shutdown. + +QEMUMachine +----------- + +The QEMUMachine API is already widely used in the Python iotests, +device-crash-test and other Python scripts. It's a wrapper around the +execution of a QEMU binary, giving its users: + + * the ability to set command line arguments to be given to the QEMU + binary + + * a ready to use QMP connection and interface, which can be used to + send commands and inspect its results, as well as asynchronous + events + + * convenience methods to set commonly used command line arguments in + a more succinct and intuitive way + +QEMU binary selection +^^^^^^^^^^^^^^^^^^^^^ + +The QEMU binary used for the ``self.vm`` QEMUMachine instance will +primarily depend on the value of the ``qemu_bin`` instance attribute. +If it is not explicitly set by the test code, its default value will +be the result the QEMU_TEST_QEMU_BINARY environment variable. + +Debugging hung QEMU +^^^^^^^^^^^^^^^^^^^ + +When test cases go wrong it may be helpful to debug a stalled QEMU +process. While the QEMUMachine class owns the primary QMP monitor +socket, it is possible to request a second QMP monitor be created +by setting the ``QEMU_TEST_QMP_BACKDOOR`` env variable to refer +to a UNIX socket name. The ``qmp-shell`` command can then be +attached to the stalled QEMU to examine its live state. + +Attribute reference +------------------- + +QemuBaseTest +^^^^^^^^^^^^ + +The following attributes are available on any ``qemu_test.QemuBaseTest`` +instance. + +arch +"""" + +The target architecture of the QEMU binary. + +Tests are also free to use this attribute value, for their own needs. +A test may, for instance, use this value when selecting the architecture +of a kernel or disk image to boot a VM with. + +qemu_bin +"""""""" + +The preserved value of the ``QEMU_TEST_QEMU_BINARY`` environment +variable. + +QemuUserTest +^^^^^^^^^^^^ + +The QemuUserTest class can be used for running an executable via the +usermode emulation binaries. + +QemuSystemTest +^^^^^^^^^^^^^^ + +The QemuSystemTest class can be used for running tests via one of the +qemu-system-* binaries. + +vm +"" + +A QEMUMachine instance, initially configured according to the given +``qemu_bin`` parameter. + +cpu +""" + +The cpu model that will be set to all QEMUMachine instances created +by the test. + +machine +""""""" + +The machine type that will be set to all QEMUMachine instances created +by the test. By using the set_machine() function of the QemuSystemTest +class to set this attribute, you can automatically check whether the +machine is available to skip the test in case it is not built into the +QEMU binary. + +Asset handling +-------------- + +Many functional tests download assets (e.g. Linux kernels, initrds, +firmware images, etc.) from the internet to be able to run tests with +them. This imposes additional challenges to the test framework. + +First there is the problem that some people might not have an +unconstrained internet connection, so such tests should not be run by +default when running ``make check``. To accomplish this situation, +the tests that download files should only be added to the "thorough" +speed mode in the meson.build file, while the "quick" speed mode is +fine for functional tests that can be run without downloading files. +``make check`` then only runs the quick functional tests along with +the other quick tests from the other test suites. If you choose to +run only ``make check-functional``, the "thorough" tests will be +executed, too. And to run all functional tests along with the others, +you can use something like:: + + make -j$(nproc) check SPEED=thorough + +The second problem with downloading files from the internet are time +constraints. The time for downloading files should not be taken into +account when the test is running and the timeout of the test is ticking +(since downloading can be very slow, depending on the network bandwidth). +This problem is solved by downloading the assets ahead of time, before +the tests are run. This pre-caching is done with the qemu_test.Asset +class. To use it in your test, declare an asset in your test class with +its URL and SHA256 checksum like this:: + + from qemu_test import Asset + + ASSET_somename = Asset( + ('https://www.qemu.org/assets/images/qemu_head_200.png'), + '34b74cad46ea28a2966c1d04e102510daf1fd73e6582b6b74523940d5da029dd') + +In your test function, you can then get the file name of the cached +asset like this:: + + def test_function(self): + file_path = self.ASSET_somename.fetch() + +The pre-caching will be done automatically when running +``make check-functional`` (but not when running e.g. +``make check-functional-<target>``). In case you just want to download +the assets without running the tests, you can do so by running:: + + make precache-functional + +The cache is populated in the ``~/.cache/qemu/download`` directory by +default, but the location can be changed by setting the +``QEMU_TEST_CACHE_DIR`` environment variable. + +Skipping tests +-------------- + +Since the test framework is based on the common Python unittest framework, +you can use the usual Python decorators which allow for easily skipping +tests running under certain conditions, for example, on the lack of a binary +on the test system or when the running environment is a CI system. For further +information about those decorators, please refer to: + + https://docs.python.org/3/library/unittest.html#skipping-tests-and-expected-failures + +While the conditions for skipping tests are often specifics of each one, there +are recurring scenarios identified by the QEMU developers and the use of +environment variables became a kind of standard way to enable/disable tests. + +Here is a list of the most used variables: + +QEMU_TEST_ALLOW_LARGE_STORAGE +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Tests which are going to fetch or produce assets considered *large* are not +going to run unless that ``QEMU_TEST_ALLOW_LARGE_STORAGE=1`` is exported on +the environment. + +The definition of *large* is a bit arbitrary here, but it usually means an +asset which occupies at least 1GB of size on disk when uncompressed. + +QEMU_TEST_ALLOW_UNTRUSTED_CODE +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +There are tests which will boot a kernel image or firmware that can be +considered not safe to run on the developer's workstation, thus they are +skipped by default. The definition of *not safe* is also arbitrary but +usually it means a blob which either its source or build process aren't +public available. + +You should export ``QEMU_TEST_ALLOW_UNTRUSTED_CODE=1`` on the environment in +order to allow tests which make use of those kind of assets. + +QEMU_TEST_FLAKY_TESTS +^^^^^^^^^^^^^^^^^^^^^ +Some tests are not working reliably and thus are disabled by default. +This includes tests that don't run reliably on GitLab's CI which +usually expose real issues that are rarely seen on developer machines +due to the constraints of the CI environment. If you encounter a +similar situation then raise a bug and then mark the test as shown on +the code snippet below: + +.. code:: + + # See https://gitlab.com/qemu-project/qemu/-/issues/nnnn + @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab') + def test(self): + do_something() + +Tests should not live in this state forever and should either be fixed +or eventually removed. + +QEMU_TEST_ALLOW_SLOW +^^^^^^^^^^^^^^^^^^^^ +Tests that have a very long runtime and might run into timeout issues +e.g. if the QEMU binary has been compiled with debugging options enabled. +To avoid these timeout issues by default and to save some precious CPU +cycles during normal testing, such tests are disabled by default unless +the QEMU_TEST_ALLOW_SLOW environment variable has been set. + + +.. _unittest: https://docs.python.org/3/library/unittest.html diff --git a/docs/devel/fuzzing.rst b/docs/devel/testing/fuzzing.rst index 3bfcb33..c3ac084 100644 --- a/docs/devel/fuzzing.rst +++ b/docs/devel/testing/fuzzing.rst @@ -21,11 +21,12 @@ Building the fuzzers To build the fuzzers, install a recent version of clang: Configure with (substitute the clang binaries with the version you installed). -Here, enable-sanitizers, is optional but it allows us to reliably detect bugs -such as out-of-bounds accesses, use-after-frees, double-frees etc.:: +Here, enable-asan and enable-ubsan are optional but they allow us to reliably +detect bugs such as out-of-bounds accesses, uses-after-free, double-frees +etc.:: - CC=clang-8 CXX=clang++-8 /path/to/configure --enable-fuzzing \ - --enable-sanitizers + CC=clang-8 CXX=clang++-8 /path/to/configure \ + --enable-fuzzing --enable-asan --enable-ubsan Fuzz targets are built similarly to system targets:: diff --git a/docs/devel/testing/index.rst b/docs/devel/testing/index.rst new file mode 100644 index 0000000..ccc2fc6 --- /dev/null +++ b/docs/devel/testing/index.rst @@ -0,0 +1,17 @@ +Testing QEMU +------------ + +Details about how to test QEMU and how it is integrated into our CI +testing infrastructure. + +.. toctree:: + :maxdepth: 3 + + main + qtest + functional + acpi-bits + ci + fuzzing + blkdebug + blkverify diff --git a/docs/devel/testing.rst b/docs/devel/testing/main.rst index 23d3f44..6b18ed8 100644 --- a/docs/devel/testing.rst +++ b/docs/devel/testing/main.rst @@ -3,13 +3,41 @@ Testing in QEMU =============== -This document describes the testing infrastructure in QEMU. +QEMU's testing infrastructure is fairly complex as it covers +everything from unit testing and exercising specific sub-systems all +the way to full blown functional tests. To get an overview of the +tests you can run ``make check-help`` from either the source or build +tree. + +Most (but not all) tests are also integrated as an automated test into +the meson build system so can be run directly from the build tree, +for example:: + + [./pyvenv/bin/]meson test --suite qemu:softfloat + +will run just the softfloat tests. + +An automated test is written with one of the test frameworks using its +generic test functions/classes. The test framework can run the tests and +report their success or failure [1]_. + +An automated test has essentially three parts: + +1. The test initialization of the parameters, where the expected parameters, + like inputs and expected results, are set up; +2. The call to the code that should be tested; +3. An assertion, comparing the result from the previous call with the expected + result set during the initialization of the parameters. If the result + matches the expected result, the test has been successful; otherwise, it has + failed. + +The rest of this document will cover the details for specific test +groups. Testing with "make check" ------------------------- -The "make check" testing family includes most of the C based tests in QEMU. For -a quick help, run ``make check-help`` from the source tree. +The "make check" testing family includes most of the C based tests in QEMU. The usual way to run these tests is: @@ -24,12 +52,22 @@ Before running tests, it is best to build QEMU programs first. Some tests expect the executables to exist and will fail with obscure messages if they cannot find them. +.. _unit-tests: + Unit tests ~~~~~~~~~~ -Unit tests, which can be invoked with ``make check-unit``, are simple C tests -that typically link to individual QEMU object files and exercise them by -calling exported functions. +A unit test is responsible for exercising individual software components as a +unit, like interfaces, data structures, and functionality, uncovering errors +within the boundaries of a component. The verification effort is in the +smallest software unit and focuses on the internal processing logic and data +structures. A test case of unit tests should be designed to uncover errors +due to erroneous computations, incorrect comparisons, or improper control +flow [2]_. + +In QEMU, unit tests can be invoked with ``make check-unit``. They are +simple C tests that typically link to individual QEMU object files and +exercise them by calling exported functions. If you are writing new code in QEMU, consider adding a unit test, especially for utility modules that are relatively stateless or have few dependencies. To @@ -111,6 +149,8 @@ successfully on various hosts. The following list shows some best practices: #ifdef in the codes. If the whole test suite cannot run on Windows, disable the build in the meson.build file. +.. _qapi-tests: + QAPI schema tests ~~~~~~~~~~~~~~~~~ @@ -145,6 +185,8 @@ check-block are in the "auto" group). See the "QEMU iotests" section below for more information. +.. _qemu-iotests: + QEMU iotests ------------ @@ -485,12 +527,6 @@ first to contribute the mapping to the ``libvirt-ci`` project: `CI <https://www.qemu.org/docs/master/devel/ci.html>`__ documentation page on how to trigger gitlab CI pipelines on your change. - * Please also trigger gitlab container generation pipelines on your change - for as many OS distros as practical to make sure that there are no - obvious breakages when adding the new pre-requisite. Please see - `CI <https://www.qemu.org/docs/master/devel/ci.html>`__ documentation - page on how to trigger gitlab CI pipelines on your change. - For enterprise distros that default to old, end-of-life versions of the Python runtime, QEMU uses a separate set of mappings that work with more recent versions. These can be found in ``tests/lcitool/mappings.yml``. @@ -619,20 +655,38 @@ Building and Testing with TSan It is possible to build and test with TSan, with a few additional steps. These steps are normally done automatically in the docker. -There is a one time patch needed in clang-9 or clang-10 at this time: +TSan is supported for clang and gcc. +One particularity of sanitizers is that all the code, including shared objects +dependencies, should be built with it. +In the case of TSan, any synchronization primitive from glib (GMutex for +instance) will not be recognized, and will lead to false positives. + +To build a tsan version of glib: .. code:: - sed -i 's/^const/static const/g' \ - /usr/lib/llvm-10/lib/clang/10.0.0/include/sanitizer/tsan_interface.h + $ git clone --depth=1 --branch=2.81.0 https://github.com/GNOME/glib.git + $ cd glib + $ CFLAGS="-O2 -g -fsanitize=thread" meson build + $ ninja -C build To configure the build for TSan: .. code:: - ../configure --enable-tsan --cc=clang-10 --cxx=clang++-10 \ + ../configure --enable-tsan \ --disable-werror --extra-cflags="-O0" +When executing qemu, don't forget to point to tsan glib: + +.. code:: + + $ glib_dir=/path/to/glib + $ export LD_LIBRARY_PATH=$glib_dir/build/gio:$glib_dir/build/glib:$glib_dir/build/gmodule:$glib_dir/build/gobject:$glib_dir/build/gthread + # check correct version is used + $ ldd build/qemu-x86_64 | grep glib + $ qemu-system-x86_64 ... + The runtime behavior of TSAN is controlled by the TSAN_OPTIONS environment variable. @@ -652,6 +706,8 @@ The above exitcode=0 has TSan continue without error if any warnings are found. This allows for running the test and then checking the warnings afterwards. If you want TSan to stop and exit with error on warnings, use exitcode=66. +.. _tsan-suppressions: + TSan Suppressions ~~~~~~~~~~~~~~~~~ Keep in mind that for any data race warning, although there might be a data race @@ -847,584 +903,21 @@ supported. To start the fuzzer, run Alternatively, some command different from ``qemu-img info`` can be tested, by changing the ``-c`` option. -Integration tests using the Avocado Framework ---------------------------------------------- - -The ``tests/avocado`` directory hosts integration tests. They're usually -higher level tests, and may interact with external resources and with -various guest operating systems. - -These tests are written using the Avocado Testing Framework (which must -be installed separately) in conjunction with a the ``avocado_qemu.Test`` -class, implemented at ``tests/avocado/avocado_qemu``. - -Tests based on ``avocado_qemu.Test`` can easily: - - * Customize the command line arguments given to the convenience - ``self.vm`` attribute (a QEMUMachine instance) - - * Interact with the QEMU monitor, send QMP commands and check - their results - - * Interact with the guest OS, using the convenience console device - (which may be useful to assert the effectiveness and correctness of - command line arguments or QMP commands) - - * Interact with external data files that accompany the test itself - (see ``self.get_data()``) - - * Download (and cache) remote data files, such as firmware and kernel - images - - * Have access to a library of guest OS images (by means of the - ``avocado.utils.vmimage`` library) - - * Make use of various other test related utilities available at the - test class itself and at the utility library: - - - http://avocado-framework.readthedocs.io/en/latest/api/test/avocado.html#avocado.Test - - http://avocado-framework.readthedocs.io/en/latest/api/utils/avocado.utils.html - -Running tests -~~~~~~~~~~~~~ - -You can run the avocado tests simply by executing: - -.. code:: - - make check-avocado - -This involves the automatic installation, from PyPI, of all the -necessary avocado-framework dependencies into the QEMU venv within the -build tree (at ``./pyvenv``). Test results are also saved within the -build tree (at ``tests/results``). - -Note: the build environment must be using a Python 3 stack, and have -the ``venv`` and ``pip`` packages installed. If necessary, make sure -``configure`` is called with ``--python=`` and that those modules are -available. On Debian and Ubuntu based systems, depending on the -specific version, they may be on packages named ``python3-venv`` and -``python3-pip``. - -It is also possible to run tests based on tags using the -``make check-avocado`` command and the ``AVOCADO_TAGS`` environment -variable: - -.. code:: - - make check-avocado AVOCADO_TAGS=quick - -Note that tags separated with commas have an AND behavior, while tags -separated by spaces have an OR behavior. For more information on Avocado -tags, see: - - https://avocado-framework.readthedocs.io/en/latest/guides/user/chapters/tags.html - -To run a single test file, a couple of them, or a test within a file -using the ``make check-avocado`` command, set the ``AVOCADO_TESTS`` -environment variable with the test files or test names. To run all -tests from a single file, use: - - .. code:: - - make check-avocado AVOCADO_TESTS=$FILEPATH - -The same is valid to run tests from multiple test files: - - .. code:: - - make check-avocado AVOCADO_TESTS='$FILEPATH1 $FILEPATH2' - -To run a single test within a file, use: - - .. code:: - - make check-avocado AVOCADO_TESTS=$FILEPATH:$TESTCLASS.$TESTNAME - -The same is valid to run single tests from multiple test files: - - .. code:: - - make check-avocado AVOCADO_TESTS='$FILEPATH1:$TESTCLASS1.$TESTNAME1 $FILEPATH2:$TESTCLASS2.$TESTNAME2' - -The scripts installed inside the virtual environment may be used -without an "activation". For instance, the Avocado test runner -may be invoked by running: - - .. code:: - - pyvenv/bin/avocado run $OPTION1 $OPTION2 tests/avocado/ - -Note that if ``make check-avocado`` was not executed before, it is -possible to create the Python virtual environment with the dependencies -needed running: - - .. code:: - - make check-venv - -It is also possible to run tests from a single file or a single test within -a test file. To run tests from a single file within the build tree, use: - - .. code:: - - pyvenv/bin/avocado run tests/avocado/$TESTFILE - -To run a single test within a test file, use: - - .. code:: - - pyvenv/bin/avocado run tests/avocado/$TESTFILE:$TESTCLASS.$TESTNAME - -Valid test names are visible in the output from any previous execution -of Avocado or ``make check-avocado``, and can also be queried using: - - .. code:: - - pyvenv/bin/avocado list tests/avocado - -Manual Installation -~~~~~~~~~~~~~~~~~~~ - -To manually install Avocado and its dependencies, run: - -.. code:: - - pip install --user avocado-framework - -Alternatively, follow the instructions on this link: - - https://avocado-framework.readthedocs.io/en/latest/guides/user/chapters/installing.html - -Overview -~~~~~~~~ - -The ``tests/avocado/avocado_qemu`` directory provides the -``avocado_qemu`` Python module, containing the ``avocado_qemu.Test`` -class. Here's a simple usage example: - -.. code:: - - from avocado_qemu import QemuSystemTest - - - class Version(QemuSystemTest): - """ - :avocado: tags=quick - """ - def test_qmp_human_info_version(self): - self.vm.launch() - res = self.vm.cmd('human-monitor-command', - command_line='info version') - self.assertRegex(res, r'^(\d+\.\d+\.\d)') - -To execute your test, run: - -.. code:: - - avocado run version.py - -Tests may be classified according to a convention by using docstring -directives such as ``:avocado: tags=TAG1,TAG2``. To run all tests -in the current directory, tagged as "quick", run: - -.. code:: - - avocado run -t quick . - -The ``avocado_qemu.Test`` base test class -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The ``avocado_qemu.Test`` class has a number of characteristics that -are worth being mentioned right away. - -First of all, it attempts to give each test a ready to use QEMUMachine -instance, available at ``self.vm``. Because many tests will tweak the -QEMU command line, launching the QEMUMachine (by using ``self.vm.launch()``) -is left to the test writer. - -The base test class has also support for tests with more than one -QEMUMachine. The way to get machines is through the ``self.get_vm()`` -method which will return a QEMUMachine instance. The ``self.get_vm()`` -method accepts arguments that will be passed to the QEMUMachine creation -and also an optional ``name`` attribute so you can identify a specific -machine and get it more than once through the tests methods. A simple -and hypothetical example follows: - -.. code:: - - from avocado_qemu import QemuSystemTest - - - class MultipleMachines(QemuSystemTest): - def test_multiple_machines(self): - first_machine = self.get_vm() - second_machine = self.get_vm() - self.get_vm(name='third_machine').launch() - - first_machine.launch() - second_machine.launch() - - first_res = first_machine.cmd( - 'human-monitor-command', - command_line='info version') - - second_res = second_machine.cmd( - 'human-monitor-command', - command_line='info version') - - third_res = self.get_vm(name='third_machine').cmd( - 'human-monitor-command', - command_line='info version') - - self.assertEqual(first_res, second_res, third_res) - -At test "tear down", ``avocado_qemu.Test`` handles all the QEMUMachines -shutdown. - -The ``avocado_qemu.LinuxTest`` base test class -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The ``avocado_qemu.LinuxTest`` is further specialization of the -``avocado_qemu.Test`` class, so it contains all the characteristics of -the later plus some extra features. - -First of all, this base class is intended for tests that need to -interact with a fully booted and operational Linux guest. At this -time, it uses a Fedora 31 guest image. The most basic example looks -like this: - -.. code:: - - from avocado_qemu import LinuxTest - - - class SomeTest(LinuxTest): - - def test(self): - self.launch_and_wait() - self.ssh_command('some_command_to_be_run_in_the_guest') - -Please refer to tests that use ``avocado_qemu.LinuxTest`` under -``tests/avocado`` for more examples. - -QEMUMachine -~~~~~~~~~~~ - -The QEMUMachine API is already widely used in the Python iotests, -device-crash-test and other Python scripts. It's a wrapper around the -execution of a QEMU binary, giving its users: - - * the ability to set command line arguments to be given to the QEMU - binary - - * a ready to use QMP connection and interface, which can be used to - send commands and inspect its results, as well as asynchronous - events - - * convenience methods to set commonly used command line arguments in - a more succinct and intuitive way - -QEMU binary selection -^^^^^^^^^^^^^^^^^^^^^ - -The QEMU binary used for the ``self.vm`` QEMUMachine instance will -primarily depend on the value of the ``qemu_bin`` parameter. If it's -not explicitly set, its default value will be the result of a dynamic -probe in the same source tree. A suitable binary will be one that -targets the architecture matching host machine. - -Based on this description, test writers will usually rely on one of -the following approaches: - -1) Set ``qemu_bin``, and use the given binary - -2) Do not set ``qemu_bin``, and use a QEMU binary named like - "qemu-system-${arch}", either in the current - working directory, or in the current source tree. - -The resulting ``qemu_bin`` value will be preserved in the -``avocado_qemu.Test`` as an attribute with the same name. - -Attribute reference -~~~~~~~~~~~~~~~~~~~ - -Test -^^^^ - -Besides the attributes and methods that are part of the base -``avocado.Test`` class, the following attributes are available on any -``avocado_qemu.Test`` instance. - -vm -'' - -A QEMUMachine instance, initially configured according to the given -``qemu_bin`` parameter. - -arch -'''' - -The architecture can be used on different levels of the stack, e.g. by -the framework or by the test itself. At the framework level, it will -currently influence the selection of a QEMU binary (when one is not -explicitly given). - -Tests are also free to use this attribute value, for their own needs. -A test may, for instance, use the same value when selecting the -architecture of a kernel or disk image to boot a VM with. - -The ``arch`` attribute will be set to the test parameter of the same -name. If one is not given explicitly, it will either be set to -``None``, or, if the test is tagged with one (and only one) -``:avocado: tags=arch:VALUE`` tag, it will be set to ``VALUE``. - -cpu -''' - -The cpu model that will be set to all QEMUMachine instances created -by the test. - -The ``cpu`` attribute will be set to the test parameter of the same -name. If one is not given explicitly, it will either be set to -``None ``, or, if the test is tagged with one (and only one) -``:avocado: tags=cpu:VALUE`` tag, it will be set to ``VALUE``. - -machine -''''''' - -The machine type that will be set to all QEMUMachine instances created -by the test. - -The ``machine`` attribute will be set to the test parameter of the same -name. If one is not given explicitly, it will either be set to -``None``, or, if the test is tagged with one (and only one) -``:avocado: tags=machine:VALUE`` tag, it will be set to ``VALUE``. - -qemu_bin -'''''''' - -The preserved value of the ``qemu_bin`` parameter or the result of the -dynamic probe for a QEMU binary in the current working directory or -source tree. - -LinuxTest -^^^^^^^^^ - -Besides the attributes present on the ``avocado_qemu.Test`` base -class, the ``avocado_qemu.LinuxTest`` adds the following attributes: - -distro -'''''' - -The name of the Linux distribution used as the guest image for the -test. The name should match the **Provider** column on the list -of images supported by the avocado.utils.vmimage library: - -https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images - -distro_version -'''''''''''''' - -The version of the Linux distribution as the guest image for the -test. The name should match the **Version** column on the list -of images supported by the avocado.utils.vmimage library: - -https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images - -distro_checksum -''''''''''''''' - -The sha256 hash of the guest image file used for the test. - -If this value is not set in the code or by a test parameter (with the -same name), no validation on the integrity of the image will be -performed. - -Parameter reference -~~~~~~~~~~~~~~~~~~~ - -To understand how Avocado parameters are accessed by tests, and how -they can be passed to tests, please refer to:: - - https://avocado-framework.readthedocs.io/en/latest/guides/writer/chapters/writing.html#accessing-test-parameters - -Parameter values can be easily seen in the log files, and will look -like the following: - -.. code:: - - PARAMS (key=qemu_bin, path=*, default=./qemu-system-x86_64) => './qemu-system-x86_64 - -Test -^^^^ - -arch -'''' - -The architecture that will influence the selection of a QEMU binary -(when one is not explicitly given). - -Tests are also free to use this parameter value, for their own needs. -A test may, for instance, use the same value when selecting the -architecture of a kernel or disk image to boot a VM with. - -This parameter has a direct relation with the ``arch`` attribute. If -not given, it will default to None. - -cpu -''' - -The cpu model that will be set to all QEMUMachine instances created -by the test. - -machine -''''''' - -The machine type that will be set to all QEMUMachine instances created -by the test. - -qemu_bin -'''''''' - -The exact QEMU binary to be used on QEMUMachine. - -LinuxTest -^^^^^^^^^ - -Besides the parameters present on the ``avocado_qemu.Test`` base -class, the ``avocado_qemu.LinuxTest`` adds the following parameters: - -distro -'''''' - -The name of the Linux distribution used as the guest image for the -test. The name should match the **Provider** column on the list -of images supported by the avocado.utils.vmimage library: - -https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images - -distro_version -'''''''''''''' - -The version of the Linux distribution as the guest image for the -test. The name should match the **Version** column on the list -of images supported by the avocado.utils.vmimage library: - -https://avocado-framework.readthedocs.io/en/latest/guides/writer/libs/vmimage.html#supported-images - -distro_checksum -''''''''''''''' - -The sha256 hash of the guest image file used for the test. - -If this value is not set in the code or by this parameter no -validation on the integrity of the image will be performed. - -Skipping tests -~~~~~~~~~~~~~~ - -The Avocado framework provides Python decorators which allow for easily skip -tests running under certain conditions. For example, on the lack of a binary -on the test system or when the running environment is a CI system. For further -information about those decorators, please refer to:: - - https://avocado-framework.readthedocs.io/en/latest/guides/writer/chapters/writing.html#skipping-tests - -While the conditions for skipping tests are often specifics of each one, there -are recurring scenarios identified by the QEMU developers and the use of -environment variables became a kind of standard way to enable/disable tests. - -Here is a list of the most used variables: - -AVOCADO_ALLOW_LARGE_STORAGE -^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Tests which are going to fetch or produce assets considered *large* are not -going to run unless that ``AVOCADO_ALLOW_LARGE_STORAGE=1`` is exported on -the environment. - -The definition of *large* is a bit arbitrary here, but it usually means an -asset which occupies at least 1GB of size on disk when uncompressed. - -SPEED -^^^^^ -Tests which have a long runtime will not be run unless ``SPEED=slow`` is -exported on the environment. - -The definition of *long* is a bit arbitrary here, and it depends on the -usefulness of the test too. A unique test is worth spending more time on, -small variations on existing tests perhaps less so. As a rough guide, -a test or set of similar tests which take more than 100 seconds to -complete. - -AVOCADO_ALLOW_UNTRUSTED_CODE -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -There are tests which will boot a kernel image or firmware that can be -considered not safe to run on the developer's workstation, thus they are -skipped by default. The definition of *not safe* is also arbitrary but -usually it means a blob which either its source or build process aren't -public available. - -You should export ``AVOCADO_ALLOW_UNTRUSTED_CODE=1`` on the environment in -order to allow tests which make use of those kind of assets. - -AVOCADO_TIMEOUT_EXPECTED -^^^^^^^^^^^^^^^^^^^^^^^^ -The Avocado framework has a timeout mechanism which interrupts tests to avoid the -test suite of getting stuck. The timeout value can be set via test parameter or -property defined in the test class, for further details:: - - https://avocado-framework.readthedocs.io/en/latest/guides/writer/chapters/writing.html#setting-a-test-timeout - -Even though the timeout can be set by the test developer, there are some tests -that may not have a well-defined limit of time to finish under certain -conditions. For example, tests that take longer to execute when QEMU is -compiled with debug flags. Therefore, the ``AVOCADO_TIMEOUT_EXPECTED`` variable -has been used to determine whether those tests should run or not. - -QEMU_TEST_FLAKY_TESTS -^^^^^^^^^^^^^^^^^^^^^ -Some tests are not working reliably and thus are disabled by default. -This includes tests that don't run reliably on GitLab's CI which -usually expose real issues that are rarely seen on developer machines -due to the constraints of the CI environment. If you encounter a -similar situation then raise a bug and then mark the test as shown on -the code snippet below: - -.. code:: +Functional tests using Python +----------------------------- - # See https://gitlab.com/qemu-project/qemu/-/issues/nnnn - @skipUnless(os.getenv('QEMU_TEST_FLAKY_TESTS'), 'Test is unstable on GitLab') - def test(self): - do_something() +A functional test focuses on the functional requirement of the software, +attempting to find errors like incorrect functions, interface errors, +behavior errors, and initialization and termination errors [3]_. -You can also add ``:avocado: tags=flaky`` to the test meta-data so -only the flaky tests can be run as a group: +The ``tests/functional`` directory hosts functional tests written in +Python. You can run the functional tests simply by executing: .. code:: - env QEMU_TEST_FLAKY_TESTS=1 ./pyvenv/bin/avocado \ - run tests/avocado -filter-by-tags=flaky - -Tests should not live in this state forever and should either be fixed -or eventually removed. - - -Uninstalling Avocado -~~~~~~~~~~~~~~~~~~~~ - -If you've followed the manual installation instructions above, you can -easily uninstall Avocado. Start by listing the packages you have -installed:: + make check-functional - pip list --user - -And remove any package you want with:: - - pip uninstall <package_name> - -If you've used ``make check-avocado``, the Python virtual environment where -Avocado is installed will be cleaned up as part of ``make check-clean``. +See :ref:`checkfunctional-ref` for more details. .. _checktcg-ref: @@ -1475,6 +968,19 @@ And run with:: Adding ``V=1`` to the invocation will show the details of how to invoke QEMU for the test which is useful for debugging tests. +Running individual tests +~~~~~~~~~~~~~~~~~~~~~~~~ + +Tests can also be run directly from the test build directory. If you +run ``make help`` from the test build directory you will get a list of +all the tests that can be run. Please note that same binaries are used +in multiple tests, for example:: + + make run-plugin-test-mmap-with-libinline.so + +will run the mmap test with the ``libinline.so`` TCG plugin. The +gdbstub tests also re-use the test binaries but while exercising gdb. + TCG test dependencies ~~~~~~~~~~~~~~~~~~~~~ @@ -1527,3 +1033,27 @@ coverage-html`` which will create Further analysis can be conducted by running the ``gcov`` command directly on the various .gcda output files. Please read the ``gcov`` documentation for more information. + +Flaky tests +----------- + +A flaky test is defined as a test that exhibits both a passing and a failing +result with the same code on different runs. Some usual reasons for an +intermittent/flaky test are async wait, concurrency, and test order dependency +[4]_. + +In QEMU, tests that are identified to be flaky are normally disabled by +default. Set the QEMU_TEST_FLAKY_TESTS environment variable before running +the tests to enable them. + +References +---------- + +.. [1] Sommerville, Ian (2016). Software Engineering. p. 233. +.. [2] Pressman, Roger S. & Maxim, Bruce R. (2020). Software Engineering, + A Practitioner’s Approach. p. 48, 376, 378, 381. +.. [3] Pressman, Roger S. & Maxim, Bruce R. (2020). Software Engineering, + A Practitioner’s Approach. p. 388. +.. [4] Luo, Qingzhou, et al. An empirical analysis of flaky tests. + Proceedings of the 22nd ACM SIGSOFT International Symposium on + Foundations of Software Engineering. 2014. diff --git a/docs/devel/qgraph.rst b/docs/devel/testing/qgraph.rst index 43342d9..43342d9 100644 --- a/docs/devel/qgraph.rst +++ b/docs/devel/testing/qgraph.rst diff --git a/docs/devel/qtest.rst b/docs/devel/testing/qtest.rst index c5b8546..73ef770 100644 --- a/docs/devel/qtest.rst +++ b/docs/devel/testing/qtest.rst @@ -1,3 +1,5 @@ +.. _qtest: + ======================================== QTest Device Emulation Testing Framework ======================================== diff --git a/docs/devel/uefi-vars.rst b/docs/devel/uefi-vars.rst new file mode 100644 index 0000000..0151a26 --- /dev/null +++ b/docs/devel/uefi-vars.rst @@ -0,0 +1,68 @@ +============== +UEFI variables +============== + +Guest UEFI variable management +============================== + +The traditional approach for UEFI Variable storage in qemu guests is +to work as close as possible to physical hardware. That means +providing pflash as storage and leaving the management of variables +and flash to the guest. + +Secure boot support comes with the requirement that the UEFI variable +storage must be protected against direct access by the OS. All update +requests must pass the sanity checks. (Parts of) the firmware must +run with a higher privilege level than the OS so this can be enforced +by the firmware. On x86 this has been implemented using System +Management Mode (SMM) in qemu and kvm, which again is the same +approach taken by physical hardware. Only privileged code running in +SMM mode is allowed to access flash storage. + +Communication with the firmware code running in SMM mode works by +serializing the requests to a shared buffer, then trapping into SMM +mode via SMI. The SMM code processes the request, stores the reply in +the same buffer and returns. + +Host UEFI variable service +========================== + +Instead of running the privileged code inside the guest we can run it +on the host. The serialization protocol can be reused. The +communication with the host uses a virtual device, which essentially +configures the shared buffer location and size, and traps to the host +to process the requests. + +The ``uefi-vars`` device implements the UEFI virtual device. It comes +in ``uefi-vars-x86`` and ``uefi-vars-sysbus`` flavours. The device +reimplements the handlers needed, specifically +``EfiSmmVariableProtocol`` and ``VarCheckPolicyLibMmiHandler``. It +also consumes events (``EfiEndOfDxeEventGroup``, +``EfiEventReadyToBoot`` and ``EfiEventExitBootServices``). + +The advantage of the approach is that we do not need a special +privilege level for the firmware to protect itself, i.e. it does not +depend on SMM emulation on x64, which allows the removal of a bunch of +complex code for SMM emulation from the linux kernel +(CONFIG_KVM_SMM=n). It also allows support for secure boot on arm +without implementing secure world (el3) emulation in kvm. + +Of course there are also downsides. The added device increases the +attack surface of the host, and we are adding some code duplication +because we have to reimplement some edk2 functionality in qemu. + +usage on x86_64 +--------------- + +.. code:: + + qemu-system-x86_64 \ + -device uefi-vars-x86,jsonfile=/path/to/vars.json + +usage on aarch64 +---------------- + +.. code:: + + qemu-system-aarch64 -M virt \ + -device uefi-vars-sysbus,jsonfile=/path/to/vars.json diff --git a/docs/devel/virtio-backends.rst b/docs/devel/virtio-backends.rst index 9ff092e..ebddc3b 100644 --- a/docs/devel/virtio-backends.rst +++ b/docs/devel/virtio-backends.rst @@ -101,13 +101,12 @@ manually instantiated: VirtIOBlock vdev; }; - static Property virtio_blk_pci_properties[] = { + static const Property virtio_blk_pci_properties[] = { DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0), DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags, VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true), DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, DEV_NVECTORS_UNSPECIFIED), - DEFINE_PROP_END_OF_LIST(), }; static void virtio_blk_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp) @@ -120,7 +119,7 @@ manually instantiated: qdev_realize(vdev, BUS(&vpci_dev->bus), errp); } - static void virtio_blk_pci_class_init(ObjectClass *klass, void *data) + static void virtio_blk_pci_class_init(ObjectClass *klass, const void *data) { DeviceClass *dc = DEVICE_CLASS(klass); VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass); |