aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorPeter Maydell <peter.maydell@linaro.org>2024-10-15 15:18:22 +0100
committerPeter Maydell <peter.maydell@linaro.org>2024-10-15 15:18:22 +0100
commitf774a677507966222624a9b2859f06ede7608100 (patch)
tree0d6a7f482982a20ef8609b798e15d5974cd2db85 /docs
parentc155d13167c6ace099e351e28125f9eb3694ae27 (diff)
parentf160a4f8d0ef322377db3519c0aa088ccd99edf1 (diff)
downloadqemu-master.zip
qemu-master.tar.gz
qemu-master.tar.bz2
Merge tag 'pull-target-arm-20241015-1' of https://git.linaro.org/people/pmaydell/qemu-arm into stagingHEADmaster
target-arm queue: * hw/arm/omap1: Remove unused omap_uwire_attach() method * stm32f405: Add RCC device to stm32f405 SoC * arm/gicv3: add missing casts * hw/misc: Create STM32L4x5 SYSCFG clock * hw/arm: Add SPI to Allwinner A10 * hw/intc/omap_intc: Remove now-unnecessary abstract base class * hw/char/pl011: Use correct masks for IBRD and FBRD * docs/devel: Convert txt files to rST * Remove MAX111X, MAX7310, DSCM-1XXXX, pcmcia devices (used only by now-removed omap/pxa2xx boards) * vl.c: Remove pxa2xx-specific -portrait and -rotate options * dma: Fix function names in documentation * hw/arm/xilinx_zynq: Add various missing unimplemented devices # -----BEGIN PGP SIGNATURE----- # # iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmcOeWEZHHBldGVyLm1h # eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3jCMD/482mpT1s+mrEJFWSJJXU4G # 8kr4Zj6+NafbayJ0vHTkpSbkEbPxuvDiUqmlnbI+3o11i+Z3IyiaGZbba7dyNnKl # 02MdQavL0dB+eMrcFNofRRvwvsposuj2ixgwTQe6L32HSFdHerVVwuhHM/wfwyCh # DKt7gPRovD/7CtwDOSpyW7cK64WK1IUlE8VEsbFdQbCPkopm55LQ2sLT4TshadpG # A6xcxyLN0x/lHgCmvijB1T09LSc1nQpUEQNIokC4f1Rmy6HNgGDYY1G7GAJf99mT # nWhATuuhZThiYfRbN5KQoS9tGEUduxtkGhHiOgpdXpgc3cS7RusCHoqAnibpsVh3 # TgAkaRAX1d/jQ2KYR2h2jI3nh66ObhrFRT3dkzRZrIvmK9zeWUKmS9lzZ94aVfPH # +MtBPwsO5OhzEABs8WpMY9V1nYaYDsFATMc1akUSaSLn1Er9Uz66NIk+J4Lob4P0 # 78IPvTmwvAIITiqQvkISsc37n5a2/toeaffU2hPKtQLlhyilWynEZA5YItrXSTuk # gYIBxyZSbzGj/ofZ9T9C0GDLbhJp9ksNIpIqRUiHOH3z9b85r7HVZORp+COw/ZXR # UGak6rpJ+XVOxVL/cPRTvZB0RbUHIZh7WLNH2G7Tfv4E4llqL81iuImHXVh/2CXO # 9GWr9qbDLDYQ+BI7ipLAYg== # =n2CA # -----END PGP SIGNATURE----- # gpg: Signature made Tue 15 Oct 2024 15:17:05 BST # gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE # gpg: issuer "peter.maydell@linaro.org" # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@gmail.com>" [ultimate] # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [ultimate] # gpg: aka "Peter Maydell <peter@archaic.org.uk>" [ultimate] # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * tag 'pull-target-arm-20241015-1' of https://git.linaro.org/people/pmaydell/qemu-arm: (28 commits) hw/arm/xilinx_zynq: Add various missing unimplemented devices dma: Fix function names in documentation vl.c: Remove pxa2xx-specific -portrait and -rotate options hw/block: Remove ecc hw: Remove PCMCIA subsystem hw/ide: Remove DSCM-1XXXX microdrive device model hw/gpio: Remove MAX7310 device hw/adc: Remove MAX111X device docs/devel/lockcnt: Include kernel-doc API documentation include: Move QemuLockCnt APIs to their own header docs/devel/rcu: Convert to rST format docs/devel/multiple-iothreads: Convert to rST format docs/devel/lockcnt: Convert to rST format docs/devel/blkverify: Convert to rST format docs/devel/blkdebug: Convert to rST format hw/char/pl011: Use correct masks for IBRD and FBRD hw/intc/omap_intc: Remove now-unnecessary abstract base class hw/arm: Add SPI to Allwinner A10 hw/ssi: Allwinner A10 SPI emulation tests/qtest: Check STM32L4x5 clock connections ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Diffstat (limited to 'docs')
-rw-r--r--docs/about/removed-features.rst23
-rw-r--r--docs/devel/blkdebug.txt162
-rw-r--r--docs/devel/clocks.rst6
-rw-r--r--docs/devel/index-api.rst1
-rw-r--r--docs/devel/index-internals.rst2
-rw-r--r--docs/devel/lockcnt.rst (renamed from docs/devel/lockcnt.txt)89
-rw-r--r--docs/devel/multiple-iothreads.rst139
-rw-r--r--docs/devel/multiple-iothreads.txt130
-rw-r--r--docs/devel/rcu.rst (renamed from docs/devel/rcu.txt)172
-rw-r--r--docs/devel/testing/blkdebug.rst177
-rw-r--r--docs/devel/testing/blkverify.rst (renamed from docs/devel/blkverify.txt)30
-rw-r--r--docs/devel/testing/index.rst2
-rw-r--r--docs/system/arm/cubieboard.rst1
-rw-r--r--docs/system/arm/stm32.rst3
14 files changed, 495 insertions, 442 deletions
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index c76b91a..912e0a1 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -532,6 +532,29 @@ security model option, or switch to ``virtiofs``. The virtiofs daemon
``virtiofsd`` uses vhost to eliminate the high latency costs of the 9p
``proxy`` backend.
+``-portrait`` and ``-rotate`` (since 9.2)
+'''''''''''''''''''''''''''''''''''''''''
+
+The ``-portrait`` and ``-rotate`` options were documented as only
+working with the PXA LCD device, and all the machine types using
+that display device were removed in 9.2, so these options also
+have been dropped.
+
+These options were intended to simulate a mobile device being
+rotated by the user, and had three effects:
+
+* the display output was rotated by 90, 180 or 270 degrees
+* the mouse/trackpad input was rotated the opposite way
+* the machine model would signal to the guest about its
+ orientation
+
+Of these three things, the input-rotation was coded without being
+restricted to boards which supported the full set of device-rotation
+handling, so in theory the options were usable on other machine models
+to produce an odd effect (rotating input but not display output). But
+this was never intended or documented behaviour, so we have dropped
+the options along with the machine models they were intended for.
+
User-mode emulator command line arguments
-----------------------------------------
diff --git a/docs/devel/blkdebug.txt b/docs/devel/blkdebug.txt
deleted file mode 100644
index 0b0c128..0000000
--- a/docs/devel/blkdebug.txt
+++ /dev/null
@@ -1,162 +0,0 @@
-Block I/O error injection using blkdebug
-----------------------------------------
-Copyright (C) 2014-2015 Red Hat Inc
-
-This work is licensed under the terms of the GNU GPL, version 2 or later. See
-the COPYING file in the top-level directory.
-
-The blkdebug block driver is a rule-based error injection engine. It can be
-used to exercise error code paths in block drivers including ENOSPC (out of
-space) and EIO.
-
-This document gives an overview of the features available in blkdebug.
-
-Background
-----------
-Block drivers have many error code paths that handle I/O errors. Image formats
-are especially complex since metadata I/O errors during cluster allocation or
-while updating tables happen halfway through request processing and require
-discipline to keep image files consistent.
-
-Error injection allows test cases to trigger I/O errors at specific points.
-This way, all error paths can be tested to make sure they are correct.
-
-Rules
------
-The blkdebug block driver takes a list of "rules" that tell the error injection
-engine when to fail an I/O request.
-
-Each I/O request is evaluated against the rules. If a rule matches the request
-then its "action" is executed.
-
-Rules can be placed in a configuration file; the configuration file
-follows the same .ini-like format used by QEMU's -readconfig option, and
-each section of the file represents a rule.
-
-The following configuration file defines a single rule:
-
- $ cat blkdebug.conf
- [inject-error]
- event = "read_aio"
- errno = "28"
-
-This rule fails all aio read requests with ENOSPC (28). Note that the errno
-value depends on the host. On Linux, see
-/usr/include/asm-generic/errno-base.h for errno values.
-
-Invoke QEMU as follows:
-
- $ qemu-system-x86_64
- -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
- -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
-
-Rules support the following attributes:
-
- event - which type of operation to match (e.g. read_aio, write_aio,
- flush_to_os, flush_to_disk). See the "Events" section for
- information on events.
-
- state - (optional) the engine must be in this state number in order for this
- rule to match. See the "State transitions" section for information
- on states.
-
- errno - the numeric errno value to return when a request matches this rule.
- The errno values depend on the host since the numeric values are not
- standardized in the POSIX specification.
-
- sector - (optional) a sector number that the request must overlap in order to
- match this rule
-
- once - (optional, default "off") only execute this action on the first
- matching request
-
- immediately - (optional, default "off") return a NULL BlockAIOCB
- pointer and fail without an errno instead. This
- exercises the code path where BlockAIOCB fails and the
- caller's BlockCompletionFunc is not invoked.
-
-Events
-------
-Block drivers provide information about the type of I/O request they are about
-to make so rules can match specific types of requests. For example, the qcow2
-block driver tells blkdebug when it accesses the L1 table so rules can match
-only L1 table accesses and not other metadata or guest data requests.
-
-The core events are:
-
- read_aio - guest data read
-
- write_aio - guest data write
-
- flush_to_os - write out unwritten block driver state (e.g. cached metadata)
-
- flush_to_disk - flush the host block device's disk cache
-
-See qapi/block-core.json:BlkdebugEvent for the full list of events.
-You may need to grep block driver source code to understand the
-meaning of specific events.
-
-State transitions
------------------
-There are cases where more power is needed to match a particular I/O request in
-a longer sequence of requests. For example:
-
- write_aio
- flush_to_disk
- write_aio
-
-How do we match the 2nd write_aio but not the first? This is where state
-transitions come in.
-
-The error injection engine has an integer called the "state" that always starts
-initialized to 1. The state integer is internal to blkdebug and cannot be
-observed from outside but rules can interact with it for powerful matching
-behavior.
-
-Rules can be conditional on the current state and they can transition to a new
-state.
-
-When a rule's "state" attribute is non-zero then the current state must equal
-the attribute in order for the rule to match.
-
-For example, to match the 2nd write_aio:
-
- [set-state]
- event = "write_aio"
- state = "1"
- new_state = "2"
-
- [inject-error]
- event = "write_aio"
- state = "2"
- errno = "5"
-
-The first write_aio request matches the set-state rule and transitions from
-state 1 to state 2. Once state 2 has been entered, the set-state rule no
-longer matches since it requires state 1. But the inject-error rule now
-matches the next write_aio request and injects EIO (5).
-
-State transition rules support the following attributes:
-
- event - which type of operation to match (e.g. read_aio, write_aio,
- flush_to_os, flush_to_disk). See the "Events" section for
- information on events.
-
- state - (optional) the engine must be in this state number in order for this
- rule to match
-
- new_state - transition to this state number
-
-Suspend and resume
-------------------
-Exercising code paths in block drivers may require specific ordering amongst
-concurrent requests. The "breakpoint" feature allows requests to be halted on
-a blkdebug event and resumed later. This makes it possible to achieve
-deterministic ordering when multiple requests are in flight.
-
-Breakpoints on blkdebug events are associated with a user-defined "tag" string.
-This tag serves as an identifier by which the request can be resumed at a later
-point.
-
-See the qemu-io(1) break, resume, remove_break, and wait_break commands for
-details.
diff --git a/docs/devel/clocks.rst b/docs/devel/clocks.rst
index 177ee1c..3f744f2 100644
--- a/docs/devel/clocks.rst
+++ b/docs/devel/clocks.rst
@@ -358,6 +358,12 @@ humans (for instance in debugging), use ``clock_display_freq()``,
which returns a prettified string-representation, e.g. "33.3 MHz".
The caller must free the string with g_free() after use.
+It's also possible to retrieve the clock period from a QTest by
+accessing QOM property ``qtest-clock-period`` using a QMP command.
+This property is only present when the device is being run under
+the ``qtest`` accelerator; it is not available when QEMU is
+being run normally.
+
Calculating expiry deadlines
----------------------------
diff --git a/docs/devel/index-api.rst b/docs/devel/index-api.rst
index fe01b2b..1c487c1 100644
--- a/docs/devel/index-api.rst
+++ b/docs/devel/index-api.rst
@@ -9,6 +9,7 @@ generated from in-code annotations to function prototypes.
bitops
loads-stores
+ lockcnt
memory
modules
pci
diff --git a/docs/devel/index-internals.rst b/docs/devel/index-internals.rst
index 4ac7725..ab9fbc4 100644
--- a/docs/devel/index-internals.rst
+++ b/docs/devel/index-internals.rst
@@ -8,6 +8,7 @@ Details about QEMU's various subsystems including how to add features to them.
qom
atomics
+ rcu
block-coroutine-wrapper
clocks
ebpf_rss
@@ -21,3 +22,4 @@ Details about QEMU's various subsystems including how to add features to them.
writing-monitor-commands
virtio-backends
crypto
+ multiple-iothreads
diff --git a/docs/devel/lockcnt.txt b/docs/devel/lockcnt.rst
index a3fb3bc..8b43578 100644
--- a/docs/devel/lockcnt.txt
+++ b/docs/devel/lockcnt.rst
@@ -1,9 +1,9 @@
-DOCUMENTATION FOR LOCKED COUNTERS (aka QemuLockCnt)
-===================================================
+Locked Counters (aka ``QemuLockCnt``)
+=====================================
QEMU often uses reference counts to track data structures that are being
accessed and should not be freed. For example, a loop that invoke
-callbacks like this is not safe:
+callbacks like this is not safe::
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
if (ioh->revents & G_IO_OUT) {
@@ -11,11 +11,11 @@ callbacks like this is not safe:
}
}
-QLIST_FOREACH_SAFE protects against deletion of the current node (ioh)
-by stashing away its "next" pointer. However, ioh->fd_write could
+``QLIST_FOREACH_SAFE`` protects against deletion of the current node (``ioh``)
+by stashing away its ``next`` pointer. However, ``ioh->fd_write`` could
actually delete the next node from the list. The simplest way to
avoid this is to mark the node as deleted, and remove it from the
-list in the above loop:
+list in the above loop::
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
if (ioh->deleted) {
@@ -29,7 +29,7 @@ list in the above loop:
}
If however this loop must also be reentrant, i.e. it is possible that
-ioh->fd_write invokes the loop again, some kind of counting is needed:
+``ioh->fd_write`` invokes the loop again, some kind of counting is needed::
walking_handlers++;
QLIST_FOREACH_SAFE(ioh, &io_handlers, next, pioh) {
@@ -46,8 +46,8 @@ ioh->fd_write invokes the loop again, some kind of counting is needed:
}
walking_handlers--;
-One may think of using the RCU primitives, rcu_read_lock() and
-rcu_read_unlock(); effectively, the RCU nesting count would take
+One may think of using the RCU primitives, ``rcu_read_lock()`` and
+``rcu_read_unlock()``; effectively, the RCU nesting count would take
the place of the walking_handlers global variable. Indeed,
reference counting and RCU have similar purposes, but their usage in
general is complementary:
@@ -70,14 +70,14 @@ general is complementary:
this can improve performance, but also delay reclamation undesirably.
With reference counting, reclamation is deterministic.
-This file documents QemuLockCnt, an abstraction for using reference
+This file documents ``QemuLockCnt``, an abstraction for using reference
counting in code that has to be both thread-safe and reentrant.
-QemuLockCnt concepts
---------------------
+``QemuLockCnt`` concepts
+------------------------
-A QemuLockCnt comprises both a counter and a mutex; it has primitives
+A ``QemuLockCnt`` comprises both a counter and a mutex; it has primitives
to increment and decrement the counter, and to take and release the
mutex. The counter notes how many visits to the data structures are
taking place (the visits could be from different threads, or there could
@@ -95,13 +95,14 @@ not just frees, though there could be cases where this is not necessary.
Reads, instead, can be done without taking the mutex, as long as the
readers and writers use the same macros that are used for RCU, for
-example qatomic_rcu_read, qatomic_rcu_set, QLIST_FOREACH_RCU, etc. This is
-because the reads are done outside a lock and a set or QLIST_INSERT_HEAD
+example ``qatomic_rcu_read``, ``qatomic_rcu_set``, ``QLIST_FOREACH_RCU``,
+etc. This is because the reads are done outside a lock and a set
+or ``QLIST_INSERT_HEAD``
can happen concurrently with the read. The RCU API ensures that the
processor and the compiler see all required memory barriers.
This could be implemented simply by protecting the counter with the
-mutex, for example:
+mutex, for example::
// (1)
qemu_mutex_lock(&walking_handlers_mutex);
@@ -125,33 +126,33 @@ mutex, for example:
Here, no frees can happen in the code represented by the ellipsis.
If another thread is executing critical section (2), that part of
the code cannot be entered, because the thread will not be able
-to increment the walking_handlers variable. And of course
+to increment the ``walking_handlers`` variable. And of course
during the visit any other thread will see a nonzero value for
-walking_handlers, as in the single-threaded code.
+``walking_handlers``, as in the single-threaded code.
Note that it is possible for multiple concurrent accesses to delay
-the cleanup arbitrarily; in other words, for the walking_handlers
+the cleanup arbitrarily; in other words, for the ``walking_handlers``
counter to never become zero. For this reason, this technique is
more easily applicable if concurrent access to the structure is rare.
However, critical sections are easy to forget since you have to do
-them for each modification of the counter. QemuLockCnt ensures that
+them for each modification of the counter. ``QemuLockCnt`` ensures that
all modifications of the counter take the lock appropriately, and it
can also be more efficient in two ways:
- it avoids taking the lock for many operations (for example
incrementing the counter while it is non-zero);
-- on some platforms, one can implement QemuLockCnt to hold the lock
+- on some platforms, one can implement ``QemuLockCnt`` to hold the lock
and the mutex in a single word, making the fast path no more expensive
than simply managing a counter using atomic operations (see
- docs/devel/atomics.rst). This can be very helpful if concurrent access to
+ :doc:`atomics`). This can be very helpful if concurrent access to
the data structure is expected to be rare.
Using the same mutex for frees and writes can still incur some small
inefficiencies; for example, a visit can never start if the counter is
-zero and the mutex is taken---even if the mutex is taken by a write,
+zero and the mutex is taken -- even if the mutex is taken by a write,
which in principle need not block a visit of the data structure.
However, these are usually not a problem if any of the following
assumptions are valid:
@@ -163,27 +164,27 @@ assumptions are valid:
- writes are frequent, but this kind of write (e.g. appending to a
list) has a very small critical section.
-For example, QEMU uses QemuLockCnt to manage an AioContext's list of
+For example, QEMU uses ``QemuLockCnt`` to manage an ``AioContext``'s list of
bottom halves and file descriptor handlers. Modifications to the list
of file descriptor handlers are rare. Creation of a new bottom half is
frequent and can happen on a fast path; however: 1) it is almost never
concurrent with a visit to the list of bottom halves; 2) it only has
-three instructions in the critical path, two assignments and a smp_wmb().
+three instructions in the critical path, two assignments and a ``smp_wmb()``.
-QemuLockCnt API
----------------
+``QemuLockCnt`` API
+-------------------
-The QemuLockCnt API is described in include/qemu/thread.h.
+.. kernel-doc:: include/qemu/lockcnt.h
-QemuLockCnt usage
------------------
+``QemuLockCnt`` usage
+---------------------
-This section explains the typical usage patterns for QemuLockCnt functions.
+This section explains the typical usage patterns for ``QemuLockCnt`` functions.
Setting a variable to a non-NULL value can be done between
-qemu_lockcnt_lock and qemu_lockcnt_unlock:
+``qemu_lockcnt_lock`` and ``qemu_lockcnt_unlock``::
qemu_lockcnt_lock(&xyz_lockcnt);
if (!xyz) {
@@ -193,8 +194,8 @@ qemu_lockcnt_lock and qemu_lockcnt_unlock:
}
qemu_lockcnt_unlock(&xyz_lockcnt);
-Accessing the value can be done between qemu_lockcnt_inc and
-qemu_lockcnt_dec:
+Accessing the value can be done between ``qemu_lockcnt_inc`` and
+``qemu_lockcnt_dec``::
qemu_lockcnt_inc(&xyz_lockcnt);
if (xyz) {
@@ -204,11 +205,11 @@ qemu_lockcnt_dec:
}
qemu_lockcnt_dec(&xyz_lockcnt);
-Freeing the object can similarly use qemu_lockcnt_lock and
-qemu_lockcnt_unlock, but you also need to ensure that the count
-is zero (i.e. there is no concurrent visit). Because qemu_lockcnt_inc
-takes the QemuLockCnt's lock, the count cannot become non-zero while
-the object is being freed. Freeing an object looks like this:
+Freeing the object can similarly use ``qemu_lockcnt_lock`` and
+``qemu_lockcnt_unlock``, but you also need to ensure that the count
+is zero (i.e. there is no concurrent visit). Because ``qemu_lockcnt_inc``
+takes the ``QemuLockCnt``'s lock, the count cannot become non-zero while
+the object is being freed. Freeing an object looks like this::
qemu_lockcnt_lock(&xyz_lockcnt);
if (!qemu_lockcnt_count(&xyz_lockcnt)) {
@@ -218,7 +219,7 @@ the object is being freed. Freeing an object looks like this:
qemu_lockcnt_unlock(&xyz_lockcnt);
If an object has to be freed right after a visit, you can combine
-the decrement, the locking and the check on count as follows:
+the decrement, the locking and the check on count as follows::
qemu_lockcnt_inc(&xyz_lockcnt);
if (xyz) {
@@ -232,7 +233,7 @@ the decrement, the locking and the check on count as follows:
qemu_lockcnt_unlock(&xyz_lockcnt);
}
-QemuLockCnt can also be used to access a list as follows:
+``QemuLockCnt`` can also be used to access a list as follows::
qemu_lockcnt_inc(&io_handlers_lockcnt);
QLIST_FOREACH_RCU(ioh, &io_handlers, pioh) {
@@ -252,10 +253,10 @@ QemuLockCnt can also be used to access a list as follows:
}
Again, the RCU primitives are used because new items can be added to the
-list during the walk. QLIST_FOREACH_RCU ensures that the processor and
+list during the walk. ``QLIST_FOREACH_RCU`` ensures that the processor and
the compiler see the appropriate memory barriers.
-An alternative pattern uses qemu_lockcnt_dec_if_lock:
+An alternative pattern uses ``qemu_lockcnt_dec_if_lock``::
qemu_lockcnt_inc(&io_handlers_lockcnt);
QLIST_FOREACH_SAFE_RCU(ioh, &io_handlers, next, pioh) {
@@ -273,5 +274,5 @@ An alternative pattern uses qemu_lockcnt_dec_if_lock:
}
qemu_lockcnt_dec(&io_handlers_lockcnt);
-Here you can use qemu_lockcnt_dec instead of qemu_lockcnt_dec_and_lock,
+Here you can use ``qemu_lockcnt_dec`` instead of ``qemu_lockcnt_dec_and_lock``,
because there is no special task to do if the count goes from 1 to 0.
diff --git a/docs/devel/multiple-iothreads.rst b/docs/devel/multiple-iothreads.rst
new file mode 100644
index 0000000..d1f3fc4
--- /dev/null
+++ b/docs/devel/multiple-iothreads.rst
@@ -0,0 +1,139 @@
+Using Multiple ``IOThread``\ s
+==============================
+
+..
+ Copyright (c) 2014-2017 Red Hat Inc.
+
+ This work is licensed under the terms of the GNU GPL, version 2 or later. See
+ the COPYING file in the top-level directory.
+
+
+This document explains the ``IOThread`` feature and how to write code that runs
+outside the BQL.
+
+The main loop and ``IOThread``\ s
+---------------------------------
+QEMU is an event-driven program that can do several things at once using an
+event loop. The VNC server and the QMP monitor are both processed from the
+same event loop, which monitors their file descriptors until they become
+readable and then invokes a callback.
+
+The default event loop is called the main loop (see ``main-loop.c``). It is
+possible to create additional event loop threads using
+``-object iothread,id=my-iothread``.
+
+Side note: The main loop and ``IOThread`` are both event loops but their code is
+not shared completely. Sometimes it is useful to remember that although they
+are conceptually similar they are currently not interchangeable.
+
+Why ``IOThread``\ s are useful
+------------------------------
+``IOThread``\ s allow the user to control the placement of work. The main loop is a
+scalability bottleneck on hosts with many CPUs. Work can be spread across
+several ``IOThread``\ s instead of just one main loop. When set up correctly this
+can improve I/O latency and reduce jitter seen by the guest.
+
+The main loop is also deeply associated with the BQL, which is a
+scalability bottleneck in itself. vCPU threads and the main loop use the BQL
+to serialize execution of QEMU code. This mutex is necessary because a lot of
+QEMU's code historically was not thread-safe.
+
+The fact that all I/O processing is done in a single main loop and that the
+BQL is contended by all vCPU threads and the main loop explain
+why it is desirable to place work into ``IOThread``\ s.
+
+The experimental ``virtio-blk`` data-plane implementation has been benchmarked and
+shows these effects:
+ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
+
+.. _how-to-program:
+
+How to program for ``IOThread``\ s
+----------------------------------
+The main difference between legacy code and new code that can run in an
+``IOThread`` is dealing explicitly with the event loop object, ``AioContext``
+(see ``include/block/aio.h``). Code that only works in the main loop
+implicitly uses the main loop's ``AioContext``. Code that supports running
+in ``IOThread``\ s must be aware of its ``AioContext``.
+
+AioContext supports the following services:
+ * File descriptor monitoring (read/write/error on POSIX hosts)
+ * Event notifiers (inter-thread signalling)
+ * Timers
+ * Bottom Halves (BH) deferred callbacks
+
+There are several old APIs that use the main loop AioContext:
+ * LEGACY ``qemu_aio_set_fd_handler()`` - monitor a file descriptor
+ * LEGACY ``qemu_aio_set_event_notifier()`` - monitor an event notifier
+ * LEGACY ``timer_new_ms()`` - create a timer
+ * LEGACY ``qemu_bh_new()`` - create a BH
+ * LEGACY ``qemu_bh_new_guarded()`` - create a BH with a device re-entrancy guard
+ * LEGACY ``qemu_aio_wait()`` - run an event loop iteration
+
+Since they implicitly work on the main loop they cannot be used in code that
+runs in an ``IOThread``. They might cause a crash or deadlock if called from an
+``IOThread`` since the BQL is not held.
+
+Instead, use the ``AioContext`` functions directly (see ``include/block/aio.h``):
+ * ``aio_set_fd_handler()`` - monitor a file descriptor
+ * ``aio_set_event_notifier()`` - monitor an event notifier
+ * ``aio_timer_new()`` - create a timer
+ * ``aio_bh_new()`` - create a BH
+ * ``aio_bh_new_guarded()`` - create a BH with a device re-entrancy guard
+ * ``aio_poll()`` - run an event loop iteration
+
+The ``qemu_bh_new_guarded``/``aio_bh_new_guarded`` APIs accept a
+``MemReentrancyGuard``
+argument, which is used to check for and prevent re-entrancy problems. For
+BHs associated with devices, the reentrancy-guard is contained in the
+corresponding ``DeviceState`` and named ``mem_reentrancy_guard``.
+
+The ``AioContext`` can be obtained from the ``IOThread`` using
+``iothread_get_aio_context()`` or for the main loop using
+``qemu_get_aio_context()``. Code that takes an ``AioContext`` argument
+works both in ``IOThread``\ s or the main loop, depending on which ``AioContext``
+instance the caller passes in.
+
+How to synchronize with an ``IOThread``
+---------------------------------------
+Variables that can be accessed by multiple threads require some form of
+synchronization such as ``qemu_mutex_lock()``, ``rcu_read_lock()``, etc.
+
+``AioContext`` functions like ``aio_set_fd_handler()``,
+``aio_set_event_notifier()``, ``aio_bh_new()``, and ``aio_timer_new()``
+are thread-safe. They can be used to trigger activity in an ``IOThread``.
+
+Side note: the best way to schedule a function call across threads is to call
+``aio_bh_schedule_oneshot()``.
+
+The main loop thread can wait synchronously for a condition using
+``AIO_WAIT_WHILE()``.
+
+``AioContext`` and the block layer
+----------------------------------
+The ``AioContext`` originates from the QEMU block layer, even though nowadays
+``AioContext`` is a generic event loop that can be used by any QEMU subsystem.
+
+The block layer has support for ``AioContext`` integrated. Each
+``BlockDriverState`` is associated with an ``AioContext`` using
+``bdrv_try_change_aio_context()`` and ``bdrv_get_aio_context()``.
+This allows block layer code to process I/O inside the
+right ``AioContext``. Other subsystems may wish to follow a similar approach.
+
+Block layer code must therefore expect to run in an ``IOThread`` and avoid using
+old APIs that implicitly use the main loop. See
+`How to program for IOThreads`_ for information on how to do that.
+
+Code running in the monitor typically needs to ensure that past
+requests from the guest are completed. When a block device is running
+in an ``IOThread``, the ``IOThread`` can also process requests from the guest
+(via ioeventfd). To achieve both objects, wrap the code between
+``bdrv_drained_begin()`` and ``bdrv_drained_end()``, thus creating a "drained
+section".
+
+Long-running jobs (usually in the form of coroutines) are often scheduled in
+the ``BlockDriverState``'s ``AioContext``. The functions
+``bdrv_add``/``remove_aio_context_notifier``, or alternatively
+``blk_add``/``remove_aio_context_notifier`` if you use ``BlockBackends``,
+can be used to get a notification whenever ``bdrv_try_change_aio_context()``
+moves a ``BlockDriverState`` to a different ``AioContext``.
diff --git a/docs/devel/multiple-iothreads.txt b/docs/devel/multiple-iothreads.txt
deleted file mode 100644
index de85767..0000000
--- a/docs/devel/multiple-iothreads.txt
+++ /dev/null
@@ -1,130 +0,0 @@
-Copyright (c) 2014-2017 Red Hat Inc.
-
-This work is licensed under the terms of the GNU GPL, version 2 or later. See
-the COPYING file in the top-level directory.
-
-
-This document explains the IOThread feature and how to write code that runs
-outside the BQL.
-
-The main loop and IOThreads
----------------------------
-QEMU is an event-driven program that can do several things at once using an
-event loop. The VNC server and the QMP monitor are both processed from the
-same event loop, which monitors their file descriptors until they become
-readable and then invokes a callback.
-
-The default event loop is called the main loop (see main-loop.c). It is
-possible to create additional event loop threads using -object
-iothread,id=my-iothread.
-
-Side note: The main loop and IOThread are both event loops but their code is
-not shared completely. Sometimes it is useful to remember that although they
-are conceptually similar they are currently not interchangeable.
-
-Why IOThreads are useful
-------------------------
-IOThreads allow the user to control the placement of work. The main loop is a
-scalability bottleneck on hosts with many CPUs. Work can be spread across
-several IOThreads instead of just one main loop. When set up correctly this
-can improve I/O latency and reduce jitter seen by the guest.
-
-The main loop is also deeply associated with the BQL, which is a
-scalability bottleneck in itself. vCPU threads and the main loop use the BQL
-to serialize execution of QEMU code. This mutex is necessary because a lot of
-QEMU's code historically was not thread-safe.
-
-The fact that all I/O processing is done in a single main loop and that the
-BQL is contended by all vCPU threads and the main loop explain
-why it is desirable to place work into IOThreads.
-
-The experimental virtio-blk data-plane implementation has been benchmarked and
-shows these effects:
-ftp://public.dhe.ibm.com/linux/pdfs/KVM_Virtualized_IO_Performance_Paper.pdf
-
-How to program for IOThreads
-----------------------------
-The main difference between legacy code and new code that can run in an
-IOThread is dealing explicitly with the event loop object, AioContext
-(see include/block/aio.h). Code that only works in the main loop
-implicitly uses the main loop's AioContext. Code that supports running
-in IOThreads must be aware of its AioContext.
-
-AioContext supports the following services:
- * File descriptor monitoring (read/write/error on POSIX hosts)
- * Event notifiers (inter-thread signalling)
- * Timers
- * Bottom Halves (BH) deferred callbacks
-
-There are several old APIs that use the main loop AioContext:
- * LEGACY qemu_aio_set_fd_handler() - monitor a file descriptor
- * LEGACY qemu_aio_set_event_notifier() - monitor an event notifier
- * LEGACY timer_new_ms() - create a timer
- * LEGACY qemu_bh_new() - create a BH
- * LEGACY qemu_bh_new_guarded() - create a BH with a device re-entrancy guard
- * LEGACY qemu_aio_wait() - run an event loop iteration
-
-Since they implicitly work on the main loop they cannot be used in code that
-runs in an IOThread. They might cause a crash or deadlock if called from an
-IOThread since the BQL is not held.
-
-Instead, use the AioContext functions directly (see include/block/aio.h):
- * aio_set_fd_handler() - monitor a file descriptor
- * aio_set_event_notifier() - monitor an event notifier
- * aio_timer_new() - create a timer
- * aio_bh_new() - create a BH
- * aio_bh_new_guarded() - create a BH with a device re-entrancy guard
- * aio_poll() - run an event loop iteration
-
-The qemu_bh_new_guarded/aio_bh_new_guarded APIs accept a "MemReentrancyGuard"
-argument, which is used to check for and prevent re-entrancy problems. For
-BHs associated with devices, the reentrancy-guard is contained in the
-corresponding DeviceState and named "mem_reentrancy_guard".
-
-The AioContext can be obtained from the IOThread using
-iothread_get_aio_context() or for the main loop using qemu_get_aio_context().
-Code that takes an AioContext argument works both in IOThreads or the main
-loop, depending on which AioContext instance the caller passes in.
-
-How to synchronize with an IOThread
------------------------------------
-Variables that can be accessed by multiple threads require some form of
-synchronization such as qemu_mutex_lock(), rcu_read_lock(), etc.
-
-AioContext functions like aio_set_fd_handler(), aio_set_event_notifier(),
-aio_bh_new(), and aio_timer_new() are thread-safe. They can be used to trigger
-activity in an IOThread.
-
-Side note: the best way to schedule a function call across threads is to call
-aio_bh_schedule_oneshot().
-
-The main loop thread can wait synchronously for a condition using
-AIO_WAIT_WHILE().
-
-AioContext and the block layer
-------------------------------
-The AioContext originates from the QEMU block layer, even though nowadays
-AioContext is a generic event loop that can be used by any QEMU subsystem.
-
-The block layer has support for AioContext integrated. Each BlockDriverState
-is associated with an AioContext using bdrv_try_change_aio_context() and
-bdrv_get_aio_context(). This allows block layer code to process I/O inside the
-right AioContext. Other subsystems may wish to follow a similar approach.
-
-Block layer code must therefore expect to run in an IOThread and avoid using
-old APIs that implicitly use the main loop. See the "How to program for
-IOThreads" above for information on how to do that.
-
-Code running in the monitor typically needs to ensure that past
-requests from the guest are completed. When a block device is running
-in an IOThread, the IOThread can also process requests from the guest
-(via ioeventfd). To achieve both objects, wrap the code between
-bdrv_drained_begin() and bdrv_drained_end(), thus creating a "drained
-section".
-
-Long-running jobs (usually in the form of coroutines) are often scheduled in
-the BlockDriverState's AioContext. The functions
-bdrv_add/remove_aio_context_notifier, or alternatively
-blk_add/remove_aio_context_notifier if you use BlockBackends, can be used to
-get a notification whenever bdrv_try_change_aio_context() moves a
-BlockDriverState to a different AioContext.
diff --git a/docs/devel/rcu.txt b/docs/devel/rcu.rst
index 2e6cc60..dd07c1d 100644
--- a/docs/devel/rcu.txt
+++ b/docs/devel/rcu.rst
@@ -20,7 +20,7 @@ for the execution of all *currently running* critical sections before
proceeding, or before asynchronously executing a callback.
The key point here is that only the currently running critical sections
-are waited for; critical sections that are started _after_ the beginning
+are waited for; critical sections that are started **after** the beginning
of the wait do not extend the wait, despite running concurrently with
the updater. This is the reason why RCU is more scalable than,
for example, reader-writer locks. It is so much more scalable that
@@ -37,7 +37,7 @@ do not matter; as soon as all previous critical sections have finished,
there cannot be any readers who hold references to the data structure,
and these can now be safely reclaimed (e.g., freed or unref'ed).
-Here is a picture:
+Here is a picture::
thread 1 thread 2 thread 3
------------------- ------------------------ -------------------
@@ -58,43 +58,38 @@ that critical section.
RCU API
-=======
+-------
The core RCU API is small:
- void rcu_read_lock(void);
-
+``void rcu_read_lock(void);``
Used by a reader to inform the reclaimer that the reader is
entering an RCU read-side critical section.
- void rcu_read_unlock(void);
-
+``void rcu_read_unlock(void);``
Used by a reader to inform the reclaimer that the reader is
exiting an RCU read-side critical section. Note that RCU
read-side critical sections may be nested and/or overlapping.
- void synchronize_rcu(void);
-
+``void synchronize_rcu(void);``
Blocks until all pre-existing RCU read-side critical sections
on all threads have completed. This marks the end of the removal
phase and the beginning of reclamation phase.
Note that it would be valid for another update to come while
- synchronize_rcu is running. Because of this, it is better that
+ ``synchronize_rcu`` is running. Because of this, it is better that
the updater releases any locks it may hold before calling
- synchronize_rcu. If this is not possible (for example, because
- the updater is protected by the BQL), you can use call_rcu.
+ ``synchronize_rcu``. If this is not possible (for example, because
+ the updater is protected by the BQL), you can use ``call_rcu``.
- void call_rcu1(struct rcu_head * head,
- void (*func)(struct rcu_head *head));
-
- This function invokes func(head) after all pre-existing RCU
+``void call_rcu1(struct rcu_head * head, void (*func)(struct rcu_head *head));``
+ This function invokes ``func(head)`` after all pre-existing RCU
read-side critical sections on all threads have completed. This
marks the end of the removal phase, with func taking care
asynchronously of the reclamation phase.
- The foo struct needs to have an rcu_head structure added,
- perhaps as follows:
+ The ``foo`` struct needs to have an ``rcu_head`` structure added,
+ perhaps as follows::
struct foo {
struct rcu_head rcu;
@@ -103,8 +98,8 @@ The core RCU API is small:
long c;
};
- so that the reclaimer function can fetch the struct foo address
- and free it:
+ so that the reclaimer function can fetch the ``struct foo`` address
+ and free it::
call_rcu1(&foo.rcu, foo_reclaim);
@@ -114,29 +109,27 @@ The core RCU API is small:
g_free(fp);
}
- For the common case where the rcu_head member is the first of the
- struct, you can use the following macro.
+ ``call_rcu1`` is typically used via either the ``call_rcu`` or
+ ``g_free_rcu`` macros, which handle the common case where the
+ ``rcu_head`` member is the first of the struct.
- void call_rcu(T *p,
- void (*func)(T *p),
- field-name);
- void g_free_rcu(T *p,
- field-name);
+``void call_rcu(T *p, void (*func)(T *p), field-name);``
+ If the ``struct rcu_head`` is the first field in the struct, you can
+ use this macro instead of ``call_rcu1``.
- call_rcu1 is typically used through these macro, in the common case
- where the "struct rcu_head" is the first field in the struct. If
- the callback function is g_free, in particular, g_free_rcu can be
- used. In the above case, one could have written simply:
+``void g_free_rcu(T *p, field-name);``
+ This is a special-case version of ``call_rcu`` where the callback
+ function is ``g_free``.
+ In the example given in ``call_rcu1``, one could have written simply::
g_free_rcu(&foo, rcu);
- typeof(*p) qatomic_rcu_read(p);
-
- qatomic_rcu_read() is similar to qatomic_load_acquire(), but it makes
- some assumptions on the code that calls it. This allows a more
- optimized implementation.
+``typeof(*p) qatomic_rcu_read(p);``
+ ``qatomic_rcu_read()`` is similar to ``qatomic_load_acquire()``, but
+ it makes some assumptions on the code that calls it. This allows a
+ more optimized implementation.
- qatomic_rcu_read assumes that whenever a single RCU critical
+ ``qatomic_rcu_read`` assumes that whenever a single RCU critical
section reads multiple shared data, these reads are either
data-dependent or need no ordering. This is almost always the
case when using RCU, because read-side critical sections typically
@@ -144,7 +137,7 @@ The core RCU API is small:
every update) until reaching a data structure of interest,
and then read from there.
- RCU read-side critical sections must use qatomic_rcu_read() to
+ RCU read-side critical sections must use ``qatomic_rcu_read()`` to
read data, unless concurrent writes are prevented by another
synchronization mechanism.
@@ -152,18 +145,17 @@ The core RCU API is small:
data structure in a single direction, opposite to the direction
in which the updater initializes it.
- void qatomic_rcu_set(p, typeof(*p) v);
-
- qatomic_rcu_set() is similar to qatomic_store_release(), though it also
- makes assumptions on the code that calls it in order to allow a more
- optimized implementation.
+``void qatomic_rcu_set(p, typeof(*p) v);``
+ ``qatomic_rcu_set()`` is similar to ``qatomic_store_release()``,
+ though it also makes assumptions on the code that calls it in
+ order to allow a more optimized implementation.
- In particular, qatomic_rcu_set() suffices for synchronization
+ In particular, ``qatomic_rcu_set()`` suffices for synchronization
with readers, if the updater never mutates a field within a
data item that is already accessible to readers. This is the
case when initializing a new copy of the RCU-protected data
- structure; just ensure that initialization of *p is carried out
- before qatomic_rcu_set() makes the data item visible to readers.
+ structure; just ensure that initialization of ``*p`` is carried out
+ before ``qatomic_rcu_set()`` makes the data item visible to readers.
If this rule is observed, writes will happen in the opposite
order as reads in the RCU read-side critical sections (or if
there is just one update), and there will be no need for other
@@ -171,58 +163,54 @@ The core RCU API is small:
The following APIs must be used before RCU is used in a thread:
- void rcu_register_thread(void);
-
+``void rcu_register_thread(void);``
Mark a thread as taking part in the RCU mechanism. Such a thread
will have to report quiescent points regularly, either manually
- or through the QemuCond/QemuSemaphore/QemuEvent APIs.
-
- void rcu_unregister_thread(void);
+ or through the ``QemuCond``/``QemuSemaphore``/``QemuEvent`` APIs.
+``void rcu_unregister_thread(void);``
Mark a thread as not taking part anymore in the RCU mechanism.
It is not a problem if such a thread reports quiescent points,
- either manually or by using the QemuCond/QemuSemaphore/QemuEvent
- APIs.
+ either manually or by using the
+ ``QemuCond``/``QemuSemaphore``/``QemuEvent`` APIs.
-Note that these APIs are relatively heavyweight, and should _not_ be
+Note that these APIs are relatively heavyweight, and should **not** be
nested.
Convenience macros
-==================
+------------------
Two macros are provided that automatically release the read lock at the
end of the scope.
- RCU_READ_LOCK_GUARD()
-
+``RCU_READ_LOCK_GUARD()``
Takes the lock and will release it at the end of the block it's
used in.
- WITH_RCU_READ_LOCK_GUARD() { code }
-
+``WITH_RCU_READ_LOCK_GUARD() { code }``
Is used at the head of a block to protect the code within the block.
-Note that 'goto'ing out of the guarded block will also drop the lock.
+Note that a ``goto`` out of the guarded block will also drop the lock.
-DIFFERENCES WITH LINUX
-======================
+Differences with Linux
+----------------------
- Waiting on a mutex is possible, though discouraged, within an RCU critical
section. This is because spinlocks are rarely (if ever) used in userspace
programming; not allowing this would prevent upgrading an RCU read-side
critical section to become an updater.
-- qatomic_rcu_read and qatomic_rcu_set replace rcu_dereference and
- rcu_assign_pointer. They take a _pointer_ to the variable being accessed.
+- ``qatomic_rcu_read`` and ``qatomic_rcu_set`` replace ``rcu_dereference`` and
+ ``rcu_assign_pointer``. They take a **pointer** to the variable being accessed.
-- call_rcu is a macro that has an extra argument (the name of the first
- field in the struct, which must be a struct rcu_head), and expects the
+- ``call_rcu`` is a macro that has an extra argument (the name of the first
+ field in the struct, which must be a struct ``rcu_head``), and expects the
type of the callback's argument to be the type of the first argument.
- call_rcu1 is the same as Linux's call_rcu.
+ ``call_rcu1`` is the same as Linux's ``call_rcu``.
-RCU PATTERNS
-============
+RCU Patterns
+------------
Many patterns using read-writer locks translate directly to RCU, with
the advantages of higher scalability and deadlock immunity.
@@ -243,28 +231,28 @@ Here are some frequently-used RCU idioms that are worth noting.
RCU list processing
--------------------
+^^^^^^^^^^^^^^^^^^^
TBD (not yet used in QEMU)
RCU reference counting
-----------------------
+^^^^^^^^^^^^^^^^^^^^^^
Because grace periods are not allowed to complete while there is an RCU
read-side critical section in progress, the RCU read-side primitives
may be used as a restricted reference-counting mechanism. For example,
-consider the following code fragment:
+consider the following code fragment::
rcu_read_lock();
p = qatomic_rcu_read(&foo);
/* do something with p. */
rcu_read_unlock();
-The RCU read-side critical section ensures that the value of "p" remains
-valid until after the rcu_read_unlock(). In some sense, it is acquiring
-a reference to p that is later released when the critical section ends.
-The write side looks simply like this (with appropriate locking):
+The RCU read-side critical section ensures that the value of ``p`` remains
+valid until after the ``rcu_read_unlock()``. In some sense, it is acquiring
+a reference to ``p`` that is later released when the critical section ends.
+The write side looks simply like this (with appropriate locking)::
qemu_mutex_lock(&foo_mutex);
old = foo;
@@ -274,7 +262,7 @@ The write side looks simply like this (with appropriate locking):
free(old);
If the processing cannot be done purely within the critical section, it
-is possible to combine this idiom with a "real" reference count:
+is possible to combine this idiom with a "real" reference count::
rcu_read_lock();
p = qatomic_rcu_read(&foo);
@@ -283,7 +271,7 @@ is possible to combine this idiom with a "real" reference count:
/* do something with p. */
foo_unref(p);
-The write side can be like this:
+The write side can be like this::
qemu_mutex_lock(&foo_mutex);
old = foo;
@@ -292,7 +280,7 @@ The write side can be like this:
synchronize_rcu();
foo_unref(old);
-or with call_rcu:
+or with ``call_rcu``::
qemu_mutex_lock(&foo_mutex);
old = foo;
@@ -301,10 +289,10 @@ or with call_rcu:
call_rcu(foo_unref, old, rcu);
In both cases, the write side only performs removal. Reclamation
-happens when the last reference to a "foo" object is dropped.
-Using synchronize_rcu() is undesirably expensive, because the
+happens when the last reference to a ``foo`` object is dropped.
+Using ``synchronize_rcu()`` is undesirably expensive, because the
last reference may be dropped on the read side. Hence you can
-use call_rcu() instead:
+use ``call_rcu()`` instead::
foo_unref(struct foo *p) {
if (qatomic_fetch_dec(&p->refcount) == 1) {
@@ -314,7 +302,7 @@ use call_rcu() instead:
Note that the same idioms would be possible with reader/writer
-locks:
+locks::
read_lock(&foo_rwlock); write_mutex_lock(&foo_rwlock);
p = foo; p = foo;
@@ -334,15 +322,15 @@ locks:
foo_unref(p);
read_unlock(&foo_rwlock);
-foo_unref could use a mechanism such as bottom halves to move deallocation
+``foo_unref`` could use a mechanism such as bottom halves to move deallocation
out of the write-side critical section.
RCU resizable arrays
---------------------
+^^^^^^^^^^^^^^^^^^^^
Resizable arrays can be used with RCU. The expensive RCU synchronization
-(or call_rcu) only needs to take place when the array is resized.
+(or ``call_rcu``) only needs to take place when the array is resized.
The two items to take care of are:
- ensuring that the old version of the array is available between removal
@@ -351,10 +339,10 @@ The two items to take care of are:
- avoiding mismatches in the read side between the array data and the
array size.
-The first problem is avoided simply by not using realloc. Instead,
+The first problem is avoided simply by not using ``realloc``. Instead,
each resize will allocate a new array and copy the old data into it.
The second problem would arise if the size and the data pointers were
-two members of a larger struct:
+two members of a larger struct::
struct mystuff {
...
@@ -364,7 +352,7 @@ two members of a larger struct:
...
};
-Instead, we store the size of the array with the array itself:
+Instead, we store the size of the array with the array itself::
struct arr {
int size;
@@ -400,7 +388,7 @@ Instead, we store the size of the array with the array itself:
}
-SOURCES
-=======
+References
+----------
-* Documentation/RCU/ from the Linux kernel
+* The `Linux kernel RCU documentation <https://docs.kernel.org/RCU/>`__
diff --git a/docs/devel/testing/blkdebug.rst b/docs/devel/testing/blkdebug.rst
new file mode 100644
index 0000000..63887c9
--- /dev/null
+++ b/docs/devel/testing/blkdebug.rst
@@ -0,0 +1,177 @@
+Block I/O error injection using ``blkdebug``
+============================================
+
+..
+ Copyright (C) 2014-2015 Red Hat Inc
+
+ This work is licensed under the terms of the GNU GPL, version 2 or later. See
+ the COPYING file in the top-level directory.
+
+The ``blkdebug`` block driver is a rule-based error injection engine. It can be
+used to exercise error code paths in block drivers including ``ENOSPC`` (out of
+space) and ``EIO``.
+
+This document gives an overview of the features available in ``blkdebug``.
+
+Background
+----------
+Block drivers have many error code paths that handle I/O errors. Image formats
+are especially complex since metadata I/O errors during cluster allocation or
+while updating tables happen halfway through request processing and require
+discipline to keep image files consistent.
+
+Error injection allows test cases to trigger I/O errors at specific points.
+This way, all error paths can be tested to make sure they are correct.
+
+Rules
+-----
+The ``blkdebug`` block driver takes a list of "rules" that tell the error injection
+engine when to fail an I/O request.
+
+Each I/O request is evaluated against the rules. If a rule matches the request
+then its "action" is executed.
+
+Rules can be placed in a configuration file; the configuration file
+follows the same .ini-like format used by QEMU's ``-readconfig`` option, and
+each section of the file represents a rule.
+
+The following configuration file defines a single rule::
+
+ $ cat blkdebug.conf
+ [inject-error]
+ event = "read_aio"
+ errno = "28"
+
+This rule fails all aio read requests with ``ENOSPC`` (28). Note that the errno
+value depends on the host. On Linux, see
+``/usr/include/asm-generic/errno-base.h`` for errno values.
+
+Invoke QEMU as follows::
+
+ $ qemu-system-x86_64
+ -drive if=none,cache=none,file=blkdebug:blkdebug.conf:test.img,id=drive0 \
+ -device virtio-blk-pci,drive=drive0,id=virtio-blk-pci0
+
+Rules support the following attributes:
+
+``event``
+ which type of operation to match (e.g. ``read_aio``, ``write_aio``,
+ ``flush_to_os``, ``flush_to_disk``). See `Events`_ for
+ information on events.
+
+``state``
+ (optional) the engine must be in this state number in order for this
+ rule to match. See `State transitions`_ for information
+ on states.
+
+``errno``
+ the numeric errno value to return when a request matches this rule.
+ The errno values depend on the host since the numeric values are not
+ standardized in the POSIX specification.
+
+``sector``
+ (optional) a sector number that the request must overlap in order to
+ match this rule
+
+``once``
+ (optional, default ``off``) only execute this action on the first
+ matching request
+
+``immediately``
+ (optional, default ``off``) return a NULL ``BlockAIOCB``
+ pointer and fail without an errno instead. This
+ exercises the code path where ``BlockAIOCB`` fails and the
+ caller's ``BlockCompletionFunc`` is not invoked.
+
+Events
+------
+Block drivers provide information about the type of I/O request they are about
+to make so rules can match specific types of requests. For example, the ``qcow2``
+block driver tells ``blkdebug`` when it accesses the L1 table so rules can match
+only L1 table accesses and not other metadata or guest data requests.
+
+The core events are:
+
+``read_aio``
+ guest data read
+
+``write_aio``
+ guest data write
+
+``flush_to_os``
+ write out unwritten block driver state (e.g. cached metadata)
+
+``flush_to_disk``
+ flush the host block device's disk cache
+
+See ``qapi/block-core.json:BlkdebugEvent`` for the full list of events.
+You may need to grep block driver source code to understand the
+meaning of specific events.
+
+State transitions
+-----------------
+There are cases where more power is needed to match a particular I/O request in
+a longer sequence of requests. For example::
+
+ write_aio
+ flush_to_disk
+ write_aio
+
+How do we match the 2nd ``write_aio`` but not the first? This is where state
+transitions come in.
+
+The error injection engine has an integer called the "state" that always starts
+initialized to 1. The state integer is internal to ``blkdebug`` and cannot be
+observed from outside but rules can interact with it for powerful matching
+behavior.
+
+Rules can be conditional on the current state and they can transition to a new
+state.
+
+When a rule's "state" attribute is non-zero then the current state must equal
+the attribute in order for the rule to match.
+
+For example, to match the 2nd write_aio::
+
+ [set-state]
+ event = "write_aio"
+ state = "1"
+ new_state = "2"
+
+ [inject-error]
+ event = "write_aio"
+ state = "2"
+ errno = "5"
+
+The first ``write_aio`` request matches the ``set-state`` rule and transitions from
+state 1 to state 2. Once state 2 has been entered, the ``set-state`` rule no
+longer matches since it requires state 1. But the ``inject-error`` rule now
+matches the next ``write_aio`` request and injects ``EIO`` (5).
+
+State transition rules support the following attributes:
+
+``event``
+ which type of operation to match (e.g. ``read_aio``, ``write_aio``,
+ ``flush_to_os`, ``flush_to_disk``). See `Events`_ for
+ information on events.
+
+``state``
+ (optional) the engine must be in this state number in order for this
+ rule to match
+
+``new_state``
+ transition to this state number
+
+Suspend and resume
+------------------
+Exercising code paths in block drivers may require specific ordering amongst
+concurrent requests. The "breakpoint" feature allows requests to be halted on
+a ``blkdebug`` event and resumed later. This makes it possible to achieve
+deterministic ordering when multiple requests are in flight.
+
+Breakpoints on ``blkdebug`` events are associated with a user-defined ``tag`` string.
+This tag serves as an identifier by which the request can be resumed at a later
+point.
+
+See the ``qemu-io(1)`` ``break``, ``resume``, ``remove_break``, and ``wait_break``
+commands for details.
diff --git a/docs/devel/blkverify.txt b/docs/devel/testing/blkverify.rst
index aca826c..2a71778 100644
--- a/docs/devel/blkverify.txt
+++ b/docs/devel/testing/blkverify.rst
@@ -1,8 +1,10 @@
-= Block driver correctness testing with blkverify =
+Block driver correctness testing with ``blkverify``
+===================================================
-== Introduction ==
+Introduction
+------------
-This document describes how to use the blkverify protocol to test that a block
+This document describes how to use the ``blkverify`` protocol to test that a block
driver is operating correctly.
It is difficult to test and debug block drivers against real guests. Often
@@ -11,12 +13,13 @@ of the executable. Other times obscure errors are raised by a program inside
the guest. These issues are extremely hard to trace back to bugs in the block
driver.
-Blkverify solves this problem by catching data corruption inside QEMU the first
+``blkverify`` solves this problem by catching data corruption inside QEMU the first
time bad data is read and reporting the disk sector that is corrupted.
-== How it works ==
+How it works
+------------
-The blkverify protocol has two child block devices, the "test" device and the
+The ``blkverify`` protocol has two child block devices, the "test" device and the
"raw" device. Read/write operations are mirrored to both devices so their
state should always be in sync.
@@ -25,13 +28,14 @@ contents to the "test" image. The idea is that the "raw" device will handle
read/write operations correctly and not corrupt data. It can be used as a
reference for comparison against the "test" device.
-After a mirrored read operation completes, blkverify will compare the data and
+After a mirrored read operation completes, ``blkverify`` will compare the data and
raise an error if it is not identical. This makes it possible to catch the
first instance where corrupt data is read.
-== Example ==
+Example
+-------
-Imagine raw.img has 0xcd repeated throughout its first sector:
+Imagine raw.img has 0xcd repeated throughout its first sector::
$ ./qemu-io -c 'read -v 0 512' raw.img
00000000: cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd ................
@@ -42,7 +46,7 @@ Imagine raw.img has 0xcd repeated throughout its first sector:
read 512/512 bytes at offset 0
512.000000 bytes, 1 ops; 0.0000 sec (97.656 MiB/sec and 200000.0000 ops/sec)
-And test.img is corrupt, its first sector is zeroed when it shouldn't be:
+And test.img is corrupt, its first sector is zeroed when it shouldn't be::
$ ./qemu-io -c 'read -v 0 512' test.img
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
@@ -53,17 +57,17 @@ And test.img is corrupt, its first sector is zeroed when it shouldn't be:
read 512/512 bytes at offset 0
512.000000 bytes, 1 ops; 0.0000 sec (81.380 MiB/sec and 166666.6667 ops/sec)
-This error is caught by blkverify:
+This error is caught by ``blkverify``::
$ ./qemu-io -c 'read 0 512' blkverify:a.img:b.img
blkverify: read sector_num=0 nb_sectors=4 contents mismatch in sector 0
-A more realistic scenario is verifying the installation of a guest OS:
+A more realistic scenario is verifying the installation of a guest OS::
$ ./qemu-img create raw.img 16G
$ ./qemu-img create -f qcow2 test.qcow2 16G
$ ./qemu-system-x86_64 -cdrom debian.iso \
-drive file=blkverify:raw.img:test.qcow2
-If the installation is aborted when blkverify detects corruption, use qemu-io
+If the installation is aborted when ``blkverify`` detects corruption, use ``qemu-io``
to explore the contents of the disk image at the sector in question.
diff --git a/docs/devel/testing/index.rst b/docs/devel/testing/index.rst
index 45eb4a7..1171f7d 100644
--- a/docs/devel/testing/index.rst
+++ b/docs/devel/testing/index.rst
@@ -14,3 +14,5 @@ testing infrastructure.
acpi-bits
ci
fuzzing
+ blkdebug
+ blkverify
diff --git a/docs/system/arm/cubieboard.rst b/docs/system/arm/cubieboard.rst
index 58c4a2d..90d24c7 100644
--- a/docs/system/arm/cubieboard.rst
+++ b/docs/system/arm/cubieboard.rst
@@ -15,4 +15,5 @@ Emulated devices:
- USB controller
- SATA controller
- TWI (I2C) controller
+- SPI controller
- Watchdog timer
diff --git a/docs/system/arm/stm32.rst b/docs/system/arm/stm32.rst
index 3b640f3..ca7a558 100644
--- a/docs/system/arm/stm32.rst
+++ b/docs/system/arm/stm32.rst
@@ -36,6 +36,7 @@ Supported devices
* SPI controller
* System configuration (SYSCFG)
* Timer controller (TIMER)
+ * Reset and Clock Controller (RCC) (STM32F4 only, reset and enable only)
Missing devices
---------------
@@ -53,7 +54,7 @@ Missing devices
* Power supply configuration (PWR)
* Random Number Generator (RNG)
* Real-Time Clock (RTC) controller
- * Reset and Clock Controller (RCC)
+ * Reset and Clock Controller (RCC) (other features than reset and enable)
* Secure Digital Input/Output (SDIO) interface
* USB OTG
* Watchdog controller (IWDG, WWDG)