aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/about/deprecated.rst66
-rw-r--r--docs/about/removed-features.rst78
-rw-r--r--docs/interop/bitmaps.rst2
-rw-r--r--docs/interop/index.rst1
-rw-r--r--docs/interop/qcow2.rst (renamed from docs/interop/qcow2.txt)187
-rw-r--r--docs/qcow2-cache.txt2
-rw-r--r--docs/system/confidential-guest-support.rst1
-rw-r--r--docs/system/i386/tdx.rst161
-rw-r--r--docs/system/target-i386.rst1
9 files changed, 355 insertions, 144 deletions
diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 44d3427..4203713 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -148,6 +148,14 @@ options are removed in favor of using explicit ``blockdev-create`` and
``blockdev-add`` calls. See :doc:`/interop/live-block-operations` for
details.
+``query-migrationthreads`` (since 9.2)
+''''''''''''''''''''''''''''''''''''''
+
+To be removed with no replacement, as it reports only a limited set of
+threads (for example, it only reports source side of multifd threads,
+without reporting any destination threads, or non-multifd source threads).
+For debugging purpose, please use ``-name $VM,debug-threads=on`` instead.
+
``block-job-pause`` (since 10.1)
''''''''''''''''''''''''''''''''
@@ -179,27 +187,10 @@ Use ``job-dismiss`` instead.
Use ``job-finalize`` instead.
-``query-migrationthreads`` (since 9.2)
-''''''''''''''''''''''''''''''''''''''
+``migrate`` argument ``detach`` (since 10.1)
+''''''''''''''''''''''''''''''''''''''''''''
-To be removed with no replacement, as it reports only a limited set of
-threads (for example, it only reports source side of multifd threads,
-without reporting any destination threads, or non-multifd source threads).
-For debugging purpose, please use ``-name $VM,debug-threads=on`` instead.
-
-Incorrectly typed ``device_add`` arguments (since 6.2)
-''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Due to shortcomings in the internal implementation of ``device_add``, QEMU
-incorrectly accepts certain invalid arguments: Any object or list arguments are
-silently ignored. Other argument types are not checked, but an implicit
-conversion happens, so that e.g. string values can be assigned to integer
-device properties or vice versa.
-
-This is a bug in QEMU that will be fixed in the future so that previously
-accepted incorrect commands will return an error. Users should make sure that
-all arguments passed to ``device_add`` are consistent with the documented
-property types.
+This argument has always been ignored.
Host Architectures
------------------
@@ -324,12 +315,6 @@ deprecated; use the new name ``dtb-randomness`` instead. The new name
better reflects the way this property affects all random data within
the device tree blob, not just the ``kaslr-seed`` node.
-Big-Endian variants of MicroBlaze ``petalogix-ml605`` and ``xlnx-zynqmp-pmu`` machines (since 9.2)
-''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-Both ``petalogix-ml605`` and ``xlnx-zynqmp-pmu`` were added for little endian
-CPUs. Big endian support is not tested.
-
Mips ``mipssim`` machine (since 10.0)
'''''''''''''''''''''''''''''''''''''
@@ -360,6 +345,19 @@ machine must ensure that they're setting the ``spike`` machine in the
command line (``-M spike``).
+System emulator binaries
+------------------------
+
+``qemu-system-microblazeel`` (since 10.1)
+'''''''''''''''''''''''''''''''''''''''''
+
+The ``qemu-system-microblaze`` binary can emulate little-endian machines
+now, too, so the separate binary ``qemu-system-microblazeel`` (with the
+``el`` suffix) for little-endian targets is not required anymore. The
+``petalogix-s3adsp1800`` machine can now be switched to little endian by
+setting its ``endianness`` property to ``little``.
+
+
Backend options
---------------
@@ -531,14 +529,6 @@ PCIe passthrough shall be the mainline solution.
CPU device properties
'''''''''''''''''''''
-``pcommit`` on x86 (since 9.1)
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
-The PCOMMIT instruction was never included in any physical processor.
-It was implemented as a no-op instruction in TCG up to QEMU 9.0, but
-only with ``-cpu max`` (which does not guarantee migration compatibility
-across versions).
-
``pmu-num=n`` on RISC-V CPUs (since 8.2)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -548,6 +538,14 @@ be calculated with ``((2 ^ n) - 1) << 3``. The least significant three bits
must be left clear.
+``pcommit`` on x86 (since 9.1)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The PCOMMIT instruction was never included in any physical processor.
+It was implemented as a no-op instruction in TCG up to QEMU 9.0, but
+only with ``-cpu max`` (which does not guarantee migration compatibility
+across versions).
+
Backwards compatibility
-----------------------
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 063284d..d7c2113 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -162,6 +162,12 @@ specified with ``-mem-path`` can actually provide the guest RAM configured with
The ``name`` parameter of the ``-net`` option was a synonym
for the ``id`` parameter, which should now be used instead.
+RISC-V firmware not booted by default (removed in 5.1)
+''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+QEMU 5.1 changes the default behaviour from ``-bios none`` to ``-bios default``
+for the RISC-V ``virt`` machine and ``sifive_u`` machine.
+
``-numa node,mem=...`` (removed in 5.1)
'''''''''''''''''''''''''''''''''''''''
@@ -324,12 +330,6 @@ devices. Drives the board doesn't pick up can no longer be used with
This option was undocumented and not used in the field.
Use ``-device usb-ccid`` instead.
-RISC-V firmware not booted by default (removed in 5.1)
-''''''''''''''''''''''''''''''''''''''''''''''''''''''
-
-QEMU 5.1 changes the default behaviour from ``-bios none`` to ``-bios default``
-for the RISC-V ``virt`` machine and ``sifive_u`` machine.
-
``-no-quit`` (removed in 7.0)
'''''''''''''''''''''''''''''
@@ -722,6 +722,15 @@ Use ``multifd-channels`` instead.
Use ``multifd-compression`` instead.
+Incorrectly typed ``device_add`` arguments (since 9.2)
+''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+Due to shortcomings in the internal implementation of ``device_add``,
+QEMU used to incorrectly accept certain invalid arguments. Any object
+or list arguments were silently ignored. Other argument types were not
+checked, but an implicit conversion happened, so that e.g. string
+values could be assigned to integer device properties or vice versa.
+
QEMU Machine Protocol (QMP) events
----------------------------------
@@ -902,14 +911,6 @@ The RISC-V no MMU cpus have been removed. The two CPUs: ``rv32imacu-nommu`` and
``rv64imacu-nommu`` can no longer be used. Instead the MMU status can be specified
via the CPU ``mmu`` option when using the ``rv32`` or ``rv64`` CPUs.
-RISC-V 'any' CPU type ``-cpu any`` (removed in 9.2)
-'''''''''''''''''''''''''''''''''''''''''''''''''''
-
-The 'any' CPU type was introduced back in 2018 and was around since the
-initial RISC-V QEMU port. Its usage was always been unclear: users don't know
-what to expect from a CPU called 'any', and in fact the CPU does not do anything
-special that isn't already done by the default CPUs rv32/rv64.
-
``compat`` property of server class POWER CPUs (removed in 6.0)
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
@@ -956,6 +957,14 @@ The CRIS architecture was pulled from Linux in 4.17 and the compiler
was no longer packaged in any distro making it harder to run the
``check-tcg`` tests.
+RISC-V 'any' CPU type ``-cpu any`` (removed in 9.2)
+'''''''''''''''''''''''''''''''''''''''''''''''''''
+
+The 'any' CPU type was introduced back in 2018 and was around since the
+initial RISC-V QEMU port. Its usage was always been unclear: users don't know
+what to expect from a CPU called 'any', and in fact the CPU does not do anything
+special that isn't already done by the default CPUs rv32/rv64.
+
System accelerators
-------------------
@@ -966,18 +975,18 @@ Userspace local APIC with KVM (x86, removed in 8.0)
a local APIC. The ``split`` setting is supported, as is using ``-M
kernel-irqchip=off`` when the CPU does not have a local APIC.
-HAXM (``-accel hax``) (removed in 8.2)
-''''''''''''''''''''''''''''''''''''''
-
-The HAXM project has been retired (see https://github.com/intel/haxm#status).
-Use "whpx" (on Windows) or "hvf" (on macOS) instead.
-
MIPS "Trap-and-Emulate" KVM support (removed in 8.0)
''''''''''''''''''''''''''''''''''''''''''''''''''''
The MIPS "Trap-and-Emulate" KVM host and guest support was removed
from Linux in 2021, and is not supported anymore by QEMU either.
+HAXM (``-accel hax``) (removed in 8.2)
+''''''''''''''''''''''''''''''''''''''
+
+The HAXM project has been retired (see https://github.com/intel/haxm#status).
+Use "whpx" (on Windows) or "hvf" (on macOS) instead.
+
System emulator machines
------------------------
@@ -1035,16 +1044,6 @@ Aspeed ``swift-bmc`` machine (removed in 7.0)
This machine was removed because it was unused. Alternative AST2500 based
OpenPOWER machines are ``witherspoon-bmc`` and ``romulus-bmc``.
-Aspeed ``tacoma-bmc`` machine (removed in 10.0)
-'''''''''''''''''''''''''''''''''''''''''''''''
-
-The ``tacoma-bmc`` machine was removed because it didn't bring much
-compared to the ``rainier-bmc`` machine. Also, the ``tacoma-bmc`` was
-a board used for bring up of the AST2600 SoC that never left the
-labs. It can be easily replaced by the ``rainier-bmc`` machine, which
-was the actual final product, or by the ``ast2600-evb`` with some
-tweaks.
-
ppc ``taihu`` machine (removed in 7.2)
'''''''''''''''''''''''''''''''''''''''''''''
@@ -1075,6 +1074,16 @@ for all machine types using the PXA2xx and OMAP2 SoCs. We are also
dropping the ``cheetah`` OMAP1 board, because we don't have any
test images for it and don't know of anybody who does.
+Aspeed ``tacoma-bmc`` machine (removed in 10.0)
+'''''''''''''''''''''''''''''''''''''''''''''''
+
+The ``tacoma-bmc`` machine was removed because it didn't bring much
+compared to the ``rainier-bmc`` machine. Also, the ``tacoma-bmc`` was
+a board used for bring up of the AST2600 SoC that never left the
+labs. It can be easily replaced by the ``rainier-bmc`` machine, which
+was the actual final product, or by the ``ast2600-evb`` with some
+tweaks.
+
ppc ``ref405ep`` machine (removed in 10.0)
''''''''''''''''''''''''''''''''''''''''''
@@ -1082,6 +1091,15 @@ This machine was removed because PPC 405 CPU have no known users,
firmware images are not available, OpenWRT dropped support in 2019,
U-Boot in 2017, and Linux in 2024.
+Big-Endian variants of ``petalogix-ml605`` and ``xlnx-zynqmp-pmu`` machines (removed in 10.1)
+'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
+
+Both the MicroBlaze ``petalogix-ml605`` and ``xlnx-zynqmp-pmu`` machines
+were added for little endian CPUs. Big endian support was never tested
+and likely never worked. Starting with QEMU v10.1, the machines are now
+only available as little-endian machines.
+
+
linux-user mode CPUs
--------------------
diff --git a/docs/interop/bitmaps.rst b/docs/interop/bitmaps.rst
index ddf8947..7536f0b 100644
--- a/docs/interop/bitmaps.rst
+++ b/docs/interop/bitmaps.rst
@@ -97,7 +97,7 @@ time.
- Persistent storage formats may impose their own requirements on bitmap names
and namespaces. Presently, only qcow2 supports persistent bitmaps. See
- docs/interop/qcow2.txt for more details on restrictions. Notably:
+ :doc:`qcow2` for more details on restrictions. Notably:
- qcow2 bitmap names are limited to between 1 and 1023 bytes long.
diff --git a/docs/interop/index.rst b/docs/interop/index.rst
index 999e44e..4b951ae 100644
--- a/docs/interop/index.rst
+++ b/docs/interop/index.rst
@@ -17,6 +17,7 @@ are useful for making QEMU interoperate with other software.
nbd
parallels
prl-xml
+ qcow2
pr-helper
qmp-spec
qemu-ga
diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.rst
index 2c46183..5948591 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.rst
@@ -1,6 +1,8 @@
-== General ==
+=======================
+Qcow2 Image File Format
+=======================
-A qcow2 image file is organized in units of constant size, which are called
+A ``qcow2`` image file is organized in units of constant size, which are called
(host) clusters. A cluster is the unit in which all allocations are done,
both for actual guest data and for image metadata.
@@ -9,10 +11,10 @@ clusters of the same size.
All numbers in qcow2 are stored in Big Endian byte order.
+Header
+------
-== Header ==
-
-The first cluster of a qcow2 image contains the file header:
+The first cluster of a qcow2 image contains the file header::
Byte 0 - 3: magic
QCOW magic string ("QFI\xfb")
@@ -38,7 +40,7 @@ The first cluster of a qcow2 image contains the file header:
within a cluster (1 << cluster_bits is the cluster size).
Must not be less than 9 (i.e. 512 byte clusters).
- Note: qemu as of today has an implementation limit of 2 MB
+ Note: QEMU as of today has an implementation limit of 2 MB
as the maximum cluster size and won't be able to open images
with larger cluster sizes.
@@ -48,7 +50,7 @@ The first cluster of a qcow2 image contains the file header:
24 - 31: size
Virtual disk size in bytes.
- Note: qemu has an implementation limit of 32 MB as
+ Note: QEMU has an implementation limit of 32 MB as
the maximum L1 table size. With a 2 MB cluster
size, it is unable to populate a virtual cluster
beyond 2 EB (61 bits); with a 512 byte cluster
@@ -87,7 +89,8 @@ The first cluster of a qcow2 image contains the file header:
For version 2, the header is exactly 72 bytes in length, and finishes here.
For version 3 or higher, the header length is at least 104 bytes, including
-the next fields through header_length.
+the next fields through ``header_length``.
+::
72 - 79: incompatible_features
Bitmask of incompatible features. An implementation must
@@ -185,7 +188,8 @@ the next fields through header_length.
of 8.
-=== Additional fields (version 3 and higher) ===
+Additional fields (version 3 and higher)
+----------------------------------------
In general, these fields are optional and may be safely ignored by the software,
as well as filled by zeros (which is equal to field absence), if software needs
@@ -193,21 +197,25 @@ to set field B, but does not care about field A which precedes B. More
formally, additional fields have the following compatibility rules:
1. If the value of the additional field must not be ignored for correct
-handling of the file, it will be accompanied by a corresponding incompatible
-feature bit.
+ handling of the file, it will be accompanied by a corresponding incompatible
+ feature bit.
2. If there are no unrecognized incompatible feature bits set, an unknown
-additional field may be safely ignored other than preserving its value when
-rewriting the image header.
+ additional field may be safely ignored other than preserving its value when
+ rewriting the image header.
+
+.. _ref_rules_3:
3. An explicit value of 0 will have the same behavior as when the field is not
-present*, if not altered by a specific incompatible bit.
+ present*, if not altered by a specific incompatible bit.
-*. A field is considered not present when header_length is less than or equal
+(*) A field is considered not present when ``header_length`` is less than or equal
to the field's offset. Also, all additional fields are not present for
version 2.
- 104: compression_type
+::
+
+ 104: compression_type
Defines the compression method used for compressed clusters.
All compressed clusters in an image use the same compression
@@ -219,8 +227,8 @@ version 2.
or must be zero (which means deflate).
Available compression type values:
- 0: deflate <https://www.ietf.org/rfc/rfc1951.txt>
- 1: zstd <http://github.com/facebook/zstd>
+ - 0: deflate <https://www.ietf.org/rfc/rfc1951.txt>
+ - 1: zstd <http://github.com/facebook/zstd>
The deflate compression type is called "zlib"
<https://www.zlib.net/> in QEMU. However, clusters with the
@@ -228,19 +236,21 @@ version 2.
105 - 111: Padding, contents defined below.
-=== Header padding ===
+Header padding
+--------------
-@header_length must be a multiple of 8, which means that if the end of the last
+``header_length`` must be a multiple of 8, which means that if the end of the last
additional field is not aligned, some padding is needed. This padding must be
zeroed, so that if some existing (or future) additional field will fall into
-the padding, it will be interpreted accordingly to point [3.] of the previous
+the padding, it will be interpreted accordingly to point `[3.] <#ref_rules_3>`_ of the previous
paragraph, i.e. in the same manner as when this field is not present.
-=== Header extensions ===
+Header extensions
+-----------------
Directly after the image header, optional sections called header extensions can
-be stored. Each extension has a structure like the following:
+be stored. Each extension has a structure like the following::
Byte 0 - 3: Header extension type:
0x00000000 - End of the header extension area
@@ -270,17 +280,19 @@ data of compatible features that it doesn't support. Compatible features that
need space for additional data can use a header extension.
-== String header extensions ==
+String header extensions
+------------------------
Some header extensions (such as the backing file format name and the external
data file name) are just a single string. In this case, the header extension
-length is the string length and the string is not '\0' terminated. (The header
-extension padding can make it look like a string is '\0' terminated, but
+length is the string length and the string is not ``\0`` terminated. (The header
+extension padding can make it look like a string is ``\0`` terminated, but
neither is padding always necessary nor is there a guarantee that zero bytes
are used for padding.)
-== Feature name table ==
+Feature name table
+------------------
The feature name table is an optional header extension that contains the name
for features used by the image. It can be used by applications that don't know
@@ -288,7 +300,7 @@ the respective feature (e.g. because the feature was introduced only later) to
display a useful error message.
The number of entries in the feature name table is determined by the length of
-the header extension data. Each entry look like this:
+the header extension data. Each entry looks like this::
Byte 0: Type of feature (select feature bitmap)
0: Incompatible feature
@@ -302,7 +314,8 @@ the header extension data. Each entry look like this:
terminated if it has full length)
-== Bitmaps extension ==
+Bitmaps extension
+-----------------
The bitmaps extension is an optional header extension. It provides the ability
to store bitmaps related to a virtual disk. For now, there is only one bitmap
@@ -310,9 +323,9 @@ type: the dirty tracking bitmap, which tracks virtual disk changes from some
point in time.
The data of the extension should be considered consistent only if the
-corresponding auto-clear feature bit is set, see autoclear_features above.
+corresponding auto-clear feature bit is set, see ``autoclear_features`` above.
-The fields of the bitmaps extension are:
+The fields of the bitmaps extension are::
Byte 0 - 3: nb_bitmaps
The number of bitmaps contained in the image. Must be
@@ -331,15 +344,17 @@ The fields of the bitmaps extension are:
Offset into the image file at which the bitmap directory
starts. Must be aligned to a cluster boundary.
-== Full disk encryption header pointer ==
+Full disk encryption header pointer
+-----------------------------------
The full disk encryption header must be present if, and only if, the
-'crypt_method' header requires metadata. Currently this is only true
-of the 'LUKS' crypt method. The header extension must be absent for
+``crypt_method`` header requires metadata. Currently this is only true
+of the ``LUKS`` crypt method. The header extension must be absent for
other methods.
This header provides the offset at which the crypt method can store
its additional data, as well as the length of such data.
+::
Byte 0 - 7: Offset into the image file at which the encryption
header starts in bytes. Must be aligned to a cluster
@@ -357,10 +372,10 @@ The first 592 bytes of the header clusters will contain the LUKS
partition header. This is then followed by the key material data areas.
The size of the key material data areas is determined by the number of
stripes in the key slot and key size. Refer to the LUKS format
-specification ('docs/on-disk-format.pdf' in the cryptsetup source
+specification (``docs/on-disk-format.pdf`` in the cryptsetup source
package) for details of the LUKS partition header format.
-In the LUKS partition header, the "payload-offset" field will be
+In the LUKS partition header, the ``payload-offset`` field will be
calculated as normal for the LUKS spec. ie the size of the LUKS
header, plus key material regions, plus padding, relative to the
start of the LUKS header. This offset value is not required to be
@@ -369,11 +384,12 @@ context of qcow2, since the qcow2 file format itself defines where
the real payload offset is, but none the less a valid payload offset
should always be present.
-In the LUKS key slots header, the "key-material-offset" is relative
+In the LUKS key slots header, the ``key-material-offset`` is relative
to the start of the LUKS header clusters in the qcow2 container,
not the start of the qcow2 file.
Logically the layout looks like
+::
+-----------------------------+
| QCow2 header |
@@ -405,7 +421,8 @@ Logically the layout looks like
| |
+-----------------------------+
-== Data encryption ==
+Data encryption
+---------------
When an encryption method is requested in the header, the image payload
data must be encrypted/decrypted on every write/read. The image headers
@@ -413,7 +430,7 @@ and metadata are never encrypted.
The algorithms used for encryption vary depending on the method
- - AES:
+ - ``AES``:
The AES cipher, in CBC mode, with 256 bit keys.
@@ -425,7 +442,7 @@ The algorithms used for encryption vary depending on the method
supported in the command line tools for the sake of back compatibility
and data liberation.
- - LUKS:
+ - ``LUKS``:
The algorithms are specified in the LUKS header.
@@ -433,7 +450,8 @@ The algorithms used for encryption vary depending on the method
in the LUKS header, with the physical disk sector as the
input tweak.
-== Host cluster management ==
+Host cluster management
+-----------------------
qcow2 manages the allocation of host clusters by maintaining a reference count
for each host cluster. A refcount of 0 means that the cluster is free, 1 means
@@ -453,14 +471,15 @@ Although a large enough refcount table can reserve clusters past 64 PB
large), note that some qcow2 metadata such as L1/L2 tables must point
to clusters prior to that point.
-Note: qemu has an implementation limit of 8 MB as the maximum refcount
-table size. With a 2 MB cluster size and a default refcount_order of
-4, it is unable to reference host resources beyond 2 EB (61 bits); in
-the worst case, with a 512 cluster size and refcount_order of 6, it is
-unable to access beyond 32 GB (35 bits).
+.. note::
+ QEMU has an implementation limit of 8 MB as the maximum refcount
+ table size. With a 2 MB cluster size and a default refcount_order of
+ 4, it is unable to reference host resources beyond 2 EB (61 bits); in
+ the worst case, with a 512 cluster size and refcount_order of 6, it is
+ unable to access beyond 32 GB (35 bits).
Given an offset into the image file, the refcount of its cluster can be
-obtained as follows:
+obtained as follows::
refcount_block_entries = (cluster_size * 8 / refcount_bits)
@@ -470,7 +489,7 @@ obtained as follows:
refcount_block = load_cluster(refcount_table[refcount_table_index]);
return refcount_block[refcount_block_index];
-Refcount table entry:
+Refcount table entry::
Bit 0 - 8: Reserved (set to 0)
@@ -482,14 +501,15 @@ Refcount table entry:
been allocated. All refcounts managed by this refcount block
are 0.
-Refcount block entry (x = refcount_bits - 1):
+Refcount block entry ``(x = refcount_bits - 1)``::
Bit 0 - x: Reference count of the cluster. If refcount_bits implies a
sub-byte width, note that bit 0 means the least significant
bit in this context.
-== Cluster mapping ==
+Cluster mapping
+---------------
Just as for refcounts, qcow2 uses a two-level structure for the mapping of
guest clusters to host clusters. They are called L1 and L2 table.
@@ -509,7 +529,7 @@ compressed clusters to reside below 512 TB (49 bits), and this limit
cannot be relaxed without an incompatible layout change).
Given an offset into the virtual disk, the offset into the image file can be
-obtained as follows:
+obtained as follows::
l2_entries = (cluster_size / sizeof(uint64_t)) [*]
@@ -523,7 +543,7 @@ obtained as follows:
[*] this changes if Extended L2 Entries are enabled, see next section
-L1 table entry:
+L1 table entry::
Bit 0 - 8: Reserved (set to 0)
@@ -538,7 +558,7 @@ L1 table entry:
refcount is exactly one. This information is only accurate
in the active L1 table.
-L2 table entry:
+L2 table entry::
Bit 0 - 61: Cluster descriptor
@@ -555,7 +575,7 @@ L2 table entry:
mapping for guest cluster offsets), so this bit should be 1
for all allocated clusters.
-Standard Cluster Descriptor:
+Standard Cluster Descriptor::
Bit 0: If set to 1, the cluster reads as all zeros. The host
cluster offset can be used to describe a preallocation,
@@ -577,7 +597,7 @@ Standard Cluster Descriptor:
56 - 61: Reserved (set to 0)
-Compressed Clusters Descriptor (x = 62 - (cluster_bits - 8)):
+Compressed Clusters Descriptor ``(x = 62 - (cluster_bits - 8))``::
Bit 0 - x-1: Host cluster offset. This is usually _not_ aligned to a
cluster or sector boundary! If cluster_bits is
@@ -601,7 +621,8 @@ file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
no backing file or the backing file is smaller than the image, they shall read
zeros for all parts that are not covered by the backing file.
-== Extended L2 Entries ==
+Extended L2 Entries
+-------------------
An image uses Extended L2 Entries if bit 4 is set on the incompatible_features
field of the header.
@@ -615,6 +636,8 @@ subclusters so they are treated the same as in images without this feature.
The size of an extended L2 entry is 128 bits so the number of entries per table
is calculated using this formula:
+.. code::
+
l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
The first 64 bits have the same format as the standard L2 table entry described
@@ -623,7 +646,7 @@ descriptor.
The last 64 bits contain a subcluster allocation bitmap with this format:
-Subcluster Allocation Bitmap (for standard clusters):
+Subcluster Allocation Bitmap (for standard clusters)::
Bit 0 - 31: Allocation status (one bit per subcluster)
@@ -647,13 +670,14 @@ Subcluster Allocation Bitmap (for standard clusters):
Bits are assigned starting from the least significant
one (i.e. bit x is used for subcluster x - 32).
-Subcluster Allocation Bitmap (for compressed clusters):
+Subcluster Allocation Bitmap (for compressed clusters)::
Bit 0 - 63: Reserved (set to 0)
Compressed clusters don't have subclusters,
so this field is not used.
-== Snapshots ==
+Snapshots
+---------
qcow2 supports internal snapshots. Their basic principle of operation is to
switch the active L1 table, so that a different set of host clusters are
@@ -672,7 +696,7 @@ in the image file, whose starting offset and length are given by the header
fields snapshots_offset and nb_snapshots. The entries of the snapshot table
have variable length, depending on the length of ID, name and extra data.
-Snapshot table entry:
+Snapshot table entry::
Byte 0 - 7: Offset into the image file at which the L1 table for the
snapshot starts. Must be aligned to a cluster boundary.
@@ -728,7 +752,8 @@ Snapshot table entry:
next multiple of 8.
-== Bitmaps ==
+Bitmaps
+-------
As mentioned above, the bitmaps extension provides the ability to store bitmaps
related to a virtual disk. This section describes how these bitmaps are stored.
@@ -739,20 +764,23 @@ each bitmap size is equal to the virtual disk size.
Each bit of the bitmap is responsible for strictly defined range of the virtual
disk. For bit number bit_nr the corresponding range (in bytes) will be:
+.. code::
+
[bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1]
Granularity is a property of the concrete bitmap, see below.
-=== Bitmap directory ===
+Bitmap directory
+----------------
Each bitmap saved in the image is described in a bitmap directory entry. The
bitmap directory is a contiguous area in the image file, whose starting offset
-and length are given by the header extension fields bitmap_directory_offset and
-bitmap_directory_size. The entries of the bitmap directory have variable
+and length are given by the header extension fields ``bitmap_directory_offset`` and
+``bitmap_directory_size``. The entries of the bitmap directory have variable
length, depending on the lengths of the bitmap name and extra data.
-Structure of a bitmap directory entry:
+Structure of a bitmap directory entry::
Byte 0 - 7: bitmap_table_offset
Offset into the image file at which the bitmap table
@@ -833,7 +861,8 @@ Structure of a bitmap directory entry:
next multiple of 8. All bytes of the padding must be zero.
-=== Bitmap table ===
+Bitmap table
+------------
Each bitmap is stored using a one-level structure (as opposed to two-level
structures like for refcounts and guest clusters mapping) for the mapping of
@@ -843,7 +872,7 @@ Each bitmap table has a variable size (stored in the bitmap directory entry)
and may use multiple clusters, however, it must be contiguous in the image
file.
-Structure of a bitmap table entry:
+Structure of a bitmap table entry::
Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero.
If bits 9 - 55 are zero:
@@ -860,11 +889,12 @@ Structure of a bitmap table entry:
56 - 63: Reserved and must be zero.
-=== Bitmap data ===
+Bitmap data
+-----------
As noted above, bitmap data is stored in separate clusters, described by the
bitmap table. Given an offset (in bytes) into the bitmap data, the offset into
-the image file can be obtained as follows:
+the image file can be obtained as follows::
image_offset(bitmap_data_offset) =
bitmap_table[bitmap_data_offset / cluster_size] +
@@ -875,7 +905,7 @@ above).
Given an offset byte_nr into the virtual disk and the bitmap's granularity, the
bit offset into the image file to the corresponding bit of the bitmap can be
-calculated like this:
+calculated like this::
bit_offset(byte_nr) =
image_offset(byte_nr / granularity / 8) * 8 +
@@ -886,21 +916,22 @@ last cluster of the bitmap data contains some unused tail bits. These bits must
be zero.
-=== Dirty tracking bitmaps ===
+Dirty tracking bitmaps
+----------------------
-Bitmaps with 'type' field equal to one are dirty tracking bitmaps.
+Bitmaps with ``type`` field equal to one are dirty tracking bitmaps.
-When the virtual disk is in use dirty tracking bitmap may be 'enabled' or
-'disabled'. While the bitmap is 'enabled', all writes to the virtual disk
+When the virtual disk is in use dirty tracking bitmap may be ``enabled`` or
+``disabled``. While the bitmap is ``enabled``, all writes to the virtual disk
should be reflected in the bitmap. A set bit in the bitmap means that the
corresponding range of the virtual disk (see above) was written to while the
-bitmap was 'enabled'. An unset bit means that this range was not written to.
+bitmap was ``enabled``. An unset bit means that this range was not written to.
The software doesn't have to sync the bitmap in the image file with its
-representation in RAM after each write or metadata change. Flag 'in_use'
+representation in RAM after each write or metadata change. Flag ``in_use``
should be set while the bitmap is not synced.
-In the image file the 'enabled' state is reflected by the 'auto' flag. If this
-flag is set, the software must consider the bitmap as 'enabled' and start
+In the image file the ``enabled`` state is reflected by the ``auto`` flag. If this
+flag is set, the software must consider the bitmap as ``enabled`` and start
tracking virtual disk changes to this bitmap from the first write to the
virtual disk. If this flag is not set then the bitmap is disabled.
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index 5f763aa..204a574 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -15,7 +15,7 @@ not a straightforward operation.
This document attempts to give an overview of the L2 and refcount
caches, and how to configure them.
-Please refer to the docs/interop/qcow2.txt file for an in-depth
+Please refer to the docs/interop/qcow2.rst file for an in-depth
technical description of the qcow2 file format.
diff --git a/docs/system/confidential-guest-support.rst b/docs/system/confidential-guest-support.rst
index 0c490db..66129fb 100644
--- a/docs/system/confidential-guest-support.rst
+++ b/docs/system/confidential-guest-support.rst
@@ -38,6 +38,7 @@ Supported mechanisms
Currently supported confidential guest mechanisms are:
* AMD Secure Encrypted Virtualization (SEV) (see :doc:`i386/amd-memory-encryption`)
+* Intel Trust Domain Extension (TDX) (see :doc:`i386/tdx`)
* POWER Protected Execution Facility (PEF) (see :ref:`power-papr-protected-execution-facility-pef`)
* s390x Protected Virtualization (PV) (see :doc:`s390x/protvirt`)
diff --git a/docs/system/i386/tdx.rst b/docs/system/i386/tdx.rst
new file mode 100644
index 0000000..8131750
--- /dev/null
+++ b/docs/system/i386/tdx.rst
@@ -0,0 +1,161 @@
+Intel Trusted Domain eXtension (TDX)
+====================================
+
+Intel Trusted Domain eXtensions (TDX) refers to an Intel technology that extends
+Virtual Machine Extensions (VMX) and Multi-Key Total Memory Encryption (MKTME)
+with a new kind of virtual machine guest called a Trust Domain (TD). A TD runs
+in a CPU mode that is designed to protect the confidentiality of its memory
+contents and its CPU state from any other software, including the hosting
+Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself.
+
+Prerequisites
+-------------
+
+To run TD, the physical machine needs to have TDX module loaded and initialized
+while KVM hypervisor has TDX support and has TDX enabled. If those requirements
+are met, the ``KVM_CAP_VM_TYPES`` will report the support of ``KVM_X86_TDX_VM``.
+
+Trust Domain Virtual Firmware (TDVF)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Trust Domain Virtual Firmware (TDVF) is required to provide TD services to boot
+TD Guest OS. TDVF needs to be copied to guest private memory and measured before
+the TD boots.
+
+KVM vcpu ioctl ``KVM_TDX_INIT_MEM_REGION`` can be used to populate the TDVF
+content into its private memory.
+
+Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash
+device and it actually works as RAM. "-bios" option is chosen to load TDVF.
+
+OVMF is the opensource firmware that implements the TDVF support. Thus the
+command line to specify and load TDVF is ``-bios OVMF.fd``
+
+Feature Configuration
+---------------------
+
+Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a TD are not
+under full control of VMM. VMM can only configure part of features of a TD on
+``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl.
+
+The configurable features have three types:
+
+- Attributes:
+ - PKS (bit 30) controls whether Supervisor Protection Keys is exposed to TD,
+ which determines related CPUID bit and CR4 bit;
+ - PERFMON (bit 63) controls whether PMU is exposed to TD.
+
+- XSAVE related features (XFAM):
+ XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS MSR. It
+ determines the set of extended features available for use by the guest TD.
+
+- CPUID features:
+ Only some bits of some CPUID leaves are directly configurable by VMM.
+
+What features can be configured is reported via TDX capabilities.
+
+TDX capabilities
+~~~~~~~~~~~~~~~~
+
+The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_CAPABILITIES``
+to get the TDX capabilities from KVM. It returns a data structure of
+``struct kvm_tdx_capabilities``, which tells the supported configuration of
+attributes, XFAM and CPUIDs.
+
+TD attributes
+~~~~~~~~~~~~~
+
+QEMU supports configuring raw 64-bit TD attributes directly via "attributes"
+property of "tdx-guest" object. Note, it's users' responsibility to provide a
+valid value because some bits may not supported by current QEMU or KVM yet.
+
+QEMU also supports the configuration of individual attribute bits that are
+supported by it, via properties of "tdx-guest" object.
+E.g., "sept-ve-disable" (bit 28).
+
+MSR based features
+~~~~~~~~~~~~~~~~~~
+
+Current KVM doesn't support MSR based feature (e.g., MSR_IA32_ARCH_CAPABILITIES)
+configuration for TDX, and it's a future work to enable it in QEMU when KVM adds
+support of it.
+
+Feature check
+~~~~~~~~~~~~~
+
+QEMU checks if the final (CPU) features, determined by given cpu model and
+explicit feature adjustment of "+featureA/-featureB", can be supported or not.
+It can produce feature not supported warning like
+
+ "warning: host doesn't support requested feature: CPUID.07H:EBX.intel-pt [bit 25]"
+
+It can also produce warning like
+
+ "warning: TDX forcibly sets the feature: CPUID.80000007H:EDX.invtsc [bit 8]"
+
+if the fixed-1 feature is requested to be disabled explicitly. This is newly
+added to QEMU for TDX because TDX has fixed-1 features that are forcibly enabled
+by TDX module and VMM cannot disable them.
+
+Launching a TD (TDX VM)
+-----------------------
+
+To launch a TD, the necessary command line options are tdx-guest object and
+split kernel-irqchip, as below:
+
+.. parsed-literal::
+
+ |qemu_system_x86| \\
+ -accel kvm \\
+ -cpu host \\
+ -object tdx-guest,id=tdx0 \\
+ -machine ...,confidential-guest-support=tdx0 \\
+ -bios OVMF.fd \\
+
+Restrictions
+------------
+
+ - kernel-irqchip must be split;
+
+ This is set by default for TDX guest if kernel-irqchip is left on its default
+ 'auto' setting.
+
+ - No readonly support for private memory;
+
+ - No SMM support: SMM support requires manipulating the guest register states
+ which is not allowed;
+
+Debugging
+---------
+
+Bit 0 of TD attributes, is DEBUG bit, which decides if the TD runs in off-TD
+debug mode. When in off-TD debug mode, TD's VCPU state and private memory are
+accessible via given SEAMCALLs. This requires KVM to expose APIs to invoke those
+SEAMCALLs and corresonponding QEMU change.
+
+It's targeted as future work.
+
+TD attestation
+--------------
+
+In TD guest, the attestation process is used to verify the TDX guest
+trustworthiness to other entities before provisioning secrets to the guest.
+
+TD attestation is initiated first by calling TDG.MR.REPORT inside TD to get the
+REPORT. Then the REPORT data needs to be converted into a remotely verifiable
+Quote by SGX Quoting Enclave (QE).
+
+It's a future work in QEMU to add support of TD attestation since it lacks
+support in current KVM.
+
+Live Migration
+--------------
+
+Future work.
+
+References
+----------
+
+- `TDX Homepage <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html>`__
+
+- `SGX QE <https://github.com/intel/SGXDataCenterAttestationPrimitives/tree/master/QuoteGeneration>`__
diff --git a/docs/system/target-i386.rst b/docs/system/target-i386.rst
index ab7af1a..43b09c7 100644
--- a/docs/system/target-i386.rst
+++ b/docs/system/target-i386.rst
@@ -31,6 +31,7 @@ Architectural features
i386/kvm-pv
i386/sgx
i386/amd-memory-encryption
+ i386/tdx
OS requirements
~~~~~~~~~~~~~~~