aboutsummaryrefslogtreecommitdiff
path: root/docs/interop/qed_spec.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/interop/qed_spec.rst')
-rw-r--r--docs/interop/qed_spec.rst219
1 files changed, 219 insertions, 0 deletions
diff --git a/docs/interop/qed_spec.rst b/docs/interop/qed_spec.rst
new file mode 100644
index 0000000..cd6c7d9
--- /dev/null
+++ b/docs/interop/qed_spec.rst
@@ -0,0 +1,219 @@
+===================================
+QED Image File Format Specification
+===================================
+
+The file format looks like this::
+
+ +----------+----------+----------+-----+
+ | cluster0 | cluster1 | cluster2 | ... |
+ +----------+----------+----------+-----+
+
+The first cluster begins with the ``header``. The header contains information
+about where regular clusters start; this allows the header to be extensible and
+store extra information about the image file. A regular cluster may be
+a ``data cluster``, an ``L2``, or an ``L1 table``. L1 and L2 tables are composed
+of one or more contiguous clusters.
+
+Normally the file size will be a multiple of the cluster size. If the file size
+is not a multiple, extra information after the last cluster may not be preserved
+if data is written. Legitimate extra information should use space between the header
+and the first regular cluster.
+
+All fields are little-endian.
+
+Header
+------
+
+::
+
+ Header {
+ uint32_t magic; /* QED\0 */
+
+ uint32_t cluster_size; /* in bytes */
+ uint32_t table_size; /* for L1 and L2 tables, in clusters */
+ uint32_t header_size; /* in clusters */
+
+ uint64_t features; /* format feature bits */
+ uint64_t compat_features; /* compat feature bits */
+ uint64_t autoclear_features; /* self-resetting feature bits */
+
+ uint64_t l1_table_offset; /* in bytes */
+ uint64_t image_size; /* total logical image size, in bytes */
+
+ /* if (features & QED_F_BACKING_FILE) */
+ uint32_t backing_filename_offset; /* in bytes from start of header */
+ uint32_t backing_filename_size; /* in bytes */
+ }
+
+Field descriptions:
+~~~~~~~~~~~~~~~~~~~
+
+- ``cluster_size`` must be a power of 2 in range [2^12, 2^26].
+- ``table_size`` must be a power of 2 in range [1, 16].
+- ``header_size`` is the number of clusters used by the header and any additional
+ information stored before regular clusters.
+- ``features``, ``compat_features``, and ``autoclear_features`` are file format
+ extension bitmaps. They work as follows:
+
+ - An image with unknown ``features`` bits enabled must not be opened. File format
+ changes that are not backwards-compatible must use ``features`` bits.
+ - An image with unknown ``compat_features`` bits enabled can be opened safely.
+ The unknown features are simply ignored and represent backwards-compatible
+ changes to the file format.
+ - An image with unknown ``autoclear_features`` bits enable can be opened safely
+ after clearing the unknown bits. This allows for backwards-compatible changes
+ to the file format which degrade gracefully and can be re-enabled again by a
+ new program later.
+- ``l1_table_offset`` is the offset of the first byte of the L1 table in the image
+ file and must be a multiple of ``cluster_size``.
+- ``image_size`` is the block device size seen by the guest and must be a multiple
+ of 512 bytes.
+- ``backing_filename_offset`` and ``backing_filename_size`` describe a string in
+ (byte offset, byte size) form. It is not NUL-terminated and has no alignment constraints.
+ The string must be stored within the first ``header_size`` clusters. The backing filename
+ may be an absolute path or relative to the image file.
+
+Feature bits:
+~~~~~~~~~~~~~
+
+- ``QED_F_BACKING_FILE = 0x01``. The image uses a backing file.
+- ``QED_F_NEED_CHECK = 0x02``. The image needs a consistency check before use.
+- ``QED_F_BACKING_FORMAT_NO_PROBE = 0x04``. The backing file is a raw disk image
+ and no file format autodetection should be attempted. This should be used to
+ ensure that raw backing files are never detected as an image format if they happen
+ to contain magic constants.
+
+There are currently no defined ``compat_features`` or ``autoclear_features`` bits.
+
+Fields predicated on a feature bit are only used when that feature is set.
+The fields always take up header space, regardless of whether or not the feature
+bit is set.
+
+Tables
+------
+
+Tables provide the translation from logical offsets in the block device to cluster
+offsets in the file.
+
+::
+
+ #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
+
+ Table {
+ uint64_t offsets[TABLE_NOFFSETS];
+ }
+
+The tables are organized as follows::
+
+ +----------+
+ | L1 table |
+ +----------+
+ ,------' | '------.
+ +----------+ | +----------+
+ | L2 table | ... | L2 table |
+ +----------+ +----------+
+ ,------' | '------.
+ +----------+ | +----------+
+ | Data | ... | Data |
+ +----------+ +----------+
+
+A table is made up of one or more contiguous clusters. The ``table_size`` header
+field determines table size for an image file. For example, ``cluster_size=64 KB``
+and ``table_size=4`` results in 256 KB tables.
+
+The logical image size must be less than or equal to the maximum possible size of
+clusters rooted by the L1 table:
+
+.. code::
+
+ header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
+
+L1, L2, and data cluster offsets must be aligned to ``header.cluster_size``.
+The following offsets have special meanings:
+
+L2 table offsets
+~~~~~~~~~~~~~~~~
+
+- 0 - unallocated. The L2 table is not yet allocated.
+
+Data cluster offsets
+~~~~~~~~~~~~~~~~~~~~
+
+- 0 - unallocated. The data cluster is not yet allocated.
+- 1 - zero. The data cluster contents are all zeroes and no cluster is allocated.
+
+Future format extensions may wish to store per-offset information. The least
+significant 12 bits of an offset are reserved for this purpose and must be set
+to zero. Image files with ``cluster_size`` > 2^12 will have more unused bits
+which should also be zeroed.
+
+Unallocated L2 tables and data clusters
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Reads to an unallocated area of the image file access the backing file. If there
+is no backing file, then zeroes are produced. The backing file may be smaller
+than the image file and reads of unallocated areas beyond the end of the backing
+file produce zeroes.
+
+Writes to an unallocated area cause a new data clusters to be allocated, and a new
+L2 table if that is also unallocated. The new data cluster is populated with data
+from the backing file (or zeroes if no backing file) and the data being written.
+
+Zero data clusters
+~~~~~~~~~~~~~~~~~~
+
+Zero data clusters are a space-efficient way of storing zeroed regions of the image.
+
+Reads to a zero data cluster produce zeroes.
+
+.. note::
+ The difference between an unallocated and a zero data cluster is that zero data
+ clusters stop the reading of contents from the backing file.
+
+Writes to a zero data cluster cause a new data cluster to be allocated. The new
+data cluster is populated with zeroes and the data being written.
+
+Logical offset translation
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Logical offsets are translated into cluster offsets as follows::
+
+ table_bits table_bits cluster_bits
+ <--------> <--------> <--------------->
+ +----------+----------+-----------------+
+ | L1 index | L2 index | byte offset |
+ +----------+----------+-----------------+
+
+ Structure of a logical offset
+
+ offset_mask = ~(cluster_size - 1) # mask for the image file byte offset
+
+ def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
+ l2_offset = l1_table[l1_index]
+ l2_table = load_table(l2_offset)
+ cluster_offset = l2_table[l2_index] & offset_mask
+ return cluster_offset + byte_offset
+
+Consistency checking
+--------------------
+
+This section is informational and included to provide background on the use
+of the ``QED_F_NEED_CHECK features`` bit.
+
+The ``QED_F_NEED_CHECK`` bit is used to mark an image as dirty before starting
+an operation that could leave the image in an inconsistent state if interrupted
+by a crash or power failure. A dirty image must be checked on open because its
+metadata may not be consistent.
+
+Consistency check includes the following invariants:
+
+- Each cluster is referenced once and only once. It is an inconsistency to have
+ a cluster referenced more than once by L1 or L2 tables. A cluster has been leaked
+ if it has no references.
+- Offsets must be within the image file size and must be ``cluster_size`` aligned.
+- Table offsets must at least ``table_size`` * ``cluster_size`` bytes from the end
+ of the image file so that there is space for the entire table.
+
+The consistency check process starts from ``l1_table_offset`` and scans all L2 tables.
+After the check completes with no other errors besides leaks, the ``QED_F_NEED_CHECK``
+bit can be cleared and the image can be accessed.