diff options
Diffstat (limited to 'docs/interop/qed_spec.rst')
-rw-r--r-- | docs/interop/qed_spec.rst | 219 |
1 files changed, 219 insertions, 0 deletions
diff --git a/docs/interop/qed_spec.rst b/docs/interop/qed_spec.rst new file mode 100644 index 0000000..cd6c7d9 --- /dev/null +++ b/docs/interop/qed_spec.rst @@ -0,0 +1,219 @@ +=================================== +QED Image File Format Specification +=================================== + +The file format looks like this:: + + +----------+----------+----------+-----+ + | cluster0 | cluster1 | cluster2 | ... | + +----------+----------+----------+-----+ + +The first cluster begins with the ``header``. The header contains information +about where regular clusters start; this allows the header to be extensible and +store extra information about the image file. A regular cluster may be +a ``data cluster``, an ``L2``, or an ``L1 table``. L1 and L2 tables are composed +of one or more contiguous clusters. + +Normally the file size will be a multiple of the cluster size. If the file size +is not a multiple, extra information after the last cluster may not be preserved +if data is written. Legitimate extra information should use space between the header +and the first regular cluster. + +All fields are little-endian. + +Header +------ + +:: + + Header { + uint32_t magic; /* QED\0 */ + + uint32_t cluster_size; /* in bytes */ + uint32_t table_size; /* for L1 and L2 tables, in clusters */ + uint32_t header_size; /* in clusters */ + + uint64_t features; /* format feature bits */ + uint64_t compat_features; /* compat feature bits */ + uint64_t autoclear_features; /* self-resetting feature bits */ + + uint64_t l1_table_offset; /* in bytes */ + uint64_t image_size; /* total logical image size, in bytes */ + + /* if (features & QED_F_BACKING_FILE) */ + uint32_t backing_filename_offset; /* in bytes from start of header */ + uint32_t backing_filename_size; /* in bytes */ + } + +Field descriptions: +~~~~~~~~~~~~~~~~~~~ + +- ``cluster_size`` must be a power of 2 in range [2^12, 2^26]. +- ``table_size`` must be a power of 2 in range [1, 16]. +- ``header_size`` is the number of clusters used by the header and any additional + information stored before regular clusters. +- ``features``, ``compat_features``, and ``autoclear_features`` are file format + extension bitmaps. They work as follows: + + - An image with unknown ``features`` bits enabled must not be opened. File format + changes that are not backwards-compatible must use ``features`` bits. + - An image with unknown ``compat_features`` bits enabled can be opened safely. + The unknown features are simply ignored and represent backwards-compatible + changes to the file format. + - An image with unknown ``autoclear_features`` bits enable can be opened safely + after clearing the unknown bits. This allows for backwards-compatible changes + to the file format which degrade gracefully and can be re-enabled again by a + new program later. +- ``l1_table_offset`` is the offset of the first byte of the L1 table in the image + file and must be a multiple of ``cluster_size``. +- ``image_size`` is the block device size seen by the guest and must be a multiple + of 512 bytes. +- ``backing_filename_offset`` and ``backing_filename_size`` describe a string in + (byte offset, byte size) form. It is not NUL-terminated and has no alignment constraints. + The string must be stored within the first ``header_size`` clusters. The backing filename + may be an absolute path or relative to the image file. + +Feature bits: +~~~~~~~~~~~~~ + +- ``QED_F_BACKING_FILE = 0x01``. The image uses a backing file. +- ``QED_F_NEED_CHECK = 0x02``. The image needs a consistency check before use. +- ``QED_F_BACKING_FORMAT_NO_PROBE = 0x04``. The backing file is a raw disk image + and no file format autodetection should be attempted. This should be used to + ensure that raw backing files are never detected as an image format if they happen + to contain magic constants. + +There are currently no defined ``compat_features`` or ``autoclear_features`` bits. + +Fields predicated on a feature bit are only used when that feature is set. +The fields always take up header space, regardless of whether or not the feature +bit is set. + +Tables +------ + +Tables provide the translation from logical offsets in the block device to cluster +offsets in the file. + +:: + + #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t)) + + Table { + uint64_t offsets[TABLE_NOFFSETS]; + } + +The tables are organized as follows:: + + +----------+ + | L1 table | + +----------+ + ,------' | '------. + +----------+ | +----------+ + | L2 table | ... | L2 table | + +----------+ +----------+ + ,------' | '------. + +----------+ | +----------+ + | Data | ... | Data | + +----------+ +----------+ + +A table is made up of one or more contiguous clusters. The ``table_size`` header +field determines table size for an image file. For example, ``cluster_size=64 KB`` +and ``table_size=4`` results in 256 KB tables. + +The logical image size must be less than or equal to the maximum possible size of +clusters rooted by the L1 table: + +.. code:: + + header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size + +L1, L2, and data cluster offsets must be aligned to ``header.cluster_size``. +The following offsets have special meanings: + +L2 table offsets +~~~~~~~~~~~~~~~~ + +- 0 - unallocated. The L2 table is not yet allocated. + +Data cluster offsets +~~~~~~~~~~~~~~~~~~~~ + +- 0 - unallocated. The data cluster is not yet allocated. +- 1 - zero. The data cluster contents are all zeroes and no cluster is allocated. + +Future format extensions may wish to store per-offset information. The least +significant 12 bits of an offset are reserved for this purpose and must be set +to zero. Image files with ``cluster_size`` > 2^12 will have more unused bits +which should also be zeroed. + +Unallocated L2 tables and data clusters +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Reads to an unallocated area of the image file access the backing file. If there +is no backing file, then zeroes are produced. The backing file may be smaller +than the image file and reads of unallocated areas beyond the end of the backing +file produce zeroes. + +Writes to an unallocated area cause a new data clusters to be allocated, and a new +L2 table if that is also unallocated. The new data cluster is populated with data +from the backing file (or zeroes if no backing file) and the data being written. + +Zero data clusters +~~~~~~~~~~~~~~~~~~ + +Zero data clusters are a space-efficient way of storing zeroed regions of the image. + +Reads to a zero data cluster produce zeroes. + +.. note:: + The difference between an unallocated and a zero data cluster is that zero data + clusters stop the reading of contents from the backing file. + +Writes to a zero data cluster cause a new data cluster to be allocated. The new +data cluster is populated with zeroes and the data being written. + +Logical offset translation +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Logical offsets are translated into cluster offsets as follows:: + + table_bits table_bits cluster_bits + <--------> <--------> <---------------> + +----------+----------+-----------------+ + | L1 index | L2 index | byte offset | + +----------+----------+-----------------+ + + Structure of a logical offset + + offset_mask = ~(cluster_size - 1) # mask for the image file byte offset + + def logical_to_cluster_offset(l1_index, l2_index, byte_offset): + l2_offset = l1_table[l1_index] + l2_table = load_table(l2_offset) + cluster_offset = l2_table[l2_index] & offset_mask + return cluster_offset + byte_offset + +Consistency checking +-------------------- + +This section is informational and included to provide background on the use +of the ``QED_F_NEED_CHECK features`` bit. + +The ``QED_F_NEED_CHECK`` bit is used to mark an image as dirty before starting +an operation that could leave the image in an inconsistent state if interrupted +by a crash or power failure. A dirty image must be checked on open because its +metadata may not be consistent. + +Consistency check includes the following invariants: + +- Each cluster is referenced once and only once. It is an inconsistency to have + a cluster referenced more than once by L1 or L2 tables. A cluster has been leaked + if it has no references. +- Offsets must be within the image file size and must be ``cluster_size`` aligned. +- Table offsets must at least ``table_size`` * ``cluster_size`` bytes from the end + of the image file so that there is space for the entire table. + +The consistency check process starts from ``l1_table_offset`` and scans all L2 tables. +After the check completes with no other errors besides leaks, the ``QED_F_NEED_CHECK`` +bit can be cleared and the image can be accessed. |