From bca5a8f4624f1859ad1a2b078e6b685456e6befd Mon Sep 17 00:00:00 2001 From: Vladimir Sementsov-Ogievskiy Date: Fri, 5 Feb 2016 11:58:33 +0300 Subject: spec: add qcow2 bitmaps extension specification The new feature for qcow2: storing bitmaps. This patch adds new header extension to qcow2 - Bitmaps Extension. It provides an ability to store virtual disk related bitmaps in a qcow2 image. For now there is only one type of such bitmaps: Dirty Tracking Bitmap, which just tracks virtual disk changes from some moment. Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file, should be stored in this qcow2 file. The size of each bitmap (considering its granularity) is equal to virtual disk size. Signed-off-by: Vladimir Sementsov-Ogievskiy Reviewed-by: Max Reitz Reviewed-by: John Snow Reviewed-by: Fam Zheng Signed-off-by: Kevin Wolf --- docs/specs/qcow2.txt | 221 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 220 insertions(+), 1 deletion(-) (limited to 'docs') diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt index f236d8c..80cdfd0 100644 --- a/docs/specs/qcow2.txt +++ b/docs/specs/qcow2.txt @@ -103,7 +103,18 @@ in the description of a field. write to an image with unknown auto-clear features if it clears the respective bits from this field first. - Bits 0-63: Reserved (set to 0) + Bit 0: Bitmaps extension bit + This bit indicates consistency for the bitmaps + extension data. + + It is an error if this bit is set without the + bitmaps extension present. + + If the bitmaps extension is present but this + bit is unset, the bitmaps extension data must be + considered inconsistent. + + Bits 1-63: Reserved (set to 0) 96 - 99: refcount_order Describes the width of a reference count block entry (width @@ -123,6 +134,7 @@ be stored. Each extension has a structure like the following: 0x00000000 - End of the header extension area 0xE2792ACA - Backing file format name 0x6803f857 - Feature name table + 0x23852875 - Bitmaps extension other - Unknown header extension, can be safely ignored @@ -166,6 +178,36 @@ the header extension data. Each entry look like this: terminated if it has full length) +== Bitmaps extension == + +The bitmaps extension is an optional header extension. It provides the ability +to store bitmaps related to a virtual disk. For now, there is only one bitmap +type: the dirty tracking bitmap, which tracks virtual disk changes from some +point in time. + +The data of the extension should be considered consistent only if the +corresponding auto-clear feature bit is set, see autoclear_features above. + +The fields of the bitmaps extension are: + + Byte 0 - 3: nb_bitmaps + The number of bitmaps contained in the image. Must be + greater than or equal to 1. + + Note: Qemu currently only supports up to 65535 bitmaps per + image. + + 4 - 7: Reserved, must be zero. + + 8 - 15: bitmap_directory_size + Size of the bitmap directory in bytes. It is the cumulative + size of all (nb_bitmaps) bitmap headers. + + 16 - 23: bitmap_directory_offset + Offset into the image file at which the bitmap directory + starts. Must be aligned to a cluster boundary. + + == Host cluster management == qcow2 manages the allocation of host clusters by maintaining a reference count @@ -360,3 +402,180 @@ Snapshot table entry: variable: Padding to round up the snapshot table entry size to the next multiple of 8. + + +== Bitmaps == + +As mentioned above, the bitmaps extension provides the ability to store bitmaps +related to a virtual disk. This section describes how these bitmaps are stored. + +All stored bitmaps are related to the virtual disk stored in the same image, so +each bitmap size is equal to the virtual disk size. + +Each bit of the bitmap is responsible for strictly defined range of the virtual +disk. For bit number bit_nr the corresponding range (in bytes) will be: + + [bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1] + +Granularity is a property of the concrete bitmap, see below. + + +=== Bitmap directory === + +Each bitmap saved in the image is described in a bitmap directory entry. The +bitmap directory is a contiguous area in the image file, whose starting offset +and length are given by the header extension fields bitmap_directory_offset and +bitmap_directory_size. The entries of the bitmap directory have variable +length, depending on the lengths of the bitmap name and extra data. These +entries are also called bitmap headers. + +Structure of a bitmap directory entry: + + Byte 0 - 7: bitmap_table_offset + Offset into the image file at which the bitmap table + (described below) for the bitmap starts. Must be aligned to + a cluster boundary. + + 8 - 11: bitmap_table_size + Number of entries in the bitmap table of the bitmap. + + 12 - 15: flags + Bit + 0: in_use + The bitmap was not saved correctly and may be + inconsistent. + + 1: auto + The bitmap must reflect all changes of the virtual + disk by any application that would write to this qcow2 + file (including writes, snapshot switching, etc.). The + type of this bitmap must be 'dirty tracking bitmap'. + + 2: extra_data_compatible + This flags is meaningful when the extra data is + unknown to the software (currently any extra data is + unknown to Qemu). + If it is set, the bitmap may be used as expected, extra + data must be left as is. + If it is not set, the bitmap must not be used, but + both it and its extra data be left as is. + + Bits 3 - 31 are reserved and must be 0. + + 16: type + This field describes the sort of the bitmap. + Values: + 1: Dirty tracking bitmap + + Values 0, 2 - 255 are reserved. + + 17: granularity_bits + Granularity bits. Valid values: 0 - 63. + + Note: Qemu currently doesn't support granularity_bits + greater than 31. + + Granularity is calculated as + granularity = 1 << granularity_bits + + A bitmap's granularity is how many bytes of the image + accounts for one bit of the bitmap. + + 18 - 19: name_size + Size of the bitmap name. Must be non-zero. + + Note: Qemu currently doesn't support values greater than + 1023. + + 20 - 23: extra_data_size + Size of type-specific extra data. + + For now, as no extra data is defined, extra_data_size is + reserved and should be zero. If it is non-zero the + behavior is defined by extra_data_compatible flag. + + variable: extra_data + Extra data for the bitmap, occupying extra_data_size bytes. + Extra data must never contain references to clusters or in + some other way allocate additional clusters. + + variable: name + The name of the bitmap (not null terminated), occupying + name_size bytes. Must be unique among all bitmap names + within the bitmaps extension. + + variable: Padding to round up the bitmap directory entry size to the + next multiple of 8. All bytes of the padding must be zero. + + +=== Bitmap table === + +Each bitmap is stored using a one-level structure (as opposed to two-level +structures like for refcounts and guest clusters mapping) for the mapping of +bitmap data to host clusters. This structure is called the bitmap table. + +Each bitmap table has a variable size (stored in the bitmap directory entry) +and may use multiple clusters, however, it must be contiguous in the image +file. + +Structure of a bitmap table entry: + + Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. + If bits 9 - 55 are zero: + 0: Cluster should be read as all zeros. + 1: Cluster should be read as all ones. + + 1 - 8: Reserved and must be zero. + + 9 - 55: Bits 9 - 55 of the host cluster offset. Must be aligned to + a cluster boundary. If the offset is 0, the cluster is + unallocated; in that case, bit 0 determines how this + cluster should be treated during reads. + + 56 - 63: Reserved and must be zero. + + +=== Bitmap data === + +As noted above, bitmap data is stored in separate clusters, described by the +bitmap table. Given an offset (in bytes) into the bitmap data, the offset into +the image file can be obtained as follows: + + image_offset(bitmap_data_offset) = + bitmap_table[bitmap_data_offset / cluster_size] + + (bitmap_data_offset % cluster_size) + +This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see +above). + +Given an offset byte_nr into the virtual disk and the bitmap's granularity, the +bit offset into the image file to the corresponding bit of the bitmap can be +calculated like this: + + bit_offset(byte_nr) = + image_offset(byte_nr / granularity / 8) * 8 + + (byte_nr / granularity) % 8 + +If the size of the bitmap data is not a multiple of the cluster size then the +last cluster of the bitmap data contains some unused tail bits. These bits must +be zero. + + +=== Dirty tracking bitmaps === + +Bitmaps with 'type' field equal to one are dirty tracking bitmaps. + +When the virtual disk is in use dirty tracking bitmap may be 'enabled' or +'disabled'. While the bitmap is 'enabled', all writes to the virtual disk +should be reflected in the bitmap. A set bit in the bitmap means that the +corresponding range of the virtual disk (see above) was written to while the +bitmap was 'enabled'. An unset bit means that this range was not written to. + +The software doesn't have to sync the bitmap in the image file with its +representation in RAM after each write. Flag 'in_use' should be set while the +bitmap is not synced. + +In the image file the 'enabled' state is reflected by the 'auto' flag. If this +flag is set, the software must consider the bitmap as 'enabled' and start +tracking virtual disk changes to this bitmap from the first write to the +virtual disk. If this flag is not set then the bitmap is disabled. -- cgit v1.1 From 1ffad77cde4ca78696c60dc604042a5d06b23cc3 Mon Sep 17 00:00:00 2001 From: Alberto Garcia Date: Thu, 18 Feb 2016 12:27:09 +0200 Subject: docs: Document the throttling infrastructure Signed-off-by: Alberto Garcia Signed-off-by: Kevin Wolf --- docs/throttle.txt | 252 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 252 insertions(+) create mode 100644 docs/throttle.txt (limited to 'docs') diff --git a/docs/throttle.txt b/docs/throttle.txt new file mode 100644 index 0000000..28204e4 --- /dev/null +++ b/docs/throttle.txt @@ -0,0 +1,252 @@ +The QEMU throttling infrastructure +================================== +Copyright (C) 2016 Igalia, S.L. +Author: Alberto Garcia + +This work is licensed under the terms of the GNU GPL, version 2 or +later. See the COPYING file in the top-level directory. + +Introduction +------------ +QEMU includes a throttling module that can be used to set limits to +I/O operations. The code itself is generic and independent of the I/O +units, but it is currenly used to limit the number of bytes per second +and operations per second (IOPS) when performing disk I/O. + +This document explains how to use the throttling code in QEMU, and how +it works internally. The implementation is in throttle.c. + + +Using throttling to limit disk I/O +---------------------------------- +Two aspects of the disk I/O can be limited: the number of bytes per +second and the number of operations per second (IOPS). For each one of +them the user can set a global limit or separate limits for read and +write operations. This gives us a total of six different parameters. + +I/O limits can be set using the throttling.* parameters of -drive, or +using the QMP 'block_set_io_throttle' command. These are the names of +the parameters for both cases: + +|-----------------------+-----------------------| +| -drive | block_set_io_throttle | +|-----------------------+-----------------------| +| throttling.iops-total | iops | +| throttling.iops-read | iops_rd | +| throttling.iops-write | iops_wr | +| throttling.bps-total | bps | +| throttling.bps-read | bps_rd | +| throttling.bps-write | bps_wr | +|-----------------------+-----------------------| + +It is possible to set limits for both IOPS and bps and the same time, +and for each case we can decide whether to have separate read and +write limits or not, but note that if iops-total is set then neither +iops-read nor iops-write can be set. The same applies to bps-total and +bps-read/write. + +The default value of these parameters is 0, and it means 'unlimited'. + +In its most basic usage, the user can add a drive to QEMU with a limit +of 100 IOPS with the following -drive line: + + -drive file=hd0.qcow2,throttling.iops-total=100 + +We can do the same using QMP. In this case all these parameters are +mandatory, so we must set to 0 the ones that we don't want to limit: + + { "execute": "block_set_io_throttle", + "arguments": { + "device": "virtio0", + "iops": 100, + "iops_rd": 0, + "iops_wr": 0, + "bps": 0, + "bps_rd": 0, + "bps_wr": 0 + } + } + + +I/O bursts +---------- +In addition to the basic limits we have just seen, QEMU allows the +user to do bursts of I/O for a configurable amount of time. A burst is +an amount of I/O that can exceed the basic limit. Bursts are useful to +allow better performance when there are peaks of activity (the OS +boots, a service needs to be restarted) while keeping the average +limits lower the rest of the time. + +Two parameters control bursts: their length and the maximum amount of +I/O they allow. These two can be configured separately for each one of +the six basic parameters described in the previous section, but in +this section we'll use 'iops-total' as an example. + +The I/O limit during bursts is set using 'iops-total-max', and the +maximum length (in seconds) is set with 'iops-total-max-length'. So if +we want to configure a drive with a basic limit of 100 IOPS and allow +bursts of 2000 IOPS for 60 seconds, we would do it like this (the line +is split for clarity): + + -drive file=hd0.qcow2, + throttling.iops-total=100, + throttling.iops-total-max=2000, + throttling.iops-total-max-length=60 + +Or, with QMP: + + { "execute": "block_set_io_throttle", + "arguments": { + "device": "virtio0", + "iops": 100, + "iops_rd": 0, + "iops_wr": 0, + "bps": 0, + "bps_rd": 0, + "bps_wr": 0, + "iops_max": 2000, + "iops_max_length": 60, + } + } + +With this, the user can perform I/O on hd0.qcow2 at a rate of 2000 +IOPS for 1 minute before it's throttled down to 100 IOPS. + +The user will be able to do bursts again if there's a sufficiently +long period of time with unused I/O (see below for details). + +The default value for 'iops-total-max' is 0 and it means that bursts +are not allowed. 'iops-total-max-length' can only be set if +'iops-total-max' is set as well, and its default value is 1 second. + +Here's the complete list of parameters for configuring bursts: + +|----------------------------------+-----------------------| +| -drive | block_set_io_throttle | +|----------------------------------+-----------------------| +| throttling.iops-total-max | iops_max | +| throttling.iops-total-max-length | iops_max_length | +| throttling.iops-read-max | iops_rd_max | +| throttling.iops-read-max-length | iops_rd_max_length | +| throttling.iops-write-max | iops_wr_max | +| throttling.iops-write-max-length | iops_wr_max_length | +| throttling.bps-total-max | bps_max | +| throttling.bps-total-max-length | bps_max_length | +| throttling.bps-read-max | bps_rd_max | +| throttling.bps-read-max-length | bps_rd_max_length | +| throttling.bps-write-max | bps_wr_max | +| throttling.bps-write-max-length | bps_wr_max_length | +|----------------------------------+-----------------------| + + +Controlling the size of I/O operations +-------------------------------------- +When applying IOPS limits all I/O operations are treated equally +regardless of their size. This means that the user can take advantage +of this in order to circumvent the limits and submit one huge I/O +request instead of several smaller ones. + +QEMU provides a setting called throttling.iops-size to prevent this +from happening. This setting specifies the size (in bytes) of an I/O +request for accounting purposes. Larger requests will be counted +proportionally to this size. + +For example, if iops-size is set to 4096 then an 8KB request will be +counted as two, and a 6KB request will be counted as one and a +half. This only applies to requests larger than iops-size: smaller +requests will be always counted as one, no matter their size. + +The default value of iops-size is 0 and it means that the size of the +requests is never taken into account when applying IOPS limits. + + +Applying I/O limits to groups of disks +-------------------------------------- +In all the examples so far we have seen how to apply limits to the I/O +performed on individual drives, but QEMU allows grouping drives so +they all share the same limits. + +The way it works is that each drive with I/O limits is assigned to a +group named using the throttling.group parameter. If this parameter is +not specified, then the device name (i.e. 'virtio0', 'ide0-hd0') will +be used as the group name. + +Limits set using the throttling.* parameters discussed earlier in this +document apply to the combined I/O of all members of a group. + +Consider this example: + + -drive file=hd1.qcow2,throttling.iops-total=6000,throttling.group=foo + -drive file=hd2.qcow2,throttling.iops-total=6000,throttling.group=foo + -drive file=hd3.qcow2,throttling.iops-total=3000,throttling.group=bar + -drive file=hd4.qcow2,throttling.iops-total=6000,throttling.group=foo + -drive file=hd5.qcow2,throttling.iops-total=3000,throttling.group=bar + -drive file=hd6.qcow2,throttling.iops-total=5000 + +Here hd1, hd2 and hd4 are all members of a group named 'foo' with a +combined IOPS limit of 6000, and hd3 and hd5 are members of 'bar'. hd6 +is left alone (technically it is part of a 1-member group). + +Limits are applied in a round-robin fashion so if there are concurrent +I/O requests on several drives of the same group they will be +distributed evenly. + +When I/O limits are applied to an existing drive using the QMP command +'block_set_io_throttle', the following things need to be taken into +account: + + - I/O limits are shared within the same group, so new values will + affect all members and overwrite the previous settings. In other + words: if different limits are applied to members of the same + group, the last one wins. + + - If 'group' is unset it is assumed to be the current group of that + drive. If the drive is not in a group yet, it will be added to a + group named after the device name. + + - If 'group' is set then the drive will be moved to that group if + it was member of a different one. In this case the limits + specified in the parameters will be applied to the new group + only. + + - I/O limits can be disabled by setting all of them to 0. In this + case the device will be removed from its group and the rest of + its members will not be affected. The 'group' parameter is + ignored. + + +The Leaky Bucket algorithm +-------------------------- +I/O limits in QEMU are implemented using the leaky bucket algorithm +(specifically the "Leaky bucket as a meter" variant). + +This algorithm uses the analogy of a bucket that leaks water +constantly. The water that gets into the bucket represents the I/O +that has been performed, and no more I/O is allowed once the bucket is +full. + +To see the way this corresponds to the throttling parameters in QEMU, +consider the following values: + + iops-total=100 + iops-total-max=2000 + iops-total-max-length=60 + + - Water leaks from the bucket at a rate of 100 IOPS. + - Water can be added to the bucket at a rate of 2000 IOPS. + - The size of the bucket is 2000 x 60 = 120000 + - If 'iops-total-max-length' is unset then the bucket size is 100. + +The bucket is initially empty, therefore water can be added until it's +full at a rate of 2000 IOPS (the burst rate). Once the bucket is full +we can only add as much water as it leaks, therefore the I/O rate is +reduced to 100 IOPS. If we add less water than it leaks then the +bucket will start to empty, allowing for bursts again. + +Note that since water is leaking from the bucket even during bursts, +it will take a bit more than 60 seconds at 2000 IOPS to fill it +up. After those 60 seconds the bucket will have leaked 60 x 100 = +6000, allowing for 3 more seconds of I/O at 2000 IOPS. + +Also, due to the way the algorithm works, longer burst can be done at +a lower I/O rate, e.g. 1000 IOPS during 120 seconds. -- cgit v1.1