diff options
Diffstat (limited to 'docs/system/i386')
-rw-r--r-- | docs/system/i386/hyperv.rst | 43 | ||||
-rw-r--r-- | docs/system/i386/nitro-enclave.rst | 78 | ||||
-rw-r--r-- | docs/system/i386/tdx.rst | 161 |
3 files changed, 278 insertions, 4 deletions
diff --git a/docs/system/i386/hyperv.rst b/docs/system/i386/hyperv.rst index 2505dc4..1c1de77 100644 --- a/docs/system/i386/hyperv.rst +++ b/docs/system/i386/hyperv.rst @@ -262,14 +262,19 @@ Supplementary features ``hv-passthrough`` In some cases (e.g. during development) it may make sense to use QEMU in 'pass-through' mode and give Windows guests all enlightenments currently - supported by KVM. This pass-through mode is enabled by "hv-passthrough" CPU - flag. + supported by KVM. Note: ``hv-passthrough`` flag only enables enlightenments which are known to QEMU (have corresponding 'hv-' flag) and copies ``hv-spinlocks`` and ``hv-vendor-id`` values from KVM to QEMU. ``hv-passthrough`` overrides all other 'hv-' settings on - the command line. Also, enabling this flag effectively prevents migration as the - list of enabled enlightenments may differ between target and destination hosts. + the command line. + + Note: ``hv-passthrough`` does not enable ``hv-syndbg`` which can prevent certain + Windows guests from booting when used without proper configuration. If needed, + ``hv-syndbg`` can be enabled additionally. + + Note: ``hv-passthrough`` effectively prevents migration as the list of enabled + enlightenments may differ between target and destination hosts. ``hv-enforce-cpuid`` By default, KVM allows the guest to use all currently supported Hyper-V @@ -278,6 +283,36 @@ Supplementary features feature alters this behavior and only allows the guest to use exposed Hyper-V enlightenments. +Recommendations +--------------- + +To achieve the best performance of Windows and Hyper-V guests and unless there +are any specific requirements (e.g. migration to older QEMU/KVM versions, +emulating specific Hyper-V version, ...), it is recommended to enable all +currently implemented Hyper-V enlightenments with the following exceptions: + +- ``hv-syndbg``, ``hv-passthrough``, ``hv-enforce-cpuid`` should not be enabled + in production configurations as these are debugging/development features. +- ``hv-reset`` can be avoided as modern Hyper-V versions don't expose it. +- ``hv-evmcs`` can (and should) be enabled on Intel CPUs only. While the feature + is only used in nested configurations (Hyper-V, WSL2), enabling it for regular + Windows guests should not have any negative effects. +- ``hv-no-nonarch-coresharing`` must only be enabled if vCPUs are properly pinned + so no non-architectural core sharing is possible. +- ``hv-vendor-id``, ``hv-version-id-build``, ``hv-version-id-major``, + ``hv-version-id-minor``, ``hv-version-id-spack``, ``hv-version-id-sbranch``, + ``hv-version-id-snumber`` can be left unchanged, guests are not supposed to + behave differently when different Hyper-V version is presented to them. +- ``hv-crash`` must only be enabled if the crash information is consumed via + QAPI by higher levels of the virtualization stack. Enabling this feature + effectively prevents Windows from creating dumps upon crashes. +- ``hv-reenlightenment`` can only be used on hardware which supports TSC + scaling or when guest migration is not needed. +- ``hv-spinlocks`` should be set to e.g. 0xfff when host CPUs are overcommited + (meaning there are other scheduled tasks or guests) and can be left unchanged + from the default value (0xffffffff) otherwise. +- ``hv-avic``/``hv-apicv`` should not be enabled if the hardware does not + support APIC virtualization (Intel APICv, AMD AVIC). Useful links ------------ diff --git a/docs/system/i386/nitro-enclave.rst b/docs/system/i386/nitro-enclave.rst new file mode 100644 index 0000000..7317f54 --- /dev/null +++ b/docs/system/i386/nitro-enclave.rst @@ -0,0 +1,78 @@ +'nitro-enclave' virtual machine (``nitro-enclave``) +=================================================== + +``nitro-enclave`` is a machine type which emulates an *AWS nitro enclave* +virtual machine. `AWS nitro enclaves`_ is an Amazon EC2 feature that allows +creating isolated execution environments, called enclaves, from Amazon EC2 +instances which are used for processing highly sensitive data. Enclaves have +no persistent storage and no external networking. The enclave VMs are based +on Firecracker microvm with a vhost-vsock device for communication with the +parent EC2 instance that spawned it and a Nitro Secure Module (NSM) device +for cryptographic attestation. The parent instance VM always has CID 3 while +the enclave VM gets a dynamic CID. Enclaves use an EIF (`Enclave Image Format`_) +file which contains the necessary kernel, cmdline and ramdisk(s) to boot. + +In QEMU, ``nitro-enclave`` is a machine type based on ``microvm`` similar to how +AWS nitro enclaves look like a `Firecracker`_ microvm. This is useful for +local testing of EIF files using QEMU instead of running real AWS Nitro Enclaves +which can be difficult for debugging due to its roots in security. The vsock +device emulation is done using vhost-user-vsock which means another process that +can do the userspace emulation, like `vhost-device-vsock`_ from rust-vmm crate, +must be run alongside nitro-enclave for the vsock communication to work. + +``libcbor`` and ``gnutls`` are required dependencies for nitro-enclave machine +support to be added when building QEMU from source. + +.. _AWS nitro enclaves: https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html +.. _Enclave Image Format: https://github.com/aws/aws-nitro-enclaves-image-format +.. _vhost-device-vsock: https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock +.. _Firecracker: https://firecracker-microvm.github.io + +Using the nitro-enclave machine type +------------------------------------ + +Machine-specific options +~~~~~~~~~~~~~~~~~~~~~~~~ + +It supports the following machine-specific options: + +- nitro-enclave.vsock=string (required) (Id of the chardev from '-chardev' option that vhost-user-vsock device will use) +- nitro-enclave.id=string (optional) (Set enclave identifier) +- nitro-enclave.parent-role=string (optional) (Set parent instance IAM role ARN) +- nitro-enclave.parent-id=string (optional) (Set parent instance identifier) + + +Running a nitro-enclave VM +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +First, run `vhost-device-vsock`__ (or a similar tool that supports vhost-user-vsock). +The forward-cid option below with value 1 forwards all connections from the enclave +VM to the host machine and the forward-listen (port numbers separated by '+') is used +for forwarding connections from the host machine to the enclave VM:: + + $ vhost-device-vsock \ + --vm guest-cid=4,forward-cid=1,forward-listen=9001+9002,socket=/tmp/vhost4.socket + +__ https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock#using-the-vsock-backend + +Now run the necessary applications on the host machine so that the nitro-enclave VM +applications' vsock communication works. For example, the nitro-enclave VM's init +process connects to CID 3 and sends a single byte hello heartbeat (0xB7) to let the +parent VM know that it booted expecting a heartbeat (0xB7) response. So you must run +a AF_VSOCK server on the host machine that listens on port 9000 and sends the heartbeat +after it receives the heartbeat for enclave VM to boot successfully. You should run all +the applications on the host machine that would typically be running in the parent EC2 +VM for successful communication with the enclave VM. + +Then run the nitro-enclave VM using the following command where ``hello.eif`` is +an EIF file you would use to spawn a real AWS nitro enclave virtual machine:: + + $ qemu-system-x86_64 -M nitro-enclave,vsock=c,id=hello-world \ + -kernel hello-world.eif -nographic -m 4G --enable-kvm -cpu host \ + -chardev socket,id=c,path=/tmp/vhost4.socket + +In this example, the nitro-enclave VM has CID 4. If there are applications that +connect to the enclave VM, run them on the host machine after enclave VM starts. +You need to modify the applications to connect to CID 1 (instead of the enclave +VM's CID) and use the forward-listen (e.g., 9001+9002) option of vhost-device-vsock +to forward the ports they connect to. diff --git a/docs/system/i386/tdx.rst b/docs/system/i386/tdx.rst new file mode 100644 index 0000000..8131750 --- /dev/null +++ b/docs/system/i386/tdx.rst @@ -0,0 +1,161 @@ +Intel Trusted Domain eXtension (TDX) +==================================== + +Intel Trusted Domain eXtensions (TDX) refers to an Intel technology that extends +Virtual Machine Extensions (VMX) and Multi-Key Total Memory Encryption (MKTME) +with a new kind of virtual machine guest called a Trust Domain (TD). A TD runs +in a CPU mode that is designed to protect the confidentiality of its memory +contents and its CPU state from any other software, including the hosting +Virtual Machine Monitor (VMM), unless explicitly shared by the TD itself. + +Prerequisites +------------- + +To run TD, the physical machine needs to have TDX module loaded and initialized +while KVM hypervisor has TDX support and has TDX enabled. If those requirements +are met, the ``KVM_CAP_VM_TYPES`` will report the support of ``KVM_X86_TDX_VM``. + +Trust Domain Virtual Firmware (TDVF) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Trust Domain Virtual Firmware (TDVF) is required to provide TD services to boot +TD Guest OS. TDVF needs to be copied to guest private memory and measured before +the TD boots. + +KVM vcpu ioctl ``KVM_TDX_INIT_MEM_REGION`` can be used to populate the TDVF +content into its private memory. + +Since TDX doesn't support readonly memslot, TDVF cannot be mapped as pflash +device and it actually works as RAM. "-bios" option is chosen to load TDVF. + +OVMF is the opensource firmware that implements the TDVF support. Thus the +command line to specify and load TDVF is ``-bios OVMF.fd`` + +Feature Configuration +--------------------- + +Unlike non-TDX VM, the CPU features (enumerated by CPU or MSR) of a TD are not +under full control of VMM. VMM can only configure part of features of a TD on +``KVM_TDX_INIT_VM`` command of VM scope ``MEMORY_ENCRYPT_OP`` ioctl. + +The configurable features have three types: + +- Attributes: + - PKS (bit 30) controls whether Supervisor Protection Keys is exposed to TD, + which determines related CPUID bit and CR4 bit; + - PERFMON (bit 63) controls whether PMU is exposed to TD. + +- XSAVE related features (XFAM): + XFAM is a 64b mask, which has the same format as XCR0 or IA32_XSS MSR. It + determines the set of extended features available for use by the guest TD. + +- CPUID features: + Only some bits of some CPUID leaves are directly configurable by VMM. + +What features can be configured is reported via TDX capabilities. + +TDX capabilities +~~~~~~~~~~~~~~~~ + +The VM scope ``MEMORY_ENCRYPT_OP`` ioctl provides command ``KVM_TDX_CAPABILITIES`` +to get the TDX capabilities from KVM. It returns a data structure of +``struct kvm_tdx_capabilities``, which tells the supported configuration of +attributes, XFAM and CPUIDs. + +TD attributes +~~~~~~~~~~~~~ + +QEMU supports configuring raw 64-bit TD attributes directly via "attributes" +property of "tdx-guest" object. Note, it's users' responsibility to provide a +valid value because some bits may not supported by current QEMU or KVM yet. + +QEMU also supports the configuration of individual attribute bits that are +supported by it, via properties of "tdx-guest" object. +E.g., "sept-ve-disable" (bit 28). + +MSR based features +~~~~~~~~~~~~~~~~~~ + +Current KVM doesn't support MSR based feature (e.g., MSR_IA32_ARCH_CAPABILITIES) +configuration for TDX, and it's a future work to enable it in QEMU when KVM adds +support of it. + +Feature check +~~~~~~~~~~~~~ + +QEMU checks if the final (CPU) features, determined by given cpu model and +explicit feature adjustment of "+featureA/-featureB", can be supported or not. +It can produce feature not supported warning like + + "warning: host doesn't support requested feature: CPUID.07H:EBX.intel-pt [bit 25]" + +It can also produce warning like + + "warning: TDX forcibly sets the feature: CPUID.80000007H:EDX.invtsc [bit 8]" + +if the fixed-1 feature is requested to be disabled explicitly. This is newly +added to QEMU for TDX because TDX has fixed-1 features that are forcibly enabled +by TDX module and VMM cannot disable them. + +Launching a TD (TDX VM) +----------------------- + +To launch a TD, the necessary command line options are tdx-guest object and +split kernel-irqchip, as below: + +.. parsed-literal:: + + |qemu_system_x86| \\ + -accel kvm \\ + -cpu host \\ + -object tdx-guest,id=tdx0 \\ + -machine ...,confidential-guest-support=tdx0 \\ + -bios OVMF.fd \\ + +Restrictions +------------ + + - kernel-irqchip must be split; + + This is set by default for TDX guest if kernel-irqchip is left on its default + 'auto' setting. + + - No readonly support for private memory; + + - No SMM support: SMM support requires manipulating the guest register states + which is not allowed; + +Debugging +--------- + +Bit 0 of TD attributes, is DEBUG bit, which decides if the TD runs in off-TD +debug mode. When in off-TD debug mode, TD's VCPU state and private memory are +accessible via given SEAMCALLs. This requires KVM to expose APIs to invoke those +SEAMCALLs and corresonponding QEMU change. + +It's targeted as future work. + +TD attestation +-------------- + +In TD guest, the attestation process is used to verify the TDX guest +trustworthiness to other entities before provisioning secrets to the guest. + +TD attestation is initiated first by calling TDG.MR.REPORT inside TD to get the +REPORT. Then the REPORT data needs to be converted into a remotely verifiable +Quote by SGX Quoting Enclave (QE). + +It's a future work in QEMU to add support of TD attestation since it lacks +support in current KVM. + +Live Migration +-------------- + +Future work. + +References +---------- + +- `TDX Homepage <https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html>`__ + +- `SGX QE <https://github.com/intel/SGXDataCenterAttestationPrimitives/tree/master/QuoteGeneration>`__ |