aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-11-02linux-headers: Add unistd_64.hBibo Mao1-0/+6
since 6.11, unistd.h includes header file unistd_64.h directly on some platforms, here add unistd_64.h on these platforms. Affected platforms are ARM64, LoongArch64 and Riscv. Otherwise there will be compiling error such as: linux-headers/asm/unistd.h:3:10: fatal error: asm/unistd_64.h: No such file or directory #include <asm/unistd_64.h> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Acked-by: Song Gao <gaosong@loongson.cn> Message-Id: <20241028023809.1554405-2-maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn>
2024-11-02target/loongarch/kvm: Implement LoongArch PMU extensionBibo Mao4-1/+63
Implement PMU extension for LoongArch kvm mode. Use OnOffAuto type variable pmu to check the PMU feature. If the PMU Feature is not supported with KVM host, it reports error if there is pmu=on command line. If there is no any command line about pmu parameter, it checks whether KVM host supports the PMU Feature and set the corresponding value in cpucfg. This patch is based on lbt patch located at https://lore.kernel.org/qemu-devel/20240904061859.86615-1-maobibo@loongson.cn Co-developed-by: Song Gao <gaosong@loongson.cn> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Song Gao <gaosong@loongson.cn> Message-Id: <20240918082315.2345034-1-maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn>
2024-11-02target/loongarch: Implement lbt registers save/restore functionBibo Mao3-0/+98
Six registers scr0 - scr3, eflags and ftop are added in percpu vmstate. And two functions kvm_loongarch_get_lbt/kvm_loongarch_put_lbt are added to save/restore lbt registers. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Song Gao <gaosong@loongson.cn> Message-Id: <20240929070405.235200-3-maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn>
2024-11-02target/loongarch: Add loongson binary translation featureBibo Mao4-2/+87
Loongson Binary Translation (LBT) is used to accelerate binary translation, which contains 4 scratch registers (scr0 to scr3), x86/ARM eflags (eflags) and x87 fpu stack pointer (ftop). Now LBT feature is added in kvm mode, not supported in TCG mode since it is not emulated. Feature variable lbt is added with OnOffAuto type, If lbt feature is not supported with KVM host, it reports error if there is lbt=on command line. If there is no any command line about lbt parameter, it checks whether KVM host supports lbt feature and set the corresponding value in cpucfg. Signed-off-by: Bibo Mao <maobibo@loongson.cn> Reviewed-by: Song Gao <gaosong@loongson.cn> Message-Id: <20240929070405.235200-2-maobibo@loongson.cn> Signed-off-by: Song Gao <gaosong@loongson.cn>
2024-10-31migration/multifd: Zero p->flags before starting filling a packetMaciej S. Szmigiero1-1/+1
This way there aren't stale flags there. p->flags can't contain SYNC to be sent at the next RAM packet since syncs are now handled separately in multifd_send_thread. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/r/1c96b6cdb797e6f035eb1a4ad9bfc24f4c7f5df8.1730203967.git.maciej.szmigiero@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration/ram: Add load start trace eventMaciej S. Szmigiero2-0/+2
There's a RAM load complete trace event but there wasn't its start equivalent. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/94ddfa7ecb83a78f73b82867dd30c8767592d257.1730203967.git.maciej.szmigiero@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Drop migration_is_idle()Peter Xu5-25/+5
Now with the current migration_is_running(), it will report exactly the opposite of what will be reported by migration_is_idle(). Drop migration_is_idle(), instead use "!migration_is_running()" which should be identical on functionality. In reality, most of the idle check is inverted, so it's even easier to write with "migrate_is_running()" check. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Drop migration_is_setup_or_active()Peter Xu5-40/+9
This helper is mostly the same as migration_is_running(), except that one has COLO reported as true, the other has CANCELLING reported as true. Per my past years experience on the state changes, none of them should matter. To make it slightly safer, report both COLO || CANCELLING to be true in migration_is_running(), then drop the other one. We kept the 1st only because the name is simpler, and clear enough. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Unexport ram_mig_init()Peter Xu2-1/+1
It's only used within migration/. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Unexport dirty_bitmap_mig_init()Peter Xu2-3/+4
It's only used within migration/, so it shouldn't be exported. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Take migration object refcount earlier for threadsPeter Xu1-2/+8
Both migration thread or background snapshot thread will take a refcount of the migration object at the entrace of the thread function. That makes sense, because it protects the object from being freed by the main thread in migration_shutdown() later, but it might still race with it if the thread is scheduled too late. Consider the case right after pthread_create() happened, VM shuts down with the object released, but right after that the migration thread finally got created, referencing MigrationState* in the opaque pointer which is already freed. The only 100% safe way to make sure it won't get freed is taking the refcount right before the thread is created, meanwhile when BQL is held. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Deprecate query-migrationthreads commandPeter Xu2-1/+14
Per previous discussion [1,2], this patch deprecates query-migrationthreads command. To summarize, the major reason of the deprecation is due to no sensible way to consume the API properly: (1) The reported list of threads are incomplete (ignoring destination threads and non-multifd threads). (2) For CPU pinning, there's no way to properly pin the threads with the API if the threads will start running right away after migration threads can be queried, so the threads will always run on the default cores for a short window. (3) For VM debugging, one can use "-name $VM,debug-threads=on" instead, which will provide proper names for all migration threads. [1] https://lore.kernel.org/r/20240930195837.825728-1-peterx@redhat.com [2] https://lore.kernel.org/r/20241011153417.516715-1-peterx@redhat.com Reviewed-by: Fabiano Rosas <farosas@suse.de> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/20241022194501.1022443-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration/dirtyrate: Silence warning about strcpy() on OpenBSDThomas Huth1-1/+4
The linker on OpenBSD complains: ld: warning: dirtyrate.c:447 (../src/migration/dirtyrate.c:447)(...): warning: strcpy() is almost always misused, please use strlcpy() It's currently not a real problem in this case since both arrays have the same size (256 bytes). But just in case somebody changes the size of the source array in the future, let's better play safe and use g_strlcpy() here instead, with an additional check that the string has been copied as a whole. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Hyman Huang <yong.huang@smartx.com> Link: https://lore.kernel.org/r/20241022063402.184213-1-thuth@redhat.com [peterx: Fix over-80 chars] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31tests/migration: Add case for periodic ramblock dirty syncHyman Huang1-0/+32
Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/cb61504f1a1e9d5f2ca4dac12e518deb076ce9f3.1729146786.git.yong.huang@smartx.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Support periodic RAMBlock dirty bitmap syncHyman Huang6-4/+98
When VM is configured with huge memory, the current throttle logic doesn't look like to scale, because migration_trigger_throttle() is only called for each iteration, so it won't be invoked for a long time if one iteration can take a long time. The periodic dirty sync aims to fix the above issue by synchronizing the ramblock from remote dirty bitmap and, when necessary, triggering the CPU throttle multiple times during a long iteration. This is a trade-off between synchronization overhead and CPU throttle impact. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/f61f1b3653f2acf026901103e1c73d157d38b08f.1729146786.git.yong.huang@smartx.com [peterx: make prev_cnt global, and reset for each migration] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Remove "rs" parameter in migration_bitmap_sync_precopyHyman Huang1-5/+6
The global static variable ram_state in fact is referred to by the "rs" parameter in migration_bitmap_sync_precopy. For ease of calling by the callees, use the global variable directly in migration_bitmap_sync_precopy and remove "rs" parameter. The migration_bitmap_sync_precopy will be exported in the next commit. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/283c335d61463bf477160da91b24da45cdaf3e43.1729146786.git.yong.huang@smartx.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Move cpu-throttle.c from system to migrationHyman Huang7-7/+7
Move cpu-throttle.c from system to migration since it's only used for migration; this makes us avoid exporting the util functions and variables in misc.h but export them in migration.h when implementing the periodic ramblock dirty sync feature in the upcoming commits. Since CPU throttle timers are only used in migration, move their registry to migration_object_init. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/c1b3efaa0cb49e03d422e9da97bdb65cc3d234d1.1729146786.git.yong.huang@smartx.com [peterx: Fix build on MacOS on cocoa.m, not move cpu-throttle.h yet] [peterx: Fix subject spelling, per pm215] Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Stop CPU throttling conditionallyHyman Huang1-1/+3
Since CPU throttling only occurs when auto-converge is on, stop it conditionally. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/f0c787080bb9ab0c37952f0ca5bfaa525d5ddd14.1729146786.git.yong.huang@smartx.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31accel/tcg/icount-common: Remove the reference to the unused header fileHyman Huang1-1/+0
Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/5e33b423d0b8506e5cb33fff42b50aa301b7731b.1729146786.git.yong.huang@smartx.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Ensure vmstate_save() sets errpHanna Czenczek1-5/+8
migration/savevm.c contains some calls to vmstate_save() that are followed by migrate_set_error() if the integer return value indicates an error. migrate_set_error() requires that the `Error *` object passed to it is set. Therefore, vmstate_save() is assumed to always set *errp on error. Right now, that assumption is not met: vmstate_save_state_v() (called internally by vmstate_save()) will not set *errp if vmstate_subsection_save() or vmsd->post_save() fail. Fix that by adding an *errp parameter to vmstate_subsection_save(), and by generating a generic error in case post_save() fails (as is already done for pre_save()). Without this patch, qemu will crash after vmstate_subsection_save() or post_save() have failed inside of a vmstate_save() call (unless migrate_set_error() then happen to discard the new error because s->error is already set). This happens e.g. when receiving the state from a virtio-fs back-end (virtiofsd) fails. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Link: https://lore.kernel.org/r/20241015170437.310358-1-hreitz@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Put thread names together with macrosPeter Xu7-13/+34
Keep migration thread names together, so it's easier to see a list of all possible migration threads. Still two functional changes below besides the macro defintions: - There's one dirty rate thread that we overlooked before, now we add that too and name it as "mig/dirtyrate" following the old rules. - The old name "mig/src/rp-thr" has "-thr" but it may not be useful if it's a thread name anyway, while "rp" can be slightly hard to read. Taking this chance to rename it to "mig/src/return", hopefully a better name. Reviewed-by: Fabiano Rosas <farosas@suse.de> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Link: https://lore.kernel.org/r/20241011153652.517440-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31migration: Cleanup migrate_fd_cleanup() on accessing to_dst_filePeter Xu1-13/+19
The cleanup function can in many cases needs cleanup on its own. The major thing we want to do here is not referencing to_dst_file when without the file mutex. When at it, touch things elsewhere too to make it look slightly better in general. One thing to mention is, migration_thread has its own "running" boolean, so it doesn't need to rely on to_dst_file being non-NULL. Multifd has a dependency so it needs to be skipped if to_dst_file is not yet set; add a richer comment for such reason. Resolves: Coverity CID 1527402 Reported-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240919163042.116767-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-10-31target/i386: Introduce GraniteRapids-v2 modelTao Su1-0/+17
Update GraniteRapids CPU model to add AVX10 and the missing features(ss, tsc-adjust, cldemote, movdiri, movdir64b). Tested-by: Xuelian Guo <xuelian.guo@intel.com> Signed-off-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20241028024512.156724-7-tao1.su@linux.intel.com Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241031085233.425388-9-tao1.su@linux.intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: Add AVX512 state when AVX10 is supportedTao Su1-1/+9
AVX10 state enumeration in CPUID leaf D and enabling in XCR0 register are identical to AVX512 state regardless of the supported vector lengths. Given that some E-cores will support AVX10 but not support AVX512, add AVX512 state components to guest when AVX10 is enabled. Based on a patch by Tao Su <tao1.su@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Signed-off-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20241031085233.425388-8-tao1.su@linux.intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: Add feature dependencies for AVX10Tao Su2-0/+20
Since the highest supported vector length for a processor implies that all lesser vector lengths are also supported, add the dependencies of the supported vector lengths. If all vector lengths aren't supported, clear AVX10 enable bit as well. Note that the order of AVX10 related dependencies should be kept as: CPUID_24_0_EBX_AVX10_128 -> CPUID_24_0_EBX_AVX10_256, CPUID_24_0_EBX_AVX10_256 -> CPUID_24_0_EBX_AVX10_512, CPUID_24_0_EBX_AVX10_VL_MASK -> CPUID_7_1_EDX_AVX10, CPUID_7_1_EDX_AVX10 -> CPUID_24_0_EBX, so that prevent user from setting weird CPUID combinations, e.g. 256-bits and 512-bits are supported but 128-bits is not, no vector lengths are supported but AVX10 enable bit is still set. Since AVX10_128 will be reserved as 1, adding these dependencies has the bonus that when user sets -cpu host,-avx10-128, CPUID_7_1_EDX_AVX10 and CPUID_24_0_EBX will be disabled automatically. Tested-by: Xuelian Guo <xuelian.guo@intel.com> Signed-off-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20241028024512.156724-5-tao1.su@linux.intel.com Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Link: https://lore.kernel.org/r/20241031085233.425388-7-tao1.su@linux.intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: add CPUID.24 features for AVX10Tao Su2-0/+23
Introduce features for the supported vector bit lengths. Signed-off-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20241028024512.156724-3-tao1.su@linux.intel.com Link: https://lore.kernel.org/r/20241028024512.156724-4-tao1.su@linux.intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Link: https://lore.kernel.org/r/20241031085233.425388-6-tao1.su@linux.intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: add AVX10 feature and AVX10 version propertyTao Su3-8/+63
When AVX10 enable bit is set, the 0x24 leaf will be present as "AVX10 Converged Vector ISA leaf" containing fields for the version number and the supported vector bit lengths. Introduce avx10-version property so that avx10 version can be controlled by user and cpu model. Per spec, avx10 version can never be 0, the default value of avx10-version is set to 0 to determine whether it is specified by user. The default can come from the device model or, for the max model, from KVM's reported value. Signed-off-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20241028024512.156724-3-tao1.su@linux.intel.com Link: https://lore.kernel.org/r/20241028024512.156724-4-tao1.su@linux.intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Xuelian Guo <xuelian.guo@intel.com> Link: https://lore.kernel.org/r/20241031085233.425388-5-tao1.su@linux.intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: return bool from x86_cpu_filter_featuresPaolo Bonzini1-9/+11
Prepare for filtering non-boolean features such as AVX10 version. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20241031085233.425388-4-tao1.su@linux.intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: do not rely on ExtSaveArea for accelerator-supported XCR0 bitsPaolo Bonzini2-6/+31
Right now, QEMU is using the "feature" and "bits" fields of ExtSaveArea to query the accelerator for the support status of extended save areas. This is a problem for AVX10, which attaches two feature bits (AVX512F and AVX10) to the same extended save states. To keep the AVX10 hacks to the minimum, limit usage of esa->features and esa->bits. Instead, just query the accelerator for the 0xD leaf. Do it in common code and clear esa->size if an extended save state is unsupported. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20241031085233.425388-3-tao1.su@linux.intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: cpu: set correct supported XCR0 features for TCGPaolo Bonzini1-2/+4
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20241031085233.425388-2-tao1.su@linux.intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: use + to put flags togetherPaolo Bonzini1-12/+12
This gives greater opportunity for reassociation on x86 targets, since addition can use the LEA instruction. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: use higher-precision arithmetic to compute CFPaolo Bonzini1-0/+37
If the operands of the arithmetic instruction fit within a half-register, it's easiest to use a comparison instruction to compute the carry. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: use compiler builtin to compute PFPaolo Bonzini5-48/+26
This removes the 256 byte parity table from the executable. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: make flag variables unsignedPaolo Bonzini1-23/+23
This makes it easier for the compiler to understand which bits are set, and it also removes "cltq" instructions to canonicalize the output value as 32-bit signed. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: add a note about gen_jcc1Paolo Bonzini1-0/+4
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: add a few more trivial CCPrepare casesPaolo Bonzini1-0/+3
Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: optimize TEST+Jxx sequencesPaolo Bonzini1-0/+22
Mostly used for TEST+JG and TEST+JLE, but it is easy to cover also JBE/JA and JL/JGE; shaves about 0.5% TCG ops. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: optimize computation of ZF from CC_OP_DYNAMICPaolo Bonzini3-3/+21
Most uses of CC_OP_DYNAMIC are for CMP/JB/JE or similar sequences. We can optimize many of them to avoid computation of the flags. This eliminates both TCG ops to set up the new cc_op, and helper instructions because evaluating just ZF is much cheaper. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: Wrap cc_op_live with a validity checkRichard Henderson4-7/+21
Assert that op is known and that cc_op_live_ is populated. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: Introduce cc_op_sizeRichard Henderson3-13/+26
Replace arithmetic on cc_op with a helper function. Assert that the op has a size and that it is valid for the configuration. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Link: https://lore.kernel.org/r/20240701025115.1265117-6-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: Rearrange CCOpRichard Henderson1-6/+8
Give the first few enumerators explicit integer constants, align the BWLQ enumerators. This will be used to simplify ((op - CC_OP_*B) & 3). Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Link: https://lore.kernel.org/r/20240701025115.1265117-4-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: remove CC_OP_CLRPaolo Bonzini5-26/+4
Just use CC_OP_EFLAGS; it is not that likely that the flags computed by CC_OP_CLR survive the end of the basic block, in which case there is no need to spill cc_op_src. cc_op_src now does need spilling if the XOR is followed by a memory operation, but this only costs 0.2% extra TCG ops. They will be recouped by simplifications in how QEMU evaluates ZF at runtime, which are even greater with this change. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: Tidy cc_op_str usageRichard Henderson1-6/+11
Make const. Use the read-only strings directly; do not copy them into an on-stack buffer with snprintf. Allow for holes in the cc_op_str array, now present with CC_OP_POPCNT. Fixes: 460231ad369 ("target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL") Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Link: https://lore.kernel.org/r/20240701025115.1265117-2-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31target/i386: use tcg_gen_ext_tl when applicablePaolo Bonzini1-8/+8
Prefer it to gen_ext_tl in the common case where the destination is known. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31ci: always invoke meson through pyvenvPaolo Bonzini1-1/+1
Do not assume that the distro-installed meson is compatible with the one in the virtual environment. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31docs/nitro-enclave: Documentation for nitro-enclave machine typeDorjoy Chowdhury4-2/+83
Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com> Reviewed-by: Alexander Graf <graf@amazon.com> Link: https://lore.kernel.org/r/20241008211727.49088-7-dorjoychy111@gmail.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31machine/nitro-enclave: New machine type for AWS Nitro EnclavesDorjoy Chowdhury8-1/+437
AWS nitro enclaves[1] is an Amazon EC2[2] feature that allows creating isolated execution environments, called enclaves, from Amazon EC2 instances which are used for processing highly sensitive data. Enclaves have no persistent storage and no external networking. The enclave VMs are based on the Firecracker microvm with a vhost-vsock device for communication with the parent EC2 instance that spawned it and a Nitro Secure Module (NSM) device for cryptographic attestation. The parent instance VM always has CID 3 while the enclave VM gets a dynamic CID. An EIF (Enclave Image Format)[3] file is used to boot an AWS nitro enclave virtual machine. This commit adds support for AWS nitro enclave emulation using a new machine type option '-M nitro-enclave'. This new machine type is based on the 'microvm' machine type, similar to how real nitro enclave VMs are based on Firecracker microvm. For nitro-enclave to boot from an EIF file, the kernel and ramdisk(s) are extracted into a temporary kernel and a temporary initrd file which are then hooked into the regular x86 boot mechanism along with the extracted cmdline. The EIF file path should be provided using the '-kernel' QEMU option. In QEMU, the vsock emulation for nitro enclave is added using vhost-user- vsock as opposed to vhost-vsock. vhost-vsock doesn't support sibling VM communication which is needed for nitro enclaves. So for the vsock communication to CID 3 to work, another process that does the vsock emulation in userspace must be run, for example, vhost-device-vsock[4] from rust-vmm, with necessary vsock communication support in another guest VM with CID 3. Using vhost-user-vsock also enables the possibility to implement some proxying support in the vhost-user-vsock daemon that will forward all the packets to the host machine instead of CID 3 so that users of nitro-enclave can run the necessary applications in their host machine instead of running another whole VM with CID 3. The following mandatory nitro-enclave machine option has been added related to the vhost-user-vsock device. - 'vsock': The chardev id from the '-chardev' option for the vhost-user-vsock device. AWS Nitro Enclaves have built-in Nitro Secure Module (NSM) device which has been added using the virtio-nsm device added in a previous commit. In Nitro Enclaves, all the PCRs start in a known zero state and the first 16 PCRs are locked from boot and reserved. The PCR0, PCR1, PCR2 and PCR8 contain the SHA384 hashes related to the EIF file used to boot the VM for validation. The following optional nitro-enclave machine options have been added related to the NSM device. - 'id': Enclave identifier, reflected in the module-id of the NSM device. If not provided, a default id will be set. - 'parent-role': Parent instance IAM role ARN, reflected in PCR3 of the NSM device. - 'parent-id': Parent instance identifier, reflected in PCR4 of the NSM device. [1] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html [2] https://aws.amazon.com/ec2/ [3] https://github.com/aws/aws-nitro-enclaves-image-format [4] https://github.com/rust-vmm/vhost-device/tree/main/vhost-device-vsock Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com> Reviewed-by: Alexander Graf <graf@amazon.com> Link: https://lore.kernel.org/r/20241008211727.49088-6-dorjoychy111@gmail.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31core/machine: Make create_default_memdev machine a virtual methodDorjoy Chowdhury4-35/+42
This is in preparation for the next commit where the nitro-enclave machine type will need to instead use a memfd backend, for the built-in vhost-user-vsock device to work. Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com> Reviewed-by: Alexander Graf <graf@amazon.com> Link: https://lore.kernel.org/r/20241008211727.49088-5-dorjoychy111@gmail.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31hw/core: Add Enclave Image Format (EIF) related helpersDorjoy Chowdhury7-0/+757
An EIF (Enclave Image Format)[1] file is used to boot an AWS nitro enclave[2] virtual machine. The EIF file contains the necessary kernel, cmdline, ramdisk(s) sections to boot. Some helper functions have been introduced for extracting the necessary sections from an EIF file and then writing them to temporary files as well as computing SHA384 hashes from the section data. These will be used in the following commit to add support for nitro-enclave machine type in QEMU. The files added in this commit are not compiled yet but will be added to the hw/core/meson.build file in the following commit where CONFIG_NITRO_ENCLAVE will be introduced. [1] https://github.com/aws/aws-nitro-enclaves-image-format [2] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com> Reviewed-by: Alexander Graf <graf@amazon.com> Link: https://lore.kernel.org/r/20241008211727.49088-4-dorjoychy111@gmail.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-10-31device/virtio-nsm: Support for Nitro Secure Module deviceDorjoy Chowdhury12-0/+2252
Nitro Secure Module (NSM)[1] device is used in AWS Nitro Enclaves[2] for stripped down TPM functionality like cryptographic attestation. The requests to and responses from NSM device are CBOR[3] encoded. This commit adds support for NSM device in QEMU. Although related to AWS Nitro Enclaves, the virito-nsm device is independent and can be used in other machine types as well. The libcbor[4] library has been used for the CBOR encoding and decoding functionalities. [1] https://lists.oasis-open.org/archives/virtio-comment/202310/msg00387.html [2] https://docs.aws.amazon.com/enclaves/latest/user/nitro-enclave.html [3] http://cbor.io/ [4] https://libcbor.readthedocs.io/en/latest/ Signed-off-by: Dorjoy Chowdhury <dorjoychy111@gmail.com> Reviewed-by: Alexander Graf <graf@amazon.com> Link: https://lore.kernel.org/r/20241008211727.49088-3-dorjoychy111@gmail.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>