aboutsummaryrefslogtreecommitdiff
path: root/accel/kvm/kvm-all.c
AgeCommit message (Collapse)AuthorFilesLines
8 dayskvm: Remove unreachable code in kvm_dirty_ring_reaper_thread()Peter Maydell1-5/+1
The code at the tail end of the loop in kvm_dirty_ring_reaper_thread() is unreachable, because there is no way for execution to leave the loop. Replace it with a g_assert_not_reached(). (The code has always been unreachable, right from the start when the function was added in commit b4420f198dd8.) Resolves: Coverity CID 1547687 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20240815131206.3231819-3-peter.maydell@linaro.org
8 dayskvm: Make 'mmap_size' be 'int' in kvm_init_vcpu(), do_kvm_destroy_vcpu()Peter Maydell1-2/+2
In kvm_init_vcpu()and do_kvm_destroy_vcpu(), the return value from kvm_ioctl(..., KVM_GET_VCPU_MMAP_SIZE, ...) is an 'int', but we put it into a 'long' logal variable mmap_size. Coverity then complains that there might be a truncation when we copy that value into the 'int ret' which we use for returning a value in an error-exit codepath. This can't ever actually overflow because the value was in an 'int' to start with, but it makes more sense to use 'int' for mmap_size so we don't do the widen-then-narrow sequence in the first place. Resolves: Coverity CID 1547515 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20240815131206.3231819-2-peter.maydell@linaro.org
2024-09-13kvm: Use 'unsigned long' for request argument in functions wrapping ioctl()Johannes Stoelp1-4/+4
Change the data type of the ioctl _request_ argument from 'int' to 'unsigned long' for the various accel/kvm functions which are essentially wrappers around the ioctl() syscall. The correct type for ioctl()'s 'request' argument is confused: * POSIX defines the request argument as 'int' * glibc uses 'unsigned long' in the prototype in sys/ioctl.h * the glibc info documentation uses 'int' * the Linux manpage uses 'unsigned long' * the Linux implementation of the syscall uses 'unsigned int' If we wrap ioctl() with another function which uses 'int' as the type for the request argument, then requests with the 0x8000_0000 bit set will be sign-extended when the 'int' is cast to 'unsigned long' for the call to ioctl(). On x86_64 one such example is the KVM_IRQ_LINE_STATUS request. Bit requests with the _IOC_READ direction bit set, will have the high bit set. Fortunately the Linux Kernel truncates the upper 32bit of the request on 64bit machines (because it uses 'unsigned int', and see also Linus Torvalds' comments in https://sourceware.org/bugzilla/show_bug.cgi?id=14362 ) so this doesn't cause active problems for us. However it is more consistent to follow the glibc ioctl() prototype when we define functions that are essentially wrappers around ioctl(). This resolves a Coverity issue where it points out that in kvm_get_xsave() we assign a value (KVM_GET_XSAVE or KVM_GET_XSAVE2) to an 'int' variable which can't hold it without overflow. Resolves: Coverity CID 1547759 Signed-off-by: Johannes Stoelp <johannes.stoelp@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 20240815122747.3053871-1-peter.maydell@linaro.org [PMM: Rebased patch, adjusted commit message, included note about Coverity fix, updated the type of the local var in kvm_get_xsave, updated the comment in the KVMState struct definition] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-08-01accel/kvm/kvm-all: Fixes the missing break in vCPU unpark logicSalil Mehta1-0/+1
Loop should exit prematurely on successfully finding out the parked vCPU (struct KVMParkedVcpu) in the 'struct KVMState' maintained 'kvm_parked_vcpus' list of parked vCPUs. Fixes: Coverity CID 1558552 Fixes: 08c3286822 ("accel/kvm: Extract common KVM vCPU {creation,parking} code") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-id: 20240725145132.99355-1-salil.mehta@huawei.com Suggested-by: Peter Maydell <peter.maydell@linaro.org> Message-ID: <CAFEAcA-3_d1c7XSXWkFubD-LsW5c5i95e6xxV09r2C9yGtzcdA@mail.gmail.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2024-07-26accel/kvm: Introduce kvm_create_and_park_vcpu() helperHarsh Prateek Bora1-0/+12
There are distinct helpers for creating and parking a KVM vCPU. However, there can be cases where a platform needs to create and immediately park the vCPU during early stages of vcpu init which can later be reused when vcpu thread gets initialized. This would help detect failures with kvm_create_vcpu at an early stage. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
2024-07-24Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into stagingRichard Henderson1-0/+27
* target/i386/kvm: support for reading RAPL MSRs using a helper program * hpet: emulation improvements # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmaelL4UHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroMXoQf+K77lNlHLETSgeeP3dr7yZPOmXjjN # qFY/18jiyLw7MK1rZC09fF+n9SoaTH8JDKupt0z9M1R10HKHLIO04f8zDE+dOxaE # Rou3yKnlTgFPGSoPPFr1n1JJfxtYlLZRoUzaAcHUaa4W7JR/OHJX90n1Rb9MXeDk # jV6P0v1FWtIDdM6ERm9qBGoQdYhj6Ra2T4/NZKJFXwIhKEkxgu4yO7WXv8l0dxQz # jE4fKotqAvrkYW1EsiVZm30lw/19duhvGiYeQXoYhk8KKXXjAbJMblLITSNWsCio # 3l6Uud/lOxekkJDAq5nH3H9hCBm0WwvwL+0vRf3Mkr+/xRGvrhtmUdp8NQ== # =00mB # -----END PGP SIGNATURE----- # gpg: Signature made Tue 23 Jul 2024 03:19:58 AM AEST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [full] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [full] * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: hpet: avoid timer storms on periodic timers hpet: store full 64-bit target value of the counter hpet: accept 64-bit reads and writes hpet: place read-only bits directly in "new_val" hpet: remove unnecessary variable "index" hpet: ignore high bits of comparator in 32-bit mode hpet: fix and cleanup persistence of interrupt status Add support for RAPL MSRs in KVM/Qemu tools: build qemu-vmsr-helper qio: add support for SO_PEERCRED for socket channel target/i386: do not crash if microvm guest uses SGX CPUID leaves Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-22accel/kvm: Extract common KVM vCPU {creation,parking} codeSalil Mehta1-32/+63
KVM vCPU creation is done once during the vCPU realization when Qemu vCPU thread is spawned. This is common to all the architectures as of now. Hot-unplug of vCPU results in destruction of the vCPU object in QOM but the corresponding KVM vCPU object in the Host KVM is not destroyed as KVM doesn't support vCPU removal. Therefore, its representative KVM vCPU object/context in Qemu is parked. Refactor architecture common logic so that some APIs could be reused by vCPU Hotplug code of some architectures likes ARM, Loongson etc. Update new/old APIs with trace events. New APIs qemu_{create,park,unpark}_vcpu() can be externally called. No functional change is intended here. Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Tested-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Xianglai Li <lixianglai@loongson.cn> Tested-by: Miguel Luis <miguel.luis@oracle.com> Reviewed-by: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Vishnu Pajjuri <vishnu@os.amperecomputing.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20240716111502.202344-2-salil.mehta@huawei.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-07-22Add support for RAPL MSRs in KVM/QemuAnthony Harivel1-0/+27
Starting with the "Sandy Bridge" generation, Intel CPUs provide a RAPL interface (Running Average Power Limit) for advertising the accumulated energy consumption of various power domains (e.g. CPU packages, DRAM, etc.). The consumption is reported via MSRs (model specific registers) like MSR_PKG_ENERGY_STATUS for the CPU package power domain. These MSRs are 64 bits registers that represent the accumulated energy consumption in micro Joules. They are updated by microcode every ~1ms. For now, KVM always returns 0 when the guest requests the value of these MSRs. Use the KVM MSR filtering mechanism to allow QEMU handle these MSRs dynamically in userspace. To limit the amount of system calls for every MSR call, create a new thread in QEMU that updates the "virtual" MSR values asynchronously. Each vCPU has its own vMSR to reflect the independence of vCPUs. The thread updates the vMSR values with the ratio of energy consumed of the whole physical CPU package the vCPU thread runs on and the thread's utime and stime values. All other non-vCPU threads are also taken into account. Their energy consumption is evenly distributed among all vCPUs threads running on the same physical CPU package. To overcome the problem that reading the RAPL MSR requires priviliged access, a socket communication between QEMU and the qemu-vmsr-helper is mandatory. You can specified the socket path in the parameter. This feature is activated with -accel kvm,rapl=true,path=/path/sock.sock Actual limitation: - Works only on Intel host CPU because AMD CPUs are using different MSR adresses. - Only the Package Power-Plane (MSR_PKG_ENERGY_STATUS) is reported at the moment. Signed-off-by: Anthony Harivel <aharivel@redhat.com> Link: https://lore.kernel.org/r/20240522153453.1230389-4-aharivel@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-07-17accel/kvm/kvm-all: Fix superfluous trailing semicolonZhao Liu1-1/+1
Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-06-24gdbstub: move enums into separate headerAlex Bennée1-1/+1
This is an experiment to further reduce the amount we throw into the exec headers. It might not be as useful as I initially thought because just under half of the users also need gdbserver_start(). Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20240620152220.2192768-3-alex.bennee@linaro.org>
2024-06-17migration/dirtyrate: Fix segmentation faultMasato Imai1-1/+1
Since the kvm_dirty_ring_enabled function accesses a null kvm_state pointer when the KVM acceleration parameter is not specified, running calc_dirty_rate with the -r or -b option causes a segmentation fault. Signed-off-by: Masato Imai <mii@sfc.wide.ad.jp> Message-ID: <20240507025010.1968881-1-mii@sfc.wide.ad.jp> [Assert kvm_state when kvm_dirty_ring_enabled was called to fix it. - Hyman] Signed-off-by: Hyman Huang <yong.huang@smartx.com>
2024-06-04accel/kvm: Fix two lines with hard-coded tabsPeter Maydell1-2/+2
In kvm-all.c, two lines have been accidentally indented with hard-coded tabs rather than spaces. Normalise to match the rest of the file. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240531170952.505323-1-peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-05-03kvm: move target-dependent interrupt routing out of kvm-all.cPaolo Bonzini1-59/+3
Let hw/hyperv/hyperv.c and hw/intc/s390_flic.c handle (respectively) SynIC and adapter routes, removing the code from target-independent files. This also removes the only occurrence of AdapterInfo outside s390 code, so remove that from typedefs.h. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23kvm/tdx: Ignore memory conversion to shared of unassigned regionIsaku Yamahata1-0/+12
TDX requires vMMIO region to be shared. For KVM, MMIO region is the region which kvm memslot isn't assigned to (except in-kernel emulation). qemu has the memory region for vMMIO at each device level. While OVMF issues MapGPA(to-shared) conservatively on 32bit PCI MMIO region, qemu doesn't find corresponding vMMIO region because it's before PCI device allocation and memory_region_find() finds the device region, not PCI bus region. It's safe to ignore MapGPA(to-shared) because when guest accesses those region they use GPA with shared bit set for vMMIO. Ignore memory conversion request of non-assigned region to shared and return success. Otherwise OVMF is confused and panics there. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240229063726.610065-35-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23kvm/tdx: Don't complain when converting vMMIO region to sharedIsaku Yamahata1-3/+16
Because vMMIO region needs to be shared region, guest TD may explicitly convert such region from private to shared. Don't complain such conversion. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240229063726.610065-34-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23kvm: handle KVM_EXIT_MEMORY_FAULTChao Peng1-10/+88
Upon an KVM_EXIT_MEMORY_FAULT exit, userspace needs to do the memory conversion on the RAMBlock to turn the memory into desired attribute, switching between private and shared. Currently only KVM_MEMORY_EXIT_FLAG_PRIVATE in flags is valid when KVM_EXIT_MEMORY_FAULT happens. Note, KVM_EXIT_MEMORY_FAULT makes sense only when the RAMBlock has guest_memfd memory backend. Note, KVM_EXIT_MEMORY_FAULT returns with -EFAULT, so special handling is added. When page is converted from shared to private, the original shared memory can be discarded via ram_block_discard_range(). Note, shared memory can be discarded only when it's not back'ed by hugetlb because hugetlb is supposed to be pre-allocated and no need for discarding. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240320083945.991426-13-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23kvm/memory: Make memory type private by default if it has guest memfd backendXiaoyao Li1-0/+10
KVM side leaves the memory to shared by default, which may incur the overhead of paging conversion on the first visit of each page. Because the expectation is that page is likely to private for the VMs that require private memory (has guest memfd). Explicitly set the memory to private when memory region has valid guest memfd backend. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Michael Roth <michael.roth@amd.com> Message-ID: <20240320083945.991426-16-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslotChao Peng1-8/+38
Switch to KVM_SET_USER_MEMORY_REGION2 when supported by KVM. With KVM_SET_USER_MEMORY_REGION2, QEMU can set up memory region that backend'ed both by hva-based shared memory and guest memfd based private memory. Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com> Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240320083945.991426-10-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23RAMBlock: Add support of KVM private guest memfdXiaoyao Li1-0/+28
Add KVM guest_memfd support to RAMBlock so both normal hva based memory and kvm guest memfd based private memory can be associated in one RAMBlock. Introduce new flag RAM_GUEST_MEMFD. When it's set, it calls KVM ioctl to create private guest_memfd during RAMBlock setup. Allocating a new RAM_GUEST_MEMFD flag to instruct the setup of guest memfd is more flexible and extensible than simply relying on the VM type because in the future we may have the case that not all the memory of a VM need guest memfd. As a benefit, it also avoid getting MachineState in memory subsystem. Note, RAM_GUEST_MEMFD is supposed to be set for memory backends of confidential guests, such as TDX VM. How and when to set it for memory backends will be implemented in the following patches. Introduce memory_region_has_guest_memfd() to query if the MemoryRegion has KVM guest_memfd allocated. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-ID: <20240320083945.991426-7-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23kvm: Introduce support for memory_attributesXiaoyao Li1-0/+32
Introduce the helper functions to set the attributes of a range of memory to private or shared. This is necessary to notify KVM the private/shared attribute of each gpa range. KVM needs the information to decide the GPA needs to be mapped at hva-based shared memory or guest_memfd based private memory. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240320083945.991426-11-michael.roth@amd.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23trace/kvm: Split address space and slot id in trace_kvm_set_user_memory()Xiaoyao Li1-2/+3
The upper 16 bits of kvm_userspace_memory_region::slot are address space id. Parse it separately in trace_kvm_set_user_memory(). Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20240229063726.610065-5-xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23KVM: remove kvm_arch_cpu_check_are_resettablePaolo Bonzini1-5/+0
Board reset requires writing a fresh CPU state. As far as KVM is concerned, the only thing that blocks reset is that CPU state is encrypted; therefore, kvm_cpus_are_resettable() can simply check if that is the case. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-23KVM: track whether guest state is encryptedPaolo Bonzini1-3/+14
So far, KVM has allowed KVM_GET/SET_* ioctls to execute even if the guest state is encrypted, in which case they do nothing. For the new API using VM types, instead, the ioctls will fail which is a safer and more robust approach. The new API will be the only one available for SEV-SNP and TDX, but it is also usable for SEV and SEV-ES. In preparation for that, require architecture-specific KVM code to communicate the point at which guest state is protected (which must be after kvm_cpu_synchronize_post_init(), though that might change in the future in order to suppor migration). From that point, skip reading registers so that cpu->vcpu_dirty is never true: if it ever becomes true, kvm_arch_put_registers() will fail miserably. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-18kvm: use configs/ definition to conditionalize debug supportPaolo Bonzini1-5/+5
If an architecture adds support for KVM_CAP_SET_GUEST_DEBUG but QEMU does not have the necessary code, QEMU will fail to build after updating kernel headers. Avoid this by using a #define in config-target.h instead of KVM_CAP_SET_GUEST_DEBUG. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-04-08kvm: error out of kvm_irqchip_add_msi_route() in case of full route tableIgor Mammedov1-5/+10
subj is calling kvm_add_routing_entry() which simply extends KVMState::irq_routes::entries[] but doesn't check if number of routes goes beyond limit the kernel is willing to accept. Which later leads toi the assert qemu-kvm: ../accel/kvm/kvm-all.c:1833: kvm_irqchip_commit_routes: Assertion `ret == 0' failed typically it happens during guest boot for large enough guest Reproduced with: ./qemu --enable-kvm -m 8G -smp 64 -machine pc \ `for b in {1..2}; do echo -n "-device pci-bridge,id=pci$b,chassis_nr=$b "; for i in {0..31}; do touch /tmp/vblk$b$i; echo -n "-drive file=/tmp/vblk$b$i,if=none,id=drive$b$i,format=raw -device virtio-blk-pci,drive=drive$b$i,bus=pci$b "; done; done` While crash at boot time is bad, the same might happen at hotplug time which is unacceptable. So instead calling kvm_add_routing_entry() unconditionally, check first that number of routes won't exceed KVM_CAP_IRQ_ROUTING. This way virtio device insteads killin qemu, will gracefully fail to initialize device as expected with following warnings on console: virtio-blk failed to set guest notifier (-28), ensure -accel kvm is set. virtio_bus_start_ioeventfd: failed. Fallback to userspace (slower). Signed-off-by: Igor Mammedov <imammedo@redhat.com> Message-ID: <20240408110956.451558-1-imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-05migration: prevent migration when VM has poisoned memoryWilliam Roche1-0/+10
A memory page poisoned from the hypervisor level is no longer readable. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a similar stack trace: Program terminated with signal SIGBUS, Bus error. #0 _mm256_loadu_si256 #1 buffer_zero_avx2 #2 select_accel_fn #3 buffer_is_zero #4 save_zero_page #5 ram_save_target_page_legacy #6 ram_save_host_page #7 ram_find_and_save_block #8 ram_save_iterate #9 qemu_savevm_state_iterate #10 migration_iteration_run #11 migration_thread #12 qemu_thread_start To avoid this VM crash during the migration, prevent the migration when a known hardware poison exists on the VM. Signed-off-by: William Roche <william.roche@oracle.com> Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-01-18Add class property to configure KVM device node to useDaan De Meyer1-1/+24
This allows passing the KVM device node to use as a file descriptor via /dev/fdset/XX. Passing the device node to use as a file descriptor allows running qemu unprivileged even when the user running qemu is not in the kvm group on distributions where access to /dev/kvm is gated behind membership of the kvm group (as long as the process invoking qemu is able to open /dev/kvm and passes the file descriptor to qemu). Signed-off-by: Daan De Meyer <daan.j.demeyer@gmail.com> Message-ID: <20231021134015.1119597-1-daan.j.demeyer@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-01-08system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()Stefan Hajnoczi1-11/+11
The Big QEMU Lock (BQL) has many names and they are confusing. The actual QemuMutex variable is called qemu_global_mutex but it's commonly referred to as the BQL in discussions and some code comments. The locking APIs, however, are called qemu_mutex_lock_iothread() and qemu_mutex_unlock_iothread(). The "iothread" name is historic and comes from when the main thread was split into into KVM vcpu threads and the "iothread" (now called the main loop thread). I have contributed to the confusion myself by introducing a separate --object iothread, a separate concept unrelated to the BQL. The "iothread" name is no longer appropriate for the BQL. Rename the locking APIs to: - void bql_lock(void) - void bql_unlock(void) - bool bql_locked(void) There are more APIs with "iothread" in their names. Subsequent patches will rename them. There are also comments and documentation that will be updated in later patches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Fabiano Rosas <farosas@suse.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20240102153529.486531-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2023-12-23accel/kvm: Turn DPRINTF macro use into tracepointsJai Arora1-22/+6
Patch removes DPRINTF macro and adds multiple tracepoints to capture different kvm events. We also drop the DPRINTFs that don't add any additional information than trace_kvm_run_exit already does. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1827 Signed-off-by: Jai Arora <arorajai2798@gmail.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2023-12-19accel/kvm: Make kvm_has_guest_debug staticRichard Henderson1-1/+1
This variable is not used or declared outside kvm-all.c. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Gavin Shan <gshan@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Tested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-10-25kvm: i386: require KVM_CAP_SET_VCPU_EVENTS and KVM_CAP_X86_ROBUST_SINGLESTEPPaolo Bonzini1-9/+0
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: i386: require KVM_CAP_DEBUGREGSPaolo Bonzini1-9/+0
This was introduced in KVM in Linux 2.6.35, we can require it unconditionally. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: unify listeners for PIO address spacePaolo Bonzini1-9/+2
Since we now assume that ioeventfds are present, kvm_io_listener is always registered. Merge it with kvm_coalesced_pio_listener in a single listener. Since PIO space does not have KVM memslots attached to it, the priority is irrelevant. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTHPaolo Bonzini1-16/+6
KVM_CAP_IOEVENTFD_ANY_LENGTH was added in Linux 4.4, released in 2016. Assume that it is present. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: assume that many ioeventfds can be createdPaolo Bonzini1-47/+0
NR_IOBUS_DEVS was increased to 200 in Linux 2.6.34. By Linux 3.5 it had increased to 1000 and later ioeventfds were changed to not count against the limit. But the earlier limit of 200 would already be enough for kvm_check_many_ioeventfds() to be true, so remove the check. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: drop reference to KVM_CAP_PCI_2_3Paolo Bonzini1-7/+0
This is a remnant of pre-VFIO device assignment; it is not defined anymore by Linux and not used by QEMU. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: require KVM_IRQFD for kernel irqchipPaolo Bonzini1-8/+5
KVM_IRQFD was introduced in Linux 2.6.32, and since then it has always been available on architectures that support an in-kernel interrupt controller. We can require it unconditionally. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: require KVM_CAP_SIGNAL_MSIPaolo Bonzini1-95/+7
This was introduced in KVM in Linux 3.5, we can require it unconditionally in kvm_irqchip_send_msi(). However, not all architectures have to implement it so check it only in x86, the only architecture that ever had MSI injection but not KVM_CAP_SIGNAL_MSI. ARM uses it to detect the presence of the ITS emulation in the kernel, introduced in Linux 4.8. Assume that it's there and possibly fail when realizing the arm-its-kvm device. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-25kvm: require KVM_CAP_INTERNAL_ERROR_DATAPaolo Bonzini1-7/+6
This was introduced in KVM in Linux 2.6.33, we can require it unconditionally. Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-10-12kvm: Add stub for kvm_get_max_memslots()David Hildenbrand1-1/+1
We'll need the stub soon from memory device context. While at it, use "unsigned int" as return value and place the declaration next to kvm_get_free_memslots(). Message-ID: <20230926185738.277351-11-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-10-12kvm: Return number of free memslotsDavid Hildenbrand1-13/+20
Let's return the number of free slots instead of only checking if there is a free slot. While at it, check all address spaces, which will also consider SMM under x86 correctly. This is a preparation for memory devices that consume multiple memslots. Message-ID: <20230926185738.277351-5-david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2023-09-29accel/kvm/kvm-all: Handle register access errorsAkihiko Odaki1-4/+28
A register access error typically means something seriously wrong happened so that anything bad can happen after that and recovery is impossible. Even failing one register access is catastorophic as architecture-specific code are not written so that it torelates such failures. Make sure the VM stop and nothing worse happens if such an error occurs. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-ID: <20221201102728.69751-1-akihiko.odaki@daynix.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2023-09-08arm/kvm: Enable support for KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZEShameer Kolothum1-0/+1
Now that we have Eager Page Split support added for ARM in the kernel, enable it in Qemu. This adds, -eager-split-size to -accel sub-options to set the eager page split chunk size. -enable KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE. The chunk size specifies how many pages to break at a time, using a single allocation. Bigger the chunk size, more pages need to be allocated ahead of time. Reviewed-by: Gavin Shan <gshan@redhat.com> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Message-id: 20230905091246.1931-1-shameerali.kolothum.thodi@huawei.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-08-24accel/kvm: Widen pc/saved_insn for kvm_sw_breakpointAnton Johansson1-2/+1
Widens the pc and saved_insn fields of kvm_sw_breakpoint from target_ulong to vaddr. The pc argument of kvm_find_sw_breakpoint is also widened to match. Signed-off-by: Anton Johansson <anjo@rev.ng> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230807155706.9580-2-anjo@rev.ng> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-22accel/kvm: Make kvm_dirty_ring_reaper_init() voidAkihiko Odaki1-7/+2
The returned value was always zero and had no meaning. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-7-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-08-22accel/kvm: Free as when an error occurredAkihiko Odaki1-0/+1
An error may occur after s->as is allocated, for example if the KVM_CREATE_VM ioctl call fails. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-6-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: tweaked commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-08-22accel/kvm: Use negative KVM type for error propagationAkihiko Odaki1-0/+5
On MIPS, kvm_arch_get_default_type() returns a negative value when an error occurred so handle the case. Also, let other machines return negative values when errors occur and declare returning a negative value as the correct way to propagate an error that happened when determining KVM type. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-5-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-08-22kvm: Introduce kvm_arch_get_default_type hookAkihiko Odaki1-1/+3
kvm_arch_get_default_type() returns the default KVM type. This hook is particularly useful to derive a KVM type that is valid for "none" machine model, which is used by libvirt to probe the availability of KVM. For MIPS, the existing mips_kvm_type() is reused. This function ensures the availability of VZ which is mandatory to use KVM on the current QEMU. Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20230727073134.134102-2-akihiko.odaki@daynix.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: added doc comment for new function] Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-07-31kvm: Fix crash due to access uninitialized kvm_stateGavin Shan1-1/+1
Runs into core dump on arm64 and the backtrace extracted from the core dump is shown as below. It's caused by accessing uninitialized @kvm_state in kvm_flush_coalesced_mmio_buffer() due to commit 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()"), where the machine's memory region is added earlier than before. main qemu_init configure_accelerators qemu_opts_foreach do_configure_accelerator accel_init_machine kvm_init virt_kvm_type virt_set_memmap machine_memory_devices_init memory_region_add_subregion memory_region_add_subregion_common memory_region_update_container_subregions memory_region_transaction_begin qemu_flush_coalesced_mmio_buffer kvm_flush_coalesced_mmio_buffer Fix it by bailing early in kvm_flush_coalesced_mmio_buffer() on the uninitialized @kvm_state. With this applied, no crash is observed on arm64. Fixes: 176d073029 ("hw/arm/virt: Use machine_memory_devices_init()") Signed-off-by: Gavin Shan <gshan@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20230731125946.2038742-1-gshan@redhat.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2023-06-28exec/memory: Add symbol for the min value of memory listener priorityIsaku Yamahata1-0/+1
Add MEMORY_LISTNER_PRIORITY_MIN for the symbolic value for the min value of the memory listener instead of the hard-coded magic value 0. Add explicit initialization. No functional change intended. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <29f88477fe82eb774bcfcae7f65ea21995f865f2.1687279702.git.isaku.yamahata@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>