aboutsummaryrefslogtreecommitdiff
path: root/include
diff options
context:
space:
mode:
authorDavid Hildenbrand <david@redhat.com>2022-11-11 10:47:58 -0500
committerPaolo Bonzini <pbonzini@redhat.com>2023-01-11 09:59:39 +0100
commitf39b7d2b96e3e73c01bb678cd096f7baf0b9ab39 (patch)
tree5d75aca0bb322e055f9cf7096ab6279603191c58 /include
parenta27dd2de68f37ba96fe164a42121daa5f0750afc (diff)
downloadqemu-f39b7d2b96e3e73c01bb678cd096f7baf0b9ab39.zip
qemu-f39b7d2b96e3e73c01bb678cd096f7baf0b9ab39.tar.gz
qemu-f39b7d2b96e3e73c01bb678cd096f7baf0b9ab39.tar.bz2
kvm: Atomic memslot updates
If we update an existing memslot (e.g., resize, split), we temporarily remove the memslot to re-add it immediately afterwards. These updates are not atomic, especially not for KVM VCPU threads, such that we can get spurious faults. Let's inhibit most KVM ioctls while performing relevant updates, such that we can perform the update just as if it would happen atomically without additional kernel support. We capture the add/del changes and apply them in the notifier commit stage instead. There, we can check for overlaps and perform the ioctl inhibiting only if really required (-> overlap). To keep things simple we don't perform additional checks that wouldn't actually result in an overlap -- such as !RAM memory regions in some cases (see kvm_set_phys_mem()). To minimize cache-line bouncing, use a separate indicator (in_ioctl_lock) per CPU. Also, make sure to hold the kvm_slots_lock while performing both actions (removing+re-adding). We have to wait until all IOCTLs were exited and block new ones from getting executed. This approach cannot result in a deadlock as long as the inhibitor does not hold any locks that might hinder an IOCTL from getting finished and exited - something fairly unusual. The inhibitor will always hold the BQL. AFAIKs, one possible candidate would be userfaultfd. If a page cannot be placed (e.g., during postcopy), because we're waiting for a lock, or if the userfaultfd thread cannot process a fault, because it is waiting for a lock, there could be a deadlock. However, the BQL is not applicable here, because any other guest memory access while holding the BQL would already result in a deadlock. Nothing else in the kernel should block forever and wait for userspace intervention. Note: pause_all_vcpus()/resume_all_vcpus() or start_exclusive()/end_exclusive() cannot be used, as they either drop the BQL or require to be called without the BQL - something inhibitors cannot handle. We need a low-level locking mechanism that is deadlock-free even when not releasing the BQL. Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Tested-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20221111154758.1372674-4-eesposit@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Diffstat (limited to 'include')
-rw-r--r--include/sysemu/kvm_int.h8
1 files changed, 8 insertions, 0 deletions
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 3b4adcd..60b520a 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -12,6 +12,7 @@
#include "exec/memory.h"
#include "qapi/qapi-types-common.h"
#include "qemu/accel.h"
+#include "qemu/queue.h"
#include "sysemu/kvm.h"
typedef struct KVMSlot
@@ -31,10 +32,17 @@ typedef struct KVMSlot
ram_addr_t ram_start_offset;
} KVMSlot;
+typedef struct KVMMemoryUpdate {
+ QSIMPLEQ_ENTRY(KVMMemoryUpdate) next;
+ MemoryRegionSection section;
+} KVMMemoryUpdate;
+
typedef struct KVMMemoryListener {
MemoryListener listener;
KVMSlot *slots;
int as_id;
+ QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_add;
+ QSIMPLEQ_HEAD(, KVMMemoryUpdate) transaction_del;
} KVMMemoryListener;
#define KVM_MSI_HASHTAB_SIZE 256