virtio: add missing mb() on notification

During normal operation, virtio first writes a used index and then checks whether it should interrupt the guest by reading guest avail index/flag values. Guest does the reverse: writes the index/flag, then checks the used ring. The ordering is important: if host avail flag read bypasses the used index write, we could in effect get this timing: host avail flag read guest enable interrupts: avail flag write guest check used ring: ring is empty host used index write which results in a lost interrupt: guest will never be notified about the used ring update. This actually can happen when using kvm with an io thread, such that the guest vcpu and qemu run on different host cpus, and this has actually been observed in the field (but only seems to trigger on very specific processor types) with userspace virtio: vhost has the necessary smp_mb() in place to prevent the regordering, so the same workload stalls forever waiting for an interrupt with vhost=off but works fine with vhost=on. Insert an smp_mb barrier operation in userspace virtio to ensure the correct ordering. Applying this patch fixed the race condition we have observed. Tested on x86_64. I checked the code generated by the new macro for i386 and ppc but didn't run virtio. Note: mb could in theory be implemented by __sync_synchronize, but this would make us hit old GCC bugs. Besides old GCC not implementing __sync_synchronize at all, there were bugs http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36793 in this functionality as recently as in 4.3. As we need asm for rmb,wmb anyway, it's just as well to use it for mb. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
author: Michael S. Tsirkin <mst@redhat.com> 2012-04-22 16:45:53 +0300
committer: Michael S. Tsirkin <mst@redhat.com> 2012-04-25 10:53:46 +0300
commit: a281ebc11a6917fbc27e1a93bb5772cd14e241fc (patch)
tree: 1efc90c6cab9f458486543a826487dab1458151e
parent: 814cd3ac37be8e71c8ef76234d0da0bbfb2f2fb2 (diff)
download: qemu-a281ebc11a6917fbc27e1a93bb5772cd14e241fc.zip
qemu-a281ebc11a6917fbc27e1a93bb5772cd14e241fc.tar.gz
qemu-a281ebc11a6917fbc27e1a93bb5772cd14e241fc.tar.bz2
2 files changed, 22 insertions, 3 deletions
diff --git a/hw/virtio.c b/hw/virtio.c
index 314abf8..bb994c9 100644
--- a/hw/virtio.c
+++ b/hw/virtio.c
@@ -700,6 +700,8 @@ static bool vring_notify(VirtIODevice *vdev, VirtQueue *vq)
 {
     uint16_t old, new;
     bool v;
+    /* We need to expose used array entries before checking used event. */
+    smp_mb();
     /* Always notify when queue is empty (when feature acknowledge) */
     if (((vdev->guest_features & (1 << VIRTIO_F_NOTIFY_ON_EMPTY)) &&
          !vq->inuse && vring_avail_idx(vq) == vq->last_avail_idx)) {
diff --git a/qemu-barrier.h b/qemu-barrier.h
index c11bb2b..f0b842e 100644
--- a/qemu-barrier.h
+++ b/qemu-barrier.h
@@ -4,7 +4,7 @@
 /* Compiler barrier */
 #define barrier()   asm volatile("" ::: "memory")
 
-#if defined(__i386__) || defined(__x86_64__)
+#if defined(__i386__)
 
 /*
  * Because of the strongly ordered x86 storage model, wmb() is a nop
@@ -13,15 +13,31 @@
  * load/stores from C code.
  */
 #define smp_wmb()   barrier()
+/*
+ * We use GCC builtin if it's available, as that can use
+ * mfence on 32 bit as well, e.g. if built with -march=pentium-m.
+ * However, on i386, there seem to be known bugs as recently as 4.3.
+ * */
+#if defined(__GNUC__) && __GNUC__ >= 4 && __GNUC_MINOR__ >= 4
+#define smp_mb() __sync_synchronize()
+#else
+#define smp_mb() asm volatile("lock; addl $0,0(%%esp) " ::: "memory")
+#endif
+
+#elif defined(__x86_64__)
+
+#define smp_wmb()   barrier()
+#define smp_mb() asm volatile("mfence" ::: "memory")
 
 #elif defined(_ARCH_PPC)
 
 /*
- * We use an eieio() for a wmb() on powerpc.  This assumes we don't
+ * We use an eieio() for wmb() on powerpc.  This assumes we don't
  * need to order cacheable and non-cacheable stores with respect to
  * each other
  */
 #define smp_wmb()   asm volatile("eieio" ::: "memory")
+#define smp_mb()   asm volatile("sync" ::: "memory")
 
 #else
 
@@ -29,9 +45,10 @@
  * For (host) platforms we don't have explicit barrier definitions
  * for, we use the gcc __sync_synchronize() primitive to generate a
  * full barrier.  This should be safe on all platforms, though it may
- * be overkill.
+ * be overkill for wmb().
  */
 #define smp_wmb()   __sync_synchronize()
+#define smp_mb()   __sync_synchronize()
 
 #endif
author	Michael S. Tsirkin <mst@redhat.com>	2012-04-22 16:45:53 +0300
committer	Michael S. Tsirkin <mst@redhat.com>	2012-04-25 10:53:46 +0300
commit	a281ebc11a6917fbc27e1a93bb5772cd14e241fc (patch)
tree	1efc90c6cab9f458486543a826487dab1458151e
parent	814cd3ac37be8e71c8ef76234d0da0bbfb2f2fb2 (diff)
download	qemu-a281ebc11a6917fbc27e1a93bb5772cd14e241fc.zip qemu-a281ebc11a6917fbc27e1a93bb5772cd14e241fc.tar.gz qemu-a281ebc11a6917fbc27e1a93bb5772cd14e241fc.tar.bz2