From db81b9953761cac71906728fb3dfefce661ab903 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 21 Sep 2017 14:44:08 +0200 Subject: atomic: update documentation Signed-off-by: Paolo Bonzini --- docs/devel/atomics.txt | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) (limited to 'docs') diff --git a/docs/devel/atomics.txt b/docs/devel/atomics.txt index 3ef5d85..048e5f2 100644 --- a/docs/devel/atomics.txt +++ b/docs/devel/atomics.txt @@ -63,11 +63,22 @@ operations: typeof(*ptr) atomic_fetch_sub(ptr, val) typeof(*ptr) atomic_fetch_and(ptr, val) typeof(*ptr) atomic_fetch_or(ptr, val) + typeof(*ptr) atomic_fetch_xor(ptr, val) typeof(*ptr) atomic_xchg(ptr, val) typeof(*ptr) atomic_cmpxchg(ptr, old, new) all of which return the old value of *ptr. These operations are -polymorphic; they operate on any type that is as wide as an int. +polymorphic; they operate on any type that is as wide as a pointer. + +Similar operations return the new value of *ptr: + + typeof(*ptr) atomic_inc_fetch(ptr) + typeof(*ptr) atomic_dec_fetch(ptr) + typeof(*ptr) atomic_add_fetch(ptr, val) + typeof(*ptr) atomic_sub_fetch(ptr, val) + typeof(*ptr) atomic_and_fetch(ptr, val) + typeof(*ptr) atomic_or_fetch(ptr, val) + typeof(*ptr) atomic_xor_fetch(ptr, val) Sequentially consistent loads and stores can be done using: -- cgit v1.1 From 447b0d0b9ee8a0ac216c3186e0f3c427a1001f0c Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 21 Sep 2017 14:32:47 +0200 Subject: memory: avoid "resurrection" of dead FlatViews It's possible for address_space_get_flatview() as it currently stands to cause a use-after-free for the returned FlatView, if the reference count is incremented after the FlatView has been replaced by a writer: thread 1 thread 2 RCU thread ------------------------------------------------------------- rcu_read_lock read as->current_map set as->current_map flatview_unref '--> call_rcu flatview_ref [ref=1] rcu_read_unlock flatview_destroy Since FlatViews are not updated very often, we can just detect the situation using a new atomic op atomic_fetch_inc_nonzero, similar to Linux's atomic_inc_not_zero, which performs the refcount increment only if it hasn't already hit zero. This is similar to Linux commit de09a9771a53 ("CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials", 2010-07-29). Signed-off-by: Paolo Bonzini --- docs/devel/atomics.txt | 1 + 1 file changed, 1 insertion(+) (limited to 'docs') diff --git a/docs/devel/atomics.txt b/docs/devel/atomics.txt index 048e5f2..10c5fa3 100644 --- a/docs/devel/atomics.txt +++ b/docs/devel/atomics.txt @@ -64,6 +64,7 @@ operations: typeof(*ptr) atomic_fetch_and(ptr, val) typeof(*ptr) atomic_fetch_or(ptr, val) typeof(*ptr) atomic_fetch_xor(ptr, val) + typeof(*ptr) atomic_fetch_inc_nonzero(ptr) typeof(*ptr) atomic_xchg(ptr, val) typeof(*ptr) atomic_cmpxchg(ptr, old, new) -- cgit v1.1 From 7c9e527659c67d4d7b41d9504f93d2d7ee482488 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Mon, 21 Aug 2017 18:58:56 +0200 Subject: scsi, file-posix: add support for persistent reservation management It is a common requirement for virtual machine to send persistent reservations, but this currently requires either running QEMU with CAP_SYS_RAWIO, or using out-of-tree patches that let an unprivileged QEMU bypass Linux's filter on SG_IO commands. As an alternative mechanism, the next patches will introduce a privileged helper to run persistent reservation commands without expanding QEMU's attack surface unnecessarily. The helper is invoked through a "pr-manager" QOM object, to which file-posix.c passes SG_IO requests for PERSISTENT RESERVE OUT and PERSISTENT RESERVE IN commands. For example: $ qemu-system-x86_64 -device virtio-scsi \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd or: $ qemu-system-x86_64 -device virtio-scsi \ -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd Multiple pr-manager implementations are conceivable and possible, though only one is implemented right now. For example, a pr-manager could: - talk directly to the multipath daemon from a privileged QEMU (i.e. QEMU links to libmpathpersist); this makes reservation work properly with multipath, but still requires CAP_SYS_RAWIO - use the Linux IOC_PR_* ioctls (they require CAP_SYS_ADMIN though) - more interestingly, implement reservations directly in QEMU through file system locks or a shared database (e.g. sqlite) Signed-off-by: Paolo Bonzini --- docs/pr-manager.rst | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100644 docs/pr-manager.rst (limited to 'docs') diff --git a/docs/pr-manager.rst b/docs/pr-manager.rst new file mode 100644 index 0000000..b6089fb --- /dev/null +++ b/docs/pr-manager.rst @@ -0,0 +1,51 @@ +====================================== +Persistent reservation managers +====================================== + +SCSI persistent Reservations allow restricting access to block devices +to specific initiators in a shared storage setup. When implementing +clustering of virtual machines, it is a common requirement for virtual +machines to send persistent reservation SCSI commands. However, +the operating system restricts sending these commands to unprivileged +programs because incorrect usage can disrupt regular operation of the +storage fabric. + +For this reason, QEMU's SCSI passthrough devices, ``scsi-block`` +and ``scsi-generic`` (both are only available on Linux) can delegate +implementation of persistent reservations to a separate object, +the "persistent reservation manager". Only PERSISTENT RESERVE OUT and +PERSISTENT RESERVE IN commands are passed to the persistent reservation +manager object; other commands are processed by QEMU as usual. + +----------------------------------------- +Defining a persistent reservation manager +----------------------------------------- + +A persistent reservation manager is an instance of a subclass of the +"pr-manager" QOM class. + +Right now only one subclass is defined, ``pr-manager-helper``, which +forwards the commands to an external privileged helper program +over Unix sockets. The helper program only allows sending persistent +reservation commands to devices for which QEMU has a file descriptor, +so that QEMU will not be able to effect persistent reservations +unless it has access to both the socket and the device. + +``pr-manager-helper`` has a single string property, ``path``, which +accepts the path to the helper program's Unix socket. For example, +the following command line defines a ``pr-manager-helper`` object and +attaches it to a SCSI passthrough device:: + + $ qemu-system-x86_64 + -device virtio-scsi \ + -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock + -drive if=none,id=hd,driver=raw,file.filename=/dev/sdb,file.pr-manager=helper0 + -device scsi-block,drive=hd + +Alternatively, using ``-blockdev``:: + + $ qemu-system-x86_64 + -device virtio-scsi \ + -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock + -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0 + -device scsi-block,drive=hd -- cgit v1.1 From b855f8d175a0a26c9798cbc5962bb8c0d9538231 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Tue, 22 Aug 2017 06:50:18 +0200 Subject: scsi: build qemu-pr-helper Introduce a privileged helper to run persistent reservation commands. This lets virtual machines send persistent reservations without using CAP_SYS_RAWIO or out-of-tree patches. The helper uses Unix permissions and SCM_RIGHTS to restrict access to processes that can access its socket and prove that they have an open file descriptor for a raw SCSI device. The next patch will also correct the usage of persistent reservations with multipath devices. It would also be possible to support for Linux's IOC_PR_* ioctls in the future, to support NVMe devices. For now, however, only SCSI is supported. Signed-off-by: Paolo Bonzini --- docs/interop/pr-helper.rst | 83 ++++++++++++++++++++++++++++++++++++++++++++++ docs/pr-manager.rst | 33 ++++++++++++++++++ 2 files changed, 116 insertions(+) create mode 100644 docs/interop/pr-helper.rst (limited to 'docs') diff --git a/docs/interop/pr-helper.rst b/docs/interop/pr-helper.rst new file mode 100644 index 0000000..9f76d5b --- /dev/null +++ b/docs/interop/pr-helper.rst @@ -0,0 +1,83 @@ +.. + +====================================== +Persistent reservation helper protocol +====================================== + +QEMU's SCSI passthrough devices, ``scsi-block`` and ``scsi-generic``, +can delegate implementation of persistent reservations to an external +(and typically privileged) program. Persistent Reservations allow +restricting access to block devices to specific initiators in a shared +storage setup. + +For a more detailed reference please refer the the SCSI Primary +Commands standard, specifically the section on Reservations and the +"PERSISTENT RESERVE IN" and "PERSISTENT RESERVE OUT" commands. + +This document describes the socket protocol used between QEMU's +``pr-manager-helper`` object and the external program. + +.. contents:: + +Connection and initialization +----------------------------- + +All data transmitted on the socket is big-endian. + +After connecting to the helper program's socket, the helper starts a simple +feature negotiation process by writing four bytes corresponding to +the features it exposes (``supported_features``). QEMU reads it, +then writes four bytes corresponding to the desired features of the +helper program (``requested_features``). + +If a bit is 1 in ``requested_features`` and 0 in ``supported_features``, +the corresponding feature is not supported by the helper and the connection +is closed. On the other hand, it is acceptable for a bit to be 0 in +``requested_features`` and 1 in ``supported_features``; in this case, +the helper will not enable the feature. + +Right now no feature is defined, so the two parties always write four +zero bytes. + +Command format +-------------- + +It is invalid to send multiple commands concurrently on the same +socket. It is however possible to connect multiple sockets to the +helper and send multiple commands to the helper for one or more +file descriptors. + +A command consists of a request and a response. A request consists +of a 16-byte SCSI CDB. A file descriptor must be passed to the helper +together with the SCSI CDB using ancillary data. + +The CDB has the following limitations: + +- the command (stored in the first byte) must be one of 0x5E + (PERSISTENT RESERVE IN) or 0x5F (PERSISTENT RESERVE OUT). + +- the allocation length (stored in bytes 7-8 of the CDB for PERSISTENT + RESERVE IN) or parameter list length (stored in bytes 5-8 of the CDB + for PERSISTENT RESERVE OUT) is limited to 8 KiB. + +For PERSISTENT RESERVE OUT, the parameter list is sent right after the +CDB. The length of the parameter list is taken from the CDB itself. + +The helper's reply has the following structure: + +- 4 bytes for the SCSI status + +- 4 bytes for the payload size (nonzero only for PERSISTENT RESERVE IN + and only if the SCSI status is 0x00, i.e. GOOD) + +- 96 bytes for the SCSI sense data + +- if the size is nonzero, the payload follows + +The sense data is always sent to keep the protocol simple, even though +it is only valid if the SCSI status is CHECK CONDITION (0x02). + +The payload size is always less than or equal to the allocation length +specified in the CDB for the PERSISTENT RESERVE IN command. + +If the protocol is violated, the helper closes the socket. diff --git a/docs/pr-manager.rst b/docs/pr-manager.rst index b6089fb..7107e59 100644 --- a/docs/pr-manager.rst +++ b/docs/pr-manager.rst @@ -49,3 +49,36 @@ Alternatively, using ``-blockdev``:: -object pr-manager-helper,id=helper0,path=/var/run/qemu-pr-helper.sock -blockdev node-name=hd,driver=raw,file.driver=host_device,file.filename=/dev/sdb,file.pr-manager=helper0 -device scsi-block,drive=hd + +---------------------------------- +Invoking :program:`qemu-pr-helper` +---------------------------------- + +QEMU provides an implementation of the persistent reservation helper, +called :program:`qemu-pr-helper`. The helper should be started as a +system service and supports the following option: + +-d, --daemon run in the background +-q, --quiet decrease verbosity +-f, --pidfile=path PID file when running as a daemon +-k, --socket=path path to the socket +-T, --trace=trace-opts tracing options + +By default, the socket and PID file are placed in the runtime state +directory, for example :file:`/var/run/qemu-pr-helper.sock` and +:file:`/var/run/qemu-pr-helper.pid`. The PID file is not created +unless :option:`-d` is passed too. + +:program:`qemu-pr-helper` can also use the systemd socket activation +protocol. In this case, the systemd socket unit should specify a +Unix stream socket, like this:: + + [Socket] + ListenStream=/var/run/qemu-pr-helper.sock + +After connecting to the socket, :program:`qemu-pr-helper`` can optionally drop +root privileges, except for those capabilities that are needed for +its operation. To do this, add the following options: + +-u, --user=user user to drop privileges to +-g, --group=group group to drop privileges to -- cgit v1.1 From fe8fc5ae5c808e037fa4746cbfeb3c07ffe0af81 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Tue, 22 Aug 2017 06:50:55 +0200 Subject: scsi: add multipath support to qemu-pr-helper Proper support of persistent reservation for multipath devices requires communication with the multipath daemon, so that the reservation is registered and applied when a path comes up. The device mapper utilities provide a library to do so; this patch makes qemu-pr-helper.c detect multipath devices and, when one is found, delegate the operation to libmpathpersist. Signed-off-by: Paolo Bonzini --- docs/pr-manager.rst | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) (limited to 'docs') diff --git a/docs/pr-manager.rst b/docs/pr-manager.rst index 7107e59..9b1de19 100644 --- a/docs/pr-manager.rst +++ b/docs/pr-manager.rst @@ -60,6 +60,7 @@ system service and supports the following option: -d, --daemon run in the background -q, --quiet decrease verbosity +-v, --verbose increase verbosity -f, --pidfile=path PID file when running as a daemon -k, --socket=path path to the socket -T, --trace=trace-opts tracing options @@ -82,3 +83,29 @@ its operation. To do this, add the following options: -u, --user=user user to drop privileges to -g, --group=group group to drop privileges to + +--------------------------------------------- +Multipath devices and persistent reservations +--------------------------------------------- + +Proper support of persistent reservation for multipath devices requires +communication with the multipath daemon, so that the reservation is +registered and applied when a path is newly discovered or becomes online +again. :command:`qemu-pr-helper` can do this if the ``libmpathpersist`` +library was available on the system at build time. + +As of August 2017, a reservation key must be specified in ``multipath.conf`` +for ``multipathd`` to check for persistent reservation for newly +discovered paths or reinstated paths. The attribute can be added +to the ``defaults`` section or the ``multipaths`` section; for example:: + + multipaths { + multipath { + wwid XXXXXXXXXXXXXXXX + alias yellow + reservation_key 0x123abc + } + } + +Linking :program:`qemu-pr-helper` to ``libmpathpersist`` does not impede +its usage on regular SCSI devices. -- cgit v1.1