aboutsummaryrefslogtreecommitdiff
path: root/qemu-seccomp.c
AgeCommit message (Collapse)AuthorFilesLines
2019-03-27seccomp: report more useful errors from seccompDaniel P. Berrangé1-7/+13
Most of the seccomp functions return errnos as a negative return value. The code is currently ignoring these and reporting a generic error message for all seccomp failure scenarios making debugging painful. Report a more precise error from each failed call and include errno if it is available. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2019-03-27seccomp: don't kill process for resource control syscallsDaniel P. Berrangé1-7/+25
The Mesa library tries to set process affinity on some of its threads in order to optimize its performance. Currently this results in QEMU being immediately terminated when seccomp is enabled. Mesa doesn't consider failure of the process affinity settings to be fatal to its operation, but our seccomp policy gives it no choice in gracefully handling this denial. It is reasonable to consider that malicious code using the resource control syscalls to be a less serious attack than if they were trying to spawn processes or change UIDs and other such things. Generally speaking changing the resource control setting will "merely" affect quality of service of processes on the host. With this in mind, rather than kill the process, we can relax the policy for these syscalls to return the EPERM errno value. This allows callers to detect that QEMU does not want them to change resource allocations, and apply some reasonable fallback logic. The main downside to this is for code which uses these syscalls but does not check the return value, blindly assuming they will always succeeed. Returning an errno could result in sub-optimal behaviour. Arguably though such code is already broken & needs fixing regardless. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2019-01-22seccomp: Work-around GCC 4.x bug in gnu99 modeThomas Huth1-1/+2
We'd like to compile QEMU with -std=gnu99, but GCC 4.8 currently fails to compile qemu-seccomp.c in this mode: qemu-seccomp.c:45:1: error: initializer element is not constant }; ^ qemu-seccomp.c:45:1: error: (near initialization for ‘sched_setscheduler_arg[0]’) This is due to a compiler bug which has just been fixed in GCC 5.0: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63567 Since we still want to support GCC 4.8 for a while and also want to use gnu99 mode, work-around the issue by expanding the macro manually. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-10-19seccomp: Clean up error reporting in parse_sandbox()Markus Armbruster1-9/+9
Calling error_report() in a function that takes an Error ** argument is suspicious. parse_sandbox() does that, and then fails without setting an error. Its caller main(), via qemu_opts_foreach(), is fine with it, but clean it up anyway. Cc: Eduardo Otubo <otubo@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com> Message-Id: <20181017082702.5581-18-armbru@redhat.com>
2018-09-26seccomp: check TSYNC host capabilityMarc-André Lureau1-1/+18
Remove -sandbox option if the host is not capable of TSYNC, since the sandbox will fail at setup time otherwise. This will help libvirt, for ex, to figure out if -sandbox will work. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Eduardo Otubo <otubo@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
2018-08-23seccomp: set the seccomp filter to all threadsMarc-André Lureau1-0/+5
When using "-seccomp on", the seccomp policy is only applied to the main thread, the vcpu worker thread and other worker threads created after seccomp policy is applied; the seccomp policy is not applied to e.g. the RCU thread because it is created before the seccomp policy is applied and SECCOMP_FILTER_FLAG_TSYNC isn't used. This can be verified with for task in /proc/`pidof qemu`/task/*; do cat $task/status | grep Secc ; done Seccomp: 2 Seccomp: 0 Seccomp: 0 Seccomp: 2 Seccomp: 2 Seccomp: 2 Starting with libseccomp 2.2.0 and kernel >= 3.17, we can use seccomp_attr_set(ctx, > SCMP_FLTATR_CTL_TSYNC, 1) to update the policy on all threads. libseccomp requirement was bumped to 2.2.0 in previous patch. libseccomp should fail to set the filter if it can't honour SCMP_FLTATR_CTL_TSYNC (untested), and thus -sandbox will now fail on kernel < 3.17. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
2018-08-23seccomp: prefer SCMP_ACT_KILL_PROCESS if availableMarc-André Lureau1-1/+30
The upcoming libseccomp release should have SCMP_ACT_KILL_PROCESS action (https://github.com/seccomp/libseccomp/issues/96). SCMP_ACT_KILL_PROCESS is preferable to immediately terminate the offending process, rather than having the SIGSYS handler running. Use SECCOMP_GET_ACTION_AVAIL to check availability of kernel support, as libseccomp will fallback on SCMP_ACT_KILL otherwise, and we still prefer SCMP_ACT_TRAP. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
2018-08-23seccomp: use SIGSYS signal instead of killing the threadMarc-André Lureau1-1/+1
The seccomp action SCMP_ACT_KILL results in immediate termination of the thread that made the bad system call. However, qemu being multi-threaded, it keeps running. There is no easy way for parent process / management layer (libvirt) to know about that situation. Instead, the default SIGSYS handler when invoked with SCMP_ACT_TRAP will terminate the program and core dump. This may not be the most secure solution, but probably better than just killing the offending thread. SCMP_ACT_KILL_PROCESS has been added in Linux 4.14 to improve the situation, which I propose to use by default if available in the next patch. Related to: https://bugzilla.redhat.com/show_bug.cgi?id=1594456 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
2018-07-12seccomp: allow sched_setscheduler() with SCHED_IDLE policyMarc-André Lureau1-2/+10
Current and upcoming mesa releases rely on a shader disk cash. It uses a thread job queue with low priority, set with sched_setscheduler(SCHED_IDLE). However, that syscall is rejected by the "resourcecontrol" seccomp qemu filter. Since it should be safe to allow lowering thread priority, let's allow scheduling thread to idle policy. Related to: https://bugzilla.redhat.com/show_bug.cgi?id=1594456 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
2018-06-01sandbox: disable -sandbox if CONFIG_SECCOMP undefinedYi Min Zhao1-1/+120
If CONFIG_SECCOMP is undefined, the option 'elevatedprivileges' remains compiled. This would make libvirt set the corresponding capability and then trigger failure during guest startup. This patch moves the code regarding seccomp command line options to qemu-seccomp.c file and wraps qemu_opts_foreach finding sandbox option with CONFIG_SECCOMP. Because parse_sandbox() is moved into qemu-seccomp.c file, change seccomp_start() to static function. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Tested-by: Ján Tomko <jtomko@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
2017-09-15seccomp: add resourcecontrol argument to command lineEduardo Otubo1-0/+11
This patch adds [,resourcecontrol=deny] to `-sandbox on' option. It blacklists all process affinity and scheduler priority system calls to avoid any bigger of the process. Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2017-09-15seccomp: add spawn argument to command lineEduardo Otubo1-0/+4
This patch adds [,spawn=deny] argument to `-sandbox on' option. It blacklists fork and execve system calls, avoiding Qemu to spawn new threads or processes. Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2017-09-15seccomp: add elevateprivileges argument to command lineEduardo Otubo1-0/+11
This patch introduces the new argument [,elevateprivileges=allow|deny|children] to the `-sandbox on'. It allows or denies Qemu process to elevate its privileges by blacklisting all set*uid|gid system calls. The 'children' option will let forks and execves run unprivileged. Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2017-09-15seccomp: add obsolete argument to command lineEduardo Otubo1-1/+18
This patch introduces the argument [,obsolete=allow] to the `-sandbox on' option. It allows Qemu to run safely on old system that still relies on old system calls. Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2017-09-15seccomp: changing from whitelist to blacklistEduardo Otubo1-232/+28
This patch changes the default behavior of the seccomp filter from whitelist to blacklist. By default now all system calls are allowed and a small black list of definitely forbidden ones was created. Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2016-09-21seccomp: adding getrusage to the whitelistEduardo Otubo1-0/+1
getrusage is used in a number of places throughout the qemu codebase (notably, in crypto/pbkdf.c). Without this syscall being whitelisted, qemu ends up getting killed by the kernel whenever you try to connect to a VNC console. Signed-off-by: Brian Rak <brak@gameservers.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2016-04-16seccomp: adding sysinfo system call to whitelistMiroslav Rezanina1-0/+1
Newer version of nss-softokn libraries (> 3.16.2.3) use sysinfo call so qemu using rbd image hang after start when run in sandbox mode. To allow using rbd images in sandbox mode we have to whitelist it. Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2016-04-16seccomp: Whitelist cacheflush since 2.2.0 not 2.2.3James Hogan1-3/+5
The cacheflush system call (found on MIPS and ARM) has been included in the libseccomp header since 2.2.0, so include it back to that version. Previously it was only enabled since 2.2.3 since that is when it was enabled properly for ARM. This will allow seccomp support to be enabled for MIPS back to libseccomp 2.2.0. Signed-off-by: James Hogan <james.hogan@imgtec.com> Reviewed-By: Andrew Jones <drjones@redhat.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2016-02-04all: Clean up includesPeter Maydell1-1/+1
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org
2015-11-16seccomp: add cacheflush to whitelistAndrew Jones1-1/+12
cacheflush is an arm-specific syscall that qemu built for arm uses. Add it to the whitelist, but only if we're linking with a recent enough libseccomp. Signed-off-by: Andrew Jones <drjones@redhat.com>
2015-10-22seccomp: add memfd_create to whitelistEduardo Otubo1-1/+2
This is used by memfd code. Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-01-23seccomp: add mlockall to whitelistPaolo Bonzini1-0/+1
This is used by "-realtime mlock=on". Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Tested-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2015-01-05seccomp: add mbind() to the syscall whitelistPaul Moore1-1/+2
The "memory-backend-ram" QOM object utilizes the mbind(2) syscall to set the policy for a memory range. Add the syscall to the seccomp sandbox whitelist. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Tested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
2014-11-11seccomp: whitelist syscalls fallocate(), fadvise64(), inotify_init1() and ↵Philipp Gesang1-1/+5
inotify_add_watch() fallocate() is needed for snapshotting. If it isn’t whitelisted $ qemu-img create -f qcow2 x.qcow 1G Formatting 'x.qcow', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off $ qemu-kvm -display none -monitor stdio -sandbox on x.qcow QEMU 2.1.50 monitor - type 'help' for more information (qemu) savevm foo (qemu) loadvm foo will fail, as will subsequent savevm commands on the same image. fadvise64(), inotify_init1(), inotify_add_watch() are needed by the SDL display. Without the whitelist entries, qemu-kvm -sandbox on fails immediately. In my tests fadvise64() is called 50--51 times per VM run. That number seems independent of the duration of the run. fallocate(), inotify_init1(), inotify_add_watch() are called once each. Accordingly, they are added to the whitelist at a very low priority. Signed-off-by: Philipp Gesang <philipp.gesang@intra2net.com> Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2014-08-21seccomp: add semctl() to the syscall whitelistPaul Moore1-1/+2
QEMU needs to call semctl() for correct operation. This particular problem was identified on shutdown with the following commandline: # qemu -sandbox on -monitor stdio \ -device intel-hda -device hda-duplex -vnc :0 Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2014-04-25seccomp: add shmctl(), mlock(), and munlock() to the syscall whitelistPaul Moore1-1/+4
Additional testing reveals that PulseAudio requires shmctl() and the mlock()/munlock() syscalls on some systems/configurations. As before, on systems that do require these syscalls, the problem can be seen with the following command line: # qemu -monitor stdio -sandbox on \ -device intel-hda -device hda-duplex Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
2014-04-25seccomp: add timerfd_create and timerfd_settime to the whitelistFelix Geyer1-1/+3
libusb calls timerfd_create() and timerfd_settime() when it's built with timerfd support. Command to reproduce: -device usb-host,hostbus=1,hostaddr=3,id=hostdev0 Log messages: audit(1390730418.924:135): auid=4294967295 uid=121 gid=103 ses=4294967295 pid=5232 comm="qemu-system-x86" sig=31 syscall=283 compat=0 ip=0x7f2b0f4e96a7 code=0x0 audit(1390733100.580:142): auid=4294967295 uid=121 gid=103 ses=4294967295 pid=16909 comm="qemu-system-x86" sig=31 syscall=286 compat=0 ip=0x7f03513a06da code=0x0 Reading a few hundred MB from a USB drive on x86_64 shows this syscall distribution. Therefore the timerfd_settime priority is set to 242. calls syscall --------- ---------------- 5303600 write 2240554 read 2167030 ppoll 2134828 ioctl 704023 timerfd_settime 689105 poll 83122 futex 803 writev 476 rt_sigprocmask 287 recvmsg 178 brk Signed-off-by: Felix Geyer <debfx@fobos.de> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
2014-01-20seccomp: add some basic shared memory syscalls to the whitelistPaul Moore1-1/+4
PulseAudio requires the use of shared memory so add shmget(), shmat(), and shmdt() to the syscall whitelist. Reported-by: xuhan@redhat.com Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-01-20seccomp: add mkdir() and fchmod() to the whitelistPaul Moore1-1/+3
The PulseAudio library attempts to do a mkdir(2) and fchmod(2) on "/run/user/<UID>/pulse" which is currently blocked by the syscall filter; this patch adds the two missing syscalls to the whitelist. You can reproduce this problem with the following command: # qemu -monitor stdio -device intel-hda -device hda-duplex If watched under strace the following syscalls are shown: mkdir("/run/user/0/pulse", 0700) fchmod(11, 0700) [NOTE: 11 is the fd for /run/user/0/pulse] Reported-by: xuhan@redhat.com Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-12-20seccomp: exit if seccomp_init() failsCorey Bryant1-0/+1
This fixes a bug where we weren't exiting if seccomp_init() failed. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Acked-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Acked-by: Paul Moore <pmoore@redhat.com>
2013-12-03seccomp: add kill() to the syscall whitelistPaul Moore1-0/+1
The kill() syscall is triggered with the following command: # qemu -sandbox on -monitor stdio \ -device intel-hda -device hda-duplex -vnc :0 The resulting syslog/audit message: # ausearch -m SECCOMP ---- time->Wed Nov 20 09:52:08 2013 type=SECCOMP msg=audit(1384912328.482:6656): auid=0 uid=0 gid=0 ses=854 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=12087 comm="qemu-kvm" sig=31 syscall=62 compat=0 ip=0x7f7a1d2abc67 code=0x0 # scmp_sys_resolver 62 kill Reported-by: CongLi <coli@redhat.com> Tested-by: CongLi <coli@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
2013-09-24seccomp: fine tuning whitelist by adding times()Eduardo Otubo1-0/+1
This was causing Qemu process to hang when using -sandbox on as discribed on RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175 Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Tested-by: Paul Moore <pmoore@redhat.com> Acked-by: Paul Moore <pmoore@redhat.com>
2013-07-29seccomp: add arch_prctl() to the syscall whitelistPaul Moore1-1/+2
It appears that even a very simple /etc/qemu-ifup configuration can require the arch_prctl() syscall, see the example below: #!/bin/sh /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif <switch> $1 Signed-off-by: Paul Moore <pmoore@redhat.com> Reviewed-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Message-id: 20130718135703.8247.19213.stgit@localhost Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-29seccomp: add additional asynchronous I/O syscallsPaul Moore1-0/+2
A previous commit, "seccomp: add the asynchronous I/O syscalls to the whitelist", added several asynchronous I/O syscalls but left out the io_submit() and io_cancel() syscalls. This patch corrects this by adding the two missing asynchronous I/O syscalls. Signed-off-by: Paul Moore <pmoore@redhat.com> Reviewed-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Message-id: 20130715193201.943.4913.stgit@localhost Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-26seccomp: removing unused syscalls gtom whitelistEduardo Otubo1-4/+0
v3 update: - reincluding getrlimit(), it is used by Xen. v2 update: - reincluding setrlimit(), it is used by Xen. Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1374518017-10424-3-git-send-email-otubo@linux.vnet.ibm.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-26seccomp: no need to check arch in syscall whitelistEduardo Otubo1-13/+0
v2 update: - set libseccomp 2.1.0 as requirement on configure script. Since libseccomp 2.0 there's no need to check the architecture type anymore. Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1374518017-10424-2-git-send-email-otubo@linux.vnet.ibm.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-05-30seccomp: add the asynchronous I/O syscalls to the whitelistPaul Moore1-1/+4
In order to enable the asynchronous I/O functionality when using the seccomp sandbox we need to add the associated syscalls to the whitelist. Signed-off-by: Paul Moore <pmoore@redhat.com> Reviewed-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Message-id: 20130529203001.20939.83322.stgit@localhost Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-12-19softmmu: move include files to include/sysemu/Paolo Bonzini1-1/+1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-30seccomp: adding new syscalls (bugzilla 855162)Eduardo Otubo1-17/+139
According to the bug 855162[0] - there's the need of adding new syscalls to the whitelist when using Qemu with Libvirt. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 Reported-by: Paul Moore <pmoore@redhat.com> Tested-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-08-16Adding qemu-seccomp.[ch] (v8)Eduardo Otubo1-0/+141
Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> --- v1: - I added a syscall struct using priority levels as described in the libseccomp man page. The priority numbers are based to the frequency they appear in a sample strace from a regular qemu guest run under libvirt. Libseccomp generates linear BPF code to filter system calls, those rules are read one after another. The priority system places the most common rules first in order to reduce the overhead when processing them. v1 -> v2: - Fixed some style issues - Removed code from vl.c and created qemu-seccomp.[ch] - Now using ARRAY_SIZE macro - Added more syscalls without priority/frequency set yet v2 -> v3: - Adding copyright and license information - Replacing seccomp_whitelist_count just by ARRAY_SIZE - Adding header protection to qemu-seccomp.h - Moving QemuSeccompSyscall definition to qemu-seccomp.c - Negative return from seccomp_start is fatal now. - Adding open() and execve() to the whitelis v3 -> v4: - Tests revealed a bigger set of syscalls. - seccomp_start() now has an argument to set the mode according to the configure option trap or kill. v4 -> v5: - Tests on x86_64 required a new specific set of system calls. - libseccomp release 1.0.0: part of the API have changed in this last release, had to adapt to the new function signatures.