aboutsummaryrefslogtreecommitdiff
path: root/qemu-seccomp.c
AgeCommit message (Collapse)AuthorFilesLines
2018-07-12seccomp: allow sched_setscheduler() with SCHED_IDLE policyMarc-André Lureau1-2/+10
Current and upcoming mesa releases rely on a shader disk cash. It uses a thread job queue with low priority, set with sched_setscheduler(SCHED_IDLE). However, that syscall is rejected by the "resourcecontrol" seccomp qemu filter. Since it should be safe to allow lowering thread priority, let's allow scheduling thread to idle policy. Related to: https://bugzilla.redhat.com/show_bug.cgi?id=1594456 Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
2018-06-01sandbox: disable -sandbox if CONFIG_SECCOMP undefinedYi Min Zhao1-1/+120
If CONFIG_SECCOMP is undefined, the option 'elevatedprivileges' remains compiled. This would make libvirt set the corresponding capability and then trigger failure during guest startup. This patch moves the code regarding seccomp command line options to qemu-seccomp.c file and wraps qemu_opts_foreach finding sandbox option with CONFIG_SECCOMP. Because parse_sandbox() is moved into qemu-seccomp.c file, change seccomp_start() to static function. Signed-off-by: Yi Min Zhao <zyimin@linux.ibm.com> Reviewed-by: Ján Tomko <jtomko@redhat.com> Tested-by: Ján Tomko <jtomko@redhat.com> Acked-by: Eduardo Otubo <otubo@redhat.com>
2017-09-15seccomp: add resourcecontrol argument to command lineEduardo Otubo1-0/+11
This patch adds [,resourcecontrol=deny] to `-sandbox on' option. It blacklists all process affinity and scheduler priority system calls to avoid any bigger of the process. Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2017-09-15seccomp: add spawn argument to command lineEduardo Otubo1-0/+4
This patch adds [,spawn=deny] argument to `-sandbox on' option. It blacklists fork and execve system calls, avoiding Qemu to spawn new threads or processes. Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2017-09-15seccomp: add elevateprivileges argument to command lineEduardo Otubo1-0/+11
This patch introduces the new argument [,elevateprivileges=allow|deny|children] to the `-sandbox on'. It allows or denies Qemu process to elevate its privileges by blacklisting all set*uid|gid system calls. The 'children' option will let forks and execves run unprivileged. Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2017-09-15seccomp: add obsolete argument to command lineEduardo Otubo1-1/+18
This patch introduces the argument [,obsolete=allow] to the `-sandbox on' option. It allows Qemu to run safely on old system that still relies on old system calls. Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2017-09-15seccomp: changing from whitelist to blacklistEduardo Otubo1-232/+28
This patch changes the default behavior of the seccomp filter from whitelist to blacklist. By default now all system calls are allowed and a small black list of definitely forbidden ones was created. Signed-off-by: Eduardo Otubo <otubo@redhat.com>
2016-09-21seccomp: adding getrusage to the whitelistEduardo Otubo1-0/+1
getrusage is used in a number of places throughout the qemu codebase (notably, in crypto/pbkdf.c). Without this syscall being whitelisted, qemu ends up getting killed by the kernel whenever you try to connect to a VNC console. Signed-off-by: Brian Rak <brak@gameservers.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2016-04-16seccomp: adding sysinfo system call to whitelistMiroslav Rezanina1-0/+1
Newer version of nss-softokn libraries (> 3.16.2.3) use sysinfo call so qemu using rbd image hang after start when run in sandbox mode. To allow using rbd images in sandbox mode we have to whitelist it. Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2016-04-16seccomp: Whitelist cacheflush since 2.2.0 not 2.2.3James Hogan1-3/+5
The cacheflush system call (found on MIPS and ARM) has been included in the libseccomp header since 2.2.0, so include it back to that version. Previously it was only enabled since 2.2.3 since that is when it was enabled properly for ARM. This will allow seccomp support to be enabled for MIPS back to libseccomp 2.2.0. Signed-off-by: James Hogan <james.hogan@imgtec.com> Reviewed-By: Andrew Jones <drjones@redhat.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2016-02-04all: Clean up includesPeter Maydell1-1/+1
Clean up includes so that osdep.h is included first and headers which it implies are not included manually. This commit was created with scripts/clean-includes. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 1454089805-5470-16-git-send-email-peter.maydell@linaro.org
2015-11-16seccomp: add cacheflush to whitelistAndrew Jones1-1/+12
cacheflush is an arm-specific syscall that qemu built for arm uses. Add it to the whitelist, but only if we're linking with a recent enough libseccomp. Signed-off-by: Andrew Jones <drjones@redhat.com>
2015-10-22seccomp: add memfd_create to whitelistEduardo Otubo1-1/+2
This is used by memfd code. Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Thibaut Collet <thibaut.collet@6wind.com>
2015-01-23seccomp: add mlockall to whitelistPaolo Bonzini1-0/+1
This is used by "-realtime mlock=on". Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Amit Shah <amit.shah@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Tested-by: Eduardo Habkost <ehabkost@redhat.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2015-01-05seccomp: add mbind() to the syscall whitelistPaul Moore1-1/+2
The "memory-backend-ram" QOM object utilizes the mbind(2) syscall to set the policy for a memory range. Add the syscall to the seccomp sandbox whitelist. Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Acked-by: Eduardo Otubo <eduardo.otubo@profitbricks.com> Tested-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>
2014-11-11seccomp: whitelist syscalls fallocate(), fadvise64(), inotify_init1() and ↵Philipp Gesang1-1/+5
inotify_add_watch() fallocate() is needed for snapshotting. If it isn’t whitelisted $ qemu-img create -f qcow2 x.qcow 1G Formatting 'x.qcow', fmt=qcow2 size=1073741824 encryption=off cluster_size=65536 lazy_refcounts=off $ qemu-kvm -display none -monitor stdio -sandbox on x.qcow QEMU 2.1.50 monitor - type 'help' for more information (qemu) savevm foo (qemu) loadvm foo will fail, as will subsequent savevm commands on the same image. fadvise64(), inotify_init1(), inotify_add_watch() are needed by the SDL display. Without the whitelist entries, qemu-kvm -sandbox on fails immediately. In my tests fadvise64() is called 50--51 times per VM run. That number seems independent of the duration of the run. fallocate(), inotify_init1(), inotify_add_watch() are called once each. Accordingly, they are added to the whitelist at a very low priority. Signed-off-by: Philipp Gesang <philipp.gesang@intra2net.com> Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2014-08-21seccomp: add semctl() to the syscall whitelistPaul Moore1-1/+2
QEMU needs to call semctl() for correct operation. This particular problem was identified on shutdown with the following commandline: # qemu -sandbox on -monitor stdio \ -device intel-hda -device hda-duplex -vnc :0 Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <eduardo.otubo@profitbricks.com>
2014-04-25seccomp: add shmctl(), mlock(), and munlock() to the syscall whitelistPaul Moore1-1/+4
Additional testing reveals that PulseAudio requires shmctl() and the mlock()/munlock() syscalls on some systems/configurations. As before, on systems that do require these syscalls, the problem can be seen with the following command line: # qemu -monitor stdio -sandbox on \ -device intel-hda -device hda-duplex Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
2014-04-25seccomp: add timerfd_create and timerfd_settime to the whitelistFelix Geyer1-1/+3
libusb calls timerfd_create() and timerfd_settime() when it's built with timerfd support. Command to reproduce: -device usb-host,hostbus=1,hostaddr=3,id=hostdev0 Log messages: audit(1390730418.924:135): auid=4294967295 uid=121 gid=103 ses=4294967295 pid=5232 comm="qemu-system-x86" sig=31 syscall=283 compat=0 ip=0x7f2b0f4e96a7 code=0x0 audit(1390733100.580:142): auid=4294967295 uid=121 gid=103 ses=4294967295 pid=16909 comm="qemu-system-x86" sig=31 syscall=286 compat=0 ip=0x7f03513a06da code=0x0 Reading a few hundred MB from a USB drive on x86_64 shows this syscall distribution. Therefore the timerfd_settime priority is set to 242. calls syscall --------- ---------------- 5303600 write 2240554 read 2167030 ppoll 2134828 ioctl 704023 timerfd_settime 689105 poll 83122 futex 803 writev 476 rt_sigprocmask 287 recvmsg 178 brk Signed-off-by: Felix Geyer <debfx@fobos.de> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
2014-01-20seccomp: add some basic shared memory syscalls to the whitelistPaul Moore1-1/+4
PulseAudio requires the use of shared memory so add shmget(), shmat(), and shmdt() to the syscall whitelist. Reported-by: xuhan@redhat.com Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-01-20seccomp: add mkdir() and fchmod() to the whitelistPaul Moore1-1/+3
The PulseAudio library attempts to do a mkdir(2) and fchmod(2) on "/run/user/<UID>/pulse" which is currently blocked by the syscall filter; this patch adds the two missing syscalls to the whitelist. You can reproduce this problem with the following command: # qemu -monitor stdio -device intel-hda -device hda-duplex If watched under strace the following syscalls are shown: mkdir("/run/user/0/pulse", 0700) fchmod(11, 0700) [NOTE: 11 is the fd for /run/user/0/pulse] Reported-by: xuhan@redhat.com Signed-off-by: Paul Moore <pmoore@redhat.com>
2013-12-20seccomp: exit if seccomp_init() failsCorey Bryant1-0/+1
This fixes a bug where we weren't exiting if seccomp_init() failed. Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Acked-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Acked-by: Paul Moore <pmoore@redhat.com>
2013-12-03seccomp: add kill() to the syscall whitelistPaul Moore1-0/+1
The kill() syscall is triggered with the following command: # qemu -sandbox on -monitor stdio \ -device intel-hda -device hda-duplex -vnc :0 The resulting syslog/audit message: # ausearch -m SECCOMP ---- time->Wed Nov 20 09:52:08 2013 type=SECCOMP msg=audit(1384912328.482:6656): auid=0 uid=0 gid=0 ses=854 subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 pid=12087 comm="qemu-kvm" sig=31 syscall=62 compat=0 ip=0x7f7a1d2abc67 code=0x0 # scmp_sys_resolver 62 kill Reported-by: CongLi <coli@redhat.com> Tested-by: CongLi <coli@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Eduardo Otubo <otubo@linux.vnet.ibm.com>
2013-09-24seccomp: fine tuning whitelist by adding times()Eduardo Otubo1-0/+1
This was causing Qemu process to hang when using -sandbox on as discribed on RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1004175 Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Tested-by: Paul Moore <pmoore@redhat.com> Acked-by: Paul Moore <pmoore@redhat.com>
2013-07-29seccomp: add arch_prctl() to the syscall whitelistPaul Moore1-1/+2
It appears that even a very simple /etc/qemu-ifup configuration can require the arch_prctl() syscall, see the example below: #!/bin/sh /sbin/ifconfig $1 0.0.0.0 up /usr/sbin/brctl addif <switch> $1 Signed-off-by: Paul Moore <pmoore@redhat.com> Reviewed-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Message-id: 20130718135703.8247.19213.stgit@localhost Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-29seccomp: add additional asynchronous I/O syscallsPaul Moore1-0/+2
A previous commit, "seccomp: add the asynchronous I/O syscalls to the whitelist", added several asynchronous I/O syscalls but left out the io_submit() and io_cancel() syscalls. This patch corrects this by adding the two missing asynchronous I/O syscalls. Signed-off-by: Paul Moore <pmoore@redhat.com> Reviewed-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Message-id: 20130715193201.943.4913.stgit@localhost Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-26seccomp: removing unused syscalls gtom whitelistEduardo Otubo1-4/+0
v3 update: - reincluding getrlimit(), it is used by Xen. v2 update: - reincluding setrlimit(), it is used by Xen. Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1374518017-10424-3-git-send-email-otubo@linux.vnet.ibm.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-26seccomp: no need to check arch in syscall whitelistEduardo Otubo1-13/+0
v2 update: - set libseccomp 2.1.0 as requirement on configure script. Since libseccomp 2.0 there's no need to check the architecture type anymore. Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1374518017-10424-2-git-send-email-otubo@linux.vnet.ibm.com Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-05-30seccomp: add the asynchronous I/O syscalls to the whitelistPaul Moore1-1/+4
In order to enable the asynchronous I/O functionality when using the seccomp sandbox we need to add the associated syscalls to the whitelist. Signed-off-by: Paul Moore <pmoore@redhat.com> Reviewed-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Message-id: 20130529203001.20939.83322.stgit@localhost Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-12-19softmmu: move include files to include/sysemu/Paolo Bonzini1-1/+1
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-11-30seccomp: adding new syscalls (bugzilla 855162)Eduardo Otubo1-17/+139
According to the bug 855162[0] - there's the need of adding new syscalls to the whitelist when using Qemu with Libvirt. [0] - https://bugzilla.redhat.com/show_bug.cgi?id=855162 Reported-by: Paul Moore <pmoore@redhat.com> Tested-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Signed-off-by: Corey Bryant <coreyb@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-08-16Adding qemu-seccomp.[ch] (v8)Eduardo Otubo1-0/+141
Signed-off-by: Eduardo Otubo <otubo@linux.vnet.ibm.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> --- v1: - I added a syscall struct using priority levels as described in the libseccomp man page. The priority numbers are based to the frequency they appear in a sample strace from a regular qemu guest run under libvirt. Libseccomp generates linear BPF code to filter system calls, those rules are read one after another. The priority system places the most common rules first in order to reduce the overhead when processing them. v1 -> v2: - Fixed some style issues - Removed code from vl.c and created qemu-seccomp.[ch] - Now using ARRAY_SIZE macro - Added more syscalls without priority/frequency set yet v2 -> v3: - Adding copyright and license information - Replacing seccomp_whitelist_count just by ARRAY_SIZE - Adding header protection to qemu-seccomp.h - Moving QemuSeccompSyscall definition to qemu-seccomp.c - Negative return from seccomp_start is fatal now. - Adding open() and execve() to the whitelis v3 -> v4: - Tests revealed a bigger set of syscalls. - seccomp_start() now has an argument to set the mode according to the configure option trap or kill. v4 -> v5: - Tests on x86_64 required a new specific set of system calls. - libseccomp release 1.0.0: part of the API have changed in this last release, had to adapt to the new function signatures.