aboutsummaryrefslogtreecommitdiff
path: root/util/oslib-posix.c
AgeCommit message (Collapse)AuthorFilesLines
2022-02-06util/oslib-posix: Fix missing unlock in the error path of os_mem_prealloc()David Hildenbrand1-0/+1
We're missing an unlock in case installing the signal handler failed. Fortunately, we barely see this error in real life. Fixes: a960d6642d39 ("util/oslib-posix: Support concurrent os_mem_prealloc() invocation") Fixes: CID 1468941 Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta@ionos.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20220111120830.119912-1-david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Forward SIGBUS to MCE handler under LinuxDavid Hildenbrand1-3/+34
Temporarily modifying the SIGBUS handler is really nasty, as we might be unlucky and receive an MCE SIGBUS while having our handler registered. Unfortunately, there is no way around messing with SIGBUS when MADV_POPULATE_WRITE is not applicable or not around. Let's forward SIGBUS that don't belong to us to the already registered handler and document the situation. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-8-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Support concurrent os_mem_prealloc() invocationDavid Hildenbrand1-0/+9
Add a mutex to protect the SIGBUS case, as we cannot mess concurrently with the sigbus handler and we have to manage the global variable sigbus_memset_context. The MADV_POPULATE_WRITE path can run concurrently. Note that page_mutex and page_cond are shared between concurrent invocations, which shouldn't be a problem. This is a preparation for future virtio-mem prealloc code, which will call os_mem_prealloc() asynchronously from an iothread when handling guest requests. Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-7-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITEDavid Hildenbrand1-0/+8
Let's simplify the case when we only want a single thread and don't have to mess with signal handlers. Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-6-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Don't create too many threads with small memory or little ↵David Hildenbrand1-2/+10
pages Let's limit the number of threads to something sane, especially that - We don't have more threads than the number of pages we have - We don't have threads that initialize small (< 64 MiB) memory Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-5-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Introduce and use MemsetContext for touch_all_pages()David Hildenbrand1-26/+47
Let's minimize the number of global variables to prepare for os_mem_prealloc() getting called concurrently and make the code a bit easier to read. The only consumer that really needs a global variable is the sigbus handler, which will require protection via a mutex in the future either way as we cannot concurrently mess with the SIGBUS handler. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()David Hildenbrand1-21/+62
Let's sense support and use it for preallocation. MADV_POPULATE_WRITE does not require a SIGBUS handler, doesn't actually touch page content, and avoids context switches; it is, therefore, faster and easier to handle than our current approach. While MADV_POPULATE_WRITE is, in general, faster than manual prefaulting, and especially faster with 4k pages, there is still value in prefaulting using multiple threads to speed up preallocation. More details on MADV_POPULATE_WRITE can be found in the Linux commits 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables") and eb2faa513c24 ("mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)"), and in the man page proposal [1]. This resolves the TODO in do_touch_pages(). In the future, we might want to look into using fallocate(), eventually combined with MADV_POPULATE_READ, when dealing with shared file/fd mappings and not caring about memory bindings. [1] https://lkml.kernel.org/r/20210816081922.5155-1-david@redhat.com Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Let touch_all_pages() return an errorDavid Hildenbrand1-12/+16
Let's prepare touch_all_pages() for returning differing errors. Return an error from the thread and report the last processed error. Translate SIGBUS to -EFAULT, as a SIGBUS can mean all different kind of things (memory error, read error, out of memory). When allocating memory fails via the current SIGBUS-based mechanism, we'll get: os_mem_prealloc: preallocating memory failed: Bad address Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-06-15memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()David Hildenbrand1-2/+4
Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The new flag has the following semantics: " RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge pages if applicable) is skipped: will bail out if not supported. When not set, the OS will do the reservation, if supported for the memory type. " Allow passing it into: - memory_region_init_ram_nomigrate() - memory_region_init_resizeable_ram() - memory_region_init_ram_from_file() ... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag. Bail out if the flag is not supported, which is the case right now for both, POSIX and win32. We will add Linux support next and allow specifying RAM_NORESERVE via memory backends. The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-9-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap()David Hildenbrand1-1/+2
Let's pass flags instead of bools to prepare for passing other flags and update the documentation of qemu_ram_mmap(). Introduce new QEMU_MAP_ flags that abstract the mmap() PROT_ and MAP_ flag handling and simplify it. We expose only flags that are currently supported by qemu_ram_mmap(). Maybe, we'll see qemu_mmap() in the future as well that can implement these flags. Note: We don't use MAP_ flags as some flags (e.g., MAP_SYNC) are only defined for some systems and we want to always be able to identify these flags reliably inside qemu_ram_mmap() -- for example, to properly warn when some future flags are not available or effective on a system. Also, this way we can simplify PROT_ handling as well. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-8-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-04oslib-posix: Remove OpenBSD workaround for fcntl("/dev/null", F_SETFL, ↵Brad Smith1-11/+0
O_NONBLOCK) failure OpenBSD prior to 6.3 required a workaround to utilize fcntl(F_SETFL) on memory devices. Since modern verions of OpenBSD that are only officialy supported and buildable on do not have this issue I am garbage collecting this workaround. Signed-off-by: Brad Smith <brad@comstyle.com> Message-Id: <YGYECGXQhdamEJgC@humpty.home.comstyle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09memory: alloc RAM from file at offsetJagannathan Raman1-1/+1
Allow RAM MemoryRegion to be created from an offset in a file, instead of allocating at offset of 0 by default. This is needed to synchronize RAM between QEMU & remote process. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 609996697ad8617e3b01df38accc5c208c24d74e.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-02-01memory: add readonly support to memory_region_init_ram_from_file()Stefan Hajnoczi1-1/+1
There is currently no way to open(O_RDONLY) and mmap(PROT_READ) when creating a memory region from a file. This functionality is needed since the underlying host file may not allow writing. Add a bool readonly argument to memory_region_init_ram_from_file() and the APIs it calls. Extend memory_region_init_ram_from_file() rather than introducing a memory_region_init_rom_from_file() API so that callers can easily make a choice between read/write and read-only at runtime without calling different APIs. No new RAMBlock flag is introduced for read-only because it's unclear whether RAMBlocks need to know that they are read-only. Pass a bool readonly argument instead. Both of these design decisions can be changed in the future. It just seemed like the simplest approach to me. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Liam Merwick <liam.merwick@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20210104171320.575838-2-stefanha@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2021-01-07util/oslib: Assert qemu_try_memalign() alignment is a power of 2Philippe Mathieu-Daudé1-0/+2
qemu_try_memalign() expects a power of 2 alignment: - posix_memalign(3): The address of the allocated memory will be a multiple of alignment, which must be a power of two and a multiple of sizeof(void *). - _aligned_malloc() The alignment value, which must be an integer power of 2. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201021173803.2619054-3-philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2021-01-02cfi: Initial support for cfi-icall in QEMUDaniele Buono1-0/+11
LLVM/Clang, supports runtime checks for forward-edge Control-Flow Integrity (CFI). CFI on indirect function calls (cfi-icall) ensures that, in indirect function calls, the function called is of the right signature for the pointer type defined at compile time. For this check to work, the code must always respect the function signature when using function pointer, the function must be defined at compile time, and be compiled with link-time optimization. This rules out, for example, shared libraries that are dynamically loaded (given that functions are not known at compile time), and code that is dynamically generated at run-time. This patch: 1) Introduces the CONFIG_CFI flag to support cfi in QEMU 2) Introduces a decorator to allow the definition of "sensitive" functions, where a non-instrumented function may be called at runtime through a pointer. The decorator will take care of disabling cfi-icall checks on such functions, when cfi is enabled. 3) Marks functions currently in QEMU that exhibit such behavior, in particular: - The function in TCG that calls pre-compiled TBs - The function in TCI that interprets instructions - Functions in the plugin infrastructures that jump to callbacks - Functions in util that directly call a signal handler Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com> Acked-by: Alex Bennée <alex.bennee@linaro.org Message-Id: <20201204230615.2392-3-dbuono@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-30oslib-posix: relocate path to /varPaolo Bonzini1-2/+4
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-30oslib-posix: default exec_dir to bindirPaolo Bonzini1-15/+8
If the exec_dir cannot be retrieved, just assume it's the installation directory that was specified at configure time. This makes it simpler to reason about what the callers will do if they get back an empty path. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-30oslib: do not call g_strdup from qemu_get_exec_dirPaolo Bonzini1-3/+5
Just return the directory without requiring the caller to free it. This also removes a bogus check for NULL in os_find_datadir and module_load_one; g_strdup of a static variable cannot return NULL. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-09-16util: rename qemu_open() to qemu_open_old()Daniel P. Berrangé1-1/+1
We want to introduce a new version of qemu_open() that uses an Error object for reporting problems and make this it the preferred interface. Rename the existing method to release the namespace for the new impl. Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
2020-07-27util: add qemu_get_host_physmem utility functionAlex Bennée1-0/+15
This will be used in a future patch. For POSIX systems _SC_PHYS_PAGES isn't standardised but at least appears in the man pages for Open/FreeBSD. The result is advisory so any users of it shouldn't just fail if we can't work it out. The win32 stub currently returns 0 until someone with a Windows system can develop and test a patch. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Cc: BALATON Zoltan <balaton@eik.bme.hu> Cc: Christian Ehrhardt <christian.ehrhardt@canonical.com> Message-Id: <20200724064509.331-5-alex.bennee@linaro.org>
2020-07-20util: Implement qemu_get_thread_id() for OpenBSDDavid CARLIER1-0/+2
Implement qemu_get_thread_id() for OpenBSD hosts, using getthrid(). Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Brad Smith <brad@comstyle.com> Message-id: CA+XhMqxD6gQDBaj8tX0CMEj3si7qYKsM8u1km47e_-U7MC37Pg@mail.gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> [PMM: tidied up commit message] Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-07-15net: check if the file descriptor is valid before using itLaurent Vivier1-8/+18
qemu_set_nonblock() checks that the file descriptor can be used and, if not, crashes QEMU. An assert() is used for that. The use of assert() is used to detect programming error and the coredump will allow to debug the problem. But in the case of the tap device, this assert() can be triggered by a misconfiguration by the user. At startup, it's not a real problem, but it can also happen during the hot-plug of a new device, and here it's a problem because we can crash a perfectly healthy system. For instance: # ip link add link virbr0 name macvtap0 type macvtap mode bridge # ip link set macvtap0 up # TAP=/dev/tap$(ip -o link show macvtap0 | cut -d: -f1) # qemu-system-x86_64 -machine q35 -device pcie-root-port,id=pcie-root-port-0 -monitor stdio 9<> $TAP (qemu) netdev_add type=tap,id=hostnet0,vhost=on,fd=9 (qemu) device_add driver=virtio-net-pci,netdev=hostnet0,id=net0,bus=pcie-root-port-0 (qemu) device_del net0 (qemu) netdev_del hostnet0 (qemu) netdev_add type=tap,id=hostnet1,vhost=on,fd=9 qemu-system-x86_64: .../util/oslib-posix.c:247: qemu_set_nonblock: Assertion `f != -1' failed. Aborted (core dumped) To avoid that, add a function, qemu_try_set_nonblock(), that allows to report the problem without crashing. In the same way, we also update the function for vhostfd in net_init_tap_one() and for fd in net_init_socket() (both descriptors are provided by the user and can be wrong). Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
2020-07-13util: Introduce qemu_get_host_name()Michal Privoznik1-0/+35
This function offers operating system agnostic way to fetch host name. It is implemented for both POSIX-like and Windows systems. Signed-off-by: Michal Privoznik <mprivozn@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Michael Roth <mdroth@linux.vnet.ibm.com>
2020-07-13util/oslib-posix.c: Implement qemu_init_exec_dir() for HaikuDavid CARLIER1-0/+19
The qemu_init_exec_dir() function is inherently non-portable; provide an implementation for Haiku hosts. Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20200703145614.16684-9-peter.maydell@linaro.org [PMM: Expanded commit message] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-07-13osdep.h: Always include <sys/signal.h> if it existsDavid CARLIER1-1/+0
Regularize our handling of <sys/signal.h>: currently we include it in osdep.h, but only for OpenBSD, and we include it without an ifdef guard in a couple of C files. This causes problems for Haiku, which doesn't have that header. Instead, check in configure whether sys/signal.h exists, and if it does then always include it from osdep.h. Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20200703145614.16684-5-peter.maydell@linaro.org [PMM: Expanded commit message; rename to HAVE_SYS_SIGNAL_H] Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-06-23util/oslib-posix : qemu_init_exec_dir implementation for MacDavid CARLIER1-0/+15
From 3025a0ce3fdf7d3559fc35a52c659f635f5c750c Mon Sep 17 00:00:00 2001 From: David Carlier <devnexen@gmail.com> Date: Tue, 26 May 2020 21:35:27 +0100 Subject: [PATCH] util/oslib-posix : qemu_init_exec_dir implementation for Mac Using dyld API to get the full path of the current process. Signed-off-by: David Carlier <devnexen@gmail.com> Message-id: CA+XhMqxwC10XHVs4Z-JfE0-WLAU3ztDuU9QKVi31mjr59HWCxg@mail.gmail.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2020-06-10util/oslib: Returns the real thread identifier on FreeBSD and NetBSDDavid Carlier1-0/+9
getpid is good enough in a mono thread context, however thr_self/_lwp_self reflects the real current thread identifier from a given process. Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Carlier <devnexen@gmail.com>
2020-04-11oslib-posix: take lock before qemu_cond_broadcastBauerchen1-0/+3
In touch_all_pages, if the mutex is not taken around qemu_cond_broadcast, qemu_cond_broadcast may be called before all touch page threads enter qemu_cond_wait. In this case, the touch page threads wait forever for the main thread to wake them up, causing a deadlock. Signed-off-by: Bauerchen <bauerchen@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-03-16oslib-posix: initialize mutex and condition variablePaolo Bonzini1-0/+7
The mutex and condition variable were never initialized, causing -mem-prealloc to abort with an assertion failure. Fixes: 037fb5eb3941c80a2b7c36a843e47207ddb004d4 Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com> Cc: bauerchen <bauerchen@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-25mem-prealloc: optimize large guest startupbauerchen1-8/+24
[desc]: Large memory VM starts slowly when using -mem-prealloc, and there are some areas to optimize in current method; 1、mmap will be used to alloc threads stack during create page clearing threads, and it will attempt mm->mmap_sem for write lock, but clearing threads have hold read lock, this competition will cause threads createion very slow; 2、methods of calcuating pages for per threads is not well;if we use 64 threads to split 160 hugepage,63 threads clear 2page,1 thread clear 34 page,so the entire speed is very slow; to solve the first problem,we add a mutex in thread function,and start all threads when all threads finished createion; and the second problem, we spread remainder to other threads,in situation that 160 hugepage and 64 threads, there are 32 threads clear 3 pages,and 32 threads clear 2 pages. [test]: 320G 84c VM start time can be reduced to 10s 680G 84c VM start time can be reduced to 18s Signed-off-by: bauerchen <bauerchen@tencent.com> Reviewed-by: Pan Rui <ruippan@tencent.com> Reviewed-by: Ivan Ren <ivanren@tencent.com> [Simplify computation of the number of pages per thread. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-10-26core: replace getpagesize() with qemu_real_host_page_sizeWei Yang1-2/+2
There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-09-16memory: fetch pmem size in get_file_size()Stefan Hajnoczi1-54/+0
Neither stat(2) nor lseek(2) report the size of Linux devdax pmem character device nodes. Commit 314aec4a6e06844937f1677f6cba21981005f389 ("hostmem-file: reject invalid pmem file sizes") added code to hostmem-file.c to fetch the size from sysfs and compare against the user-provided size=NUM parameter: if (backend->size > size) { error_setg(errp, "size property %" PRIu64 " is larger than " "pmem file \"%s\" size %" PRIu64, backend->size, fb->mem_path, size); return; } It turns out that exec.c:qemu_ram_alloc_from_fd() already has an equivalent size check but it skips devdax pmem character devices because lseek(2) returns 0: if (file_size > 0 && file_size < size) { error_setg(errp, "backing store %s size 0x%" PRIx64 " does not match 'size' option 0x" RAM_ADDR_FMT, mem_path, file_size, size); return NULL; } This patch moves the devdax pmem file size code into get_file_size() so that we check the memory size in a single place: qemu_ram_alloc_from_fd(). This simplifies the code and makes it more general. This also fixes the problem that hostmem-file only checks the devdax pmem file size when the pmem=on parameter is given. An unchecked size=NUM parameter can lead to SIGBUS in QEMU so we must always fetch the file size for Linux devdax pmem character device nodes. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190830093056.12572-1-stefanha@redhat.com> Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-08-16Include qemu/main-loop.h lessMarkus Armbruster1-0/+1
In my "build everything" tree, changing qemu/main-loop.h triggers a recompile of some 5600 out of 6600 objects (not counting tests and objects that don't depend on qemu/osdep.h). It includes block/aio.h, which in turn includes qemu/event_notifier.h, qemu/notify.h, qemu/processor.h, qemu/qsp.h, qemu/queue.h, qemu/thread-posix.h, qemu/thread.h, qemu/timer.h, and a few more. Include qemu/main-loop.h only where it's needed. Touching it now recompiles only some 1700 objects. For block/aio.h and qemu/event_notifier.h, these numbers drop from 5600 to 2800. For the others, they shrink only slightly. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190812052359.30071-21-armbru@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Tested-by: Philippe Mathieu-Daudé <philmd@redhat.com>
2019-06-12Include qemu-common.h exactly where neededMarkus Armbruster1-0/+1
No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
2019-04-25util/mmap-alloc: Add a 'is_pmem' parameter to qemu_ram_mmapZhang Yi1-1/+1
besides the existing 'shared' flags, we are going to add 'is_pmem' to qemu_ram_mmap(), which indicated the memory backend file is a persist memory. Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Zhang Yi <yi.z.zhang@linux.intel.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Message-Id: <786c46862cfeb253ee0ea2f44d62ffe76edb7fa4.1549555521.git.yi.z.zhang@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-03-12Merge remote-tracking branch ↵Peter Maydell1-0/+53
'remotes/ehabkost/tags/machine-next-pull-request' into staging Machine queue, 2019-03-11 * memfd fixes (Ilya Maximets) * Move nvdimms state into struct MachineState (Eric Auger) * hostmem-file: reject invalid pmem file sizes (Stefan Hajnoczi) # gpg: Signature made Tue 12 Mar 2019 00:57:41 GMT # gpg: using RSA key 2807936F984DC5A6 # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>" [full] # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF D1AA 2807 936F 984D C5A6 * remotes/ehabkost/tags/machine-next-pull-request: memfd: improve error messages memfd: set up correct errno if not supported memfd: always check for MFD_CLOEXEC hostmem-memfd: disable for systems without sealing support machine: Move nvdimms state into struct MachineState nvdimm: Rename AcpiNVDIMMState into NVDIMMState hostmem-file: reject invalid pmem file sizes Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-03-11oslib-posix: Ignore fcntl("/dev/null", F_SETFL, O_NONBLOCK) failurePhilippe Mathieu-Daudé1-0/+12
Previous to OpenBSD 6.3 [1], fcntl(F_SETFL) is not permitted on memory devices. Trying this call sets errno to ENODEV ("not a memory device"): 19 ENODEV Operation not supported by device. An attempt was made to apply an inappropriate function to a device, for example, trying to read a write-only device such as a printer. Do not assert fcntl failures in this specific case (errno set to ENODEV) on OpenBSD. This fixes: $ lm32-softmmu/qemu-system-lm32 assertion "f != -1" failed: file "util/oslib-posix.c", line 247, function "qemu_set_nonblock" Abort trap (core dumped) [1] The fix seems https://github.com/openbsd/src/commit/c2a35b387f9d3c "fcntl(F_SETFL) invokes the FIONBIO and FIOASYNC ioctls internally, so the memory devices (/dev/null, /dev/zero, etc) need to permit them." Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20190307142822.8531-2-philmd@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-03-11hostmem-file: reject invalid pmem file sizesStefan Hajnoczi1-0/+53
Guests started with NVDIMMs larger than the underlying host file produce confusing errors inside the guest. This happens because the guest accesses pages beyond the end of the file. Check the pmem file size on startup and print a clear error message if the size is invalid. Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1669053 Cc: Wei Yang <richardw.yang@linux.intel.com> Cc: Zhang Yi <yi.z.zhang@linux.intel.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20190214031004.32522-3-stefanha@redhat.com> Reviewed-by: Wei Yang <richardw.yang@linux.intel.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Pankaj Gupta <pagupta@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
2019-02-04mmap-alloc: fix hugetlbfs misaligned length in ppc64Murilo Opsfelder Araujo1-1/+1
The commit 7197fb4058bcb68986bae2bb2c04d6370f3e7218 ("util/mmap-alloc: fix hugetlb support on ppc64") fixed Huge TLB mappings on ppc64. However, we still need to consider the underlying huge page size during munmap() because it requires that both address and length be a multiple of the underlying huge page size for Huge TLB mappings. Quote from "Huge page (Huge TLB) mappings" paragraph under NOTES section of the munmap(2) manual: "For munmap(), addr and length must both be a multiple of the underlying huge page size." On ppc64, the munmap() in qemu_ram_munmap() does not work for Huge TLB mappings because the mapped segment can be aligned with the underlying huge page size, not aligned with the native system page size, as returned by getpagesize(). This has the side effect of not releasing huge pages back to the pool after a hugetlbfs file-backed memory device is hot-unplugged. This patch fixes the situation in qemu_ram_mmap() and qemu_ram_munmap() by considering the underlying page size on ppc64. After this patch, memory hot-unplug releases huge pages back to the pool. Fixes: 7197fb4058bcb68986bae2bb2c04d6370f3e7218 Signed-off-by: Murilo Opsfelder Araujo <muriloo@linux.ibm.com> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2019-01-14util: check the return value of fcntl in qemu_set_{block, nonblock}Li Qiang1-2/+6
Assert that the return value is not an error. This is like commit 7e6478e7d4f for qemu_set_cloexec. Signed-off-by: Li Qiang <liq3ea@163.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2018-11-06oslib-posix: Use MAP_STACK in qemu_alloc_stack() on OpenBSDBrad Smith1-2/+13
Use MAP_STACK in qemu_alloc_stack() on OpenBSD. Added to our 6.4 release. MAP_STACK Indicate that the mapping is used as a stack. This flag must be used in combination with MAP_ANON and MAP_PRIVATE. Implement MAP_STACK option for mmap(). Synchronous faults (pagefault and syscall) confirm the stack register points at MAP_STACK memory, otherwise SIGSEGV is delivered. sigaltstack() and pthread_attr_setstack() are modified to create a MAP_STACK sub-region which satisfies alignment requirements. Observe that MAP_STACK can only be set/cleared by mmap(), which zeroes the contents of the region -- there is no mprotect() equivalent operation, so there is no MAP_STACK-adding gadget. Signed-off-by: Brad Smith <brad@comstyle.com> Reviewed-by: Kamil Rytarowski <n54@gmx.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-id: 20181019125239.GA13884@humpty.home.comstyle.com Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-10-02util: use fcntl() for qemu_write_pidfile() lockingMarc-André Lureau1-1/+6
Daniel Berrangé suggested to use fcntl() locks rather than lockf(). 'man lockf': On Linux, lockf() is just an interface on top of fcntl(2) locking. Many other systems implement lockf() in this way, but note that POSIX.1 leaves the relationship between lockf() and fcntl(2) locks unspecified. A portable application should probably avoid mixing calls to these interfaces. IOW, if its just a shim around fcntl() on many systems, it is clearer if we just use fcntl() directly, as we then know how fcntl() locks will behave if they're on a network filesystem like NFS. Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180831145314.14736-3-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-10-02util: add qemu_write_pidfile()Marc-André Lureau1-0/+68
There are variants of qemu_create_pidfile() in qemu-pr-helper and qemu-ga. Let's have a common implementation in libqemuutil. The code is initially based from pr-helper write_pidfile(), with various improvements and suggestions from Daniel Berrangé: QEMU will leave the pidfile existing on disk when it exits which initially made me think it avoids the deletion race. The app managing QEMU, however, may well delete the pidfile after it has seen QEMU exit, and even if the app locks the pidfile before deleting it, there is still a race. eg consider the following sequence QEMU 1 libvirtd QEMU 2 1. lock(pidfile) 2. exit() 3. open(pidfile) 4. lock(pidfile) 5. open(pidfile) 6. unlink(pidfile) 7. close(pidfile) 8. lock(pidfile) IOW, at step 8 the new QEMU has successfully acquired the lock, but the pidfile no longer exists on disk because it was deleted after the original QEMU exited. While we could just say no external app should ever delete the pidfile, I don't think that is satisfactory as people don't read docs, and admins don't like stale pidfiles being left around on disk. To make this robust, I think we might want to copy libvirt's approach to pidfile acquisition which runs in a loop and checks that the file on disk /after/ acquiring the lock matches the file that was locked. Then we could in fact safely let QEMU delete its own pidfiles on clean exit.. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20180831145314.14736-2-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2018-02-19mem: add share parameter to memory-backend-ramMarcel Apfelbaum1-2/+2
Currently only file backed memory backend can be created with a "share" flag in order to allow sharing guest RAM with other processes in the host. Add the "share" flag also to RAM Memory Backend in order to allow remapping parts of the guest RAM to different host virtual addresses. This is needed by the RDMA devices in order to remap non-contiguous QEMU virtual addresses to a contiguous virtual address range. Moved the "share" flag to the Host Memory base class, modified phys_mem_alloc to include the new parameter and a new interface memory_region_init_ram_shared_nomigrate. There are no functional changes if the new flag is not used. Reviewed-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Marcel Apfelbaum <marcel@redhat.com>
2018-02-10oslib-posix: check for posix_memalign in configure scriptAndreas Gustafsson1-1/+1
Check for the presence of posix_memalign() in the configure script, not using "defined(_POSIX_C_SOURCE) && !defined(__sun__)". This lets qemu use posix_memalign() on NetBSD versions that have it, instead of falling back to valloc() which is wasteful when the required alignment is smaller than a page. Signed-off-by: Andreas Gustafsson <gson@gson.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Kamil Rytarowski <n54@gmx.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
2017-11-02oslib-posix: Use sysctl(2) call to resolve exec_dir on NetBSDKamil Rytarowski1-1/+10
NetBSD 8.0(beta) ships with KERN_PROC_PATHNAME in sysctl(2). Older NetBSD versions can use argv[0] parsing fallback. This code section is partly shared with FreeBSD. Signed-off-by: Kamil Rytarowski <n54@gmx.com> Message-id: 20171028194833.23858-1-n54@gmx.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2017-10-20oslib-posix: Fix compiler warning and some data typesStefan Weil1-7/+8
gcc warning: /qemu/util/oslib-posix.c:304:11: error: variable ‘addr’ might be clobbered by ‘longjmp’ or ‘vfork’ [-Werror=clobbered] Fix also some related data types: numpages, hpagesize are used as pointer offset. Always use size_t for them and also for the derived numpages_per_thread and size_per_thread. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Stefan Weil <sw@weilnetz.de> Message-id: 20171016202912.1117-1-sw@weilnetz.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-08-30oslib-posix: Print errors before aborting on qemu_alloc_stack()Eduardo Habkost1-0/+2
If QEMU is running on a system that's out of memory and mmap() fails, QEMU aborts with no error message at all, making it hard to debug the reason for the failure. Add perror() calls that will print error information before aborting. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Tested-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20170829212053.6003-1-ehabkost@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2017-07-21util/oslib-posix.c: Avoid warning on NetBSDPeter Maydell1-1/+1
On NetBSD the compiler warns: util/oslib-posix.c: In function 'sigaction_invoke': util/oslib-posix.c:589:5: warning: missing braces around initializer [-Wmissing-braces] siginfo_t si = { 0 }; ^ util/oslib-posix.c:589:5: warning: (near initialization for 'si.si_pad') [-Wmissing-braces] because on this platform siginfo_t is defined as typedef union siginfo { char si_pad[128]; /* Total size; for future expansion */ struct _ksiginfo _info; } siginfo_t; Avoid this warning by initializing the struct with {} instead; this is a GCC extension but we use it all over the codebase already. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1500568341-8389-1-git-send-email-peter.maydell@linaro.org
2017-07-11block: rip out all traces of password promptingDaniel P. Berrange1-66/+0
Now that qcow & qcow2 are wired up to get encryption keys via the QCryptoSecret object, nothing is relying on the interactive prompting for passwords. All the code related to password prompting can thus be ripped out. Reviewed-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Message-id: 20170623162419.26068-17-berrange@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>