aboutsummaryrefslogtreecommitdiff
path: root/util/oslib-posix.c
AgeCommit message (Collapse)AuthorFilesLines
2024-12-20include: Rename sysemu/ -> system/Philippe Mathieu-Daudé1-1/+1
Headers in include/sysemu/ are not only related to system *emulation*, they are also used by virtualization. Rename as system/ which is clearer. Files renamed manually then mechanical change using sed tool. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Message-Id: <20241203172445.28576-1-philmd@linaro.org>
2024-08-05qemu/osdep: Add excluded fd parameter to qemu_close_all_open_fd()Clément Léger1-16/+82
In order for this function to be usable by tap.c code, add a list of file descriptors that should not be closed. Signed-off-by: Clément Léger <cleger@rivosinc.com> Message-ID: <20240802145423.3232974-5-cleger@rivosinc.com> [rth: Use max_fd in qemu_close_all_open_fd_close_range] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-08-05qemu/osdep: Split qemu_close_all_open_fd() and add fallbackClément Léger1-13/+37
In order to make it cleaner, split qemu_close_all_open_fd() logic into multiple subfunctions (close with close_range(), with /proc/self/fd and fallback). Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240802145423.3232974-3-cleger@rivosinc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-08-05qemu/osdep: Move close_all_open_fds() to oslib-posixClément Léger1-0/+34
Move close_all_open_fds() in oslib-posix, rename it qemu_close_all_open_fds() and export it. Signed-off-by: Clément Léger <cleger@rivosinc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20240802145423.3232974-2-cleger@rivosinc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-07-17util/oslib-posix: Fix superfluous trailing semicolonZhao Liu1-1/+1
Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2024-03-08oslib-posix: fix memory leak in touch_all_pagesPaolo Bonzini1-2/+4
touch_all_pages() can return early, before creating threads. In this case, however, it leaks the MemsetContext that it has allocated at the beginning of the function. Reported by Coverity as CID 1534922. Fixes: 04accf43df8 ("oslib-posix: initialize backend memory objects in parallel", 2024-02-06) Reviewed-by: Mark Kanda <mark.kanda@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2024-02-06oslib-posix: initialize backend memory objects in parallelMark Kanda1-31/+100
QEMU initializes preallocated backend memory as the objects are parsed from the command line. This is not optimal in some cases (e.g. memory spanning multiple NUMA nodes) because the memory objects are initialized in series. Allow the initialization to occur in parallel (asynchronously). In order to ensure optimal thread placement, asynchronous initialization requires prealloc context threads to be in use. Signed-off-by: Mark Kanda <mark.kanda@oracle.com> Message-ID: <20240131165327.3154970-2-mark.kanda@oracle.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2024-01-05util/oslib: Have qemu_prealloc_mem() handler return a booleanPhilippe Mathieu-Daudé1-2/+5
Following the example documented since commit e3fe3988d7 ("error: Document Error API usage rules"), have qemu_prealloc_mem() return a boolean indicating whether an error is set or not. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Gavin Shan <gshan@redhat.com> Message-Id: <20231120213301.24349-19-philmd@linaro.org>
2023-09-15util: Delete checks for old host definitionsAkihiko Odaki1-12/+3
IA-64 and PA-RISC host support is already removed with commit b1cef6d02f ("Drop remaining bits of ia64 host support"). Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20230810225922.21600-1-akihiko.odaki@daynix.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-08-31util: spelling fixesMichael Tokarev1-1/+1
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230823065335.1919380-3-mjt@tls.msk.ru> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2023-03-13util: drop qemu_fork()Marc-André Lureau1-70/+0
Fortunately, qemu_fork() is no longer used since commit a95570e3e4d6 ("io/command: use glib GSpawn, instead of open-coding fork/exec"). (GSpawn uses posix_spawn() whenever possible instead) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20230221124802.4103554-2-marcandre.lureau@redhat.com>
2023-02-08Drop duplicate #includeMarkus Armbruster1-4/+0
Tracked down with the help of scripts/clean-includes. Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230202133830.2152150-21-armbru@redhat.com>
2023-02-08Don't include headers already included by qemu/osdep.hMarkus Armbruster1-2/+0
This commit was created with scripts/clean-includes. Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20230202133830.2152150-19-armbru@redhat.com>
2022-10-27util: Make qemu_prealloc_mem() optionally consume a ThreadContextDavid Hildenbrand1-6/+14
... and implement it under POSIX. When a ThreadContext is provided, create new threads via the context such that these new threads obtain a properly configured CPU affinity. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-6-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27util: Introduce ThreadContext user-creatable objectDavid Hildenbrand1-0/+1
Setting the CPU affinity of QEMU threads is a bit problematic, because QEMU doesn't always have permissions to set the CPU affinity itself, for example, with seccomp after initialized by QEMU: -sandbox enable=on,resourcecontrol=deny General information about CPU affinities can be found in the man page of taskset: CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. While upper layers are already aware of how to handle CPU affinities for long-lived threads like iothreads or vcpu threads, especially short-lived threads, as used for memory-backend preallocation, are more involved to handle. These threads are created on demand and upper layers are not even able to identify and configure them. Introduce the concept of a ThreadContext, that is essentially a thread used for creating new threads. All threads created via that context thread inherit the configured CPU affinity. Consequently, it's sufficient to create a ThreadContext and configure it once, and have all threads created via that ThreadContext inherit the same CPU affinity. The CPU affinity of a ThreadContext can be configured two ways: (1) Obtaining the thread id via the "thread-id" property and setting the CPU affinity manually (e.g., via taskset). (2) Setting the "cpu-affinity" property and letting QEMU try set the CPU affinity itself. This will fail if QEMU doesn't have permissions to do so anymore after seccomp was initialized. A simple QEMU example to set the CPU affinity to host CPU 0,1,6,7 would be: qemu-system-x86_64 -S \ -object thread-context,id=tc1,cpu-affinity=0-1,cpu-affinity=6-7 And we can query it via HMP/QMP: (qemu) qom-get tc1 cpu-affinity [ 0, 1, 6, 7 ] But note that due to dynamic library loading this example will not work before we actually make use of thread_context_create_thread() in QEMU code, because the type will otherwise not get registered. We'll wire this up next to make it work. In general, the interface behaves like pthread_setaffinity_np(): host CPU numbers that are currently not available are ignored; only host CPU numbers that are impossible with the current kernel will fail. If the list of host CPU numbers does not include a single CPU that is available, setting the CPU affinity will fail. A ThreadContext can be reused, simply by reconfiguring the CPU affinity. Note that the CPU affinity of previously created threads will not get adjusted. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20221014134720.168738-4-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-10-27util: Cleanup and rename os_mem_prealloc()David Hildenbrand1-12/+12
Let's * give the function a "qemu_*" style name * make sure the parameters in the implementation match the prototype * rename smp_cpus to max_threads, which makes the semantics of that parameter clearer ... and add a function documentation. Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Message-Id: <20221014134720.168738-2-david@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2022-09-29oslib-posix: Introduce qemu_socketpair()Guoyi Tu1-0/+19
qemu_socketpair() will create a pair of connected sockets with FD_CLOEXEC set Signed-off-by: Guoyi Tu <tugy@chinatelecom.cn> Reviewed-by: Christian Schoenebeck <qemu_oss@crudebyte.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <17fa1eff729eeabd9a001f4639abccb127ceec81.1661240709.git.tugy@chinatelecom.cn>
2022-08-12cutils: Add missing dyld(3) include on macOSPhilippe Mathieu-Daudé1-4/+0
Commit 06680b15b4 moved qemu_*_exec_dir() to cutils but forgot to move the macOS dyld(3) include, resulting in the following error (when building with Homebrew GCC on macOS Monterey 12.4): [313/1197] Compiling C object libqemuutil.a.p/util_cutils.c.o FAILED: libqemuutil.a.p/util_cutils.c.o ../../util/cutils.c:1039:13: error: implicit declaration of function '_NSGetExecutablePath' [-Werror=implicit-function-declaration] 1039 | if (_NSGetExecutablePath(fpath, &len) == 0) { | ^~~~~~~~~~~~~~~~~~~~ ../../util/cutils.c:1039:13: error: nested extern declaration of '_NSGetExecutablePath' [-Werror=nested-externs] Fix by moving the include line to cutils. Fixes: 06680b15b4 ("include: move qemu_*_exec_dir() to cutils") Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20220809222046.30812-1-f4bug@amsat.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-07-18util: Fix broken build on HaikuThomas Huth1-4/+0
A recent commit moved some Haiku-specific code parts from oslib-posix.c to cutils.c, but failed to move the corresponding header #include statement, too, so "make vm-build-haiku.x86_64" is currently broken. Fix it by moving the header #include, too. Fixes: 06680b15b4 ("include: move qemu_*_exec_dir() to cutils") Message-Id: <20220718172026.139004-1-thuth@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2022-05-28include: move qemu_*_exec_dir() to cutilsMarc-André Lureau1-84/+2
The function is required by get_relocated_path() (already in cutils), and used by qemu-ga and may be generally useful. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20220525144140.591926-2-marcandre.lureau@redhat.com>
2022-05-03util: rename qemu_*block() socket functionsMarc-André Lureau1-4/+4
The qemu_*block() functions are meant to be be used with sockets (the win32 implementation expects SOCKET) Over time, those functions where used with Win32 SOCKET or file-descriptors interchangeably. But for portability, they must only be used with socket-like file-descriptors. FDs can use g_unix_set_fd_nonblocking() instead. Rename the functions with "socket" in the name to prevent bad usages. This is effectively reverting commit f9e8cacc5557e43 ("oslib-posix: rename socket_set_nonblock() to qemu_set_nonblock()"). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2022-05-03Replace fcntl(O_NONBLOCK) with g_unix_set_fd_nonblocking()Marc-André Lureau1-14/+2
Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-03Replace qemu_pipe() with g_unix_open_pipe()Marc-André Lureau1-22/+0
GLib g_unix_open_pipe() is essentially like qemu_pipe(), available since 2.30. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-05-03block: move fcntl_setfl()Marc-André Lureau1-15/+0
It is only used by block/file-posix.c, move it there. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
2022-04-21util: replace qemu_get_local_state_pathname()Marc-André Lureau1-5/+2
Simplify the function to only return the directory path. Callers are adjusted to use the GLib function to build paths, g_build_filename(). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220420132624.2439741-39-marcandre.lureau@redhat.com>
2022-04-21util: use qemu_create() in qemu_write_pidfile()Marc-André Lureau1-2/+1
qemu_open_old(O_CREATE) should be replaced with qemu_create() which handles Error reporting. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220420132624.2439741-38-marcandre.lureau@redhat.com>
2022-04-21util: use qemu_write_full() in qemu_write_pidfile()Marc-André Lureau1-1/+1
Mostly for correctness. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220420132624.2439741-37-marcandre.lureau@redhat.com>
2022-04-21qga: move qga_get_host_name()Marc-André Lureau1-35/+0
The function is specific to qemu-ga, no need to share it in QEMU. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com> Message-Id: <20220420132624.2439741-32-marcandre.lureau@redhat.com>
2022-04-21include: move qemu_msync() to osdepMarc-André Lureau1-0/+18
The implementation depends on the OS. (and longer-term goal is to move cutils to a common subproject) Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-Id: <20220420132624.2439741-21-marcandre.lureau@redhat.com>
2022-04-06Remove qemu-common.h include from most unitsMarc-André Lureau1-1/+0
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-33-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06Move fcntl_setfl() to oslib-posixMarc-André Lureau1-0/+15
It is only implemented for POSIX anyway. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-30-marcandre.lureau@redhat.com> [Add braces around if statements. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06Replace qemu_real_host_page variables with inlined functionsMarc-André Lureau1-4/+4
Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-07util: Put qemu_vfree() in memalign.cPeter Maydell1-6/+0
qemu_vfree() is the companion free function to qemu_memalign(); put it in memalign.c so the allocation and free functions are together. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220226180723.1706285-9-peter.maydell@linaro.org Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-03-07util: Share qemu_try_memalign() implementation between POSIX and WindowsPeter Maydell1-29/+0
The qemu_try_memalign() functions for POSIX and Windows used to be significantly different, but these days they are identical except for the actual allocation function called, and the POSIX version already has to have ifdeffery for different allocation functions. Move to a single implementation in memalign.c, which uses the Windows _aligned_malloc if we detect that function in meson. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220226180723.1706285-7-peter.maydell@linaro.org Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-03-07util: Return valid allocation for qemu_try_memalign() with zero sizePeter Maydell1-0/+3
Currently qemu_try_memalign()'s behaviour if asked to allocate 0 bytes is rather variable: * on Windows, we will assert * on POSIX platforms, we get the underlying behaviour of the posix_memalign() or equivalent function, which may be either "return a valid non-NULL pointer" or "return NULL" Explictly check for 0 byte allocations, so we get consistent behaviour across platforms. We handle them by incrementing the size so that we return a valid non-NULL pointer that can later be passed to qemu_vfree(). This is permitted behaviour for the posix_memalign() API and is the most usual way that underlying malloc() etc implementations handle a zero-sized allocation request, because it won't trip up calling code that assumes NULL means an error. (This includes our own qemu_memalign(), which will abort on NULL.) This change is a preparation for sharing the qemu_try_memalign() code between Windows and POSIX. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-03-07util: Unify implementations of qemu_memalign()Peter Maydell1-14/+0
We implement qemu_memalign() in both oslib-posix.c and oslib-win32.c, but the two versions are essentially the same: they call qemu_try_memalign(), and abort() after printing an error message if it fails. The only difference is that the win32 version prints the GetLastError() value whereas the POSIX version prints strerror(errno). However, this is a bug in the win32 version: in commit dfbd0b873a85021 in 2020 we changed the implementation of qemu_try_memalign() from using VirtualAlloc() (which sets the GetLastError() value) to using _aligned_malloc() (which sets errno), but didn't update the error message to match. Replace the two separate functions with a single version in a new memalign.c file, which drops the unnecessary extra qemu_oom_check() function and instead prints a more useful message including the requested size and alignment as well as the errno string. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220226180723.1706285-4-peter.maydell@linaro.org Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
2022-03-07util: Make qemu_oom_check() a static functionPeter Maydell1-1/+1
The qemu_oom_check() function, which we define in both oslib-posix.c and oslib-win32.c, is now used only locally in that file; make it static. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20220226180723.1706285-3-peter.maydell@linaro.org
2022-02-21include: Move qemu_madvise() and related #defines to new qemu/madvise.hPeter Maydell1-0/+1
The function qemu_madvise() and the QEMU_MADV_* constants associated with it are used in only 10 files. Move them out of osdep.h to a new qemu/madvise.h header that is included where it is needed. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220208200856.3558249-2-peter.maydell@linaro.org
2022-02-06util/oslib-posix: Fix missing unlock in the error path of os_mem_prealloc()David Hildenbrand1-0/+1
We're missing an unlock in case installing the signal handler failed. Fortunately, we barely see this error in real life. Fixes: a960d6642d39 ("util/oslib-posix: Support concurrent os_mem_prealloc() invocation") Fixes: CID 1468941 Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pankaj Gupta <pankaj.gupta@ionos.com> Cc: Daniel P. Berrangé <berrange@redhat.com> Cc: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20220111120830.119912-1-david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Forward SIGBUS to MCE handler under LinuxDavid Hildenbrand1-3/+34
Temporarily modifying the SIGBUS handler is really nasty, as we might be unlucky and receive an MCE SIGBUS while having our handler registered. Unfortunately, there is no way around messing with SIGBUS when MADV_POPULATE_WRITE is not applicable or not around. Let's forward SIGBUS that don't belong to us to the already registered handler and document the situation. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-8-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Support concurrent os_mem_prealloc() invocationDavid Hildenbrand1-0/+9
Add a mutex to protect the SIGBUS case, as we cannot mess concurrently with the sigbus handler and we have to manage the global variable sigbus_memset_context. The MADV_POPULATE_WRITE path can run concurrently. Note that page_mutex and page_cond are shared between concurrent invocations, which shouldn't be a problem. This is a preparation for future virtio-mem prealloc code, which will call os_mem_prealloc() asynchronously from an iothread when handling guest requests. Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-7-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Avoid creating a single thread with MADV_POPULATE_WRITEDavid Hildenbrand1-0/+8
Let's simplify the case when we only want a single thread and don't have to mess with signal handlers. Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-6-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Don't create too many threads with small memory or little ↵David Hildenbrand1-2/+10
pages Let's limit the number of threads to something sane, especially that - We don't have more threads than the number of pages we have - We don't have threads that initialize small (< 64 MiB) memory Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-5-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Introduce and use MemsetContext for touch_all_pages()David Hildenbrand1-26/+47
Let's minimize the number of global variables to prepare for os_mem_prealloc() getting called concurrently and make the code a bit easier to read. The only consumer that really needs a global variable is the sigbus handler, which will require protection via a mutex in the future either way as we cannot concurrently mess with the SIGBUS handler. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-4-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Support MADV_POPULATE_WRITE for os_mem_prealloc()David Hildenbrand1-21/+62
Let's sense support and use it for preallocation. MADV_POPULATE_WRITE does not require a SIGBUS handler, doesn't actually touch page content, and avoids context switches; it is, therefore, faster and easier to handle than our current approach. While MADV_POPULATE_WRITE is, in general, faster than manual prefaulting, and especially faster with 4k pages, there is still value in prefaulting using multiple threads to speed up preallocation. More details on MADV_POPULATE_WRITE can be found in the Linux commits 4ca9b3859dac ("mm/madvise: introduce MADV_POPULATE_(READ|WRITE) to prefault page tables") and eb2faa513c24 ("mm/madvise: report SIGBUS as -EFAULT for MADV_POPULATE_(READ|WRITE)"), and in the man page proposal [1]. This resolves the TODO in do_touch_pages(). In the future, we might want to look into using fallocate(), eventually combined with MADV_POPULATE_READ, when dealing with shared file/fd mappings and not caring about memory bindings. [1] https://lkml.kernel.org/r/20210816081922.5155-1-david@redhat.com Reviewed-by: Pankaj Gupta <pankaj.gupta@ionos.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-3-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-01-07util/oslib-posix: Let touch_all_pages() return an errorDavid Hildenbrand1-12/+16
Let's prepare touch_all_pages() for returning differing errors. Return an error from the thread and report the last processed error. Translate SIGBUS to -EFAULT, as a SIGBUS can mean all different kind of things (memory error, read error, out of memory). When allocating memory fails via the current SIGBUS-based mechanism, we'll get: os_mem_prealloc: preallocating memory failed: Bad address Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michal Privoznik <mprivozn@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20211217134611.31172-2-david@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-06-15memory: Introduce RAM_NORESERVE and wire it up in qemu_ram_mmap()David Hildenbrand1-2/+4
Let's introduce RAM_NORESERVE, allowing mmap'ing with MAP_NORESERVE. The new flag has the following semantics: " RAM is mmap-ed with MAP_NORESERVE. When set, reserving swap space (or huge pages if applicable) is skipped: will bail out if not supported. When not set, the OS will do the reservation, if supported for the memory type. " Allow passing it into: - memory_region_init_ram_nomigrate() - memory_region_init_resizeable_ram() - memory_region_init_ram_from_file() ... and teach qemu_ram_mmap() and qemu_anon_ram_alloc() about the flag. Bail out if the flag is not supported, which is the case right now for both, POSIX and win32. We will add Linux support next and allow specifying RAM_NORESERVE via memory backends. The target use case is virtio-mem, which dynamically exposes memory inside a large, sparse memory area to the VM. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-9-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-15util/mmap-alloc: Pass flags instead of separate bools to qemu_ram_mmap()David Hildenbrand1-1/+2
Let's pass flags instead of bools to prepare for passing other flags and update the documentation of qemu_ram_mmap(). Introduce new QEMU_MAP_ flags that abstract the mmap() PROT_ and MAP_ flag handling and simplify it. We expose only flags that are currently supported by qemu_ram_mmap(). Maybe, we'll see qemu_mmap() in the future as well that can implement these flags. Note: We don't use MAP_ flags as some flags (e.g., MAP_SYNC) are only defined for some systems and we want to always be able to identify these flags reliably inside qemu_ram_mmap() -- for example, to properly warn when some future flags are not available or effective on a system. Also, this way we can simplify PROT_ handling as well. Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Eduardo Habkost <ehabkost@redhat.com> for memory backend and machine core Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20210510114328.21835-8-david@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-06-04oslib-posix: Remove OpenBSD workaround for fcntl("/dev/null", F_SETFL, ↵Brad Smith1-11/+0
O_NONBLOCK) failure OpenBSD prior to 6.3 required a workaround to utilize fcntl(F_SETFL) on memory devices. Since modern verions of OpenBSD that are only officialy supported and buildable on do not have this issue I am garbage collecting this workaround. Signed-off-by: Brad Smith <brad@comstyle.com> Message-Id: <YGYECGXQhdamEJgC@humpty.home.comstyle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2021-02-09memory: alloc RAM from file at offsetJagannathan Raman1-1/+1
Allow RAM MemoryRegion to be created from an offset in a file, instead of allocating at offset of 0 by default. This is needed to synchronize RAM between QEMU & remote process. Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John G Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 609996697ad8617e3b01df38accc5c208c24d74e.1611938319.git.jag.raman@oracle.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>