aboutsummaryrefslogtreecommitdiff
path: root/linux-user/syscall.c
AgeCommit message (Collapse)AuthorFilesLines
2023-02-21linux-user: Always exit from exclusive state in fork_end()Ilya Leoshkevich1-0/+1
fork()ed processes currently start with current_cpu->in_exclusive_context set, which is, strictly speaking, not correct, but does not cause problems (even assertion failures). With one of the next patches, the code begins to rely on this value, so fix it by always calling end_exclusive() in fork_end(). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-Id: <20230214140829.45392-2-iii@linux.ibm.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2023-02-03linux-user: Allow sendmsg() without IOVHelge Deller1-2/+7
Applications do call sendmsg() without any IOV, e.g.: sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=NULL, msg_iovlen=0, msg_control=[{cmsg_len=36, cmsg_level=SOL_ALG, cmsg_type=0x2}], msg_controllen=40, msg_flags=0}, MSG_MORE) = 0 sendmsg(4, {msg_name=NULL, msg_namelen=0, msg_iov=[{iov_base="The quick brown fox jumps over t"..., iov_len=183}], msg_iovlen=1, msg_control=[{cmsg_len=20, cmsg_level=SOL_ALG, cmsg_type=0x3}], msg_controllen=24, msg_flags=0}, 0) = 183 The function do_sendrecvmsg_locked() is used for sndmsg() and recvmsg() and calls lock_iovec() to lock the IOV into memory. For the first sendmsg() above it returns NULL and thus wrongly skips the call the host sendmsg() syscall, which will break the calling application. Fix this issue by: - allowing sendmsg() even with empty IOV - skip recvmsg() if IOV is NULL - skip both if the return code of do_sendrecvmsg_locked() != 0, which indicates some failure like EFAULT on the IOV Tested with the debian "ell" package with hppa guest on x86_64 host. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20221212173416.90590-2-deller@gmx.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03linux-user: Implement SOL_ALG encryption supportHelge Deller1-0/+8
Add suport to handle SOL_ALG packets via sendmsg() and recvmsg(). This allows emulated userspace to use encryption functionality. Tested with the debian ell package with hppa guest on x86_64 host. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20221212173416.90590-1-deller@gmx.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03linux-user: Fix /proc/cpuinfo output for hppaHelge Deller1-5/+11
The hppa architectures provides an own output for the emulated /proc/cpuinfo file. Some userspace applications count (even if that's not the recommended way) the number of lines which start with "processor:" and assume that this number then reflects the number of online CPUs. Since those 3 architectures don't provide any such line, applications may assume "0" CPUs. One such issue can be seen in debian bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024653 Avoid such issues by adding a "processor:" line for each of the online CPUs. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <Y9QvyRSq1I1k5/JW@p100> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03linux-user: Fix SO_ERROR return code of getsockopt()Helge Deller1-1/+6
Add translation for the host error return code of: getsockopt(19, SOL_SOCKET, SO_ERROR, [ECONNREFUSED], [4]) = 0 This fixes the testsuite of the cockpit debian package with a hppa-linux guest on a x86-64 host. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <Y9QzNzXg0hrzHQeo@p100> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03Revert "linux-user: fix compat with glibc >= 2.36 sys/mount.h"Daniel P. Berrangé1-18/+0
This reverts commit 3cd3df2a9584e6f753bb62a0028bd67124ab5532. glibc has fixed (in 2.36.9000-40-g774058d729) the problem that caused a clash when both sys/mount.h annd linux/mount.h are included, and backported this to the 2.36 stable release too: https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E It is saner for QEMU to remove the workaround it applied for glibc 2.36 and expect distros to ship the 2.36 maint release with the fix. This avoids needing to add a further workaround to QEMU to deal with the fact that linux/brtfs.h now also pulls in linux/mount.h via linux/fs.h since Linux 6.1 Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20230110174901.2580297-3-berrange@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03Revert "linux-user: add more compat ioctl definitions"Daniel P. Berrangé1-25/+0
This reverts commit c5495f4ecb0cdaaf2e9dddeb48f1689cdb520ca0. glibc has fixed (in 2.36.9000-40-g774058d729) the problem that caused a clash when both sys/mount.h annd linux/mount.h are included, and backported this to the 2.36 stable release too: https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E It is saner for QEMU to remove the workaround it applied for glibc 2.36 and expect distros to ship the 2.36 maint release with the fix. This avoids needing to add a further workaround to QEMU to deal with the fact that linux/brtfs.h now also pulls in linux/mount.h via linux/fs.h since Linux 6.1 Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20230110174901.2580297-2-berrange@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-02-03linux-user: un-parent OBJECT(cpu) when closing threadRichard Henderson1-6/+7
This reinstates commit 52f0c1607671293afcdb2acc2f83e9bccbfa74bb: While forcing the CPU to unrealize by hand does trigger the clean-up code we never fully free resources because refcount never reaches zero. This is because QOM automatically added objects without an explicit parent to /unattached/, incrementing the refcount. Instead of manually triggering unrealization just unparent the object and let the device machinery deal with that for us. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/866 Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220811151413.3350684-2-alex.bennee@linaro.org> The original patch tickled a problem in target/arm, and was reverted. But that problem is fixed as of commit 3b07a936d3bf. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20230124201019.3935934-1-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-01-25linux-user/syscall: Implement execveat()Drew DeVault1-6/+9
References: https://gitlab.com/qemu-project/qemu/-/issues/1007 Signed-off-by: Drew DeVault <sir@cmpwn.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20221104081015.706009-1-sir@cmpwn.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20221104173632.1052-6-philmd@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2023-01-25linux-user/syscall: Extract do_execve() from do_syscall1()Drew DeVault1-97/+114
execve() is a particular case of execveat(). In order to add do_execveat(), first factor do_execve() out. Signed-off-by: Drew DeVault <sir@cmpwn.com> Message-Id: <20221104081015.706009-1-sir@cmpwn.com> [PMD: Split of bigger patch, filled description, fixed style] Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20221104173632.1052-5-philmd@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-12-14Drop more useless casts from void * to pointerMarkus Armbruster1-1/+1
Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20221123133811.1398562-1-armbru@redhat.com>
2022-11-02linux-user: always translate cmsg when recvmsgIcenowy Zheng1-1/+2
It's possible that a message contains both normal payload and ancillary data in the same message, and even if no ancillary data is available this information should be passed to the target, otherwise the target cmsghdr will be left uninitialized and the target is going to access uninitialized memory if it expects cmsg. Always call the function that translate cmsg when recvmsg, because that function should be empty-cmsg-safe (it creates an empty cmsg in the target). Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20221028081220.1604244-1-uwu@icenowy.me> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-11-02linux-user: Add close_range() syscallHelge Deller1-0/+19
Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <Y1dLJoEDhJ2AAYDn@p100> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-25linux-user: Add guest memory layout to exception dumpHelge Deller1-0/+28
When the emulation stops with a hard exception it's very useful for debugging purposes to dump the current guest memory layout (for an example see /proc/self/maps) beside the CPU registers. The open_self_maps() function provides such a memory dump, but since it's located in the syscall.c file, various changes (add #includes, make this function externally visible, ...) are needed to be able to call it from the existing EXCP_DUMP() macro. This patch takes another approach by re-defining EXCP_DUMP() to call target_exception_dump(), which is in syscall.c, consolidates the log print functions and allows to add the call to dump the memory layout. Beside a reduced code footprint, this approach keeps the changes across the various callers minimal, and keeps EXCP_DUMP() highlighted as important macro/function. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <Y1bzAWbw07WBKPxw@p100> [lv: remove pc declaration and setting] Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21linux-user: Implement faccessat2WANG Xuerui1-0/+9
User space has been preferring this syscall for a while, due to its closer match with C semantics, and newer platforms such as LoongArch apparently have libc implementations that don't fallback to faccessat so normal access checks are failing without the emulation in place. Tested by successfully emerging several packages within a Gentoo loong stage3 chroot, emulated on amd64 with help of static qemu-loongarch64. Reported-by: Andreas K. Hüttel <dilfridge@gentoo.org> Signed-off-by: WANG Xuerui <xen0n@gentoo.org> Message-Id: <20221009060813.2289077-1-xen0n@gentoo.org> [lv: removing defined(__NR_faccessat2) in syscall.c, adding defined(TARGET_NR_faccessat2) on print_faccessat()] Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21linux-user: add more compat ioctl definitionsDaniel P. Berrangé1-0/+25
GLibc changes prevent us from including linux/fs.h anymore, and we previously adjusted to this in commit 3cd3df2a9584e6f753bb62a0028bd67124ab5532 Author: Daniel P. Berrangé <berrange@redhat.com> Date: Tue Aug 2 12:41:34 2022 -0400 linux-user: fix compat with glibc >= 2.36 sys/mount.h That change required adding compat ioctl definitions on the QEMU side for any ioctls that we would otherwise obtain from linux/fs.h. This commit adds more that were initially missed, due to their usage being conditionalized in QEMU. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20221004093206.652431-2-berrange@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21linux-user: don't use AT_EXECFD in do_openat()Laurent Vivier1-2/+1
AT_EXECFD gives access to the binary file even if it is not readable (only executable). Moreover it can be opened with flags and mode that are not the ones provided by do_openat() caller. And it is not available because loader_exec() has closed it. To avoid that, use only safe_openat() with the exec_path. Signed-off-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220927124357.688536-3-laurent@vivier.eu> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21linux-user: handle /proc/self/exe with execve() syscallLaurent Vivier1-1/+5
If path is /proc/self/exe, use the executable path provided by exec_path. Don't use execfd as it is closed by loader_exec() and otherwise will survive to the exec() syscall and be usable child process. Signed-off-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220927124357.688536-2-laurent@vivier.eu> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21linux-user: fix pidfd_send_signal()Laurent Vivier1-7/+12
According to pidfd_send_signal(2), info argument can be a NULL pointer. Fix strace to correctly manage ending comma in parameters. Fixes: cc054c6f13 ("linux-user: Add pidfd_open(), pidfd_send_signal() and pidfd_getfd() syscalls") cc: Helge Deller <deller@gmx.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Helge Deller <deller@gmx.de> Message-Id: <20221005163826.1455313-1-laurent@vivier.eu> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-10-21linux-user: Fix more MIPS n32 syscall ABI issuesWANG Xuerui1-5/+5
In commit 80f0fe3a85 ("linux-user: Fix syscall parameter handling for MIPS n32") the ABI problem regarding offset64 on MIPS n32 was fixed, but still some cases remain where the n32 is incorrectly treated as any other 32-bit ABI that passes 64-bit arguments in pairs of GPRs. Fix by excluding TARGET_ABI_MIPSN32 from various TARGET_ABI_BITS == 32 checks. Closes: https://gitlab.com/qemu-project/qemu/-/issues/1238 Signed-off-by: WANG Xuerui <xen0n@gentoo.org> Cc: Philippe Mathieu-Daudé <f4bug@amsat.org> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Andreas K. Hüttel <dilfridge@gentoo.org> Cc: Joshua Kinard <kumba@gentoo.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Tested-by: Jiaxun Yang <jiaxun.yang@flygoat.com> Tested-by: Andreas K. Huettel <dilfridge@gentoo.org> Message-Id: <20221006085500.290341-1-xen0n@gentoo.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27linux-user: Implement PI futexesRichard Henderson1-0/+10
Define the missing FUTEX_* constants in syscall_defs.h Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220829021006.67305-6-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27linux-user: Convert signal number for FUTEX_FDRichard Henderson1-0/+1
The val argument to FUTEX_FD is a signal number. Convert to match the host, as it will be converted back when the signal is delivered. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220829021006.67305-5-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27linux-user: Implement FUTEX_WAKE_BITSETRichard Henderson1-0/+1
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220829021006.67305-4-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27linux-user: Sink call to do_safe_futexRichard Henderson1-29/+31
Leave only the argument adjustments within the shift, and sink the actual syscall to the end. Sink the timespec conversion as well, as there will be more users. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220829021006.67305-3-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27linux-user: Combine do_futex and do_futex_time64Richard Henderson1-56/+11
Pass a boolean to select between time32 and time64. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220829021006.67305-2-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27linux-user: Don't assume 0 is not a valid host timer_t valuePeter Maydell1-8/+16
For handling guest POSIX timers, we currently use an array g_posix_timers[], whose entries are a host timer_t value, or 0 for "this slot is unused". When the guest calls the timer_create syscall we look through the array for a slot containing 0, and use that for the new timer. This scheme assumes that host timer_t values can never be zero. This is unfortunately not a valid assumption -- for some host libc versions, timer_t values are simply indexes starting at 0. When using this kind of host libc, the effect is that the first and second timers end up sharing a slot, and so when the guest tries to operate on the first timer it changes the second timer instead. Rework the timer allocation code, so that: * the 'slot in use' indication uses a separate array from the host timer_t array * we grab the free slot atomically, to avoid races when multiple threads call timer_create simultaneously * releasing an allocated slot is abstracted out into a new free_host_timer_slot() function called in the correct places This fixes: * problems on hosts where timer_t 0 is valid * the FIXME in next_free_host_timer() about locking * bugs in the error paths in timer_create where we forgot to release the slot we grabbed, or forgot to free the host timer Reported-by: Jon Alduan <jon.alduan@gmail.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20220725110035.1273441-1-peter.maydell@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27linux-user: fix bug about missing signum convert of sigqueuefanwenjie1-2/+2
Fixes: 66fb9763af ("basic signal handling") Fixes: cf8b8bfc50 ("linux-user: add support for rt_tgsigqueueinfo() system call") Signed-off-by: fanwenjie <fanwj@mail.ustc.edu.cn> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27linux-user/hppa: Increase guest stack size to 80MB for hppa targetHelge Deller1-0/+4
The hppa target requires a much bigger stack than many other targets, and the Linux kernel allocates 80 MB by default for it. This patch increases the guest stack for hppa to 80MB, and prevents that this default stack size gets reduced by a lower stack limit on the host. Since the stack grows upwards on hppa, the stack_limit value marks the upper boundary of the stack. Fix the output of /proc/self/maps (in the guest) to show the [stack] marker on the correct memory area. Signed-off-by: Helge Deller <deller@gmx.de> Message-Id: <20220924114501.21767-6-deller@gmx.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-27linux-user: Add pidfd_open(), pidfd_send_signal() and pidfd_getfd() syscallsHelge Deller1-0/+34
I noticed those were missing when running the glib2.0 testsuite. Add the syscalls including the strace output. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220918194555.83535-4-deller@gmx.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-09-23linux-user: fix readlinkat handling with magic exe symlinkJameson Nash1-2/+13
Exactly the same as f17f4989fa193fa8279474c5462289a3cfe69aea before was for readlink. I suppose this was simply missed at the time. Signed-off-by: Jameson Nash <vtjnash@gmail.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220808190727.875155-1-vtjnash@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-08-18Revert "linux-user: un-parent OBJECT(cpu) when closing thread"Richard Henderson1-7/+6
This reverts commit 52f0c1607671293afcdb2acc2f83e9bccbfa74bb. This caused a regression in arm/aarch64. We are hard-coding ARMCPRegInfo pointers into TranslationBlocks, for calling into helper_{get,set}cp_reg{,64}. So we have a race condition between whichever cpu thread translates the code first (encoding the pointer), and that cpu thread exiting, so that the next execution of the TB references a freed data structure. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2022-08-16linux-user: un-parent OBJECT(cpu) when closing threadAlex Bennée1-6/+7
While forcing the CPU to unrealize by hand does trigger the clean-up code we never fully free resources because refcount never reaches zero. This is because QOM automatically added objects without an explicit parent to /unattached/, incrementing the refcount. Instead of manually triggering unrealization just unparent the object and let the device machinery deal with that for us. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/866 Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220811151413.3350684-2-alex.bennee@linaro.org>
2022-08-10linux-user: fix compat with glibc >= 2.36 sys/mount.hDaniel P. Berrangé1-0/+18
The latest glibc 2.36 has extended sys/mount.h so that it defines the FSCONFIG_* enum constants. These are historically defined in linux/mount.h, and thus if you include both headers the compiler complains: In file included from /usr/include/linux/fs.h:19, from ../linux-user/syscall.c:98: /usr/include/linux/mount.h:95:6: error: redeclaration of 'enum fsconfig_command' 95 | enum fsconfig_command { | ^~~~~~~~~~~~~~~~ In file included from ../linux-user/syscall.c:31: /usr/include/sys/mount.h:189:6: note: originally defined here 189 | enum fsconfig_command | ^~~~~~~~~~~~~~~~ /usr/include/linux/mount.h:96:9: error: redeclaration of enumerator 'FSCONFIG_SET_FLAG' 96 | FSCONFIG_SET_FLAG = 0, /* Set parameter, supplying no value */ | ^~~~~~~~~~~~~~~~~ /usr/include/sys/mount.h:191:3: note: previous definition of 'FSCONFIG_SET_FLAG' with type 'enum fsconfig_command' 191 | FSCONFIG_SET_FLAG = 0, /* Set parameter, supplying no value */ | ^~~~~~~~~~~~~~~~~ ...snip... QEMU doesn't include linux/mount.h, but it does use linux/fs.h and thus gets linux/mount.h indirectly. glibc acknowledges this problem but does not appear to be intending to fix it in the forseeable future, simply documenting it as a known incompatibility with no workaround: https://sourceware.org/glibc/wiki/Release/2.36#Usage_of_.3Clinux.2Fmount.h.3E_and_.3Csys.2Fmount.h.3E https://sourceware.org/glibc/wiki/Synchronizing_Headers To address this requires either removing use of sys/mount.h or linux/fs.h, despite QEMU needing declarations from both. This patch removes linux/fs.h, meaning we have to define various FS_IOC constants that are now unavailable. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Message-Id: <20220802164134.1851910-1-berrange@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-08-02linux-user: Use memfd for open syscall emulationRainer Müller1-8/+14
For certain paths in /proc, the open syscall is intercepted and the returned file descriptor points to a temporary file with emulated contents. If TMPDIR is not accessible or writable for the current user (for example in a read-only mounted chroot or container) tools such as ps from procps may fail unexpectedly. Trying to read one of these paths such as /proc/self/stat would return an error such as ENOENT or EROFS. To relax the requirement on a writable TMPDIR, use memfd_create() instead to create an anonymous file and return its file descriptor. Signed-off-by: Rainer Müller <raimue@codingfarm.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220729154951.76268-1-raimue@codingfarm.de> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-07-25linux-user: Use target abi_int type for pipefd[1] in pipe()Helge Deller1-1/+1
When writing back the fd[1] pipe file handle to emulated userspace memory, use sizeof(abi_int) as offset insted of the hosts's int type. There is no functional change in this patch. Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <YtQ3Id6z8slpVr7r@p100> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-07-25linux-user: Unconditionally use pipe2() syscallHelge Deller1-10/+1
The pipe2() syscall is available on all Linux platforms since kernel 2.6.27, so use it unconditionally to emulate pipe() and pipe2(). Signed-off-by: Helge Deller <deller@gmx.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <YtbZ2ojisTnzxN9Y@p100> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-07-11linux-user/aarch64: Implement PR_SME_GET_VL, PR_SME_SET_VLRichard Henderson1-0/+16
These prctl set the Streaming SVE vector length, which may be completely different from the Normal SVE vector length. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220708151540.18136-43-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-07-11linux-user: Rename sve prctlsRichard Henderson1-6/+6
Add "sve" to the sve prctl functions, to distinguish them from the coming "sme" prctls with similar names. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20220708151540.18136-42-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2022-06-24linux-user: Adjust child_tidptr on set_tid_address() syscallHelge Deller1-5/+7
Keep track of the new child tidptr given by a set_tid_address() syscall. Do not call the host set_tid_address() syscall because we are emulating the behaviour of writing to child_tidptr in the exit() path. Signed-off-by: Helge Deller<deller@gmx.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <YpH+2sw1PCRqx/te@p100> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-06-24linux-user: Add partial support for MADV_DONTNEEDIlya Leoshkevich1-6/+2
Currently QEMU ignores madvise(MADV_DONTNEED), which break apps that rely on this for zeroing out memory [1]. Improve the situation by doing a passthrough when the range in question is a host-page-aligned anonymous mapping. This is based on the patches from Simon Hausmann [2] and Chris Fallin [3]. The structure is taken from Simon's patch. The PAGE_MAP_ANONYMOUS bits are superseded by commit 26bab757d41b ("linux-user: Introduce PAGE_ANON"). In the end the patch acts like the one from Chris: we either pass-through the entire syscall, or do nothing, since doing this only partially would not help the affected applications much. Finally, add some extra checks to match the behavior of the Linux kernel [4]. [1] https://gitlab.com/qemu-project/qemu/-/issues/326 [2] https://patchew.org/QEMU/20180827084037.25316-1-simon.hausmann@qt.io/ [3] https://github.com/bytecodealliance/wasmtime/blob/v0.37.0/ci/qemu-madvise.patch [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/mm/madvise.c?h=v5.19-rc3#n1368 Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220621144205.158452-1-iii@linux.ibm.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-05-23linux-user: Remove pointless CPU{ARCH}State castsPhilippe Mathieu-Daudé1-26/+23
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220509205728.51912-4-philippe.mathieu.daude@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-05-23linux-user: Have do_syscall() use CPUArchState* instead of void*Philippe Mathieu-Daudé1-16/+16
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220509205728.51912-3-philippe.mathieu.daude@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-05-23linux-user/syscall.c: fix build without RLIMIT_RTTIMEFabrice Fontaine1-0/+2
RLIMIT_RTTIME is not provided by uclibc-ng or by musl prior to version 1.2.0 and https://github.com/bminor/musl/commit/2507e7f5312e79620f6337935d0a6c9045ccba09 resulting in the following build failure since https://git.qemu.org/?p=qemu.git;a=commit;h=244fd08323088db73590ff2317dfe86f810b51d7: ../linux-user/syscall.c: In function 'target_to_host_resource': ../linux-user/syscall.c:1057:16: error: 'RLIMIT_RTTIME' undeclared (first use in this function); did you mean 'RLIMIT_NOFILE'? 1057 | return RLIMIT_RTTIME; | ^~~~~~~~~~~~~ | RLIMIT_NOFILE Fixes: - http://autobuild.buildroot.org/results/22d3b584b704613d030e1ea9e6b709b713e4cc26 Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220523105239.1499162-1-fontaine.fabrice@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-04-06Replace TARGET_WORDS_BIGENDIANMarc-André Lureau1-3/+3
Convert the TARGET_WORDS_BIGENDIAN macro, similarly to what was done with HOST_BIG_ENDIAN. The new TARGET_BIG_ENDIAN macro is either 0 or 1, and thus should always be defined to prevent misuse. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Suggested-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-8-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-04-06Replace config-time define HOST_WORDS_BIGENDIANMarc-André Lureau1-3/+3
Replace a config-time define with a compile time condition define (compatible with clang and gcc) that must be declared prior to its usage. This avoids having a global configure time define, but also prevents from bad usage, if the config header wasn't included before. This can help to make some code independent from qemu too. gcc supports __BYTE_ORDER__ from about 4.6 and clang from 3.2. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> [ For the s390x parts I'm involved in ] Acked-by: Halil Pasic <pasic@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20220323155743.1585078-7-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-03-22linux-user: Properly handle sigset arg to ppollRichard Henderson1-17/+7
Unblocked signals are never delivered, because we didn't record the new mask for process_pending_signals. Handle this with the same mechanism as sigsuspend. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220315084308.433109-6-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-03-22linux-user: Properly handle sigset arg to epoll_pwaitRichard Henderson1-15/+7
Unblocked signals are never delivered, because we didn't record the new mask for process_pending_signals. Handle this with the same mechanism as sigsuspend. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220315084308.433109-5-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-03-22linux-user: Properly handle sigset arg to pselectRichard Henderson1-20/+10
Unblocked signals are never delivered, because we didn't record the new mask for process_pending_signals. Handle this with the same mechanism as sigsuspend. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/834 Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220315084308.433109-4-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-03-22linux-user: Split out helpers for sigsuspendRichard Henderson1-23/+17
Two new functions: process_sigsuspend_mask and finish_sigsuspend_mask. Move the size check and copy-from-user code. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220315084308.433109-3-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2022-03-22linux-user/alpha: Fix sigsuspend for big-endian hostsRichard Henderson1-1/+2
On alpha, the sigset argument for sigsuspend is in a register. When we drop that into memory that happens in host-endianness, but target_to_host_old_sigset will treat it as target-endianness. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20220315084308.433109-2-richard.henderson@linaro.org> Signed-off-by: Laurent Vivier <laurent@vivier.eu>