aboutsummaryrefslogtreecommitdiff
path: root/gdb/linux-nat.c
AgeCommit message (Collapse)AuthorFilesLines
2024-06-21[gdb/tdep] Fix gdb.base/watchpoint-running.exp on {arm,ppc64le}-linuxPedro Alves1-0/+6
When running test-case gdb.base/watchpoint-running on ppc64le-linux (and similar on arm-linux), we get: ... (gdb) watch global_var^M warning: Error when detecting the debug register interface. \ Debug registers will be unavailable.^M Watchpoint 2: global_var^M (gdb) FAIL: $exp: all-stop: hardware: watch global_var FAIL: $exp: all-stop: hardware: watchpoint hit (timeout) ... The problem is that ppc_linux_dreg_interface::detect fails to detect the hardware watchpoint interface, because the calls to ptrace return with errno set to ESRCH. This is a feature of ptrace: if a call is done while the tracee is not ptrace-stopped, it returns ESRCH. Indeed, in the test-case "watch global_var" is executed while the inferior is running, and that triggers the first call to ppc_linux_dreg_interface::detect. And because the detection failure is cached, subsequent attempts at setting hardware watchpoints will also fail, even if the tracee is ptrace-stopped. The way to fix this is to make sure that ppc_linux_dreg_interface::detect is called when we know that the thread is ptrace-stopped, which in the current setup is best addressed by using target-specific post_attach and post_startup_inferior overrides. However, as we can see in aarch64_linux_nat_target, that causes code duplication. Fix this by: - defining a new target hook low_init_process, called from linux_init_ptrace_procfs, which is called from both linux_nat_target::post_attach and linux_nat_target::post_startup_inferior, - adding implementations for ppc_linux_nat_target and arm_linux_nat_target that detect the hardware watchpoint interface, - replacing the aarch64_linux_nat_target implementations of post_attach and post_startup_inferior with a low_init_process implementation. Tested on ppc64le-linux, arm-linux, aarch64-linux and x86_64-linux. Co-Authored-By: Tom de Vries <tdevries@suse.de> Approved-By: Luis Machado <luis.machado@arm.com> PR tdep/31834 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31834 PR tdep/31705 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31705
2024-06-14gdb: move have_ptrace_getregset declaration into gdb/nat directoryAndrew Burgess1-1/+1
In a later commit I want to access have_ptrace_getregset from a .c file in the nat/ directory. To achieve this I need access to the declaration of have_ptrace_getregset. Currently have_ptrace_getregset is declared (and defined) twice, once in GDB and once in gdbserver. This commit moves the declaration into nat/linux-nat.h, but leaves the two definitions where they are. Now, in my later commit, I can pull in the declaration from nat/linux-nat.h. There should be no user visible changes after this commit. Approved-By: Felix Willgerodt <felix.willgerodt@intel.com>
2024-05-28Use bool in thread_eventsTom Tromey1-2/+2
This changes target_ops::thread_events and target_thread_events to use 'bool'. The callers were already doing this. Tested by rebuilding. Approved-By: Simon Marchi <simon.marchi@efficios.com>
2024-04-26gdb/linux-nat: Fix mem access ptrace fallback (PR threads/31579)Pedro Alves1-3/+58
Old RHEL systems have a kernel that does not support writing memory via /proc/pid/mem. On such systems, we fallback to accessing memory via ptrace. That has a few downsides described in the "Accessing inferior memory" section at the top of linux-nat.c. The target_xfer interface for memory access uses inferior_ptid as sideband argument to indicate which process to access. Memory access is process-wide, it is not thread-specific, so inferior_ptid is sometimes pointed at a process-wide ptid_t for the memory access (i.e., a ptid that looks like {pid, 0, 0}). That is the case for any code that uses scoped_restore_current_inferior_for_memory, for example. That is what causes the issue described in PR 31579, where thread_db calls into the debugger to read memory, which reaches our ps_xfer_memory function, which does: static ps_err_e ps_xfer_memory (const struct ps_prochandle *ph, psaddr_t addr, gdb_byte *buf, size_t len, int write) { scoped_restore_current_inferior_for_memory save_inferior (ph->thread->inf); ... ret = target_read_memory (core_addr, buf, len); ... } If linux_nat_target::xfer_partial falls back to inf_ptrace_target with a pid-ptid, then the ptrace code will do the ptrace call targeting pid, the leader LWP. That may fail with ESRCH if the leader is currently running, or zombie. That is the case in the scenario in question, because thread_db is consulted for an event of a non-leader thread, before we've stopped the whole process. Fix this by having the ptrace fallback code try to find a stopped LWP to use with ptrace. I chose to handle this in the linux-nat target instead of in common code because (global) memory is a process-wide property, and this avoids having to teach all the code paths that use scoped_restore_current_inferior_for_memory to find some stopped thread to access memory through, which is a ptrace quirk. That is effectively what we used to do before we started relying on writable /proc/pid/mem. I'd rather not go back there. To trigger this on modern kernels you have to hack linux-nat.c to force the ptrace fallback code, like so: --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -3921,7 +3921,7 @@ linux_nat_target::xfer_partial (enum target_object object, poke would incorrectly write memory to the post-exec address space, while the core was trying to write to the pre-exec address space. */ - if (proc_mem_file_is_writable ()) + if (0 && proc_mem_file_is_writable ()) With that hack, I was able to confirm that the fix fixes hundreds of testsuite failures. Compared to a test run with pristine master, the hack above + this commit's fix shows that some non-stop-related tests fail, but that is expected, because those are tests that need to access memory while the program is running. (I made no effort to temporarily pause an lwp if no ptrace-stopped lwp is found.) Change-Id: I24a4f558e248aff7bc7c514a88c698f379f23180 Tested-By: Hannes Domani <ssbssa@yahoo.de> Approved-By: Andrew Burgess <aburgess@redhat.com>
2024-04-25gdb: remove gdbcmd.hSimon Marchi1-1/+1
Most files including gdbcmd.h currently rely on it to access things actually declared in cli/cli-cmds.h (setlist, showlist, etc). To make things easy, replace all includes of gdbcmd.h with includes of cli/cli-cmds.h. This might lead to some unused includes of cli/cli-cmds.h, but it's harmless, and much faster than going through the 170 or so files by hand. Change-Id: I11f884d4d616c12c05f395c98bbc2892950fb00f Approved-By: Tom Tromey <tom@tromey.com>
2024-04-17gdb+gdbserver/Linux: Remove USE_SIGTRAP_SIGINFO fallbackPedro Alves1-51/+2
It's been over 9 years (since commit faf09f0119da) since Linux GDB and GDBserver started relying on SIGTRAP si_code to tell whether a breakpoint triggered, which is important for non-stop mode. When that then-new code was added, I had left the then-old code as fallback, in case some architectured still needed it. Given AFAIK there haven't been complaints since, this commit finally removes the fallback code, along with USE_SIGTRAP_SIGINFO. Change-Id: I140a5333a9fe70e90dbd186aca1f081549b2e63d
2024-04-16Remove excess whitespace from doc strings of some commandsEli Zaretskii1-2/+2
I've noticed that doc strings of some commands, like "set cwd" and "set inferior-tty", have some excess whitespace, which makes them display with unexpected indentation, at least in a Windows command prompt window. This patch fixes that. * gdb/linux-nat.c (_initialize_linux_nat): * gdb/riscv-tdep.c (riscv_insn): * gdb/top.c (quit_force): * gdb/infcmd.c (_initialize_infcmd): Remove excess whitespace.
2024-03-26gdb, gdbserver, gdbsupport: remove includes of early headersSimon Marchi1-1/+0
Now that defs.h, server.h and common-defs.h are included via the `-include` option, it is no longer necessary for source files to include them. Remove all the inclusions of these files I could find. Update the generation scripts where relevant. Change-Id: Ia026cff269c1b7ae7386dd3619bc9bb6a5332837 Approved-By: Pedro Alves <pedro@palves.net>
2024-02-26[gdb] Fix heap-use-after-free in select_event_lwpPedro Alves1-9/+25
PR gdb/31259 reveals one scenario where we run into a heap-use-after-free reported by thread sanitizer, while running gdb.base/vfork-follow-parent.exp. The heap-use-after-free happens during the following scenario: - linux_nat_wait_1 is about to return an event for T2. It stops all other threads, and while doing so, stop_wait_callback -> wait_lwp sees T1 exit, and decides to leave the exit event pending. It should have set the lp->stopped flag too, but does not -- this is the bug. - The event for T2 is reported, is processed by infrun, and we're back at linux_nat_wait_1. - linux_nat_wait_1 selects LWP T1 with the pending exit status to report. - it sets variable lp to point to the corresponding lwp_info. - it calls stop_callback and stop_wait_callback for all threads (because !target_is_non_stop_p ()). - it calls select_event_lwp to maybe pick another thread than T1, to prevent starvation. The problem is the following: - while calling stop_wait_callback for all threads, it also does this for T1. While doing so, the corresponding lwp_info is deleted (callstack stop_wait_callback -> wait_lwp -> exit_lwp -> delete_lwp), leaving variable lp as a dangling pointer. - variable lp is passed to select_event_lwp, which derefences it, which causes the heap-use-after-free. Note that the comment here mentions "all other LWP's": ... /* Now stop all other LWP's ... */ iterate_over_lwps (minus_one_ptid, stop_callback); /* ... and wait until all of them have reported back that they're no longer running. */ iterate_over_lwps (minus_one_ptid, stop_wait_callback); ... The reason the comments say "all other LWP's", and doesn't bother filtering out LP is that lp->stopped should be true at this point, and the callbacks (both stop_callback and stop_wait_callback) check that flag, and do nothing if set. I.e., they skip already-stopped threads, so they should skip LP. In this particular scenario, though, we missed setting the stopped flag right in the first step described above, so LP was iterated over incorrectly. The fix is to make wait_lwp set the lp->stopped flag when it decides to leave the exit event pending. However, going a bit further, gdbserver has a mark_lwp_dead function to centralize setting up various lwp flags such that the rest of the code doesn't mishandle them, and it seems like a good idea to do a similar thing in gdb as well. That is what this patch does. PR gdb/31259 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31259 Co-Authored-By: Tom de Vries <tdevries@suse.de> Change-Id: I4a6169976f89bf714c478cbb2b7d4c32365e62a9
2024-02-23gdb/linux-nat.c: Add "Accessing inferior memory" sectionPedro Alves1-1/+57
This commit adds a new "Accessing inferior memory" comment section to gdb/linux-nat.c that explains why we prefer /proc/pid/mem over alternatives. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30453 Change-Id: I575b21ed697a85f3ff4c0ec58c04812db5005b76
2024-01-12Update copyright year range in header of all files managed by GDBAndrew Burgess1-1/+1
This commit is the result of the following actions: - Running gdb/copyright.py to update all of the copyright headers to include 2024, - Manually updating a few files the copyright.py script told me to update, these files had copyright headers embedded within the file, - Regenerating gdbsupport/Makefile.in to refresh it's copyright date, - Using grep to find other files that still mentioned 2023. If these files were updated last year from 2022 to 2023 then I've updated them this year to 2024. I'm sure I've probably missed some dates. Feel free to fix them up as you spot them.
2023-12-02Fix detach bug when lwp has exited/terminatedKevin Buettner1-2/+16
When using GDB on native linux, it can happen that, while attempting to detach an inferior, the inferior may have been exited or have been killed, yet still be in the list of lwps. Should that happen, the assert in x86_linux_update_debug_registers in gdb/nat/x86-linux-dregs.c will trigger. The line in question looks like this: gdb_assert (lwp_is_stopped (lwp)); For this case, the lwp isn't stopped - it's dead. The bug which brought this problem to my attention is one in which the pwntools library uses GDB to to debug a process; as the script is shutting things down, it kills the process that GDB is debugging and also sends GDB a SIGTERM signal, which causes GDB to detach all inferiors prior to exiting. Here's a link to the bug: https://bugzilla.redhat.com/show_bug.cgi?id=2192169 The following shell command mimics part of what the pwntools reproducer script does (with regard to shutting things down), but reproduces the bug much less reliably. I have found it necessary to run the command a bunch of times before seeing the bug. (I usually see it within 5-10 repetitions.) If you choose to try this command, make sure that you have no running "cat" or "gdb" processes first! cat </dev/zero >/dev/null & \ (sleep 5; (kill -KILL `pgrep cat` & kill -TERM `pgrep gdb`)) & \ sleep 1 ; \ gdb -q -iex 'set debuginfod enabled off' -ex 'set height 0' \ -ex c /usr/bin/cat `pgrep cat` So, basically, the idea here is to kill both gdb and cat at roughly the same time. If we happen to attempt the detach before the process lwp has been deleted from GDB's (linux native) LWP data structures, then the assert will trigger. The relevant part of the backtrace looks like this: #8 0x00000000008a83ae in x86_linux_update_debug_registers (lwp=0x1873280) at gdb/nat/x86-linux-dregs.c:146 #9 0x00000000008a862f in x86_linux_prepare_to_resume (lwp=0x1873280) at gdb/nat/x86-linux.c:81 #10 0x000000000048ea42 in x86_linux_nat_target::low_prepare_to_resume ( this=0x121eee0 <the_amd64_linux_nat_target>, lwp=0x1873280) at gdb/x86-linux-nat.h:70 #11 0x000000000081a452 in detach_one_lwp (lp=0x1873280, signo_p=0x7fff8ca3441c) at gdb/linux-nat.c:1374 #12 0x000000000081a85f in linux_nat_target::detach ( this=0x121eee0 <the_amd64_linux_nat_target>, inf=0x16e8f70, from_tty=0) at gdb/linux-nat.c:1450 #13 0x000000000083a23b in thread_db_target::detach ( this=0x1206ae0 <the_thread_db_target>, inf=0x16e8f70, from_tty=0) at gdb/linux-thread-db.c:1385 #14 0x0000000000a66722 in target_detach (inf=0x16e8f70, from_tty=0) at gdb/target.c:2526 #15 0x0000000000a8f0ad in kill_or_detach (inf=0x16e8f70, from_tty=0) at gdb/top.c:1659 #16 0x0000000000a8f4fa in quit_force (exit_arg=0x0, from_tty=0) at gdb/top.c:1762 #17 0x000000000070829c in async_sigterm_handler (arg=0x0) at gdb/event-top.c:1141 My colleague, Andrew Burgess, has done some recent work on other problems with detach. Upon hearing of this problem, he came up a test case which reliably reproduces the problem and tests for a few other problems as well. In addition to testing detach when the inferior has terminated due to a signal, it also tests detach when the inferior has exited normally. Andrew observed that the linux-native-only "checkpoint" command would be affected too, so the test also tests those cases when there's an active checkpoint. For the LWP exit / termination case with no checkpoint, that's handled via newly added checks of the waitstatus in detach_one_lwp in linux-nat.c. For the checkpoint detach problem, I chose to pass the lwp_info to linux_fork_detach in linux-fork.c. With that in place, suitable tests were added before attempting a PTRACE_DETACH operation. I added a few asserts at the beginning of linux_fork_detach and modified the caller code so that the newly added asserts shouldn't trigger. (That's what the 'pid == inferior_ptid.pid' check is about in gdb/linux-nat.c.) Lastly, I'll note that the checkpoint code needs some work with regard to background execution. This patch doesn't attempt to fix that problem, but it doesn't make it any worse. It does slightly improve the situation with detach because, due to the check noted above, linux_fork_detach() won't be called for the wrong inferior when there are multiple inferiors. (There are at least two other problems with the checkpoint code when there are multiple inferiors. See: https://sourceware.org/bugzilla/show_bug.cgi?id=31065) This commit also adds a new test, gdb.base/process-dies-while-detaching.exp. Andrew Burgess is the primary author of this test case. Its design is similar to that of gdb.threads/main-thread-exit-during-detach.exp, which was also written by Andrew. This test checks that GDB correctly handles several cases that can occur when GDB attempts to detach an inferior process. The process can exit or be terminated (e.g. via SIGKILL) prior to GDB's event loop getting a chance to remove it from GDB's internal data structures. To complicate things even more, detach works differently when a checkpoint (created via GDB's "checkpoint" command) exists for the inferior. This test checks all four possibilities: process exit with no checkpoint, process termination with no checkpoint, process exit with a checkpoint, and process termination with a checkpoint. Co-Authored-By: Andrew Burgess <aburgess@redhat.com> Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-12-01Bail out of "attach" if a thread cannot be tracedTom Tromey1-11/+38
On Linux, threads are treated much like separate processes by the kernel. In particular, it's possible to ptrace just a single thread. If gdb tries to attach to a multi-threaded inferior, where a non-main thread is already being traced (e.g., by strace), then gdb will get into an infinite loop attempting to attach. This patch fixes this problem by having the attach fail if ptrace fails to attach to any thread of the inferior.
2023-11-28[gdb] Fix segfault in for_each_block, part 1Tom de Vries1-1/+1
When running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise on s390x), I run into: ... (gdb) PASS: gdb.base/vfork-follow-parent.exp: \ exec_file=vfork-follow-parent-exit: target-non-stop=on: non-stop=off: \ resolution_method=schedule-multiple: print unblock_parent = 1 continue^M Continuing.^M Reading symbols from vfork-follow-parent-exit...^M ^M ^M Fatal signal: Segmentation fault^M ----- Backtrace -----^M 0x1027d3e7 gdb_internal_backtrace_1^M src/gdb/bt-utils.c:122^M 0x1027d54f _Z22gdb_internal_backtracev^M src/gdb/bt-utils.c:168^M 0x1057643f handle_fatal_signal^M src/gdb/event-top.c:889^M 0x10576677 handle_sigsegv^M src/gdb/event-top.c:962^M 0x3fffa7610477 ???^M 0x103f2144 for_each_block^M src/gdb/dcache.c:199^M 0x103f235b _Z17dcache_invalidateP13dcache_struct^M src/gdb/dcache.c:251^M 0x10bde8c7 _Z24target_dcache_invalidatev^M src/gdb/target-dcache.c:50^M ... or similar. The root cause for the segmentation fault is that linux_is_uclinux gives an incorrect result: it should always return false, given that we're running on a regular linux system, but instead it returns first true, then false. In more detail, the segmentation fault happens as follows: - a program space with an address space is created - a second program space is about to be created. maybe_new_address_space is called, and because linux_is_uclinux returns true, maybe_new_address_space returns false, and no new address space is created - a second program space with the same address space is created - a program space is deleted. Because linux_is_uclinux now returns false, gdbarch_has_shared_address_space (current_inferior ()->arch ()) returns false, and the address space is deleted - when gdb uses the address space of the remaining program space, we run into the segfault, because the address space is deleted. Hardcoding linux_is_uclinux to false makes the test-case pass. We leave addressing the root cause for the following commit in this series. For now, prevent the segmentation fault by making the address space a refcounted object. This was already suggested here [1]: ... A better solution might be to have the address spaces be reference counted ... Tested on top of trunk on x86_64-linux and ppc64le-linux. Tested on top of gdb-14-branch on ppc64-linux. Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca> PR gdb/30547 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547 [1] https://sourceware.org/pipermail/gdb-patches/2023-October/202928.html
2023-11-21gdb: Replace gdb::optional with std::optionalLancelot Six1-5/+5
Since GDB now requires C++17, we don't need the internally maintained gdb::optional implementation. This patch does the following replacing: - gdb::optional -> std::optional - gdb::in_place -> std::in_place - #include "gdbsupport/gdb_optional.h" -> #include <optional> This change has mostly been done automatically. One exception is gdbsupport/thread-pool.* which did not use the gdb:: prefix as it already lives in the gdb namespace. Change-Id: I19a92fa03e89637bab136c72e34fd351524f65e9 Approved-By: Tom Tromey <tom@tromey.com> Approved-By: Pedro Alves <pedro@palves.net>
2023-11-17gdb: remove regcache's address spaceSimon Marchi1-39/+17
While looking at the regcache code, I noticed that the address space (passed to regcache when constructing it, and available through regcache::aspace) wasn't relevant for the regcache itself. Callers of regcache::aspace use that method because it appears to be a convenient way of getting the address space for a thread, if you already have the regcache. But there is always another way to get the address space, as the callers pretty much always know which thread they are dealing with. The regcache code itself doesn't use the address space. This patch removes anything related to address_space from the regcache code, and updates callers to get it from the thread in context. This removes a bit of unnecessary complexity from the regcache code. The current get_thread_arch_regcache function gets an address_space for the given thread using the target_thread_address_space function (which calls the target_ops::thread_address_space method). This suggest that there might have been the intention of supporting per-thread address spaces. But digging through the history, I did not find any such case. Maybe this method was just added because we needed a way to get an address space from a ptid (because constructing a regcache required an address space), and this seemed like the right way to do it, I don't know. The only implementations of thread_address_space and process_stratum_target::thread_address_space and linux_nat_target::thread_address_space, which essentially just return the inferior's address space. And thread_address_space is only used in the current get_thread_arch_regcache, which gets removed. So, I think that the thread_address_space target method can be removed, and we can assume that it's fine to use the inferior's address space everywhere. Callers of regcache::aspace are updated to get the address space from the relevant inferior, either using some context they already know about, or in last resort using the current global context. So, to summarize: - remove everything in regcache related to address spaces - in particular, remove get_thread_arch_regcache, and rename get_thread_arch_aspace_regcache to get_thread_arch_regcache - remove target_ops::thread_address_space, and target_thread_address_space - adjust all users of regcache::aspace to get the address space another way Change-Id: I04fd41b22c83fe486522af7851c75bcfb31c88c7
2023-11-13Implement GDB_THREAD_OPTION_EXIT support for native LinuxPedro Alves1-13/+31
This implements support for the new GDB_THREAD_OPTION_EXIT thread option for native Linux. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ia69fc0b9b96f9af7de7cefc1ddb1fba9bbb0bb90 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338
2023-11-13Move deleting thread on TARGET_WAITKIND_THREAD_EXITED to corePedro Alves1-7/+14
Currently, infrun assumes that when TARGET_WAITKIND_THREAD_EXITED is reported, the corresponding GDB thread has already been removed from the GDB thread list. Later in the series, that will no longer work, as infrun will need to refer to the thread's thread_info when it processes TARGET_WAITKIND_THREAD_EXITED. As preparation, this patch makes deleting the GDB thread responsibility of infrun, instead of the target. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I013d87f61ffc9aaca49f0d6ce2a43e3ea69274de
2023-11-13Thread options & clone events (native Linux)Pedro Alves1-0/+7
This commit teaches the native Linux target about the GDB_THREAD_OPTION_CLONE thread option. It's actually simpler to just continue reporting all clone events unconditionally to the core. There's never any harm in reporting a clone event when the option is disabled. All we need to do is to report support for the option, otherwise GDB falls back to use target_thread_events(). Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: If90316e2dcd0c61d0fefa0d463c046011698acf9
2023-11-13Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONEDPedro Alves1-120/+132
(A good chunk of the problem statement in the commit log below is Andrew's, adjusted for a different solution, and for covering displaced stepping too. The testcase is mostly Andrew's too.) This commit addresses bugs gdb/19675 and gdb/27830, which are about stepping over a breakpoint set at a clone syscall instruction, one is about displaced stepping, and the other about in-line stepping. Currently, when a new thread is created through a clone syscall, GDB sets the new thread running. With 'continue' this makes sense (assuming no schedlock): - all-stop mode, user issues 'continue', all threads are set running, a newly created thread should also be set running. - non-stop mode, user issues 'continue', other pre-existing threads are not affected, but as the new thread is (sort-of) a child of the thread the user asked to run, it makes sense that the new threads should be created in the running state. Similarly, if we are stopped at the clone syscall, and there's no software breakpoint at this address, then the current behaviour is fine: - all-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). While stepping the thread of interest all the other threads will be allowed to continue. A newly created thread will be set running, and then stopped once the thread of interest has completed its step. - non-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). Other threads might be running or stopped, but as with the continue case above, the new thread will be created running. The only possible issue here is that the new thread will be left running after the initial thread has completed its stepi. The user would need to manually select the thread and interrupt it, this might not be what the user expects. However, this is not something this commit tries to change. The problem then is what happens when we try to step over a clone syscall if there is a breakpoint at the syscall address. - For both all-stop and non-stop modes, with in-line stepping: + user issues 'stepi', + [non-stop mode only] GDB stops all threads. In all-stop mode all threads are already stopped. + GDB removes s/w breakpoint at syscall address, + GDB single steps just the thread of interest, all other threads are left stopped, + New thread is created running, + Initial thread completes its step, + [non-stop mode only] GDB resumes all threads that it previously stopped. There are two problems in the in-line stepping scenario above: 1. The new thread might pass through the same code that the initial thread is in (i.e. the clone syscall code), in which case it will fail to hit the breakpoint in clone as this was removed so the first thread can single step, 2. The new thread might trigger some other stop event before the initial thread reports its step completion. If this happens we end up triggering an assertion as GDB assumes that only the thread being stepped should stop. The assert looks like this: infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. - For both all-stop and non-stop modes, with displaced stepping: + user issues 'stepi', + GDB starts the displaced step, moves thread's PC to the out-of-line scratch pad, maybe adjusts registers, + GDB single steps the thread of interest, [non-stop mode only] all other threads are left as they were, either running or stopped. In all-stop, all other threads are left stopped. + New thread is created running, + Initial thread completes its step, GDB re-adjusts its PC, restores/releases scratchpad, + [non-stop mode only] GDB resumes the thread, now past its breakpoint. + [all-stop mode only] GDB resumes all threads. There is one problem with the displaced stepping scenario above: 3. When the parent thread completed its step, GDB adjusted its PC, but did not adjust the child's PC, thus that new child thread will continue execution in the scratch pad, invoking undefined behavior. If you're lucky, you see a crash. If unlucky, the inferior gets silently corrupted. What is needed is for GDB to have more control over whether the new thread is created running or not. Issue #1 above requires that the new thread not be allowed to run until the breakpoint has been reinserted. The only way to guarantee this is if the new thread is held in a stopped state until the single step has completed. Issue #3 above requires that GDB is informed of when a thread clones itself, and of what is the child's ptid, so that GDB can fixup both the parent and the child. When looking for solutions to this problem I considered how GDB handles fork/vfork as these have some of the same issues. The main difference between fork/vfork and clone is that the clone events are not reported back to core GDB. Instead, the clone event is handled automatically in the target code and the child thread is immediately set running. Note we have support for requesting thread creation events out of the target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported for the new/child thread. That would be sufficient to address in-line stepping (issue #1), but not for displaced-stepping (issue #3). To handle displaced-stepping, we need an event that is reported to the _parent_ of the clone, as the information about the displaced step is associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED includes no indication of which thread is the parent that spawned the new child. In fact, for some targets, like e.g., Windows, it would be impossible to know which thread that was, as thread creation there doesn't work by "cloning". The solution implemented here is to model clone on fork/vfork, and introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except that we end up with a new thread in the same process, instead of a new thread of a new process. Like FORKED and VFORKED, THREAD_CLONED waitstatuses have a child_ptid property, and the child is held stopped until GDB explicitly resumes it. This addresses the in-line stepping case (issues #1 and #2). The infrun code that handles displaced stepping fixup for the child after a fork/vfork event is thus reused for THREAD_CLONE, with some minimal conditions added, addressing the displaced stepping case (issue #3). The native Linux backend is adjusted to unconditionally report TARGET_WAITKIND_THREAD_CLONED events to the core. Following the follow_fork model in core GDB, we introduce a target_follow_clone target method, which is responsible for making the new clone child visible to the rest of GDB. Subsequent patches will add clone events support to the remote protocol and gdbserver. displaced_step_in_progress_thread becomes unused with this patch, but a new use will reappear later in the series. To avoid deleting it and readding it back, this patch marks it with attribute unused, and the latter patch removes the attribute again. We need to do this because the function is static, and with no callers, the compiler would warn, (error with -Werror), breaking the build. This adds a new gdb.threads/stepi-over-clone.exp testcase, which exercises stepping over a clone syscall, with displaced stepping vs inline stepping, and all-stop vs non-stop. We already test stepping over clone syscalls with gdb.base/step-over-syscall.exp, but this test uses pthreads, while the other test uses raw clone, and this one is more thorough. The testcase passes on native GNU/Linux, but fails against GDBserver. GDBserver will be fixed by a later patch in the series. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2023-11-13gdb/linux: Delete all other LWPs immediately on ptrace exec eventPedro Alves1-0/+15
I noticed that on an Ubuntu 20.04 system, after a following patch ("Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd end up with four new unexpected GDB core dumps: === gdb Summary === # of unexpected core files 4 # of expected passes 48 That said patch is making the pre-existing gdb.threads/step-over-exec.exp testcase (almost silently) expose a latent problem in gdb/linux-nat.c, resulting in a GDB crash when: #1 - a non-leader thread execs #2 - the post-exec program stops somewhere #3 - you kill the inferior Instead of #3 directly, the testcase just returns, which ends up in gdb_exit, tearing down GDB, which kills the inferior, and is thus equivalent to #3 above. Vis (after said patch is applied): $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true ... (top-gdb) r ... (gdb) b main ... (gdb) r ... Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 69 argv0 = argv[0]; (gdb) c Continuing. [New Thread 0x7ffff7d89700 (LWP 2506975)] Other going in exec. Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 28 foo (); (gdb) k ... Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 393 return m_suspend.waitstatus_pending_p; (top-gdb) bt #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 #3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 #4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 #5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590 #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 ... The root of the problem is that when a non-leader LWP execs, it just changes its tid to the tgid, replacing the pre-exec leader thread, becoming the new leader. There's no thread exit event for the execing thread. It's as if the old pre-exec LWP vanishes without trace. The ptrace man page says: "PTRACE_O_TRACEEXEC (since Linux 2.5.46) Stop the tracee at the next execve(2). A waitpid(2) by the tracer will return a status value such that status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) If the execing thread is not a thread group leader, the thread ID is reset to thread group leader's ID before this stop. Since Linux 3.0, the former thread ID can be retrieved with PTRACE_GETEVENTMSG." When the core of GDB processes an exec events, it deletes all the threads of the inferior. But, that is too late -- deleting the thread does not delete the corresponding LWP, so we end leaving the pre-exec non-leader LWP stale in the LWP list. That's what leads to the crash above -- linux_nat_target::kill iterates over all LWPs, and after the patch in question, that code will look for the corresponding thread_info for each LWP. For the pre-exec non-leader LWP still listed, won't find one. This patch fixes it, by deleting the pre-exec non-leader LWP (and thread) from the LWP/thread lists as soon as we get an exec event out of ptrace. GDBserver does not need an equivalent fix, because it is already doing this, as side effect of mourning the pre-exec process, in gdbserver/linux-low.cc: else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) { ... /* Delete the execing process and all its threads. */ mourn (proc); switch_to_thread (nullptr); The crash with gdb.threads/step-over-exec.exp is not observable on newer systems, which postdate the glibc change to move "libpthread.so" internals to "libc.so.6", because right after the exec, GDB traps a load event for "libc.so.6", which leads to GDB trying to open libthread_db for the post-exec inferior, and, on such systems that succeeds. When we load libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps, and then waits to see their stops. While doing this, GDB detects that the pre-exec stale LWP is gone, and deletes it. If we use "catch exec" to stop right at the exec before the "libc.so.6" load event ever happens, and issue "kill" right there, then GDB crashes on newer systems as well. So instead of tweaking gdb.threads/step-over-exec.exp to cover the fix, add a new gdb.threads/threads-after-exec.exp testcase that uses "catch exec". The test also uses the new "maint info linux-lwps" command if testing on Linux native, which also exposes the stale LWP problem with an unfixed GDB. Also tweak a comment in infrun.c:follow_exec referring to how linux-nat.c used to behave, as it would become stale otherwise. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
2023-11-13Add "maint info linux-lwps" commandAndrew Burgess1-0/+46
This adds a maintenance command that lets you list all the LWPs under control of the linux-nat target. For example: (gdb) maint info linux-lwps LWP Ptid Thread ID 560948.561047.0 None 560948.560948.0 1.1 This shows that "560948.561047.0" LWP doesn't map to any thread_info object, which is bogus. We'll be using this in a testcase in a following patch. Co-Authored-By: Pedro Alves <pedro@palves.net> Change-Id: Ic4e9e123385976e5cd054391990124b7a20fb3f5
2023-10-26gdb: handle main thread exiting during detachAndrew Burgess1-7/+17
Overview ======== Consider the following situation, GDB is in non-stop mode, the main thread is running while a second thread is stopped. The user has the second thread selected as the current thread and asks GDB to detach. At the exact moment of detach the main thread exits. This situation currently causes crashes, assertion failures, and unexpected errors to be reported from GDB for both native and remote targets. This commit addresses this situation for native and remote targets. There are a number of different fixes, but all are required in order to get this functionality working correct for native and remote targets. Native Linux Target =================== For the native Linux target, detaching is handled in the function linux_nat_target::detach. In here we call stop_wait_callback for each thread, and it is this callback that will spot that the main thread has exited. GDB then detaches from everything except the main thread by calling detach_callback. After this the first problem is this assert: /* Only the initial process should be left right now. */ gdb_assert (num_lwps (pid) == 1); The num_lwps call will return 0 as the main thread has exited and all of the other threads have now been detached. I fix this by changing the assert to allow for 0 or 1 lwps at this point. As the 0 case can only happen in non-stop mode, the assert becomes: gdb_assert (num_lwps (pid) == 1 || (target_is_non_stop_p () && num_lwps (pid) == 0)); The next problem is that we do: main_lwp = find_lwp_pid (ptid_t (pid)); and then proceed assuming that main_lwp is not nullptr. In the case that the main thread has exited though, main_lwp will be nullptr. However, we only need main_lwp so that GDB can detach from the thread. If the main thread has exited, and GDB has already detached from every other thread, then GDB has finished detaching, GDB can skip the calls that try to detach from the main thread, and then tell the user that the detach was a success. For Remote Targets ================== On remote targets there are two problems. First is that when the exit occurs during the early phase of the detach, we see the stop notification arrive while GDB is removing the breakpoints ahead of the detach. The 'set debug remote on' trace looks like this: [remote] Sending packet: $z0,7f1648fe0241,1#35 [remote] Notification received: Stop:W0;process:2a0ac8 # At this point an unpatched gdbserver segfaults, and the connection # is broken. A patched gdbserver continues as below... [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff00a8,1#68 [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff132f,1#6b [remote] Packet received: E01 [remote] Sending packet: $D;2a0ac8#3e [remote] Packet received: E01 I was originally running into Segmentation Faults, from within gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This function calls current_process() and then dereferences the result to find the breakpoint list. However, in our case, the current process has already exited, and so the current_process() call returns nullptr. At the point of failure, the gdbserver backtrace looks like this: #0 0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982 #1 0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093 #2 0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372 #3 0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498 ... The problem is that, as a result non-stop being on, the process exiting is only reported back to GDB after the request to remove a breakpoint has been sent. Clearly gdbserver can't actually remove this breakpoint -- the process has already exited -- so I think the best solution is for gdbserver just to report an error, which is what I've done. The second problem I ran into was on the gdb side, as the process has already exited, but GDB has not yet acknowledged the exit event, the detach -- the 'D' packet in the above trace -- fails. This was being reported to the user with a 'Can't detach process' error. As the test actually calls detach from Python code, this error was then becoming a Python exception. Though clearly the detach has returned an error, and so, maybe, having GDB throw an error would be fine, I think in this case, there's a good argument that the remote error can be ignored -- if GDB tries to detach and gets back an error, and if there's a pending exit event for the pid we tried to detach, then just ignore the error and pretend the detach worked fine. We could possibly check for a pending exit event before sending the detach packet, however, I believe that it might be possible (in non-stop mode) for the stop notification to arrive after the detach is sent, but before gdbserver has started processing the detach. In this case we would still need to check for pending stop events after seeing the detach fail, so I figure there's no point having two checks -- we just send the detach request, and if it fails, check to see if the process has already exited. Testing ======= In order to test this issue I needed to ensure that the exit event arrives at the same time as the detach call. The window of opportunity for getting the exit to arrive is so small I've never managed to trigger this in real use -- I originally spotted this issue while working on another patch, which did manage to trigger this issue. However, if we trigger both the exit and the detach from a single Python function then we never return to GDB's event loop, as such GDB never processes the exit event, and so the first time GDB gets a chance to see the exit is during the detach call. And so that is the approach I've taken for testing this patch. Tested-By: Kevin Buettner <kevinb@redhat.com> Approved-By: Kevin Buettner <kevinb@redhat.com>
2023-10-10gdb: remove target_gdbarchSimon Marchi1-4/+6
This function is just a wrapper around the current inferior's gdbarch. I find that having that wrapper just obscures where the arch is coming from, and that it's often used as "I don't know which arch to use so I'll use this magical target_gdbarch function that gets me an arch" when the arch should in fact come from something in the context (a thread, objfile, symbol, etc). I think that removing it and inlining `current_inferior ()->arch ()` everywhere will make it a bit clearer where that arch comes from and will trigger people into reflecting whether this is the right place to get the arch or not. Change-Id: I79f14b4e4934c88f91ca3a3155f5fc3ea2fadf6b Reviewed-By: John Baldwin <jhb@FreeBSD.org> Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-09-20Remove explanatory comments from includesTom Tromey1-7/+7
I noticed a comment by an include and remembered that I think these don't really provide much value -- sometimes they are just editorial, and sometimes they are obsolete. I think it's better to just remove them. Tested by rebuilding. Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-08-23gdb: centralize "[Thread ...exited]" notificationsPedro Alves1-7/+1
Currently, each target backend is responsible for printing "[Thread ...exited]" before deleting a thread. This leads to unnecessary differences between targets, like e.g. with the remote target, we never print such messages, even though we do print "[New Thread ...]". E.g., debugging the gdb.threads/attach-many-short-lived-threads.exp with gdbserver, letting it run for a bit, and then pressing Ctrl-C, we currently see: (gdb) c Continuing. ^C[New Thread 3850398.3887449] [New Thread 3850398.3887500] [New Thread 3850398.3887551] [New Thread 3850398.3887602] [New Thread 3850398.3887653] ... Thread 1 "attach-many-sho" received signal SIGINT, Interrupt. 0x00007ffff7e6a23f in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fffffffda80, rem=rem@entry=0x7fffffffda80) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 78 in ../sysdeps/unix/sysv/linux/clock_nanosleep.c (gdb) Above, we only see "New Thread" notifications, even though threads were deleted. After this patch, we'll see: (gdb) c Continuing. ^C[Thread 3558643.3577053 exited] [Thread 3558643.3577104 exited] [Thread 3558643.3577155 exited] [Thread 3558643.3579603 exited] ... [New Thread 3558643.3597415] [New Thread 3558643.3600015] [New Thread 3558643.3599965] ... Thread 1 "attach-many-sho" received signal SIGINT, Interrupt. 0x00007ffff7e6a23f in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fffffffda80, rem=rem@entry=0x7fffffffda80) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 78 in ../sysdeps/unix/sysv/linux/clock_nanosleep.c (gdb) q This commit fixes this by moving the thread exit printing to common code instead, triggered from within delete_thread (or rather, set_thread_exited). There's one wrinkle, though. While most targest want to print: [Thread ... exited] the Windows target wants to print: [Thread ... exited with code <exit_code>] ... and sometimes wants to suppress the notification for the main thread. To address that, this commits adds a delete_thread_with_code function, only used by that target (so far). This fix was originally posted as part of a larger series: https://inbox.sourceware.org/gdb-patches/20221212203101.1034916-1-pedro@palves.net/ But didn't really need to be part of that series. In order to get this fix merged sooner, I (Andrew Burgess) have rebased this commit outside of the original series. Any bugs introduced while splitting this patch out and rebasing, are entirely my own. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30129 Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
2023-07-23gdb: two changes to linux_nat_debug_printf calls in linux-nat.cAndrew Burgess1-7/+7
This commit adjusts some of the debug output in linux-nat.c, but makes no other functional changes to GDB. In resume_lwp I've added the word "sibling" to one of the debug messages. All the other debug messages in this function talk about operating on the sibling thread, so I think it makes sense, for consistency, if the message I've updated also talks about the sibling thread. In resume_stopped_resumed_lwps I've reordered the condition checks so that the vfork-parent check now happens after the checks for whether the thread is already resumed or not. This makes no functional difference to GDB, but does, I think, mean we see more helpful debug messages first. Consider the situation where a vfork-parent thread is already resumed, and resume_stopped_resumed_lwps is called. Previously the message saying that the thread was not being resumed due to being a vfork-parent, was printed. This might give the impression that the thread is left in a not resumed state, which is misleading. After this change we now get a message saying that the thread is not being resumed due to it not being stopped (i.e. is already resumed). With this message the already resumed nature of the thread is much clearer. I found this change helpful when debugging some vfork related issues.
2023-07-17gdb: additional debug output in infrun.c and linux-nat.cAndrew Burgess1-4/+19
While working on some of the recent patches relating to vfork handling I found this debug output very helpful, I'd like to propose adding this into GDB. With debug turned off there should be no user visible changes after this commit.
2023-07-17gdb: don't restart vfork parent while waiting for child to finishAndrew Burgess1-1/+8
While working on a later patch, which changes gdb.base/foll-vfork.exp, I noticed that sometimes I would hit this assert: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. I eventually tracked it down to a combination of schedule-multiple mode being on, target-non-stop being off, follow-fork-mode being set to child, and some bad timing. The failing case is pretty simple, a single threaded application performs a vfork, the child process then execs some other application while the parent process (once the vfork child has completed its exec) just exits. As best I understand things, here's what happens when things go wrong: 1. The parent process performs a vfork, GDB sees the VFORKED event and creates an inferior and thread for the vfork child, 2. GDB resumes the vfork child process. As schedule-multiple is on and target-non-stop is off, this is translated into a request to start all processes (see user_visible_resume_ptid), 3. In the linux-nat layer we spot that one of the threads we are about to start is a vfork parent, and so don't start that thread (see resume_lwp), the vfork child thread is resumed, 4. GDB waits for the next event, eventually entering linux_nat_target::wait, which in turn calls linux_nat_wait_1, 5. In linux_nat_wait_1 we eventually call resume_stopped_resumed_lwps, this should restart threads that have stopped but don't actually have anything interesting to report. 6. Unfortunately, resume_stopped_resumed_lwps doesn't check for vfork parents like resume_lwp does, so at this point the vfork parent is resumed. This feels like the start of the bug, and this is where I'm proposing to fix things, but, resuming the vfork parent isn't the worst thing in the world because.... 7. As the vfork child is still alive the kernel holds the vfork parent stopped, 8. Eventually the child performs its exec and GDB is sent and EXECD event. However, because the parent is resumed, as soon as the child performs its exec the vfork parent also sends a VFORK_DONE event to GDB, 9. Depending on timing both of these events might seem to arrive in GDB at the same time. Normally GDB expects to see the EXECD or EXITED/SIGNALED event from the vfork child before getting the VFORK_DONE in the parent. We know this because it is as a result of the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see handle_vfork_child_exec_or_exit for details). Further the comment in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that when we remain attached to the child (not the parent) we should not expect to see a VFORK_DONE, 10. If both events arrive at the same time then GDB will randomly choose one event to handle first, in some cases this will be the VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB expects that (a) the vfork child has finished, however, in this case this is not completely true, the child has finished, but GDB has not processed the event associated with the completion yet, and (b) upon seeing a VFORK_DONE GDB assumes we are remaining attached to the parent, and so resumes the parent process, 11. GDB now handles the EXECD event. In our case we are detaching from the parent, so GDB calls target_detach (see handle_vfork_child_exec_or_exit), 12. While this has been going on the vfork parent is executing, and might even exit, 13. In linux_nat_target::detach the first thing we do is stop all threads in the process we're detaching from, the result of the stop request will be cached on the lwp_info object, 14. In our case the vfork parent has exited though, so when GDB waits for the thread, instead of a stop due to signal, we instead get a thread exited status, 15. Later in the detach process we try to resume the threads just prior to making the ptrace call to actually detach (see detach_one_lwp), as part of the process to resume a thread we try to touch some registers within the thread, and before doing this GDB asserts that the thread is stopped, 16. An exited thread is not classified as stopped, and so the assert triggers! So there's two bugs I see here. The first, and most critical one here is in step #6. I think that resume_stopped_resumed_lwps should not resume a vfork parent, just like resume_lwp doesn't resume a vfork parent. With this change in place the vfork parent will remain stopped in step instead GDB will only see the EXECD/EXITED/SIGNALLED event. The problems in #9 and #10 are therefore skipped and we arrive at #11, handling the EXECD event. As the parent is still stopped #12 doesn't apply, and in #13 when we try to stop the process we will see that it is already stopped, there's no risk of the vfork parent exiting before we get to this point. And finally, in #15 we are safe to poke the process registers because it will not have exited by this point. However, I did mention two bugs. The second bug I've not yet managed to actually trigger, but I'm convinced it must exist: if we forget vforks for a moment, in step #13 above, when linux_nat_target::detach is called, we first try to stop all threads in the process GDB is detaching from. If we imagine a multi-threaded inferior with many threads, and GDB running in non-stop mode, then, if the user tries to detach there is a chance that thread could exit just as linux_nat_target::detach is entered, in which case we should be able to trigger the same assert. But, like I said, I've not (yet) managed to trigger this second bug, and even if I could, the fix would not belong in this commit, so I'm pointing this out just for completeness. There's no test included in this commit. In a couple of commits time I will expand gdb.base/foll-vfork.exp which is when this bug would be exposed. Unfortunately there are at least two other bugs (separate from the ones discussed above) that need fixing first, these will be fixed in the next commits before the gdb.base/foll-vfork.exp test is expanded. If you do want to reproduce this failure then you will for certainly need to run the gdb.base/foll-vfork.exp test in a loop as the failures are all very timing sensitive. I've found that running multiple copies in parallel makes the failure more likely to appear, I usually run ~6 copies in parallel and expect to see a failure after within 10mins.
2023-07-06Linux: Avoid pread64/pwrite64 for high memory addresses (PR gdb/30525)Pedro Alves1-10/+18
Since commit 05c06f318fd9 ("Linux: Access memory even if threads are running"), GDB prefers pread64/pwrite64 to access inferior memory instead of ptrace. That change broke reading shared libraries on SPARC64 Linux, as reported by PR gdb/30525 ("gdb cannot read shared libraries on SPARC64"). On SPARC64 Linux, surprisingly (to me), userspace shared libraries are mapped at high 64-bit addresses: (gdb) info sharedlibrary Cannot access memory at address 0xfff80001002011e0 Cannot access memory at address 0xfff80001002011d8 Cannot access memory at address 0xfff80001002011d8 From To Syms Read Shared Object Library 0xfff80001000010a0 0xfff8000100021f80 Yes (*) /lib64/ld-linux.so.2 (*): Shared library is missing debugging information. Those addresses are 64-bit addresses with the high bits set. When interpreted as signed, they're negative. The Linux kernel rejects pread64/pwrite64 if the offset argument of type off_t (a signed type) is negative, which happens if the memory address we're accessing has its high bit set. See linux/fs/read_write.c sys_pread64 and sys_pwrite64 in Linux. Thankfully, lseek does not fail in that situation. So the fix is to use the 'lseek + read|write' path if the offset would be negative. Fix this in both native GDB and GDBserver. Tested on a SPARC64 GNU/Linux and x86-64 GNU/Linux. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30525 Change-Id: I79c724f918037ea67b7396fadb521bc9d1b10dc5
2023-06-05[gdb] Fix more typosTom de Vries1-1/+1
Fix some more typos: - distinquish -> distinguish - actualy -> actually - singe -> single - frash -> frame - chid -> child - dissassembler -> disassembler - uninitalized -> uninitialized - precontidion -> precondition - regsiters -> registers - marge -> merge - sate -> state - garanteed -> guaranteed - explictly -> explicitly - prefices (nonstandard plural) -> prefixes - bondary -> boundary - formated -> formatted - ithe -> the - arrav -> array - coresponding -> corresponding - owend -> owned - fials -> fails - diasm -> disasm - ture -> true - tpye -> type There's one code change, the name of macro SIG_CODE_BONDARY_FAULT changed to SIG_CODE_BOUNDARY_FAULT. Tested on x86_64-linux.
2023-04-04gdb: make find_thread_ptid a process_stratum_target methodSimon Marchi1-7/+7
Make find_thread_ptid (the overload that takes a process_stratum_target) a method of process_stratum_target. Change-Id: Ib190a925a83c6b93e9c585dc7c6ab65efbdd8629 Reviewed-By: Tom Tromey <tom@tromey.com>
2023-03-27linux-nat: introduce pending_status_strPedro Alves1-3/+16
I noticed that some debug log output printing an lwp's pending status wasn't considering lp->waitstatus. This fixes it, by introducing a new pending_status_str function. Also fix the comment in gdb/linux-nat.h describing lwp_info::waitstatus and details the description of lwp_info::status while at it. Change-Id: I66e5c7a363d30a925b093b195d72925ce5b6b980 Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-03-09gdb, gdbserver, gdbsupport: fix whitespace issuesSimon Marchi1-1/+1
Replace spaces with tabs in a bunch of places. Change-Id: If0f87180f1d13028dc178e5a8af7882a067868b0
2023-02-24Remove struct bufferTom Tromey1-1/+0
I've long wanted to remove 'struct buffer', and thanks to Simon's earlier patch, I was finally able to do so. My feeling has been that gdb already has several decent structures available for growing strings: std::string of course, but also obstack and even objalloc from BFD and dyn-string from libiberty. The previous patches in this series removed all the uses of struct buffer, so this one can remove the code and the remaining #includes.
2023-02-14Do not cast away const in agent_run_commandTom Tromey1-5/+2
While investigating something else, I noticed some weird code in agent_run_command (use of memcpy rather than strcpy). Then I noticed that 'cmd' is used as both an in and out parameter, despite being const. Casting away const like this is bad. This patch removes the const and fixes the memcpy. I also added a static assert to assure myself that the code in gdbserver is correct -- gdbserver is passing its own buffer directly to agent_run_command. Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2023-01-01Update copyright year range in header of all files managed by GDBJoel Brobecker1-1/+1
This commit is the result of running the gdb/copyright.py script, which automated the update of the copyright year range for all source files managed by the GDB project to be updated to include year 2023.
2022-12-16Delay checking whether /proc/pid/mem is writable (PR gdb/29907)Pedro Alves1-3/+6
As of 1bcb0708f229 ("gdb/linux-nat: Check whether /proc/pid/mem is writable"), GDB checks if /proc/pid/mem is writable. This is done early at GDB startup, in order to get a consistent warning, instead of a warning that depends on whenever GDB writes to inferior memory. PR gdb/29907 points out that some build systems (like QEMU's, apparently) may call 'gdb --version' to check GDB's presence & its version on the system, and that Gentoo's build process has sandboxing which blocks the /proc/pid/mem access and thus GDB warns, which results in build fails. To help with that, this patch delays the /proc/pid/mem check until we start or attach to an inferior. Ends up potentially emiting a warning close where we already emit other ptrace- and /proc- related warnings, which just Feels Right. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29907 Change-Id: I5537653ecfbbe76a04ab035e40e59d09b4980763
2022-12-05gdb/linux-nat: add ptid parameter to linux_xfer_siginfoSimon Marchi1-4/+4
Make the inferior_ptid bubble up to linux_nat_target::xfer_partial. Change-Id: I62dbc5734c26648bb465f449c2003c73751cd812
2022-12-05gdb/linux-nat: use l linux_nat_get_siginfo in linux_xfer_siginfoSimon Marchi1-4/+2
I noticed we could reduce duplication a bit here. Change-Id: If24e54d1dac71b46f7c1f68a18a073d4c084b644
2022-12-05gdb/linux-nat: check ptrace return value in linux_nat_get_siginfoSimon Marchi1-5/+1
Not a big deal, but it seems strange to check errno instead of the ptrace return value to know whether it succeeded. Change-Id: If0a6d0280ab0e5ecb077e546af0d6fe489c5b9fd
2022-12-05gdb/linux-nat: don't memset siginfo on failure in linux_nat_get_siginfoSimon Marchi1-6/+2
No caller cares about the value of *SIGINFO on failure. It's also documented in the function doc that *SIGINFO is uninitialized (I understand "untouched") on failure. Change-Id: I5ef38a5f58e3635e109b919ddf6f827f38f1225a
2022-12-05gdb/linux-nat: bool-ify linux_nat_get_siginfoSimon Marchi1-3/+3
Change return type to bool. Change-Id: I1bf0360bfdd1b5994cd0f96c268d806f96fe51a4
2022-12-05gdb/linux-nat: use get_ptrace_pid in two spotsSimon Marchi1-10/+2
No behavior change expected. Change-Id: Ifaa64ecd619483646b024fd7c62e571e92a8eedb
2022-12-02gdb/linux-nat: add pid parameter to linux_proc_xfer_memory_partialSimon Marchi1-9/+9
Add a pid parameter to linux_proc_xfer_memory_partial, making the inferior_ptid reference bubble up close to the target_ops::xfer_partial boundary. No behavior change expected. Change-Id: I58171b00ee1bba1ea22efdbb5dcab8b1ab3aac4c
2022-11-07Don't explicitly set clone child ptrace optionsPedro Alves1-1/+0
linux_handle_extended_wait calls target_post_attach if we're handling a PTRACE_EVENT_CLONE, and libthread_db.so isn't active. target_post_attach just calls linux_init_ptrace_procfs to set the lwp's ptrace options. However, this is completely unnecessary, because, as man ptrace [1] says, options are inherited: "Flags are inherited by new tracees created and "auto-attached" via active PTRACE_O_TRACEFORK, PTRACE_O_TRACEVFORK, or PTRACE_O_TRACECLONE options." This removes the unnecessary call. [1] - https://man7.org/linux/man-pages/man2/ptrace.2.html Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I533eaa60b700f7e40760311fc0d344d0b3f19a78
2022-10-19internal_error: remove need to pass __FILE__/__LINE__Pedro Alves1-9/+5
Currently, every internal_error call must be passed __FILE__/__LINE__ explicitly, like: internal_error (__FILE__, __LINE__, "foo %d", var); The need to pass in explicit __FILE__/__LINE__ is there probably because the function predates widespread and portable variadic macros availability. We can use variadic macros nowadays, and in fact, we already use them in several places, including the related gdb_assert_not_reached. So this patch renames the internal_error function to something else, and then reimplements internal_error as a variadic macro that expands __FILE__/__LINE__ itself. The result is that we now should call internal_error like so: internal_error ("foo %d", var); Likewise for internal_warning. The patch adjusts all calls sites. 99% of the adjustments were done with a perl/sed script. The non-mechanical changes are in gdbsupport/errors.h, gdbsupport/gdb_assert.h, and gdb/gdbarch.py. Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: Ia6f372c11550ca876829e8fd85048f4502bdcf06
2022-09-21gdbsupport: convert FILEIO_* macros to an enumSimon Marchi1-3/+3
Converting from free-form macros to an enum gives a bit of type-safety. This caught places where we would assign host error numbers to what should contain a target fileio error number, for instance in target_fileio_pread. I added the FILEIO_SUCCESS enumerator, because remote.c:remote_hostio_parse_result initializes the remote_errno output variable to 0. It seems better to have an explicit enumerator than to assign a value for which there is no enumerator. I considered initializing this variable to FILEIO_EUNKNOWN instead, such that if the remote side replies with an error and omits the errno value, we'll get an errno that represents an error instead of 0 (which reprensents no error). But it's not clear what the consequences of that change would be, so I prefer to err on the side of caution and just keep the existing behavior (there is no intended change in behavior with this patch). Note that remote_hostio_parse_resul still reads blindly what the remote side sends as a target errno into this variable, so we can still end up with a nonsensical value here. It's not good, but out of the scope of this patch. Convert host_to_fileio_error and fileio_errno_to_host to return / accept a fileio_error instead of an int, and cascade the change in the whole chain that uses that. Change-Id: I454b0e3fcf0732447bc872252fa8e57d138b0e03
2022-07-26gdb/linux_nat: Write memory using ptrace if /proc/pid/mem is not writableKeith Seitz1-2/+9
Commit 05c06f318fd9a112529dfc313e6512b399a645e4 enabled GDB to access memory while threads are running. It did this by accessing /proc/PID/task/LWP/mem. Unfortunately, this interface is not implemented for writing in older kernels (such as RHEL6). This means that GDB is unable to insert breakpoints on these hosts: $ ./gdb -q gdb -ex start Reading symbols from gdb... Temporary breakpoint 1 at 0x40fdd5: file ../../src/gdb/gdb.c, line 28. Starting program: /home/rhel6/fsf/linux/gdb/gdb Warning: Cannot insert breakpoint 1. Cannot access memory at address 0x40fdd5 (gdb) Before this patch, linux_proc_xfer_memory_partial (previously called linux_proc_xfer_partial) would return TARGET_XFER_EOF if the write to /proc/PID/mem failed. [More specifically, linux_proc_xfer_partial would not "bother for one word," but the effect is the essentially same.] This status was checked by linux_nat_target::xfer_partial, which would then fallback to using ptrace to perform the operation. This is the specific hunk that removed the fallback: - xfer = linux_proc_xfer_partial (object, annex, readbuf, writebuf, - offset, len, xfered_len); - if (xfer != TARGET_XFER_EOF) - return xfer; + return linux_proc_xfer_memory_partial (readbuf, writebuf, + offset, len, xfered_len); + } return inf_ptrace_target::xfer_partial (object, annex, readbuf, writebuf, offset, len, xfered_len); This patch makes linux_nat_target::xfer_partial go straight to writing memory via ptrace if writing via /proc/pid/mem is not possible in the running kernel, enabling GDB to insert breakpoints on these older kernels. Note that a recent patch changed the return status from TARGET_XFER_EOF to TARGET_XFER_E_IO. Tested on {unix,native-gdbserver,native-extended-gdbserver}/-m{32,64} on x86_64, s390x, aarch64, and ppc64le. Change-Id: If1d884278e8c4ea71d8836bedd56e6a6c242a415
2022-07-26gdb/linux-nat: Check whether /proc/pid/mem is writablePedro Alves1-17/+88
Probe whether /proc/pid/mem is writable, by using it to write to a GDB variable. This will be used in the following patch to avoid falling back to writing to inferior memory with ptrace if /proc/pid/mem _is_ writable. Change-Id: If87eff0b46cbe5e32a583e2977a9e17d29d0ed3e