aboutsummaryrefslogtreecommitdiff
path: root/gdb/linux-nat.c
AgeCommit message (Collapse)AuthorFilesLines
2023-11-17gdb: remove regcache's address spaceSimon Marchi1-39/+17
While looking at the regcache code, I noticed that the address space (passed to regcache when constructing it, and available through regcache::aspace) wasn't relevant for the regcache itself. Callers of regcache::aspace use that method because it appears to be a convenient way of getting the address space for a thread, if you already have the regcache. But there is always another way to get the address space, as the callers pretty much always know which thread they are dealing with. The regcache code itself doesn't use the address space. This patch removes anything related to address_space from the regcache code, and updates callers to get it from the thread in context. This removes a bit of unnecessary complexity from the regcache code. The current get_thread_arch_regcache function gets an address_space for the given thread using the target_thread_address_space function (which calls the target_ops::thread_address_space method). This suggest that there might have been the intention of supporting per-thread address spaces. But digging through the history, I did not find any such case. Maybe this method was just added because we needed a way to get an address space from a ptid (because constructing a regcache required an address space), and this seemed like the right way to do it, I don't know. The only implementations of thread_address_space and process_stratum_target::thread_address_space and linux_nat_target::thread_address_space, which essentially just return the inferior's address space. And thread_address_space is only used in the current get_thread_arch_regcache, which gets removed. So, I think that the thread_address_space target method can be removed, and we can assume that it's fine to use the inferior's address space everywhere. Callers of regcache::aspace are updated to get the address space from the relevant inferior, either using some context they already know about, or in last resort using the current global context. So, to summarize: - remove everything in regcache related to address spaces - in particular, remove get_thread_arch_regcache, and rename get_thread_arch_aspace_regcache to get_thread_arch_regcache - remove target_ops::thread_address_space, and target_thread_address_space - adjust all users of regcache::aspace to get the address space another way Change-Id: I04fd41b22c83fe486522af7851c75bcfb31c88c7
2023-11-13Implement GDB_THREAD_OPTION_EXIT support for native LinuxPedro Alves1-13/+31
This implements support for the new GDB_THREAD_OPTION_EXIT thread option for native Linux. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ia69fc0b9b96f9af7de7cefc1ddb1fba9bbb0bb90 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338
2023-11-13Move deleting thread on TARGET_WAITKIND_THREAD_EXITED to corePedro Alves1-7/+14
Currently, infrun assumes that when TARGET_WAITKIND_THREAD_EXITED is reported, the corresponding GDB thread has already been removed from the GDB thread list. Later in the series, that will no longer work, as infrun will need to refer to the thread's thread_info when it processes TARGET_WAITKIND_THREAD_EXITED. As preparation, this patch makes deleting the GDB thread responsibility of infrun, instead of the target. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I013d87f61ffc9aaca49f0d6ce2a43e3ea69274de
2023-11-13Thread options & clone events (native Linux)Pedro Alves1-0/+7
This commit teaches the native Linux target about the GDB_THREAD_OPTION_CLONE thread option. It's actually simpler to just continue reporting all clone events unconditionally to the core. There's never any harm in reporting a clone event when the option is disabled. All we need to do is to report support for the option, otherwise GDB falls back to use target_thread_events(). Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: If90316e2dcd0c61d0fefa0d463c046011698acf9
2023-11-13Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONEDPedro Alves1-120/+132
(A good chunk of the problem statement in the commit log below is Andrew's, adjusted for a different solution, and for covering displaced stepping too. The testcase is mostly Andrew's too.) This commit addresses bugs gdb/19675 and gdb/27830, which are about stepping over a breakpoint set at a clone syscall instruction, one is about displaced stepping, and the other about in-line stepping. Currently, when a new thread is created through a clone syscall, GDB sets the new thread running. With 'continue' this makes sense (assuming no schedlock): - all-stop mode, user issues 'continue', all threads are set running, a newly created thread should also be set running. - non-stop mode, user issues 'continue', other pre-existing threads are not affected, but as the new thread is (sort-of) a child of the thread the user asked to run, it makes sense that the new threads should be created in the running state. Similarly, if we are stopped at the clone syscall, and there's no software breakpoint at this address, then the current behaviour is fine: - all-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). While stepping the thread of interest all the other threads will be allowed to continue. A newly created thread will be set running, and then stopped once the thread of interest has completed its step. - non-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). Other threads might be running or stopped, but as with the continue case above, the new thread will be created running. The only possible issue here is that the new thread will be left running after the initial thread has completed its stepi. The user would need to manually select the thread and interrupt it, this might not be what the user expects. However, this is not something this commit tries to change. The problem then is what happens when we try to step over a clone syscall if there is a breakpoint at the syscall address. - For both all-stop and non-stop modes, with in-line stepping: + user issues 'stepi', + [non-stop mode only] GDB stops all threads. In all-stop mode all threads are already stopped. + GDB removes s/w breakpoint at syscall address, + GDB single steps just the thread of interest, all other threads are left stopped, + New thread is created running, + Initial thread completes its step, + [non-stop mode only] GDB resumes all threads that it previously stopped. There are two problems in the in-line stepping scenario above: 1. The new thread might pass through the same code that the initial thread is in (i.e. the clone syscall code), in which case it will fail to hit the breakpoint in clone as this was removed so the first thread can single step, 2. The new thread might trigger some other stop event before the initial thread reports its step completion. If this happens we end up triggering an assertion as GDB assumes that only the thread being stepped should stop. The assert looks like this: infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. - For both all-stop and non-stop modes, with displaced stepping: + user issues 'stepi', + GDB starts the displaced step, moves thread's PC to the out-of-line scratch pad, maybe adjusts registers, + GDB single steps the thread of interest, [non-stop mode only] all other threads are left as they were, either running or stopped. In all-stop, all other threads are left stopped. + New thread is created running, + Initial thread completes its step, GDB re-adjusts its PC, restores/releases scratchpad, + [non-stop mode only] GDB resumes the thread, now past its breakpoint. + [all-stop mode only] GDB resumes all threads. There is one problem with the displaced stepping scenario above: 3. When the parent thread completed its step, GDB adjusted its PC, but did not adjust the child's PC, thus that new child thread will continue execution in the scratch pad, invoking undefined behavior. If you're lucky, you see a crash. If unlucky, the inferior gets silently corrupted. What is needed is for GDB to have more control over whether the new thread is created running or not. Issue #1 above requires that the new thread not be allowed to run until the breakpoint has been reinserted. The only way to guarantee this is if the new thread is held in a stopped state until the single step has completed. Issue #3 above requires that GDB is informed of when a thread clones itself, and of what is the child's ptid, so that GDB can fixup both the parent and the child. When looking for solutions to this problem I considered how GDB handles fork/vfork as these have some of the same issues. The main difference between fork/vfork and clone is that the clone events are not reported back to core GDB. Instead, the clone event is handled automatically in the target code and the child thread is immediately set running. Note we have support for requesting thread creation events out of the target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported for the new/child thread. That would be sufficient to address in-line stepping (issue #1), but not for displaced-stepping (issue #3). To handle displaced-stepping, we need an event that is reported to the _parent_ of the clone, as the information about the displaced step is associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED includes no indication of which thread is the parent that spawned the new child. In fact, for some targets, like e.g., Windows, it would be impossible to know which thread that was, as thread creation there doesn't work by "cloning". The solution implemented here is to model clone on fork/vfork, and introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except that we end up with a new thread in the same process, instead of a new thread of a new process. Like FORKED and VFORKED, THREAD_CLONED waitstatuses have a child_ptid property, and the child is held stopped until GDB explicitly resumes it. This addresses the in-line stepping case (issues #1 and #2). The infrun code that handles displaced stepping fixup for the child after a fork/vfork event is thus reused for THREAD_CLONE, with some minimal conditions added, addressing the displaced stepping case (issue #3). The native Linux backend is adjusted to unconditionally report TARGET_WAITKIND_THREAD_CLONED events to the core. Following the follow_fork model in core GDB, we introduce a target_follow_clone target method, which is responsible for making the new clone child visible to the rest of GDB. Subsequent patches will add clone events support to the remote protocol and gdbserver. displaced_step_in_progress_thread becomes unused with this patch, but a new use will reappear later in the series. To avoid deleting it and readding it back, this patch marks it with attribute unused, and the latter patch removes the attribute again. We need to do this because the function is static, and with no callers, the compiler would warn, (error with -Werror), breaking the build. This adds a new gdb.threads/stepi-over-clone.exp testcase, which exercises stepping over a clone syscall, with displaced stepping vs inline stepping, and all-stop vs non-stop. We already test stepping over clone syscalls with gdb.base/step-over-syscall.exp, but this test uses pthreads, while the other test uses raw clone, and this one is more thorough. The testcase passes on native GNU/Linux, but fails against GDBserver. GDBserver will be fixed by a later patch in the series. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2023-11-13gdb/linux: Delete all other LWPs immediately on ptrace exec eventPedro Alves1-0/+15
I noticed that on an Ubuntu 20.04 system, after a following patch ("Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd end up with four new unexpected GDB core dumps: === gdb Summary === # of unexpected core files 4 # of expected passes 48 That said patch is making the pre-existing gdb.threads/step-over-exec.exp testcase (almost silently) expose a latent problem in gdb/linux-nat.c, resulting in a GDB crash when: #1 - a non-leader thread execs #2 - the post-exec program stops somewhere #3 - you kill the inferior Instead of #3 directly, the testcase just returns, which ends up in gdb_exit, tearing down GDB, which kills the inferior, and is thus equivalent to #3 above. Vis (after said patch is applied): $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true ... (top-gdb) r ... (gdb) b main ... (gdb) r ... Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 69 argv0 = argv[0]; (gdb) c Continuing. [New Thread 0x7ffff7d89700 (LWP 2506975)] Other going in exec. Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 28 foo (); (gdb) k ... Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 393 return m_suspend.waitstatus_pending_p; (top-gdb) bt #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 #3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 #4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 #5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590 #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 ... The root of the problem is that when a non-leader LWP execs, it just changes its tid to the tgid, replacing the pre-exec leader thread, becoming the new leader. There's no thread exit event for the execing thread. It's as if the old pre-exec LWP vanishes without trace. The ptrace man page says: "PTRACE_O_TRACEEXEC (since Linux 2.5.46) Stop the tracee at the next execve(2). A waitpid(2) by the tracer will return a status value such that status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) If the execing thread is not a thread group leader, the thread ID is reset to thread group leader's ID before this stop. Since Linux 3.0, the former thread ID can be retrieved with PTRACE_GETEVENTMSG." When the core of GDB processes an exec events, it deletes all the threads of the inferior. But, that is too late -- deleting the thread does not delete the corresponding LWP, so we end leaving the pre-exec non-leader LWP stale in the LWP list. That's what leads to the crash above -- linux_nat_target::kill iterates over all LWPs, and after the patch in question, that code will look for the corresponding thread_info for each LWP. For the pre-exec non-leader LWP still listed, won't find one. This patch fixes it, by deleting the pre-exec non-leader LWP (and thread) from the LWP/thread lists as soon as we get an exec event out of ptrace. GDBserver does not need an equivalent fix, because it is already doing this, as side effect of mourning the pre-exec process, in gdbserver/linux-low.cc: else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) { ... /* Delete the execing process and all its threads. */ mourn (proc); switch_to_thread (nullptr); The crash with gdb.threads/step-over-exec.exp is not observable on newer systems, which postdate the glibc change to move "libpthread.so" internals to "libc.so.6", because right after the exec, GDB traps a load event for "libc.so.6", which leads to GDB trying to open libthread_db for the post-exec inferior, and, on such systems that succeeds. When we load libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps, and then waits to see their stops. While doing this, GDB detects that the pre-exec stale LWP is gone, and deletes it. If we use "catch exec" to stop right at the exec before the "libc.so.6" load event ever happens, and issue "kill" right there, then GDB crashes on newer systems as well. So instead of tweaking gdb.threads/step-over-exec.exp to cover the fix, add a new gdb.threads/threads-after-exec.exp testcase that uses "catch exec". The test also uses the new "maint info linux-lwps" command if testing on Linux native, which also exposes the stale LWP problem with an unfixed GDB. Also tweak a comment in infrun.c:follow_exec referring to how linux-nat.c used to behave, as it would become stale otherwise. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
2023-11-13Add "maint info linux-lwps" commandAndrew Burgess1-0/+46
This adds a maintenance command that lets you list all the LWPs under control of the linux-nat target. For example: (gdb) maint info linux-lwps LWP Ptid Thread ID 560948.561047.0 None 560948.560948.0 1.1 This shows that "560948.561047.0" LWP doesn't map to any thread_info object, which is bogus. We'll be using this in a testcase in a following patch. Co-Authored-By: Pedro Alves <pedro@palves.net> Change-Id: Ic4e9e123385976e5cd054391990124b7a20fb3f5
2023-10-26gdb: handle main thread exiting during detachAndrew Burgess1-7/+17
Overview ======== Consider the following situation, GDB is in non-stop mode, the main thread is running while a second thread is stopped. The user has the second thread selected as the current thread and asks GDB to detach. At the exact moment of detach the main thread exits. This situation currently causes crashes, assertion failures, and unexpected errors to be reported from GDB for both native and remote targets. This commit addresses this situation for native and remote targets. There are a number of different fixes, but all are required in order to get this functionality working correct for native and remote targets. Native Linux Target =================== For the native Linux target, detaching is handled in the function linux_nat_target::detach. In here we call stop_wait_callback for each thread, and it is this callback that will spot that the main thread has exited. GDB then detaches from everything except the main thread by calling detach_callback. After this the first problem is this assert: /* Only the initial process should be left right now. */ gdb_assert (num_lwps (pid) == 1); The num_lwps call will return 0 as the main thread has exited and all of the other threads have now been detached. I fix this by changing the assert to allow for 0 or 1 lwps at this point. As the 0 case can only happen in non-stop mode, the assert becomes: gdb_assert (num_lwps (pid) == 1 || (target_is_non_stop_p () && num_lwps (pid) == 0)); The next problem is that we do: main_lwp = find_lwp_pid (ptid_t (pid)); and then proceed assuming that main_lwp is not nullptr. In the case that the main thread has exited though, main_lwp will be nullptr. However, we only need main_lwp so that GDB can detach from the thread. If the main thread has exited, and GDB has already detached from every other thread, then GDB has finished detaching, GDB can skip the calls that try to detach from the main thread, and then tell the user that the detach was a success. For Remote Targets ================== On remote targets there are two problems. First is that when the exit occurs during the early phase of the detach, we see the stop notification arrive while GDB is removing the breakpoints ahead of the detach. The 'set debug remote on' trace looks like this: [remote] Sending packet: $z0,7f1648fe0241,1#35 [remote] Notification received: Stop:W0;process:2a0ac8 # At this point an unpatched gdbserver segfaults, and the connection # is broken. A patched gdbserver continues as below... [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff00a8,1#68 [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff132f,1#6b [remote] Packet received: E01 [remote] Sending packet: $D;2a0ac8#3e [remote] Packet received: E01 I was originally running into Segmentation Faults, from within gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This function calls current_process() and then dereferences the result to find the breakpoint list. However, in our case, the current process has already exited, and so the current_process() call returns nullptr. At the point of failure, the gdbserver backtrace looks like this: #0 0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982 #1 0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093 #2 0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372 #3 0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498 ... The problem is that, as a result non-stop being on, the process exiting is only reported back to GDB after the request to remove a breakpoint has been sent. Clearly gdbserver can't actually remove this breakpoint -- the process has already exited -- so I think the best solution is for gdbserver just to report an error, which is what I've done. The second problem I ran into was on the gdb side, as the process has already exited, but GDB has not yet acknowledged the exit event, the detach -- the 'D' packet in the above trace -- fails. This was being reported to the user with a 'Can't detach process' error. As the test actually calls detach from Python code, this error was then becoming a Python exception. Though clearly the detach has returned an error, and so, maybe, having GDB throw an error would be fine, I think in this case, there's a good argument that the remote error can be ignored -- if GDB tries to detach and gets back an error, and if there's a pending exit event for the pid we tried to detach, then just ignore the error and pretend the detach worked fine. We could possibly check for a pending exit event before sending the detach packet, however, I believe that it might be possible (in non-stop mode) for the stop notification to arrive after the detach is sent, but before gdbserver has started processing the detach. In this case we would still need to check for pending stop events after seeing the detach fail, so I figure there's no point having two checks -- we just send the detach request, and if it fails, check to see if the process has already exited. Testing ======= In order to test this issue I needed to ensure that the exit event arrives at the same time as the detach call. The window of opportunity for getting the exit to arrive is so small I've never managed to trigger this in real use -- I originally spotted this issue while working on another patch, which did manage to trigger this issue. However, if we trigger both the exit and the detach from a single Python function then we never return to GDB's event loop, as such GDB never processes the exit event, and so the first time GDB gets a chance to see the exit is during the detach call. And so that is the approach I've taken for testing this patch. Tested-By: Kevin Buettner <kevinb@redhat.com> Approved-By: Kevin Buettner <kevinb@redhat.com>
2023-10-10gdb: remove target_gdbarchSimon Marchi1-4/+6
This function is just a wrapper around the current inferior's gdbarch. I find that having that wrapper just obscures where the arch is coming from, and that it's often used as "I don't know which arch to use so I'll use this magical target_gdbarch function that gets me an arch" when the arch should in fact come from something in the context (a thread, objfile, symbol, etc). I think that removing it and inlining `current_inferior ()->arch ()` everywhere will make it a bit clearer where that arch comes from and will trigger people into reflecting whether this is the right place to get the arch or not. Change-Id: I79f14b4e4934c88f91ca3a3155f5fc3ea2fadf6b Reviewed-By: John Baldwin <jhb@FreeBSD.org> Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-09-20Remove explanatory comments from includesTom Tromey1-7/+7
I noticed a comment by an include and remembered that I think these don't really provide much value -- sometimes they are just editorial, and sometimes they are obsolete. I think it's better to just remove them. Tested by rebuilding. Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-08-23gdb: centralize "[Thread ...exited]" notificationsPedro Alves1-7/+1
Currently, each target backend is responsible for printing "[Thread ...exited]" before deleting a thread. This leads to unnecessary differences between targets, like e.g. with the remote target, we never print such messages, even though we do print "[New Thread ...]". E.g., debugging the gdb.threads/attach-many-short-lived-threads.exp with gdbserver, letting it run for a bit, and then pressing Ctrl-C, we currently see: (gdb) c Continuing. ^C[New Thread 3850398.3887449] [New Thread 3850398.3887500] [New Thread 3850398.3887551] [New Thread 3850398.3887602] [New Thread 3850398.3887653] ... Thread 1 "attach-many-sho" received signal SIGINT, Interrupt. 0x00007ffff7e6a23f in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fffffffda80, rem=rem@entry=0x7fffffffda80) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 78 in ../sysdeps/unix/sysv/linux/clock_nanosleep.c (gdb) Above, we only see "New Thread" notifications, even though threads were deleted. After this patch, we'll see: (gdb) c Continuing. ^C[Thread 3558643.3577053 exited] [Thread 3558643.3577104 exited] [Thread 3558643.3577155 exited] [Thread 3558643.3579603 exited] ... [New Thread 3558643.3597415] [New Thread 3558643.3600015] [New Thread 3558643.3599965] ... Thread 1 "attach-many-sho" received signal SIGINT, Interrupt. 0x00007ffff7e6a23f in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=req@entry=0x7fffffffda80, rem=rem@entry=0x7fffffffda80) at ../sysdeps/unix/sysv/linux/clock_nanosleep.c:78 78 in ../sysdeps/unix/sysv/linux/clock_nanosleep.c (gdb) q This commit fixes this by moving the thread exit printing to common code instead, triggered from within delete_thread (or rather, set_thread_exited). There's one wrinkle, though. While most targest want to print: [Thread ... exited] the Windows target wants to print: [Thread ... exited with code <exit_code>] ... and sometimes wants to suppress the notification for the main thread. To address that, this commits adds a delete_thread_with_code function, only used by that target (so far). This fix was originally posted as part of a larger series: https://inbox.sourceware.org/gdb-patches/20221212203101.1034916-1-pedro@palves.net/ But didn't really need to be part of that series. In order to get this fix merged sooner, I (Andrew Burgess) have rebased this commit outside of the original series. Any bugs introduced while splitting this patch out and rebasing, are entirely my own. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30129 Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
2023-07-23gdb: two changes to linux_nat_debug_printf calls in linux-nat.cAndrew Burgess1-7/+7
This commit adjusts some of the debug output in linux-nat.c, but makes no other functional changes to GDB. In resume_lwp I've added the word "sibling" to one of the debug messages. All the other debug messages in this function talk about operating on the sibling thread, so I think it makes sense, for consistency, if the message I've updated also talks about the sibling thread. In resume_stopped_resumed_lwps I've reordered the condition checks so that the vfork-parent check now happens after the checks for whether the thread is already resumed or not. This makes no functional difference to GDB, but does, I think, mean we see more helpful debug messages first. Consider the situation where a vfork-parent thread is already resumed, and resume_stopped_resumed_lwps is called. Previously the message saying that the thread was not being resumed due to being a vfork-parent, was printed. This might give the impression that the thread is left in a not resumed state, which is misleading. After this change we now get a message saying that the thread is not being resumed due to it not being stopped (i.e. is already resumed). With this message the already resumed nature of the thread is much clearer. I found this change helpful when debugging some vfork related issues.
2023-07-17gdb: additional debug output in infrun.c and linux-nat.cAndrew Burgess1-4/+19
While working on some of the recent patches relating to vfork handling I found this debug output very helpful, I'd like to propose adding this into GDB. With debug turned off there should be no user visible changes after this commit.
2023-07-17gdb: don't restart vfork parent while waiting for child to finishAndrew Burgess1-1/+8
While working on a later patch, which changes gdb.base/foll-vfork.exp, I noticed that sometimes I would hit this assert: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. I eventually tracked it down to a combination of schedule-multiple mode being on, target-non-stop being off, follow-fork-mode being set to child, and some bad timing. The failing case is pretty simple, a single threaded application performs a vfork, the child process then execs some other application while the parent process (once the vfork child has completed its exec) just exits. As best I understand things, here's what happens when things go wrong: 1. The parent process performs a vfork, GDB sees the VFORKED event and creates an inferior and thread for the vfork child, 2. GDB resumes the vfork child process. As schedule-multiple is on and target-non-stop is off, this is translated into a request to start all processes (see user_visible_resume_ptid), 3. In the linux-nat layer we spot that one of the threads we are about to start is a vfork parent, and so don't start that thread (see resume_lwp), the vfork child thread is resumed, 4. GDB waits for the next event, eventually entering linux_nat_target::wait, which in turn calls linux_nat_wait_1, 5. In linux_nat_wait_1 we eventually call resume_stopped_resumed_lwps, this should restart threads that have stopped but don't actually have anything interesting to report. 6. Unfortunately, resume_stopped_resumed_lwps doesn't check for vfork parents like resume_lwp does, so at this point the vfork parent is resumed. This feels like the start of the bug, and this is where I'm proposing to fix things, but, resuming the vfork parent isn't the worst thing in the world because.... 7. As the vfork child is still alive the kernel holds the vfork parent stopped, 8. Eventually the child performs its exec and GDB is sent and EXECD event. However, because the parent is resumed, as soon as the child performs its exec the vfork parent also sends a VFORK_DONE event to GDB, 9. Depending on timing both of these events might seem to arrive in GDB at the same time. Normally GDB expects to see the EXECD or EXITED/SIGNALED event from the vfork child before getting the VFORK_DONE in the parent. We know this because it is as a result of the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see handle_vfork_child_exec_or_exit for details). Further the comment in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that when we remain attached to the child (not the parent) we should not expect to see a VFORK_DONE, 10. If both events arrive at the same time then GDB will randomly choose one event to handle first, in some cases this will be the VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB expects that (a) the vfork child has finished, however, in this case this is not completely true, the child has finished, but GDB has not processed the event associated with the completion yet, and (b) upon seeing a VFORK_DONE GDB assumes we are remaining attached to the parent, and so resumes the parent process, 11. GDB now handles the EXECD event. In our case we are detaching from the parent, so GDB calls target_detach (see handle_vfork_child_exec_or_exit), 12. While this has been going on the vfork parent is executing, and might even exit, 13. In linux_nat_target::detach the first thing we do is stop all threads in the process we're detaching from, the result of the stop request will be cached on the lwp_info object, 14. In our case the vfork parent has exited though, so when GDB waits for the thread, instead of a stop due to signal, we instead get a thread exited status, 15. Later in the detach process we try to resume the threads just prior to making the ptrace call to actually detach (see detach_one_lwp), as part of the process to resume a thread we try to touch some registers within the thread, and before doing this GDB asserts that the thread is stopped, 16. An exited thread is not classified as stopped, and so the assert triggers! So there's two bugs I see here. The first, and most critical one here is in step #6. I think that resume_stopped_resumed_lwps should not resume a vfork parent, just like resume_lwp doesn't resume a vfork parent. With this change in place the vfork parent will remain stopped in step instead GDB will only see the EXECD/EXITED/SIGNALLED event. The problems in #9 and #10 are therefore skipped and we arrive at #11, handling the EXECD event. As the parent is still stopped #12 doesn't apply, and in #13 when we try to stop the process we will see that it is already stopped, there's no risk of the vfork parent exiting before we get to this point. And finally, in #15 we are safe to poke the process registers because it will not have exited by this point. However, I did mention two bugs. The second bug I've not yet managed to actually trigger, but I'm convinced it must exist: if we forget vforks for a moment, in step #13 above, when linux_nat_target::detach is called, we first try to stop all threads in the process GDB is detaching from. If we imagine a multi-threaded inferior with many threads, and GDB running in non-stop mode, then, if the user tries to detach there is a chance that thread could exit just as linux_nat_target::detach is entered, in which case we should be able to trigger the same assert. But, like I said, I've not (yet) managed to trigger this second bug, and even if I could, the fix would not belong in this commit, so I'm pointing this out just for completeness. There's no test included in this commit. In a couple of commits time I will expand gdb.base/foll-vfork.exp which is when this bug would be exposed. Unfortunately there are at least two other bugs (separate from the ones discussed above) that need fixing first, these will be fixed in the next commits before the gdb.base/foll-vfork.exp test is expanded. If you do want to reproduce this failure then you will for certainly need to run the gdb.base/foll-vfork.exp test in a loop as the failures are all very timing sensitive. I've found that running multiple copies in parallel makes the failure more likely to appear, I usually run ~6 copies in parallel and expect to see a failure after within 10mins.
2023-07-06Linux: Avoid pread64/pwrite64 for high memory addresses (PR gdb/30525)Pedro Alves1-10/+18
Since commit 05c06f318fd9 ("Linux: Access memory even if threads are running"), GDB prefers pread64/pwrite64 to access inferior memory instead of ptrace. That change broke reading shared libraries on SPARC64 Linux, as reported by PR gdb/30525 ("gdb cannot read shared libraries on SPARC64"). On SPARC64 Linux, surprisingly (to me), userspace shared libraries are mapped at high 64-bit addresses: (gdb) info sharedlibrary Cannot access memory at address 0xfff80001002011e0 Cannot access memory at address 0xfff80001002011d8 Cannot access memory at address 0xfff80001002011d8 From To Syms Read Shared Object Library 0xfff80001000010a0 0xfff8000100021f80 Yes (*) /lib64/ld-linux.so.2 (*): Shared library is missing debugging information. Those addresses are 64-bit addresses with the high bits set. When interpreted as signed, they're negative. The Linux kernel rejects pread64/pwrite64 if the offset argument of type off_t (a signed type) is negative, which happens if the memory address we're accessing has its high bit set. See linux/fs/read_write.c sys_pread64 and sys_pwrite64 in Linux. Thankfully, lseek does not fail in that situation. So the fix is to use the 'lseek + read|write' path if the offset would be negative. Fix this in both native GDB and GDBserver. Tested on a SPARC64 GNU/Linux and x86-64 GNU/Linux. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30525 Change-Id: I79c724f918037ea67b7396fadb521bc9d1b10dc5
2023-06-05[gdb] Fix more typosTom de Vries1-1/+1
Fix some more typos: - distinquish -> distinguish - actualy -> actually - singe -> single - frash -> frame - chid -> child - dissassembler -> disassembler - uninitalized -> uninitialized - precontidion -> precondition - regsiters -> registers - marge -> merge - sate -> state - garanteed -> guaranteed - explictly -> explicitly - prefices (nonstandard plural) -> prefixes - bondary -> boundary - formated -> formatted - ithe -> the - arrav -> array - coresponding -> corresponding - owend -> owned - fials -> fails - diasm -> disasm - ture -> true - tpye -> type There's one code change, the name of macro SIG_CODE_BONDARY_FAULT changed to SIG_CODE_BOUNDARY_FAULT. Tested on x86_64-linux.
2023-04-04gdb: make find_thread_ptid a process_stratum_target methodSimon Marchi1-7/+7
Make find_thread_ptid (the overload that takes a process_stratum_target) a method of process_stratum_target. Change-Id: Ib190a925a83c6b93e9c585dc7c6ab65efbdd8629 Reviewed-By: Tom Tromey <tom@tromey.com>
2023-03-27linux-nat: introduce pending_status_strPedro Alves1-3/+16
I noticed that some debug log output printing an lwp's pending status wasn't considering lp->waitstatus. This fixes it, by introducing a new pending_status_str function. Also fix the comment in gdb/linux-nat.h describing lwp_info::waitstatus and details the description of lwp_info::status while at it. Change-Id: I66e5c7a363d30a925b093b195d72925ce5b6b980 Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-03-09gdb, gdbserver, gdbsupport: fix whitespace issuesSimon Marchi1-1/+1
Replace spaces with tabs in a bunch of places. Change-Id: If0f87180f1d13028dc178e5a8af7882a067868b0
2023-02-24Remove struct bufferTom Tromey1-1/+0
I've long wanted to remove 'struct buffer', and thanks to Simon's earlier patch, I was finally able to do so. My feeling has been that gdb already has several decent structures available for growing strings: std::string of course, but also obstack and even objalloc from BFD and dyn-string from libiberty. The previous patches in this series removed all the uses of struct buffer, so this one can remove the code and the remaining #includes.
2023-02-14Do not cast away const in agent_run_commandTom Tromey1-5/+2
While investigating something else, I noticed some weird code in agent_run_command (use of memcpy rather than strcpy). Then I noticed that 'cmd' is used as both an in and out parameter, despite being const. Casting away const like this is bad. This patch removes the const and fixes the memcpy. I also added a static assert to assure myself that the code in gdbserver is correct -- gdbserver is passing its own buffer directly to agent_run_command. Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2023-01-01Update copyright year range in header of all files managed by GDBJoel Brobecker1-1/+1
This commit is the result of running the gdb/copyright.py script, which automated the update of the copyright year range for all source files managed by the GDB project to be updated to include year 2023.
2022-12-16Delay checking whether /proc/pid/mem is writable (PR gdb/29907)Pedro Alves1-3/+6
As of 1bcb0708f229 ("gdb/linux-nat: Check whether /proc/pid/mem is writable"), GDB checks if /proc/pid/mem is writable. This is done early at GDB startup, in order to get a consistent warning, instead of a warning that depends on whenever GDB writes to inferior memory. PR gdb/29907 points out that some build systems (like QEMU's, apparently) may call 'gdb --version' to check GDB's presence & its version on the system, and that Gentoo's build process has sandboxing which blocks the /proc/pid/mem access and thus GDB warns, which results in build fails. To help with that, this patch delays the /proc/pid/mem check until we start or attach to an inferior. Ends up potentially emiting a warning close where we already emit other ptrace- and /proc- related warnings, which just Feels Right. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29907 Change-Id: I5537653ecfbbe76a04ab035e40e59d09b4980763
2022-12-05gdb/linux-nat: add ptid parameter to linux_xfer_siginfoSimon Marchi1-4/+4
Make the inferior_ptid bubble up to linux_nat_target::xfer_partial. Change-Id: I62dbc5734c26648bb465f449c2003c73751cd812
2022-12-05gdb/linux-nat: use l linux_nat_get_siginfo in linux_xfer_siginfoSimon Marchi1-4/+2
I noticed we could reduce duplication a bit here. Change-Id: If24e54d1dac71b46f7c1f68a18a073d4c084b644
2022-12-05gdb/linux-nat: check ptrace return value in linux_nat_get_siginfoSimon Marchi1-5/+1
Not a big deal, but it seems strange to check errno instead of the ptrace return value to know whether it succeeded. Change-Id: If0a6d0280ab0e5ecb077e546af0d6fe489c5b9fd
2022-12-05gdb/linux-nat: don't memset siginfo on failure in linux_nat_get_siginfoSimon Marchi1-6/+2
No caller cares about the value of *SIGINFO on failure. It's also documented in the function doc that *SIGINFO is uninitialized (I understand "untouched") on failure. Change-Id: I5ef38a5f58e3635e109b919ddf6f827f38f1225a
2022-12-05gdb/linux-nat: bool-ify linux_nat_get_siginfoSimon Marchi1-3/+3
Change return type to bool. Change-Id: I1bf0360bfdd1b5994cd0f96c268d806f96fe51a4
2022-12-05gdb/linux-nat: use get_ptrace_pid in two spotsSimon Marchi1-10/+2
No behavior change expected. Change-Id: Ifaa64ecd619483646b024fd7c62e571e92a8eedb
2022-12-02gdb/linux-nat: add pid parameter to linux_proc_xfer_memory_partialSimon Marchi1-9/+9
Add a pid parameter to linux_proc_xfer_memory_partial, making the inferior_ptid reference bubble up close to the target_ops::xfer_partial boundary. No behavior change expected. Change-Id: I58171b00ee1bba1ea22efdbb5dcab8b1ab3aac4c
2022-11-07Don't explicitly set clone child ptrace optionsPedro Alves1-1/+0
linux_handle_extended_wait calls target_post_attach if we're handling a PTRACE_EVENT_CLONE, and libthread_db.so isn't active. target_post_attach just calls linux_init_ptrace_procfs to set the lwp's ptrace options. However, this is completely unnecessary, because, as man ptrace [1] says, options are inherited: "Flags are inherited by new tracees created and "auto-attached" via active PTRACE_O_TRACEFORK, PTRACE_O_TRACEVFORK, or PTRACE_O_TRACECLONE options." This removes the unnecessary call. [1] - https://man7.org/linux/man-pages/man2/ptrace.2.html Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I533eaa60b700f7e40760311fc0d344d0b3f19a78
2022-10-19internal_error: remove need to pass __FILE__/__LINE__Pedro Alves1-9/+5
Currently, every internal_error call must be passed __FILE__/__LINE__ explicitly, like: internal_error (__FILE__, __LINE__, "foo %d", var); The need to pass in explicit __FILE__/__LINE__ is there probably because the function predates widespread and portable variadic macros availability. We can use variadic macros nowadays, and in fact, we already use them in several places, including the related gdb_assert_not_reached. So this patch renames the internal_error function to something else, and then reimplements internal_error as a variadic macro that expands __FILE__/__LINE__ itself. The result is that we now should call internal_error like so: internal_error ("foo %d", var); Likewise for internal_warning. The patch adjusts all calls sites. 99% of the adjustments were done with a perl/sed script. The non-mechanical changes are in gdbsupport/errors.h, gdbsupport/gdb_assert.h, and gdb/gdbarch.py. Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: Ia6f372c11550ca876829e8fd85048f4502bdcf06
2022-09-21gdbsupport: convert FILEIO_* macros to an enumSimon Marchi1-3/+3
Converting from free-form macros to an enum gives a bit of type-safety. This caught places where we would assign host error numbers to what should contain a target fileio error number, for instance in target_fileio_pread. I added the FILEIO_SUCCESS enumerator, because remote.c:remote_hostio_parse_result initializes the remote_errno output variable to 0. It seems better to have an explicit enumerator than to assign a value for which there is no enumerator. I considered initializing this variable to FILEIO_EUNKNOWN instead, such that if the remote side replies with an error and omits the errno value, we'll get an errno that represents an error instead of 0 (which reprensents no error). But it's not clear what the consequences of that change would be, so I prefer to err on the side of caution and just keep the existing behavior (there is no intended change in behavior with this patch). Note that remote_hostio_parse_resul still reads blindly what the remote side sends as a target errno into this variable, so we can still end up with a nonsensical value here. It's not good, but out of the scope of this patch. Convert host_to_fileio_error and fileio_errno_to_host to return / accept a fileio_error instead of an int, and cascade the change in the whole chain that uses that. Change-Id: I454b0e3fcf0732447bc872252fa8e57d138b0e03
2022-07-26gdb/linux_nat: Write memory using ptrace if /proc/pid/mem is not writableKeith Seitz1-2/+9
Commit 05c06f318fd9a112529dfc313e6512b399a645e4 enabled GDB to access memory while threads are running. It did this by accessing /proc/PID/task/LWP/mem. Unfortunately, this interface is not implemented for writing in older kernels (such as RHEL6). This means that GDB is unable to insert breakpoints on these hosts: $ ./gdb -q gdb -ex start Reading symbols from gdb... Temporary breakpoint 1 at 0x40fdd5: file ../../src/gdb/gdb.c, line 28. Starting program: /home/rhel6/fsf/linux/gdb/gdb Warning: Cannot insert breakpoint 1. Cannot access memory at address 0x40fdd5 (gdb) Before this patch, linux_proc_xfer_memory_partial (previously called linux_proc_xfer_partial) would return TARGET_XFER_EOF if the write to /proc/PID/mem failed. [More specifically, linux_proc_xfer_partial would not "bother for one word," but the effect is the essentially same.] This status was checked by linux_nat_target::xfer_partial, which would then fallback to using ptrace to perform the operation. This is the specific hunk that removed the fallback: - xfer = linux_proc_xfer_partial (object, annex, readbuf, writebuf, - offset, len, xfered_len); - if (xfer != TARGET_XFER_EOF) - return xfer; + return linux_proc_xfer_memory_partial (readbuf, writebuf, + offset, len, xfered_len); + } return inf_ptrace_target::xfer_partial (object, annex, readbuf, writebuf, offset, len, xfered_len); This patch makes linux_nat_target::xfer_partial go straight to writing memory via ptrace if writing via /proc/pid/mem is not possible in the running kernel, enabling GDB to insert breakpoints on these older kernels. Note that a recent patch changed the return status from TARGET_XFER_EOF to TARGET_XFER_E_IO. Tested on {unix,native-gdbserver,native-extended-gdbserver}/-m{32,64} on x86_64, s390x, aarch64, and ppc64le. Change-Id: If1d884278e8c4ea71d8836bedd56e6a6c242a415
2022-07-26gdb/linux-nat: Check whether /proc/pid/mem is writablePedro Alves1-17/+88
Probe whether /proc/pid/mem is writable, by using it to write to a GDB variable. This will be used in the following patch to avoid falling back to writing to inferior memory with ptrace if /proc/pid/mem _is_ writable. Change-Id: If87eff0b46cbe5e32a583e2977a9e17d29d0ed3e
2022-07-22Change target_ops::async to accept boolTom Tromey1-3/+3
This changes the parameter of target_ops::async from int to bool. Regression tested on x86-64 Fedora 34.
2022-06-28gdb+gdbserver/Linux: avoid reading registers while going through shellPedro Alves1-0/+4
For every stop, Linux GDB and GDBserver save the stopped thread's PC, in lwp->stop_pc. This is done in save_stop_reason, in both gdb/linux-nat.c and gdbserver/linux-low.cc. However, while we're going through the shell after "run", in startup_inferior, we shouldn't be reading registers, as we haven't yet determined the target's architecture -- the shell's architecture may not even be the same as the final inferior's. In gdb/linux-nat.c, lwp->stop_pc is only needed when the thread has stopped for a breakpoint, and since when going through the shell, no breakpoint is going to hit, we could simply teach save_stop_reason to only record the stop pc when the thread stopped for a breakpoint. However, in gdbserver/linux-low.cc, lwp->stop_pc is used in more cases than breakpoint hits (e.g., it's used in tracepoints & the "while-stepping" feature). So to avoid GDB vs GDBserver divergence, we apply the same approach to both implementations. We set a flag in the inferior (process in GDBserver) whenever it is being nursed through the shell, and when that flag is set, save_stop_reason bails out early. While going through the shell, we'll only ever get process exits (normal or signalled), random signals, and exec events, so nothing is lost. Change-Id: If0f01831514d3a74d17efd102875de7d2c6401ad
2022-05-26gdb/linux-nat: xfer_memory_partial return E_IO on errorLancelot SIX1-1/+1
When accessing /proc/PID/mem, if pread64/pwrite64/read/write encounters an error and return -1, linux_proc_xfer_memory_partial return TARGET_XFER_EOF. I think it should return TARGET_XFER_E_IO in this case. TARGET_XFER_EOF is returned when pread64/pwrite64/read/frite returns 0, which indicates that the address space is gone and the whole process has exited or execed. This patch makes this change. Regression tested on x86_64-linux-gnu. Change-Id: I6030412459663b8d7933483fdda22a6c2c5d7221
2022-05-13Constify target_pid_to_exec_fileTom Tromey1-1/+1
This changes target_pid_to_exec_file and target_ops::pid_to_exec_file to return a "const char *". I couldn't build many of these targets, but did examine the code by hand -- also, as this only affects the return type, it's normally pretty safe. This brings gdb and gdbserver a bit closer, and allows for the removal of a const_cast as well.
2022-04-29Slightly tweak and clarify target_resume's interfacePedro Alves1-19/+10
The current target_resume interface is a bit odd & non-intuitive. I've found myself explaining it a couple times the recent past, while reviewing patches that assumed STEP/SIGNAL always applied to the passed in PTID. It goes like this today: - if the passed in PTID is a thread, then the step/signal request is for that thread. - otherwise, if PTID is a wildcard (all threads or all threads of process), the step/signal request is for inferior_ptid, and PTID indicates which set of threads run free. Because GDB always switches the current thread to "leader" thread being resumed/stepped/signalled, we can simplify this a bit to: - step/signal are always for inferior_ptid. - PTID indicates the set of threads that run free. Still not ideal, but it's a minimal change and at least there are no special cases this way. That's what this patch does. It renames the PTID parameter to SCOPE_PTID, adds some assertions to target_resume, and tweaks target_resume's description. In addition, it also renames PTID to SCOPE_PTID in the remote and linux-nat targets, and simplifies their implementation a little bit. Other targets could do the same, but they don't have to. Change-Id: I02a2ec2ab3a3e9b191de1e9a84f55c17cab7daaf
2022-03-31gdb/linux-nat: remove check based on current_inferior in ↵Simon Marchi1-13/+4
linux_handle_extended_wait The check removed by this patch, using current_inferior, looks wrong. When debugging multiple inferiors with the Linux native target and linux_handle_extended_wait is called, there's no guarantee about which is the current inferior. The vfork-done event we receive could be for any inferior. If the vfork-done event is for a non-current inferior, we end up wrongfully ignoring it. As a result, the core never processes a TARGET_WAITKIND_VFORK_DONE event, program_space::breakpoints_not_allowed is never cleared, and breakpoints are never reinserted. However, because the Linux native target decided to ignore the event, it resumed the thread - while breakpoints out. And that's bad. The proposed fix is to remove this check. Always report vfork-done events and let infrun's logic decide if it should be ignored. We don't save much cycles by filtering the event here. Add a test that replicates the situation described above. See comments in the test for more details. Change-Id: Ibe33c1716c3602e847be6c2093120696f2286fbf
2022-03-29Unify gdb printf functionsTom Tromey1-4/+4
Now that filtered and unfiltered output can be treated identically, we can unify the printf family of functions. This is done under the name "gdb_printf". Most of this patch was written by script.
2022-03-29Remove some uses of printf_unfilteredTom Tromey1-2/+2
A number of spots call printf_unfiltered only because they are in code that should not be interrupted by the pager. However, I believe these cases are all handled by infrun's blanket ban on paging, and so can be converted to the default (_filtered) API. After this patch, I think all the remaining _unfiltered calls are ones that really ought to be. A few -- namely in complete_command -- could be replaced by a scoped assignment to pagination_enabled, but for the remainder, the code seems simple enough like this.
2022-03-10Re-add zombie leader on exit, gdb/linuxPedro Alves1-27/+80
The current zombie leader detection code in linux-nat.c has a race -- if a multi-threaded inferior exits just before check_zombie_leaders finds that the leader is now zombie via checking /proc/PID/status, check_zombie_leaders deletes the leader, assuming we won't get an event for that exit (which we won't in some scenarios, but not in this one). That might seem mostly harmless, but it has some downsides: - later when we continue pulling events out of the kernel, we will collect the exit event of the non-leader threads, and once we see the last lwp in our list exit, we return _that_ lwp's exit code as whole-process exit code to infrun, instead of the leader's exit code. - this can cause a hang in stop_all_threads in infrun.c. Say there are 2 threads in the process. stop_all_threads stops each of those threads, and then waits for two stop or exit events, one for each thread. If the whole process exits, and check_zombie_leaders hits the false-positive case, linux-nat.c will only return one event to GDB (the whole-process exit returned when we see the last thread, the non-leader thread, exit), making stop_all_threads hang forever waiting for a second event that will never come. However, in this false-positive scenario, where the whole process is exiting, as opposed to just the leader (with pthread_exit(), for example), we _will_ get an exit event shortly for the leader, after we collect the exit event of all the other non-leader threads. Or put another way, we _always_ get an event for the leader after we see it become zombie. I tried a number of approaches to fix this: #1 - My first thought to address the race was to make GDB always report the whole-process exit status for the leader thread, not for whatever is the last lwp in the list. We _always_ get a final exit (or exec) event for the leader, and when the race triggers, we're not collecting it. #2 - My second thought was to try to plug the race in the first place. I thought of making GDB call waitpid/WNOHANG for all non-leader threads immediately when the zombie leader is detected, assuming there would be an exit event pending for each of them waiting to be collected. Turns out that that doesn't work -- you can see the leader become zombie _before_ the kernel kills all other threads. Waitpid in that small time window returns 0, indicating no-event. Thankfully we hit that race window all the time, which avoided trading one race for another. Looking at the non-leader thread's status in /proc doesn't help either, the threads are still in running state for a bit, for the same reason. #3 - My next attempt, which seemed promising, was to synchronously stop and wait for the stop for each of the non-leader threads. For the scenario in question, this will collect all the exit statuses of the non-leader threads. Then, if we are left with only the zombie leader in the lwp list, it means we either have a normal while-process exit or an exec, in which case we should not delete the leader. If _only_ the leader exited, like in gdb.threads/leader-exit.exp, then after pausing threads, we will still have at least one live non-leader thread in the list, and so we delete the leader lwp. I got this working and polished, and it was only after staring at the kernel code to convince myself that this would really work (and it would, for the scenario I considered), that I realized I had failed to account for one scenario -- if any non-leader thread is _already_ stopped when some thread triggers a group exit, like e.g., if you have some threads stopped and then resume just one thread with scheduler-locking or non-stop, and that thread exits the process. I also played with PTRACE_EVENT_EXIT, see if it would help in any way to plug the race, and I couldn't find a way that it would result in any practical difference compared to looking at /proc/PID/status, with respect to having a race. So I concluded that there's no way to plug the race, we just have to deal with it. Which means, going back to approach #1. That is the approach taken by this patch. Change-Id: I6309fd4727da8c67951f9cea557724b77e8ee979
2022-03-10gdb: Reorganize linux_nat_filter_eventPedro Alves1-35/+40
Reorganize linux-nat.c:linux_nat_filter_event such that all the handling for events for LWPs not in the LWP list is together. This helps make a following patch clearer. The comments and debug messages have also been tweaked - the end goal is to have them synchronized with the gdbserver counterpart. Change-Id: I8586d8dcd76d8bd3795145e3056fc660e3b2cd22
2022-02-22inf-ptrace: Add an event_pipe to be used for async mode in subclasses.John Baldwin1-99/+24
Subclasses of inf_ptrace_target have to opt-in to using the event_pipe by implementing the can_async_p and async methods. For subclasses which do this, inf_ptrace_target provides is_async_p, async_wait_fd and closes the pipe in the close target method. inf_ptrace_target also provides wrapper routines around the event pipe (async_file_open, async_file_close, async_file_flush, and async_file_mark) for use in target methods such as async. inf_ptrace_target also exports a static async_file_mark_if_open function which can be used in SIGCHLD signal handlers.
2022-02-22Enable async mode in the target in attach_cmd.John Baldwin1-3/+0
If the attach target supports async mode, enable it after the attach target's ::attach method returns.
2022-02-22Don't enable async mode at the end of target ::resume methods.John Baldwin1-3/+0
Now that target_resume always enables async mode after target::resume returns, these calls are redundant. The other place that target resume methods are invoked outside of target_resume are as the beneath target in record_full_wait_1. In this case, async mode should already be enabled when supported by the target before the resume method is invoked due to the following: In general, targets which support async mode run as async until ::wait returns TARGET_WAITKIND_NO_RESUMED to indicate that there are no unwaited for children (either they have exited or are stopped). When that occurs, the loop in wait_one disables async mode. Later if a stopped child is resumed, async mode is re-enabled in do_target_resume before waiting for the next event. In the case of record_full_wait_1, this function is invoked from the ::wait target method when fetching an event. If the underlying target supports async mode, then an earlier call to do_target_resume to resume the child reporting an event in the loop in record_full_wait_1 would have already enabled async mode before ::wait was invoked. In addition, nothing in the code executed in the loop in record_full_wait_1 disables async mode. Async mode is only disabled higher in the call stack in wait_one after ::wait returns. It is also true that async mode can be disabled by an INF_EXEC_COMPLETE event passed to inferior_event_handle, but all of the places that invoke that are in the gdb core which is "above" a target ::wait method. Note that there is an earlier call to enable async mode in linux_nat_target::resume. That call also marks the async event pipe to report an existing event after enabling async mode, so it needs to stay.
2022-02-22gdb linux-nat: Convert linux_nat_event_pipe to the event_pipe class.John Baldwin1-43/+16
Use event_pipe from gdbsupport in place of the existing file descriptor array.
2022-02-18gdb: remove newlines from some linux_nat_debug_printf callsSimon Marchi1-3/+3
Change-Id: I80328fab7096221356864b5a4fb30858b48d2c10