gdb: handle main thread exiting during detach

Overview ======== Consider the following situation, GDB is in non-stop mode, the main thread is running while a second thread is stopped. The user has the second thread selected as the current thread and asks GDB to detach. At the exact moment of detach the main thread exits. This situation currently causes crashes, assertion failures, and unexpected errors to be reported from GDB for both native and remote targets. This commit addresses this situation for native and remote targets. There are a number of different fixes, but all are required in order to get this functionality working correct for native and remote targets. Native Linux Target =================== For the native Linux target, detaching is handled in the function linux_nat_target::detach. In here we call stop_wait_callback for each thread, and it is this callback that will spot that the main thread has exited. GDB then detaches from everything except the main thread by calling detach_callback. After this the first problem is this assert: /* Only the initial process should be left right now. */ gdb_assert (num_lwps (pid) == 1); The num_lwps call will return 0 as the main thread has exited and all of the other threads have now been detached. I fix this by changing the assert to allow for 0 or 1 lwps at this point. As the 0 case can only happen in non-stop mode, the assert becomes: gdb_assert (num_lwps (pid) == 1 || (target_is_non_stop_p () && num_lwps (pid) == 0)); The next problem is that we do: main_lwp = find_lwp_pid (ptid_t (pid)); and then proceed assuming that main_lwp is not nullptr. In the case that the main thread has exited though, main_lwp will be nullptr. However, we only need main_lwp so that GDB can detach from the thread. If the main thread has exited, and GDB has already detached from every other thread, then GDB has finished detaching, GDB can skip the calls that try to detach from the main thread, and then tell the user that the detach was a success. For Remote Targets ================== On remote targets there are two problems. First is that when the exit occurs during the early phase of the detach, we see the stop notification arrive while GDB is removing the breakpoints ahead of the detach. The 'set debug remote on' trace looks like this: [remote] Sending packet: $z0,7f1648fe0241,1#35 [remote] Notification received: Stop:W0;process:2a0ac8 # At this point an unpatched gdbserver segfaults, and the connection # is broken. A patched gdbserver continues as below... [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff00a8,1#68 [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff132f,1#6b [remote] Packet received: E01 [remote] Sending packet: $D;2a0ac8#3e [remote] Packet received: E01 I was originally running into Segmentation Faults, from within gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This function calls current_process() and then dereferences the result to find the breakpoint list. However, in our case, the current process has already exited, and so the current_process() call returns nullptr. At the point of failure, the gdbserver backtrace looks like this: #0 0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982 #1 0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093 #2 0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372 #3 0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498 ... The problem is that, as a result non-stop being on, the process exiting is only reported back to GDB after the request to remove a breakpoint has been sent. Clearly gdbserver can't actually remove this breakpoint -- the process has already exited -- so I think the best solution is for gdbserver just to report an error, which is what I've done. The second problem I ran into was on the gdb side, as the process has already exited, but GDB has not yet acknowledged the exit event, the detach -- the 'D' packet in the above trace -- fails. This was being reported to the user with a 'Can't detach process' error. As the test actually calls detach from Python code, this error was then becoming a Python exception. Though clearly the detach has returned an error, and so, maybe, having GDB throw an error would be fine, I think in this case, there's a good argument that the remote error can be ignored -- if GDB tries to detach and gets back an error, and if there's a pending exit event for the pid we tried to detach, then just ignore the error and pretend the detach worked fine. We could possibly check for a pending exit event before sending the detach packet, however, I believe that it might be possible (in non-stop mode) for the stop notification to arrive after the detach is sent, but before gdbserver has started processing the detach. In this case we would still need to check for pending stop events after seeing the detach fail, so I figure there's no point having two checks -- we just send the detach request, and if it fails, check to see if the process has already exited. Testing ======= In order to test this issue I needed to ensure that the exit event arrives at the same time as the detach call. The window of opportunity for getting the exit to arrive is so small I've never managed to trigger this in real use -- I originally spotted this issue while working on another patch, which did manage to trigger this issue. However, if we trigger both the exit and the detach from a single Python function then we never return to GDB's event loop, as such GDB never processes the exit event, and so the first time GDB gets a chance to see the exit is during the detach call. And so that is the approach I've taken for testing this patch. Tested-By: Kevin Buettner <kevinb@redhat.com> Approved-By: Kevin Buettner <kevinb@redhat.com>
author: Andrew Burgess <aburgess@redhat.com> 2023-07-17 11:31:11 +0100
committer: Andrew Burgess <aburgess@redhat.com> 2023-10-26 18:11:54 +0100
commit: fd492bf1e20a0410189307c5063acc444c4bd5d3 (patch)
tree: f18848410756f7cc7ec1c0d96795587893fe3cc3 /gdbserver
parent: 743d3f0945c625ce5647130b506eeb6940dfc12e (diff)
download: gdb-fd492bf1e20a0410189307c5063acc444c4bd5d3.zip
gdb-fd492bf1e20a0410189307c5063acc444c4bd5d3.tar.gz
gdb-fd492bf1e20a0410189307c5063acc444c4bd5d3.tar.bz2
1 files changed, 11 insertions, 0 deletions
diff --git a/gdbserver/mem-break.cc b/gdbserver/mem-break.cc
index 897b9a2..3bee8bc8 100644
--- a/gdbserver/mem-break.cc
+++ b/gdbserver/mem-break.cc
@@ -976,6 +976,17 @@ static struct gdb_breakpoint *
 find_gdb_breakpoint (char z_type, CORE_ADDR addr, int kind)
 {
   struct process_info *proc = current_process ();
+
+  /* In some situations the current process exits, we inform GDB, but
+     before GDB can acknowledge that the process has exited GDB tries to
+     detach from the inferior.  As part of the detach process GDB will
+     remove all breakpoints, which means we can end up here when the
+     current process has already exited and so PROC is nullptr.  In this
+     case just claim we can't find (and so delete) the breakpoint, GDB
+     will ignore this error during detach.  */
+  if (proc == nullptr)
+    return nullptr;
+
   struct breakpoint *bp;
   enum bkpt_type type = Z_packet_to_bkpt_type (z_type);
author	Andrew Burgess <aburgess@redhat.com>	2023-07-17 11:31:11 +0100
committer	Andrew Burgess <aburgess@redhat.com>	2023-10-26 18:11:54 +0100
commit	fd492bf1e20a0410189307c5063acc444c4bd5d3 (patch)
tree	f18848410756f7cc7ec1c0d96795587893fe3cc3 /gdbserver
parent	743d3f0945c625ce5647130b506eeb6940dfc12e (diff)
download	gdb-fd492bf1e20a0410189307c5063acc444c4bd5d3.zip gdb-fd492bf1e20a0410189307c5063acc444c4bd5d3.tar.gz gdb-fd492bf1e20a0410189307c5063acc444c4bd5d3.tar.bz2