diff options
author | Pedro Alves <pedro@palves.net> | 2022-07-13 17:16:38 +0100 |
---|---|---|
committer | Pedro Alves <pedro@palves.net> | 2023-11-13 14:16:09 +0000 |
commit | 6a534f85cbdb065a307aa96fbbb670948ee135c8 (patch) | |
tree | d6b4db2e9f0f0245791f8b8bb39224cf421b293e | |
parent | 0ae5b8faded5f79d1fa5bfb257262278676d55f9 (diff) | |
download | gdb-6a534f85cbdb065a307aa96fbbb670948ee135c8.zip gdb-6a534f85cbdb065a307aa96fbbb670948ee135c8.tar.gz gdb-6a534f85cbdb065a307aa96fbbb670948ee135c8.tar.bz2 |
gdb/linux: Delete all other LWPs immediately on ptrace exec event
I noticed that on an Ubuntu 20.04 system, after a following patch
("Step over clone syscall w/ breakpoint,
TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp
was passing cleanly, but still, we'd end up with four new unexpected
GDB core dumps:
=== gdb Summary ===
# of unexpected core files 4
# of expected passes 48
That said patch is making the pre-existing
gdb.threads/step-over-exec.exp testcase (almost silently) expose a
latent problem in gdb/linux-nat.c, resulting in a GDB crash when:
#1 - a non-leader thread execs
#2 - the post-exec program stops somewhere
#3 - you kill the inferior
Instead of #3 directly, the testcase just returns, which ends up in
gdb_exit, tearing down GDB, which kills the inferior, and is thus
equivalent to #3 above.
Vis (after said patch is applied):
$ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true
...
(top-gdb) r
...
(gdb) b main
...
(gdb) r
...
Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69
69 argv0 = argv[0];
(gdb) c
Continuing.
[New Thread 0x7ffff7d89700 (LWP 2506975)]
Other going in exec.
Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28
28 foo ();
(gdb) k
...
Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
393 return m_suspend.waitstatus_pending_p;
(top-gdb) bt
#0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
#1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345
#2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564
#3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284
#4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278
#5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247
#6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864
#7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590
#8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911
...
The root of the problem is that when a non-leader LWP execs, it just
changes its tid to the tgid, replacing the pre-exec leader thread,
becoming the new leader. There's no thread exit event for the execing
thread. It's as if the old pre-exec LWP vanishes without trace. The
ptrace man page says:
"PTRACE_O_TRACEEXEC (since Linux 2.5.46)
Stop the tracee at the next execve(2). A waitpid(2) by the
tracer will return a status value such that
status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))
If the execing thread is not a thread group leader, the thread
ID is reset to thread group leader's ID before this stop.
Since Linux 3.0, the former thread ID can be retrieved with
PTRACE_GETEVENTMSG."
When the core of GDB processes an exec events, it deletes all the
threads of the inferior. But, that is too late -- deleting the thread
does not delete the corresponding LWP, so we end leaving the pre-exec
non-leader LWP stale in the LWP list. That's what leads to the crash
above -- linux_nat_target::kill iterates over all LWPs, and after the
patch in question, that code will look for the corresponding
thread_info for each LWP. For the pre-exec non-leader LWP still
listed, won't find one.
This patch fixes it, by deleting the pre-exec non-leader LWP (and
thread) from the LWP/thread lists as soon as we get an exec event out
of ptrace.
GDBserver does not need an equivalent fix, because it is already doing
this, as side effect of mourning the pre-exec process, in
gdbserver/linux-low.cc:
else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events)
{
...
/* Delete the execing process and all its threads. */
mourn (proc);
switch_to_thread (nullptr);
The crash with gdb.threads/step-over-exec.exp is not observable on
newer systems, which postdate the glibc change to move "libpthread.so"
internals to "libc.so.6", because right after the exec, GDB traps a
load event for "libc.so.6", which leads to GDB trying to open
libthread_db for the post-exec inferior, and, on such systems that
succeeds. When we load libthread_db, we call
linux_stop_and_wait_all_lwps, which, as the name suggests, stops all
lwps, and then waits to see their stops. While doing this, GDB
detects that the pre-exec stale LWP is gone, and deletes it.
If we use "catch exec" to stop right at the exec before the
"libc.so.6" load event ever happens, and issue "kill" right there,
then GDB crashes on newer systems as well. So instead of tweaking
gdb.threads/step-over-exec.exp to cover the fix, add a new
gdb.threads/threads-after-exec.exp testcase that uses "catch exec".
The test also uses the new "maint info linux-lwps" command if testing
on Linux native, which also exposes the stale LWP problem with an
unfixed GDB.
Also tweak a comment in infrun.c:follow_exec referring to how
linux-nat.c used to behave, as it would become stale otherwise.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
-rw-r--r-- | gdb/infrun.c | 8 | ||||
-rw-r--r-- | gdb/linux-nat.c | 15 | ||||
-rw-r--r-- | gdb/testsuite/gdb.threads/threads-after-exec.c | 56 | ||||
-rw-r--r-- | gdb/testsuite/gdb.threads/threads-after-exec.exp | 57 |
4 files changed, 131 insertions, 5 deletions
diff --git a/gdb/infrun.c b/gdb/infrun.c index 4c7eb9b..c60cfc0 100644 --- a/gdb/infrun.c +++ b/gdb/infrun.c @@ -1245,13 +1245,11 @@ follow_exec (ptid_t ptid, const char *exec_file_target) some other thread does the exec, and even if the main thread was stopped or already gone. We may still have non-leader threads of the process on our list. E.g., on targets that don't have thread - exit events (like remote); or on native Linux in non-stop mode if - there were only two threads in the inferior and the non-leader - one is the one that execs (and nothing forces an update of the - thread list up to here). When debugging remotely, it's best to + exit events (like remote) and nothing forces an update of the + thread list up to here. When debugging remotely, it's best to avoid extra traffic, when possible, so avoid syncing the thread list with the target, and instead go ahead and delete all threads - of the process but one that reported the event. Note this must + of the process but the one that reported the event. Note this must be done before calling update_breakpoints_after_exec, as otherwise clearing the threads' resources would reference stale thread breakpoints -- it may have been one of these threads that diff --git a/gdb/linux-nat.c b/gdb/linux-nat.c index f73e52f..97d8005 100644 --- a/gdb/linux-nat.c +++ b/gdb/linux-nat.c @@ -2001,6 +2001,21 @@ linux_handle_extended_wait (struct lwp_info *lp, int status) thread execs, it changes its tid to the tgid, and the old tgid thread might have not been resumed. */ lp->resumed = 1; + + /* All other LWPs are gone now. We'll have received a thread + exit notification for all threads other the execing one. + That one, if it wasn't the leader, just silently changes its + tid to the tgid, and the previous leader vanishes. Since + Linux 3.0, the former thread ID can be retrieved with + PTRACE_GETEVENTMSG, but since we support older kernels, don't + bother with it, and just walk the LWP list. Even with + PTRACE_GETEVENTMSG, we'd still need to lookup the + corresponding LWP object, and it would be an extra ptrace + syscall, so this way may even be more efficient. */ + for (lwp_info *other_lp : all_lwps_safe ()) + if (other_lp != lp && other_lp->ptid.pid () == lp->ptid.pid ()) + exit_lwp (other_lp); + return 0; } diff --git a/gdb/testsuite/gdb.threads/threads-after-exec.c b/gdb/testsuite/gdb.threads/threads-after-exec.c new file mode 100644 index 0000000..b3ed7ec --- /dev/null +++ b/gdb/testsuite/gdb.threads/threads-after-exec.c @@ -0,0 +1,56 @@ +/* This testcase is part of GDB, the GNU debugger. + + Copyright 2023 Free Software Foundation, Inc. + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <http://www.gnu.org/licenses/>. */ + +#include <pthread.h> +#include <unistd.h> +#include <stdlib.h> +#include <stdio.h> + +static char *program_name; + +void * +thread_execler (void *arg) +{ + /* Exec ourselves again, but with an extra argument, to avoid + infinite recursion. */ + if (execl (program_name, program_name, "1", NULL) == -1) + { + perror ("execl"); + abort (); + } + + return NULL; +} + +int +main (int argc, char **argv) +{ + pthread_t thread; + + if (argc > 1) + { + /* Getting here via execl. */ + return 0; + } + + program_name = argv[0]; + + pthread_create (&thread, NULL, thread_execler, NULL); + pthread_join (thread, NULL); + + return 0; +} diff --git a/gdb/testsuite/gdb.threads/threads-after-exec.exp b/gdb/testsuite/gdb.threads/threads-after-exec.exp new file mode 100644 index 0000000..cd8adf9 --- /dev/null +++ b/gdb/testsuite/gdb.threads/threads-after-exec.exp @@ -0,0 +1,57 @@ +# Copyright 2023 Free Software Foundation, Inc. + +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 3 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# Test that after an exec of a non-leader thread, we don't leave the +# non-leader thread listed in internal thread lists, causing problems. + +standard_testfile .c + +proc do_test { } { + if [prepare_for_testing "failed to prepare" $::testfile $::srcfile {debug pthreads}] { + return -1 + } + + if ![runto_main] { + return + } + + gdb_test "catch exec" "Catchpoint $::decimal \\(exec\\)" + + gdb_test "continue" "Catchpoint $::decimal .*" "continue until exec" + + # Confirm we only have one thread in the thread list. + gdb_test "info threads" "\\* 1\[ \t\]+\[^\r\n\]+.*" + + if {[istarget *-*-linux*] && [gdb_is_target_native]} { + # Confirm there's only one LWP in the list as well, and that + # it is bound to thread 1.1. + set inf_pid [get_inferior_pid] + gdb_test_multiple "maint info linux-lwps" "" { + -wrap -re "Thread ID *\r\n$inf_pid\.$inf_pid\.0\[ \t\]+1\.1 *" { + pass $gdb_test_name + } + } + } + + # Test that GDB is able to kill the inferior. This used to crash + # on native Linux as GDB did not dispose of the pre-exec LWP for + # the non-leader (and that LWP did not have a matching thread in + # the core thread list). + gdb_test "with confirm off -- kill" \ + "\\\[Inferior 1 (.*) killed\\\]" \ + "kill inferior" +} + +do_test |