diff options
author | Andrew Burgess <aburgess@redhat.com> | 2025-05-16 17:56:58 +0100 |
---|---|---|
committer | Andrew Burgess <aburgess@redhat.com> | 2025-09-23 18:43:47 +0100 |
commit | 8bd08ee92c4a7bf2ad9e29c4da32a276ef2257fc (patch) | |
tree | dd9d95ae2e0e49901926711fd02d1ee46760755f /gdb/python/python-internal.h | |
parent | 1ff92d0903f6d660131e93f4274e0db9d42a7f25 (diff) | |
download | binutils-8bd08ee92c4a7bf2ad9e29c4da32a276ef2257fc.zip binutils-8bd08ee92c4a7bf2ad9e29c4da32a276ef2257fc.tar.gz binutils-8bd08ee92c4a7bf2ad9e29c4da32a276ef2257fc.tar.bz2 |
gdb: crash if thread unexpectedly disappears from thread list
A bug was reported to Red Hat where GDB was crashing with an assertion
failure, the assertion message is:
../../gdb/regcache.c:432: internal-error: get_thread_regcache: Assertion `thread->state != THREAD_EXITED' failed.
The backtrace for the crash is:
#5 0x000055a21da8a880 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=problem@entry=0x55a21e289060 <internal_error_problem>, file=<optimized out>, line=<optimized out>, fmt=<optimized out>, ap=ap@entry=0x7ffec7576be0) at ../../gdb/utils.c:477
#6 0x000055a21da8aadf in internal_verror (file=<optimized out>, line=<optimized out>, fmt=<optimized out>, ap=ap@entry=0x7ffec7576be0) at ../../gdb/utils.c:503
#7 0x000055a21dcbd055 in internal_error_loc (file=file@entry=0x55a21dd33b71 "../../gdb/regcache.c", line=line@entry=432, fmt=<optimized out>) at ../../gdbsupport/errors.cc:57
#8 0x000055a21d8baaa9 in get_thread_regcache (thread=thread@entry=0x55a258de3a50) at ../../gdb/regcache.c:432
#9 0x000055a21d74fa18 in print_signal_received_reason (uiout=0x55a258b649b0, siggnal=GDB_SIGNAL_TRAP) at ../../gdb/infrun.c:9287
#10 0x000055a21d7daad9 in mi_interp::on_signal_received (this=0x55a258af5f60, siggnal=GDB_SIGNAL_TRAP) at ../../gdb/mi/mi-interp.c:372
#11 0x000055a21d76ef99 in interps_notify<void (interp::*)(gdb_signal), gdb_signal&> (method=&virtual table offset 88, this adjustment 974682) at ../../gdb/interps.c:369
#12 0x000055a21d76e58f in interps_notify_signal_received (sig=<optimized out>, sig@entry=GDB_SIGNAL_TRAP) at ../../gdb/interps.c:378
#13 0x000055a21d75074d in notify_signal_received (sig=GDB_SIGNAL_TRAP) at ../../gdb/infrun.c:6818
#14 0x000055a21d755af0 in normal_stop () at ../../gdb/gdbthread.h:432
#15 0x000055a21d768331 in fetch_inferior_event () at ../../gdb/infrun.c:4753
The user is using a build of GDB with 32-bit ARM support included, and
they gave the following description for what they were doing at the
time of the crash:
Suspended the execution of the firmware in Eclipse. The gdb was
connected to JLinkGDBServer with activated FreeRTOS awareness JLink
plugin.
So they are remote debugging with a non-gdbserver target.
Looking in normal_stop() we see this code:
/* As we're presenting a stop, and potentially removing breakpoints,
update the thread list so we can tell whether there are threads
running on the target. With target remote, for example, we can
only learn about new threads when we explicitly update the thread
list. Do this before notifying the interpreters about signal
stops, end of stepping ranges, etc., so that the "new thread"
output is emitted before e.g., "Program received signal FOO",
instead of after. */
update_thread_list ();
if (last.kind () == TARGET_WAITKIND_STOPPED && stopped_by_random_signal)
notify_signal_received (inferior_thread ()->stop_signal ());
Which accounts for the transition from frame #14 to frame #13. But it
is the update_thread_list() call which interests me. This call asks
the target (remote target in this case) for the current thread list,
and then marks threads exited based on the answer.
And so, if a (badly behaved) target (incorrectly) removes a thread
from the thread list, then the update_thread_list() call will mark the
impacted thread as exited, even if GDB is currently handling a signal
stop event for that target.
My guess for what's going on here then is this:
1. Thread receives a signal.
2. Remote target sends GDB a stop with signal packet.
3. Remote decides that the thread is going away soon, and marks the
thread as exited.
4. GDB asks for the thread list.
5. Remote sends back the thread list, which doesn't include the
event thread, as the remote things this thread has exited.
6. GDB marks the thread as exited, and then proceeds to try and
print the signal stop event for the event thread.
7. Printing the signal stop requires reading registers, which
requires a regache. We can only get a regcache for a non-exited
thread, and so GDB raises an assertion.
Using the gdbreplay test frame work I was able to reproduce this
failure using gdbserver. I create an inferior with two threads, the
main thread sends a signal to the second thread, GDB sees the signal
arrive and prints this information for the user.
Having captured the trace of this activity, I then find the thread
list reply in the log file, and modify it to remove the second thread.
Now, when I replay the modified log file I see the same assertion
complaining about an attempt to get a regcache for an exited thread.
I'm not entirely sure the best way to fix this. Clearly the problem
here is a bad remote target. But, replies from a remote target
should (in my opinion) not be considered trusted, as a consequence, we
should not be asserting based on data coming from a remote. Instead,
we should be giving warnings or errors and have GDB handle the bad
data as best it can.
This is the second attempt to fix this issue, my first patch can be
seen here:
https://inbox.sourceware.org/gdb-patches/062e438c8677e2ab28fac6183d2ea6d444cb9121.1747567717.git.aburgess@redhat.com
In the first patch I was to checking in normal_stop, immediately after
the call to update_thread_list, to see if the current thread was now
marked as exited. However CI testing showed an issue with this
approach; I was already checking for many different TARGET_WAITKIND_*
kinds where the "is the current thread exited" question didn't make
sense, and it turns out that the list of kinds in my first attempt was
already insufficient.
Rather than trying to just adding to the list, in this revised patch
I'm proposing to move the "is this thread exited" check inside the
block which handles signal stop events.
Right now, the only part of normal_stop which I know relies on the
current thread not being exited is the call to notify_signal_received,
so before calling notify_signal_received I check to see if the current
thread is now exited. If it is then I print a warning to indicate
that the thread has unexpectedly exited and that the current
command (continue/step/etc) has been cancelled, I then change the
current event type to TARGET_WAITKIND_SPURIOUS.
GDB's output now looks like this in all-stop mode:
(gdb) continue
Continuing.
[New Thread 3483690.3483693]
[Thread 3483690.3483693 exited]
warning: Thread 3483690.3483693 unexpectedly exited after non-exit event
[Switching to Thread 3483690.3483693]
(gdb)
The non-stop output is identical, except we don't switch thread (stop
events never trigger a thread switch in non-stop mode).
The include test makes use of the gdbreplay framework, and tests in
all-stop and non-stop modes. I would like to do more extensive
testing of GDB's state after the receiving the unexpected thread list,
but due to using gdbreplay for testing, this is quite hard. Many
commands, especially those looking at thread state, are likely to
trigger additional packets being sent to the remote, which causes
gdbreplay to bail out as the new packet doesn't match the original
recorded state. However, I really don't think it is a good idea to
change gdbserver in order to "fake" this error case, so for now, using
gdbreplay is the best idea I have.
Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2366461
Diffstat (limited to 'gdb/python/python-internal.h')
0 files changed, 0 insertions, 0 deletions