riscv-gnu-toolchain/binutils.git - Unnamed repository; edit this file 'description' to name the repository.

diff options

author	Andrew Burgess <aburgess@redhat.com>	2025-05-16 17:56:58 +0100
committer	Andrew Burgess <aburgess@redhat.com>	2025-09-23 18:43:47 +0100
commit	8bd08ee92c4a7bf2ad9e29c4da32a276ef2257fc (patch)
tree	dd9d95ae2e0e49901926711fd02d1ee46760755f /gdb/python/python-internal.h
parent	1ff92d0903f6d660131e93f4274e0db9d42a7f25 (diff)
download	binutils-8bd08ee92c4a7bf2ad9e29c4da32a276ef2257fc.zip binutils-8bd08ee92c4a7bf2ad9e29c4da32a276ef2257fc.tar.gz binutils-8bd08ee92c4a7bf2ad9e29c4da32a276ef2257fc.tar.bz2

gdb: crash if thread unexpectedly disappears from thread list

A bug was reported to Red Hat where GDB was crashing with an assertion failure, the assertion message is: ../../gdb/regcache.c:432: internal-error: get_thread_regcache: Assertion `thread->state != THREAD_EXITED' failed. The backtrace for the crash is: #5 0x000055a21da8a880 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_list_tag __va_list_tag *) (problem=problem@entry=0x55a21e289060 <internal_error_problem>, file=<optimized out>, line=<optimized out>, fmt=<optimized out>, ap=ap@entry=0x7ffec7576be0) at ../../gdb/utils.c:477 #6 0x000055a21da8aadf in internal_verror (file=<optimized out>, line=<optimized out>, fmt=<optimized out>, ap=ap@entry=0x7ffec7576be0) at ../../gdb/utils.c:503 #7 0x000055a21dcbd055 in internal_error_loc (file=file@entry=0x55a21dd33b71 "../../gdb/regcache.c", line=line@entry=432, fmt=<optimized out>) at ../../gdbsupport/errors.cc:57 #8 0x000055a21d8baaa9 in get_thread_regcache (thread=thread@entry=0x55a258de3a50) at ../../gdb/regcache.c:432 #9 0x000055a21d74fa18 in print_signal_received_reason (uiout=0x55a258b649b0, siggnal=GDB_SIGNAL_TRAP) at ../../gdb/infrun.c:9287 #10 0x000055a21d7daad9 in mi_interp::on_signal_received (this=0x55a258af5f60, siggnal=GDB_SIGNAL_TRAP) at ../../gdb/mi/mi-interp.c:372 #11 0x000055a21d76ef99 in interps_notify<void (interp::*)(gdb_signal), gdb_signal&> (method=&virtual table offset 88, this adjustment 974682) at ../../gdb/interps.c:369 #12 0x000055a21d76e58f in interps_notify_signal_received (sig=<optimized out>, sig@entry=GDB_SIGNAL_TRAP) at ../../gdb/interps.c:378 #13 0x000055a21d75074d in notify_signal_received (sig=GDB_SIGNAL_TRAP) at ../../gdb/infrun.c:6818 #14 0x000055a21d755af0 in normal_stop () at ../../gdb/gdbthread.h:432 #15 0x000055a21d768331 in fetch_inferior_event () at ../../gdb/infrun.c:4753 The user is using a build of GDB with 32-bit ARM support included, and they gave the following description for what they were doing at the time of the crash: Suspended the execution of the firmware in Eclipse. The gdb was connected to JLinkGDBServer with activated FreeRTOS awareness JLink plugin. So they are remote debugging with a non-gdbserver target. Looking in normal_stop() we see this code: /* As we're presenting a stop, and potentially removing breakpoints, update the thread list so we can tell whether there are threads running on the target. With target remote, for example, we can only learn about new threads when we explicitly update the thread list. Do this before notifying the interpreters about signal stops, end of stepping ranges, etc., so that the "new thread" output is emitted before e.g., "Program received signal FOO", instead of after. */ update_thread_list (); if (last.kind () == TARGET_WAITKIND_STOPPED && stopped_by_random_signal) notify_signal_received (inferior_thread ()->stop_signal ()); Which accounts for the transition from frame #14 to frame #13. But it is the update_thread_list() call which interests me. This call asks the target (remote target in this case) for the current thread list, and then marks threads exited based on the answer. And so, if a (badly behaved) target (incorrectly) removes a thread from the thread list, then the update_thread_list() call will mark the impacted thread as exited, even if GDB is currently handling a signal stop event for that target. My guess for what's going on here then is this: 1. Thread receives a signal. 2. Remote target sends GDB a stop with signal packet. 3. Remote decides that the thread is going away soon, and marks the thread as exited. 4. GDB asks for the thread list. 5. Remote sends back the thread list, which doesn't include the event thread, as the remote things this thread has exited. 6. GDB marks the thread as exited, and then proceeds to try and print the signal stop event for the event thread. 7. Printing the signal stop requires reading registers, which requires a regache. We can only get a regcache for a non-exited thread, and so GDB raises an assertion. Using the gdbreplay test frame work I was able to reproduce this failure using gdbserver. I create an inferior with two threads, the main thread sends a signal to the second thread, GDB sees the signal arrive and prints this information for the user. Having captured the trace of this activity, I then find the thread list reply in the log file, and modify it to remove the second thread. Now, when I replay the modified log file I see the same assertion complaining about an attempt to get a regcache for an exited thread. I'm not entirely sure the best way to fix this. Clearly the problem here is a bad remote target. But, replies from a remote target should (in my opinion) not be considered trusted, as a consequence, we should not be asserting based on data coming from a remote. Instead, we should be giving warnings or errors and have GDB handle the bad data as best it can. This is the second attempt to fix this issue, my first patch can be seen here: https://inbox.sourceware.org/gdb-patches/062e438c8677e2ab28fac6183d2ea6d444cb9121.1747567717.git.aburgess@redhat.com In the first patch I was to checking in normal_stop, immediately after the call to update_thread_list, to see if the current thread was now marked as exited. However CI testing showed an issue with this approach; I was already checking for many different TARGET_WAITKIND_* kinds where the "is the current thread exited" question didn't make sense, and it turns out that the list of kinds in my first attempt was already insufficient. Rather than trying to just adding to the list, in this revised patch I'm proposing to move the "is this thread exited" check inside the block which handles signal stop events. Right now, the only part of normal_stop which I know relies on the current thread not being exited is the call to notify_signal_received, so before calling notify_signal_received I check to see if the current thread is now exited. If it is then I print a warning to indicate that the thread has unexpectedly exited and that the current command (continue/step/etc) has been cancelled, I then change the current event type to TARGET_WAITKIND_SPURIOUS. GDB's output now looks like this in all-stop mode: (gdb) continue Continuing. [New Thread 3483690.3483693] [Thread 3483690.3483693 exited] warning: Thread 3483690.3483693 unexpectedly exited after non-exit event [Switching to Thread 3483690.3483693] (gdb) The non-stop output is identical, except we don't switch thread (stop events never trigger a thread switch in non-stop mode). The include test makes use of the gdbreplay framework, and tests in all-stop and non-stop modes. I would like to do more extensive testing of GDB's state after the receiving the unexpected thread list, but due to using gdbreplay for testing, this is quite hard. Many commands, especially those looking at thread state, are likely to trigger additional packets being sent to the remote, which causes gdbreplay to bail out as the new packet doesn't match the original recorded state. However, I really don't think it is a good idea to change gdbserver in order to "fake" this error case, so for now, using gdbreplay is the best idea I have. Bug: https://bugzilla.redhat.com/show_bug.cgi?id=2366461

Diffstat (limited to 'gdb/python/python-internal.h')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: