aboutsummaryrefslogtreecommitdiff
path: root/gdb/infrun.c
AgeCommit message (Collapse)AuthorFilesLines
2024-08-26Change message when reaching end of reverse history.Guinevere Larsen1-1/+8
In a record session, when we move backward, GDB switches from normal execution to simulation. Moving forward again, the emulation continues until the end of the reverse history. When the end is reached, the execution stops, and a warning message is shown. This message has been modified to indicate that the forward emulation has reached the end, but the execution can continue as normal, and the recording will also continue. Before this patch, the warning message shown in that case was the same as in the reverse case. This meant that when the end of history was reached in either backward or forward emulation, the same message was displayed: "No more reverse-execution history." This message has changed for these two cases. Backward emulation: "Reached end of recorded history; stopping. Backward execution from here not possible." Forward emulation: "Reached end of recorded history; stopping. Following forward execution will be added to history." The reason for this change is that the initial message was deceiving, for the forward case, making the user believe that forward debugging could not continue. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31224 Reviewed-By: Markus T. Metzger <markus.t.metzger@intel.com> (btrace) Approved-By: Guinevere Larsen <blarsen@redhat.com>
2024-08-19gdb: Fix printing frame when reversing out of a recursive call with clangGuinevere Larsen1-1/+2
Commit bf2813aff8f2988ad3d53e819a0415abf295c91f introduced some logic to not refresh the step frame id if it detects that the inferior is reverse stepping out of a recursive call, so that we would still print frame information once the inferior stops. However, that logic was overly specific, and wouldn't be hit for inferiors compiled with clang because clang adds line table entries that aren't statements, making process_event_stop_test go through a different branch on the relevant if statement. Fix this by not making the code that detects "reversing out of a recursion" an else clause to the previous if, but a standalone if block. Approved-by: Kevin Buettner <kevinb@redhat.com>
2024-08-05[gdb] Notice when stepping into different fileTom de Vries1-3/+15
Consider the following test-case: ... $ cat -n test.c 1 int var; 2 3 int 4 foo (void) 5 { 6 var = 1; 7 #include "test.h" 8 } 9 10 int 11 main () 12 { 13 return foo (); 14 } $ cat -n test.h 1 return 1; $ gcc test.c -g ... When stepping through the test-case, gdb doesn't make it explicit that line 1 is not in test.c: ... Temporary breakpoint 1, main () at test.c:13 13 return foo (); (gdb) step foo () at test.c:6 6 var = 1; (gdb) n 1 return 1; (gdb) 8 } (gdb) ... which makes it easy to misinterpret the output. This is with the default "print frame-info" == auto, with documented behaviour [1]: ... stepi will switch between source-line and source-and-location depending on the program counter. ... What is actually implemented is that source-line is used unless stepping into or out of a function. The problem can be worked around by using "set print frame-info source-and-location", but that's a bit verbose. Instead, change the behaviour of "print frame-info" == auto to also use source-and-location when stepping into another file, which gets us: ... (gdb) n foo () at test.h:1 1 return 1; ... Tested on x86_64-linux. Reviewed-By: Kevin Buettner <kevinb@redhat.com> Reviewed-By: Kévin Le Gouguec <legouguec@adacore.com> PR gdb/32011 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32011 [1] https://sourceware.org/gdb/current/onlinedocs/gdb.html/Print-Settings.html#index-set-print-frame_002dinfo
2024-07-15gdb: pass program space to no_shared_librariesSimon Marchi1-1/+1
Make the current program space reference bubble up one level. Pass `current_program_space` everywhere, except in some cases where we can get the pspace another way, and it's relatively obvious that it's the same as the current program space. Change-Id: Id86b79f1e44f92a398f49d137d57457174dfa96d Approved-By: Tom Tromey <tom@tromey.com> Reviewed-By: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
2024-07-15gdb: split no_shared_libraries, command vs implementationSimon Marchi1-1/+1
The `no_shared_libraries` function is currently used to implement the `nosharedlibrary` command, but it also used internally by other functions. This does not make a very good internal API. Add the `no_shared_libraries_command` function to implement the CLI command. Remove the unused parameters from `no_shared_libraries`. Remove the `from_tty` parameter of `target_pre_inferior`, since it's now unused. Change-Id: I4fcba5ee1e0f7d250aab1a7b62b9ea16265fe962 Approved-By: Tom Tromey <tom@tromey.com> Reviewed-By: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
2024-05-30gdb: remove unused includes in utils.hSimon Marchi1-0/+1
Remove some includes reported as unused by clangd. Add some includes in other files that were previously relying on the transitive include. Change-Id: Ibdd0a998b04d21362a20d0ca8e5267e21e2e133e
2024-04-26Remove unnecessary get_current_frame calls from infrun.cBernd Edlinger1-20/+4
Since the frame variable is now a frame_info_ptr, the issue with the dangling frame pointer is apparently no longer there. So remove the re-fetch code and the corresponding meanwhile misleading comments. Approved-By: Tom Tromey <tom@tromey.com>
2024-04-23gdb: remove unused include in infrun.cSimon Marchi1-1/+1
Remove the gdbcmd.h, which is reported as unused by clangd. Add cli/cli-cmds.h instead, to get access to `cmdlist` and friends. Change-Id: Ic0c60d2f6d3618f1bd9fd80b95ffd7c33c692a04
2024-03-26gdb, gdbserver, gdbsupport: remove includes of early headersSimon Marchi1-1/+0
Now that defs.h, server.h and common-defs.h are included via the `-include` option, it is no longer necessary for source files to include them. Remove all the inclusions of these files I could find. Update the generation scripts where relevant. Change-Id: Ia026cff269c1b7ae7386dd3619bc9bb6a5332837 Approved-By: Pedro Alves <pedro@palves.net>
2024-03-25gdb: fix b/p conditions with infcalls in multi-threaded inferiorsAndrew Burgess1-14/+50
This commit fixes bug PR 28942, that is, creating a conditional breakpoint in a multi-threaded inferior, where the breakpoint condition includes an inferior function call. Currently, when a user tries to create such a breakpoint, then GDB will fail with: (gdb) break infcall-from-bp-cond-single.c:61 if (return_true ()) Breakpoint 2 at 0x4011fa: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-single.c, line 61. (gdb) continue Continuing. [New Thread 0x7ffff7c5d700 (LWP 2460150)] [New Thread 0x7ffff745c700 (LWP 2460151)] [New Thread 0x7ffff6c5b700 (LWP 2460152)] [New Thread 0x7ffff645a700 (LWP 2460153)] [New Thread 0x7ffff5c59700 (LWP 2460154)] Error in testing breakpoint condition: Couldn't get registers: No such process. An error occurred while in a function called from GDB. Evaluation of the expression containing the function (return_true) will be abandoned. When the function is done executing, GDB will silently stop. Selected thread is running. (gdb) Or, in some cases, like this: (gdb) break infcall-from-bp-cond-simple.c:56 if (is_matching_tid (arg, 1)) Breakpoint 2 at 0x401194: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-simple.c, line 56. (gdb) continue Continuing. [New Thread 0x7ffff7c5d700 (LWP 2461106)] [New Thread 0x7ffff745c700 (LWP 2461107)] ../../src.release/gdb/nat/x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. The precise error depends on the exact thread state; so there's race conditions depending on which threads have fully started, and which have not. But the underlying problem is always the same; when GDB tries to execute the inferior function call from within the breakpoint condition, GDB will, incorrectly, try to resume threads that are already running - GDB doesn't realise that some threads might already be running. The solution proposed in this patch requires an additional member variable thread_info::in_cond_eval. This flag is set to true (in breakpoint.c) when GDB is evaluating a breakpoint condition. In user_visible_resume_ptid (infrun.c), when the in_cond_eval flag is true, then GDB will only try to resume the current thread, that is, the thread for which the breakpoint condition is being evaluated. This solves the problem of GDB trying to resume threads that are already running. The next problem is that inferior function calls are assumed to be synchronous, that is, GDB doesn't expect to start an inferior function call in thread #1, then receive a stop from thread #2 for some other, unrelated reason. To prevent GDB responding to an event from another thread, we update fetch_inferior_event and do_target_wait in infrun.c, so that, when an inferior function call (on behalf of a breakpoint condition) is in progress, we only wait for events from the current thread (the one evaluating the condition). In do_target_wait I had to change the inferior_matches lambda function, which is used to select which inferior to wait on. Previously the logic was this: auto inferior_matches = [&wait_ptid] (inferior *inf) { return (inf->process_target () != nullptr && ptid_t (inf->pid).matches (wait_ptid)); }; This compares the pid of the inferior against the complete ptid we want to wait on. Before this commit wait_ptid was only ever minus_one_ptid (which is special, and means any process), and so every inferior would match. After this commit though wait_ptid might represent a specific thread in a specific inferior. If we compare the pid of the inferior to a specific ptid then these will not match. The fix is to compare against the pid extracted from the wait_ptid, not against the complete wait_ptid itself. In fetch_inferior_event, after receiving the event, we only want to stop all the other threads, and call inferior_event_handler with INF_EXEC_COMPLETE, if we are not evaluating a conditional breakpoint. If we are, then all the other threads should be left doing whatever they were before. The inferior_event_handler call will be performed once the breakpoint condition has finished being evaluated, and GDB decides to stop or not. The final problem that needs solving relates to GDB's commit-resume mechanism, which allows GDB to collect resume requests into a single packet in order to reduce traffic to a remote target. The problem is that the commit-resume mechanism will not send any resume requests for an inferior if there are already events pending on the GDB side. Imagine an inferior with two threads. Both threads hit a breakpoint, maybe the same conditional breakpoint. At this point there are two pending events, one for each thread. GDB selects one of the events and spots that this is a conditional breakpoint, GDB evaluates the condition. The condition includes an inferior function call, so GDB sets up for the call and resumes the one thread, the resume request is added to the commit-resume queue. When the commit-resume queue is committed GDB sees that there is a pending event from another thread, and so doesn't send any resume requests to the actual target, GDB is assuming that when we wait we will select the event from the other thread. However, as this is an inferior function call for a condition evaluation, we will not select the event from the other thread, we only care about events from the thread that is evaluating the condition - and the resume for this thread was never sent to the target. And so, GDB hangs, waiting for an event from a thread that was never fully resumed. To fix this issue I have added the concept of "forcing" the commit-resume queue. When enabling commit resume, if the force flag is true, then any resumes will be committed to the target, even if there are other threads with pending events. A note on authorship: this patch was based on some work done by Natalia Saiapova and Tankut Baris Aktemur from Intel[1]. I have made some changes to their work in this version. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28942 [1] https://sourceware.org/pipermail/gdb-patches/2020-October/172454.html Co-authored-by: Natalia Saiapova <natalia.saiapova@intel.com> Co-authored-by: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Tested-By: Luis Machado <luis.machado@arm.com> Tested-By: Keith Seitz <keiths@redhat.com>
2024-03-25Revert "gdb: remove unnecessary parameter wait_ptid from do_target_wait"Andrew Burgess1-5/+7
This reverts commit ac0d67ed1dcf470bad6a3bc4800c2ddc9bedecca. There was nothing wrong with the commit which I'm reverting here, but it removed some functionality that will be needed for a later commit; that is, the ability for GDB to ask for events from a specific ptid_t via the do_target_wait function. In a follow up commit, this functionality will be used to implement inferior function calls in multi-threaded inferiors. This is not a straight revert of the above commit. Reverting the above commit replaces a 'nullptr' with 'NULL', I've gone in and changed that, preserving the 'nullptr'. Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Tested-By: Luis Machado <luis.machado@arm.com> Tested-By: Keith Seitz <keiths@redhat.com>
2024-02-20gdb: pass frames as `const frame_info_ptr &`Simon Marchi1-10/+12
We currently pass frames to function by value, as `frame_info_ptr`. This is somewhat expensive: - the size of `frame_info_ptr` is 64 bytes, which is a bit big to pass by value - the constructors and destructor link/unlink the object in the global `frame_info_ptr::frame_list` list. This is an `intrusive_list`, so it's not so bad: it's just assigning a few points, there's no memory allocation as if it was `std::list`, but still it's useless to do that over and over. As suggested by Tom Tromey, change many function signatures to accept `const frame_info_ptr &` instead of `frame_info_ptr`. Some functions reassign their `frame_info_ptr` parameter, like: void the_func (frame_info_ptr frame) { for (; frame != nullptr; frame = get_prev_frame (frame)) { ... } } I wondered what to do about them, do I leave them as-is or change them (and need to introduce a separate local variable that can be re-assigned). I opted for the later for consistency. It might not be clear why some functions take `const frame_info_ptr &` while others take `frame_info_ptr`. Also, if a function took a `frame_info_ptr` because it did re-assign its parameter, I doubt that we would think to change it to `const frame_info_ptr &` should the implementation change such that it doesn't need to take `frame_info_ptr` anymore. It seems better to have a simple rule and apply it everywhere. Change-Id: I59d10addef687d157f82ccf4d54f5dde9a963fd0 Approved-By: Andrew Burgess <aburgess@redhat.com>
2024-02-09gdb: add inferior parameter to breakpoint_init_inferiorSimon Marchi1-2/+2
By inspection, I believe that breakpoint_init_inferior doesn't call anything that relies on the current program space or inferior. So, add an inferior parameter, to make the current inferior / program space references bubble up one level. Change-Id: Ib07b7a6d360e324f6ae1aa502dd314b8cce421b7 Approved-By: Andrew Burgess <aburgess@redhat.com>
2024-02-09gdb: add program_space parameter to mark_breakpoints_outSimon Marchi1-1/+1
Make the current_program_space reference bubble up one level. Change-Id: Idc8ed78d23bf3bb2969f6963d8cc049f26901c29 Approved-By: Andrew Burgess <aburgess@redhat.com>
2024-01-31gdb: remove some unnecessary frame_info_ptr resetsSimon Marchi1-3/+0
This code was probably needed before we had reinflatable frame_info_ptrs, it's not necessary anymore. Change-Id: I5474c6081ee1e39624c9266b05dbe01351a130b5 Approved-By: Tom Tromey <tom@tromey.com>
2024-01-28Use domain_search_flags in lookup_symbol et alTom Tromey1-1/+1
This changes lookup_symbol and associated APIs to accept domain_search_flags rather than a domain_enum. Note that this introduces some new constants to Python and Guile. I chose to break out the documentation patch for this, because the internals here do not change until a later patch, and it seemed simpler to patch the docs just once, rather than twice.
2024-01-21gdb/infrun: lazily load curr_frame_id in process_event_stop_testLancelot SIX1-9/+41
A recent(ish) change in gdb/infrun.c made process_event_stop_test load debug information where it would not have done so previously. The change is: commit bf2813aff8f2988ad3d53e819a0415abf295c91f AuthorDate: Fri Sep 1 13:47:32 2023 +0200 CommitDate: Mon Nov 20 10:54:03 2023 +0100 gdb/record: print frame information when exiting a recursive call Currently, when GDB is reverse stepping out of a function into the same function due to a recursive call, it doesn't print frame information, as reported by PR record/29178. This happens because when the inferior leaves the current frame, GDB decides to refresh the step information, clobbering the original step_frame_id, making it impossible to figure out later on that the frame has been changed. This commit changes GDB so that, if we notice we're in this exact situation, we won't refresh the step information. Because of implementation details, this change can cause some debug information to be read when it normally wouldn't before, which showed up as a regression on gdb.dwarf2/dw2-out-of-range-end-of-seq. Since that isn't a problem, the test was changed to allow for the new output. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29178 Although there is nothing wrong with this change in principle, it happens to break most of the tests in gdb/testsuite/gdb.rocm/*.exp. This is because those tests do rely on GDB not loading debug information. This is necessary because the debug information produced for AMDGPU code is using DWARF extensions which are not supported by GDB at this point. In this patch, I propose to use a lazy loading mechanism so the frame_id for the current frame is only computed when required instead of when entering process_event_stop_test. The lazy_loader class is currently defined locally in infrun.c, but if it turns out to be useful elsewhere, it could go somewhere under gdbsupport. This patch should restore the behavior GDB had before bf2813aff8f2988ad3d53e819a0415abf295c91f when it comes to load debug info. Another approach could have been to revert fb84fbf8a51f5be2e78765508ebd9753af96b492 (gdb/infrun: simplify process_event_stop_test) and adjust the implementation of bf2813aff8f2988ad3d53e819a0415abf295c91f (gdb/record: print frame information when exiting a recursive call). However, I think that the lazy loading works well with the simplification done recently, so I went down that route. Regression tested on x86_64-linux (Ubuntu 22.04) with AMDGPU support. Change-Id: Ib63a162128130d1786a77c98623e9e3dcbc363b7 Approved-by: Kevin Buettner <kevinb@redhat.com>
2024-01-19gdb: Buffer output streams during events that might download debuginfoAaron Merey1-3/+13
Introduce new ui_file buffering_file to temporarily collect output written to gdb_std* output streams during print_thread, print_frame_info and print_stop_event. This ensures that output during these functions is not interrupted by debuginfod progress messages. With the addition of deferred debuginfo downloading it is possible for download progress messages to print during these events. Without any intervention we can end up with poorly formatted output: (gdb) backtrace [...] #8 0x00007fbe8af7d7cf in pygi_invoke_c_callable (Downloading separate debug info for /lib64/libpython3.11.so.1.0 function_cache=0x561221b224d0, state=<optimized out>... To fix this we buffer writes to gdb_std* output streams while allowing debuginfod progress messages to skip the buffers and print to the underlying output streams immediately. Buffered output is then written to the output streams. This ensures that progress messages print first, followed by uninterrupted frame/thread/stop info: (gdb) backtrace [...] Downloading separate debug info for /lib64/libpython3.11.so.1.0 #8 0x00007fbe8af7d7cf in pygi_invoke_c_callable (function_cache=0x561221b224d0, state=<optimized out>... Co-Authored-By: Andrew Burgess <aburgess@redhat.com> Approved-By: Andrew Burgess <aburgess@redhat.com>
2024-01-12Update copyright year range in header of all files managed by GDBAndrew Burgess1-1/+1
This commit is the result of the following actions: - Running gdb/copyright.py to update all of the copyright headers to include 2024, - Manually updating a few files the copyright.py script told me to update, these files had copyright headers embedded within the file, - Regenerating gdbsupport/Makefile.in to refresh it's copyright date, - Using grep to find other files that still mentioned 2023. If these files were updated last year from 2022 to 2023 then I've updated them this year to 2024. I'm sure I've probably missed some dates. Feel free to fix them up as you spot them.
2024-01-02PowerPC and aarch64: Fix reverse stepping failureCarl Love1-0/+57
When running GDB's testsuite on aarch64-linux/Ubuntu 20.04 (also spotted on the ppc backend), there are failures in gdb.reverse/solib-precsave.exp and gdb.reverse/solib-reverse.exp. The failure happens around the following code: 38 b[1] = shr2(17); /* middle part two */ 40 b[0] = 6; b[1] = 9; /* generic statement, end part two */ 42 shr1 ("message 1\n"); /* shr1 one */ Normal execution: - step from line 38 will land on line 40. - step from line 40 will land on line 42. Reverse execution: - step from line 42 will land on line 40. - step from line 40 will land on line 40. - step from line 40 will land on line 38. The problem here is that line 40 contains two contiguous but distinct PC ranges in the line table, like so: Line 40 - [0x7ec ~ 0x7f4] Line 40 - [0x7f4 ~ 0x7fc] The two distinct ranges are generated because GCC started outputting source column information, which GDB doesn't take into account at the moment. When stepping forward from line 40, we skip both of these ranges and land on line 42. When stepping backward from line 42, we stop at the start PC of the second (or first, going backwards) range of line 40. Since we've reached ecs->event_thread->control.step_range_start, we stop stepping backwards. The above issues were fixed by introducing a new function that looks for adjacent PC ranges for the same line, until we notice a line change. Then we take that as the start PC of the range. The new start PC for the range is used for the control.step_range_start when setting up a step range. The test case gdb.reverse/map-to-same-line.exp is added to test the fix for the above reverse step issues. Patch has been tested on PowerPC, X86 and AArch64 with no regressions.
2023-12-20Step over thread exit, always delete the thread non-silentlyPedro Alves1-4/+7
With AMD GPU debugging, I noticed that when stepping over a breakpoint placed on top of the s_endpgm instruction inline (displaced=off), GDB would behave differently -- it wouldn't print the wave exit. E.g: With displaced stepping, or no breakpoint at all: stepi [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited] Command aborted, thread exited. (gdb) With inline stepping: stepi Command aborted, thread exited. (gdb) In the cases we see the "exited" notification, handle_thread_exit is what first called delete_thread on the exiting thread, which is non-silent. With inline stepping, however, handle_thread_exit ends up in update_thread_list (via restart_threads) before any delete_thread call. Thus, amd_dbgapi_target::update_thread_list notices that the wave is gone and deletes it with delete_thread_silent. This commit fixes it, by making handle_thread_exited call set_thread_exited (with the default silent=false) early, which emits the user-visible notification. Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I22ab3145e18d07c99dace45576307b9f9d5d966f
2023-12-20displaced_step_finish: Don't fetch the regcache of exited threadsPedro Alves1-7/+12
displaced_step_finish can be called with event_status.kind == TARGET_WAITKIND_THREAD_EXITED, and in that case it is not possible to get at the already-exited thread's registers. This patch moves the get_thread_regcache calls to branches that actually need it, where we know the thread is still alive. It also adds an assertion to get_thread_regcache, to help catching these broken cases sooner. Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I63b5eacb3e02a538fc5087c270d8025adfda88c3
2023-12-20Ensure selected thread after thread exit stopPedro Alves1-0/+7
While making step over thread exit work properly on AMDGPU, I noticed that if there's a breakpoint on top of the exit syscall, and, displaced stepping is off, then when GDB reports "Command aborted, thread exited.", GDB also switches focus to a random thread, instead of leaving the exited thread as selected: (gdb) thread [Current thread is 6, lane 0 (AMDGPU Lane 1:4:1:1/0 (0,0,0)[0,0,0])] (gdb) si Command aborted, thread exited. (gdb) thread [Current thread is 5 (Thread 0x7ffff626f640 (LWP 3248392))] (gdb) The previous patch extended gdb.threads/step-over-thread-exit.exp to exercise this on GNU/Linux (on the CPU side), and there, after that "si", we always end up with the exiting thread as selected even without this fix, but that's just a concidence, there's a code path that happens to select the exiting thread for an unrelated reason. This commit add the explict switch, fixing the latent problem for GNU/Linux, and the actual problem on AMDGPU. I wrote a gdb.rocm/ testcase for this, but it can't be upstreamed yet, until more pieces of the DWARF machinery are upstream as well. Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I6ff57a79514ac0142bba35c749fe83d53d9e4e51
2023-11-28[gdb] Fix segfault in for_each_block, part 2Tom de Vries1-5/+11
The previous commit describes PR gdb/30547, a segfault when running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise on s390x). The root cause for the segmentation fault is that linux_is_uclinux gives an incorrect result: it returns true instead of false. So, why does linux_is_uclinux: ... int linux_is_uclinux (void) { CORE_ADDR dummy; return (target_auxv_search (AT_NULL, &dummy) > 0 && target_auxv_search (AT_PAGESZ, &dummy) == 0); ... return true? This is because ppc_linux_target_wordsize returns 4 instead of 8, causing ppc_linux_nat_target::auxv_parse to misinterpret the auxv vector. So, why does ppc_linux_target_wordsize: ... int ppc_linux_target_wordsize (int tid) { int wordsize = 4; /* Check for 64-bit inferior process. This is the case when the host is 64-bit, and in addition the top bit of the MSR register is set. */ long msr; errno = 0; msr = (long) ptrace (PTRACE_PEEKUSER, tid, PT_MSR * 8, 0); if (errno == 0 && ppc64_64bit_inferior_p (msr)) wordsize = 8; return wordsize; } ... return 4? Specifically, we get this result because because tid == 0, so we get errno == ESRCH. The tid == 0 is caused by the switch_to_no_thread in handle_vfork_child_exec_or_exit: ... /* Switch to no-thread while running clone_program_space, so that clone_program_space doesn't want to read the selected frame of a dead process. */ scoped_restore_current_thread restore_thread; switch_to_no_thread (); inf->pspace = new program_space (maybe_new_address_space ()); ... but moving the maybe_new_address_space call to before that gives us the same result. The tid is no longer 0, but we still get ESRCH because the thread has exited. Fix this in handle_vfork_child_exec_or_exit by doing the maybe_new_address_space call in the context of the vfork parent. Tested on top of trunk on x86_64-linux and ppc64le-linux. Tested on top of gdb-14-branch on ppc64-linux. Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca> PR gdb/30547 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547
2023-11-28[gdb] Fix segfault in for_each_block, part 1Tom de Vries1-24/+23
When running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise on s390x), I run into: ... (gdb) PASS: gdb.base/vfork-follow-parent.exp: \ exec_file=vfork-follow-parent-exit: target-non-stop=on: non-stop=off: \ resolution_method=schedule-multiple: print unblock_parent = 1 continue^M Continuing.^M Reading symbols from vfork-follow-parent-exit...^M ^M ^M Fatal signal: Segmentation fault^M ----- Backtrace -----^M 0x1027d3e7 gdb_internal_backtrace_1^M src/gdb/bt-utils.c:122^M 0x1027d54f _Z22gdb_internal_backtracev^M src/gdb/bt-utils.c:168^M 0x1057643f handle_fatal_signal^M src/gdb/event-top.c:889^M 0x10576677 handle_sigsegv^M src/gdb/event-top.c:962^M 0x3fffa7610477 ???^M 0x103f2144 for_each_block^M src/gdb/dcache.c:199^M 0x103f235b _Z17dcache_invalidateP13dcache_struct^M src/gdb/dcache.c:251^M 0x10bde8c7 _Z24target_dcache_invalidatev^M src/gdb/target-dcache.c:50^M ... or similar. The root cause for the segmentation fault is that linux_is_uclinux gives an incorrect result: it should always return false, given that we're running on a regular linux system, but instead it returns first true, then false. In more detail, the segmentation fault happens as follows: - a program space with an address space is created - a second program space is about to be created. maybe_new_address_space is called, and because linux_is_uclinux returns true, maybe_new_address_space returns false, and no new address space is created - a second program space with the same address space is created - a program space is deleted. Because linux_is_uclinux now returns false, gdbarch_has_shared_address_space (current_inferior ()->arch ()) returns false, and the address space is deleted - when gdb uses the address space of the remaining program space, we run into the segfault, because the address space is deleted. Hardcoding linux_is_uclinux to false makes the test-case pass. We leave addressing the root cause for the following commit in this series. For now, prevent the segmentation fault by making the address space a refcounted object. This was already suggested here [1]: ... A better solution might be to have the address spaces be reference counted ... Tested on top of trunk on x86_64-linux and ppc64le-linux. Tested on top of gdb-14-branch on ppc64-linux. Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca> PR gdb/30547 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547 [1] https://sourceware.org/pipermail/gdb-patches/2023-October/202928.html
2023-11-27gdb: make catch_syscall_enabled return boolSimon Marchi1-1/+1
Make it return a bool and adjust a few comparisons where it's used. Change-Id: Ic77d23b0dcfcfc9195dfe65e4c7ff9cf3229f6fb
2023-11-21gdb: Replace gdb::optional with std::optionalLancelot Six1-6/+6
Since GDB now requires C++17, we don't need the internally maintained gdb::optional implementation. This patch does the following replacing: - gdb::optional -> std::optional - gdb::in_place -> std::in_place - #include "gdbsupport/gdb_optional.h" -> #include <optional> This change has mostly been done automatically. One exception is gdbsupport/thread-pool.* which did not use the gdb:: prefix as it already lives in the gdb namespace. Change-Id: I19a92fa03e89637bab136c72e34fd351524f65e9 Approved-By: Tom Tromey <tom@tromey.com> Approved-By: Pedro Alves <pedro@palves.net>
2023-11-20gdb/infrun: simplify process_event_stop_testGuinevere Larsen1-12/+6
The previous commit introduced some local variables to make some if statements simpler. This commit uses them more liberally throughout the process_event_stop_test in order to simplify the function a little more. No functional changes are expected. Approved-By: Tom Tromey <tom@tromey.com>
2023-11-20gdb/record: print frame information when exiting a recursive callGuinevere Larsen1-0/+18
Currently, when GDB is reverse stepping out of a function into the same function due to a recursive call, it doesn't print frame information, as reported by PR record/29178. This happens because when the inferior leaves the current frame, GDB decides to refresh the step information, clobbering the original step_frame_id, making it impossible to figure out later on that the frame has been changed. This commit changes GDB so that, if we notice we're in this exact situation, we won't refresh the step information. Because of implementation details, this change can cause some debug information to be read when it normally wouldn't before, which showed up as a regression on gdb.dwarf2/dw2-out-of-range-end-of-seq. Since that isn't a problem, the test was changed to allow for the new output. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29178 Approved-By: Tom Tromey <tom@tromey.com>
2023-11-17gdb: pass address_space to target dcache functionsSimon Marchi1-3/+3
A simple refactor to make the reference to current_program_space bubble up one level. No behavior changes expected. Change-Id: I237cf2f45ae73c35bcb433ce40e3c03cef6b87e2
2023-11-17gdb: remove get_current_regcacheSimon Marchi1-13/+10
Remove get_current_regcache, inlining the call to get_thread_regcache in callers. When possible, pass the right thread_info object known from the local context. Otherwise, fall back to passing `inferior_thread ()`. This makes the reference to global context bubble up one level, a small step towards the long term goal of reducing the number of references to global context (or rather, moving those references as close as possible to the top of the call tree). No behavior change expected. Change-Id: Ifa6980c88825d803ea586546b6b4c633c33be8d6
2023-11-17gdb: remove regcache's address spaceSimon Marchi1-22/+21
While looking at the regcache code, I noticed that the address space (passed to regcache when constructing it, and available through regcache::aspace) wasn't relevant for the regcache itself. Callers of regcache::aspace use that method because it appears to be a convenient way of getting the address space for a thread, if you already have the regcache. But there is always another way to get the address space, as the callers pretty much always know which thread they are dealing with. The regcache code itself doesn't use the address space. This patch removes anything related to address_space from the regcache code, and updates callers to get it from the thread in context. This removes a bit of unnecessary complexity from the regcache code. The current get_thread_arch_regcache function gets an address_space for the given thread using the target_thread_address_space function (which calls the target_ops::thread_address_space method). This suggest that there might have been the intention of supporting per-thread address spaces. But digging through the history, I did not find any such case. Maybe this method was just added because we needed a way to get an address space from a ptid (because constructing a regcache required an address space), and this seemed like the right way to do it, I don't know. The only implementations of thread_address_space and process_stratum_target::thread_address_space and linux_nat_target::thread_address_space, which essentially just return the inferior's address space. And thread_address_space is only used in the current get_thread_arch_regcache, which gets removed. So, I think that the thread_address_space target method can be removed, and we can assume that it's fine to use the inferior's address space everywhere. Callers of regcache::aspace are updated to get the address space from the relevant inferior, either using some context they already know about, or in last resort using the current global context. So, to summarize: - remove everything in regcache related to address spaces - in particular, remove get_thread_arch_regcache, and rename get_thread_arch_aspace_regcache to get_thread_arch_regcache - remove target_ops::thread_address_space, and target_thread_address_space - adjust all users of regcache::aspace to get the address space another way Change-Id: I04fd41b22c83fe486522af7851c75bcfb31c88c7
2023-11-13Cancel execution command on thread exit, when stepping, nexting, etc.Pedro Alves1-10/+63
If your target has no support for TARGET_WAITKIND_NO_RESUMED events (and no way to support them, such as the yet-unsubmitted AMDGPU target), and you step over thread exit with scheduler-locking on, this is what you get: (gdb) n [Thread ... exited] *hang* Getting back the prompt by typing Ctrl-C may not even work, since no inferior thread is running to receive the SIGINT. Even if it works, it seems unnecessarily harsh. If you started an execution command for which there's a clear thread of interest (step, next, until, etc.), and that thread disappears, then I think it's more user friendly if GDB just detects the situation and aborts the command, giving back the prompt. That is what this commit implements. It does this by explicitly requesting the target to report thread exit events whenever the main resumed thread has a thread_fsm. Note that unlike stepping over a breakpoint, we don't need to enable clone events in this case. With this patch, we get: (gdb) n [Thread 0x7ffff7d89700 (LWP 3961883) exited] Command aborted, thread exited. (gdb) Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I901ab64c91d10830590b2dac217b5264635a2b95
2023-11-13Don't resume new threads if scheduler-locking is in effectPedro Alves1-8/+33
If scheduler-locking is in effect, e.g., with "set scheduler-locking on", and you step over a function that spawns a new thread, the new thread is allowed to run free, at least until some event is hit, at which point, whether the new thread is re-resumed depends on a number of seemingly random factors. E.g., if the target is all-stop, and the parent thread hits a breakpoint, and GDB decides the breakpoint isn't interesting to report to the user, then the parent thread is resumed, but the new thread is left stopped. I think that letting the new threads run with scheduler-locking enabled is a defect. This commit fixes that, making use of the new clone events on Linux, and of target_thread_events() on targets where new threads have no connection to the thread that spawned them. Testcase and documentation changes included. Approved-By: Eli Zaretskii <eliz@gnu.org> Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie12140138b37534b7fc1d904da34f0f174aa11ce
2023-11-13stop_all_threads: (re-)enable async before waiting for stopsPedro Alves1-0/+81
Running the gdb.threads/step-over-thread-exit-while-stop-all-threads.exp testcase added later in the series against gdbserver, after the TARGET_WAITKIND_NO_RESUMED fix from the following patch, would run into an infinite loop in stop_all_threads, leading to a timeout: FAIL: gdb.threads/step-over-thread-exit-while-stop-all-threads.exp: displaced-stepping=off: target-non-stop=on: iter 0: continue (timeout) The is really a latent bug, and it is about the fact that stop_all_threads stops listening to events from a target as soon as it sees a TARGET_WAITKIND_NO_RESUMED, ignoring that TARGET_WAITKIND_NO_RESUMED may be delayed. handle_no_resumed knows how to handle delayed no-resumed events, but stop_all_threads was never taught to. In more detail, here's what happens with that testcase: #1 - Multiple threads report breakpoint hits to gdb. #2 - gdb picks one events, and it's for thread 1. All other stops are left pending. thread 1 needs to move past a breakpoint, so gdb stops all threads to start an inline step over for thread 1. While stopping threads, some of the threads that were still running report events that are also left pending. #2 - gdb steps thread 1 #3 - Thread 1 exits while stepping (it steps over an exit syscall), gdbserver reports thread exit for thread 1 #4 - Thread 1 was the last resumed thread, so gdbserver also reports no-resumed: [remote] Notification received: Stop:w0;p3445d0.3445d3 [remote] Sending packet: $vStopped#55 [remote] Packet received: N [remote] Sending packet: $vStopped#55 [remote] Packet received: OK #5 - gdb processes the thread exit for thread 1, finishes the step over and restarts threads. #6 - gdb picks the next event to process out of one of the resumed threads with pending events: [infrun] random_resumed_with_pending_wait_status: Found 32 events, selecting #11 #7 - This is again a breakpoint hit and the breakpoint needs to be stepped over too, so gdb starts a step-over dance again. #8 - We reach stop_all_threads, which finds that some threads need to be stopped. #9 - wait_one finally consumes the no-resumed event queue by #4. Seeing this, wait_one disable target async, to stop listening for events out of the remote target. #10 - We still haven't seen all the stops expected, so stop_all_threads tries another iteration. #11 - Because the remote target is no longer async, and there are no other targets, wait_one return no-resumed immediately without polling the remote target. #12 - We still haven't seen all the stops expected, so stop_all_threads tries another iteration. goto #11, looping forever. Fix this by explicitly enabling/re-enabling target async on targets that can async, before waiting for stops. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie3ffb0df89635585a6631aa842689cecc989e33f
2023-11-13gdb: clear step over information on thread exit (PR gdb/27338)Pedro Alves1-16/+155
GDB doesn't handle correctly the case where a thread steps over a breakpoint (using either in-line or displaced stepping), and the executed instruction causes the thread to exit. Using the test program included later in the series, this is what it looks like with displaced-stepping, on x86-64 Linux, where we have two displaced-step buffers: $ ./gdb -q -nx --data-directory=data-directory build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit -ex "b my_exit_syscall" -ex r Reading symbols from build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit... Breakpoint 1 at 0x123c: file src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S, line 68. Starting program: build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1". [New Thread 0x7ffff7c5f640 (LWP 2915510)] [Switching to Thread 0x7ffff7c5f640 (LWP 2915510)] Thread 2 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68 68 syscall (gdb) c Continuing. [New Thread 0x7ffff7c5f640 (LWP 2915524)] [Thread 0x7ffff7c5f640 (LWP 2915510) exited] [Switching to Thread 0x7ffff7c5f640 (LWP 2915524)] Thread 3 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68 68 syscall (gdb) c Continuing. [New Thread 0x7ffff7c5f640 (LWP 2915616)] [Thread 0x7ffff7c5f640 (LWP 2915524) exited] [Switching to Thread 0x7ffff7c5f640 (LWP 2915616)] Thread 4 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68 68 syscall (gdb) c Continuing. ... hangs ... The first two times we do "continue", we displaced-step the syscall instruction that causes the thread to exit. When the thread exits, the main thread, waiting on pthread_join, is unblocked. It spawns a new thread, which hits the breakpoint on the syscall again. However, infrun was never notified that the displaced-stepping threads are done using the displaced-step buffer, so now both buffers are marked as used. So when we do the third continue, there are no buffers available to displaced-step the syscall, so the thread waits forever for its turn. When trying the same but with in-line step over (displaced-stepping disabled): $ ./gdb -q -nx --data-directory=data-directory \ build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit \ -ex "b my_exit_syscall" -ex "set displaced-stepping off" -ex r Reading symbols from build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit... Breakpoint 1 at 0x123c: file src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S, line 68. Starting program: build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1". [New Thread 0x7ffff7c5f640 (LWP 2928290)] [Switching to Thread 0x7ffff7c5f640 (LWP 2928290)] Thread 2 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68 68 syscall (gdb) c Continuing. [Thread 0x7ffff7c5f640 (LWP 2928290) exited] No unwaited-for children left. (gdb) i th Id Target Id Frame 1 Thread 0x7ffff7c60740 (LWP 2928285) "step-over-threa" 0x00007ffff7f7c9b7 in __pthread_clockjoin_ex () from /usr/lib/libpthread.so.0 The current thread <Thread ID 2> has terminated. See `help thread'. (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7c60740 (LWP 2928285))] #0 0x00007ffff7f7c9b7 in __pthread_clockjoin_ex () from /usr/lib/libpthread.so.0 (gdb) c Continuing. ^C^C ... hangs ... The "continue" causes an in-line step to occur, meaning the main thread is stopped while we step the syscall. The stepped thread exits when executing the syscall, the linux-nat target notices there are no more resumed threads to be waited for, so returns TARGET_WAITKIND_NO_RESUMED, which causes the prompt to return. But infrun never clears the in-line step over info. So if we try continuing the main thread, GDB doesn't resume it, because it thinks there's an in-line step in progress that we need to wait for to finish, and we are stuck there. To fix this, infrun needs to be informed when a thread doing a displaced or in-line step over exits. We can do that with the new target_set_thread_options mechanism which is optimal for only enabling exit events of the thread that needs it; or, if that is not supported, by using target_thread_events, which enables thread exit events for all threads. This is done by this commit. This patch then modifies handle_inferior_event in infrun.c to clean up any step-over the exiting thread might have been doing at the time of the exit. The cases to consider are: - the exiting thread was doing an in-line step-over with an all-stop target - the exiting thread was doing an in-line step-over with a non-stop target - the exiting thread was doing a displaced step-over with a non-stop target The displaced-stepping buffer implementation in displaced-stepping.c is modified to account for the fact that it's possible that we "finish" a displaced step after a thread exit event. The buffer that the exiting thread was using is marked as available again and the original instructions under the scratch pad are restored. However, it skips applying the fixup, which wouldn't make sense since the thread does not exist anymore. Another case that needs handling is if a displaced-stepping thread exits, and the event is reported while we are in stop_all_threads. We should call displaced_step_finish in the handle_one function, in that case. It was already called in other code paths, just not the "thread exited" path. This commit doesn't make infrun ask the target to report the TARGET_WAITKIND_THREAD_EXITED events yet, that'll be done later in the series. Note that "stop_print_frame = false;" line is moved to normal_stop, because TARGET_WAITKIND_THREAD_EXITED can also end up with the event transmorphed into TARGET_WAITKIND_NO_RESUMED. Moving it to normal_stop keeps it centralized. Co-authored-by: Simon Marchi <simon.marchi@efficios.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338 Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I745c6955d7ef90beb83bcf0ff1d1ac8b9b6285a5
2023-11-13Introduce GDB_THREAD_OPTION_EXIT thread option, fix step-over-thread-exitPedro Alves1-5/+10
When stepping over a breakpoint with displaced stepping, GDB needs to be informed if the stepped thread exits, otherwise the displaced stepping buffer that was allocated to that thread leaks, and this can result in deadlock, with other threads waiting for their turn to displaced step, but their turn never comes. Similarly, when stepping over a breakpoint in line, GDB also needs to be informed if the stepped thread exits, so that is can clear the step over state and re-resume threads. This commit makes it possible for GDB to ask the target to report thread exit events for a given thread, using the new "thread options" mechanism introduced by a previous patch. This only adds the core bits. Following patches in the series will teach the Linux backends (native & gdbserver) to handle the GDB_THREAD_OPTION_EXIT option, and then a later patch will make use of these thread exit events to clean up displaced stepping and inline stepping state properly. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I96b719fdf7fee94709e98bb3a90751d8134f3a38 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338
2023-11-13Move deleting thread on TARGET_WAITKIND_THREAD_EXITED to corePedro Alves1-5/+27
Currently, infrun assumes that when TARGET_WAITKIND_THREAD_EXITED is reported, the corresponding GDB thread has already been removed from the GDB thread list. Later in the series, that will no longer work, as infrun will need to refer to the thread's thread_info when it processes TARGET_WAITKIND_THREAD_EXITED. As preparation, this patch makes deleting the GDB thread responsibility of infrun, instead of the target. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I013d87f61ffc9aaca49f0d6ce2a43e3ea69274de
2023-11-13Thread options & clone events (core + remote)Pedro Alves1-1/+62
A previous patch taught GDB about a new TARGET_WAITKIND_THREAD_CLONED event kind, and made the Linux target report clone events. A following patch will teach Linux GDBserver to do the same thing. However, for remote debugging, it wouldn't be ideal for GDBserver to report every clone event to GDB, when GDB only cares about such events in some specific situations. Reporting clone events all the time would be potentially chatty. We don't enable thread create/exit events all the time for the same reason. Instead we have the QThreadEvents packet. QThreadEvents is target-wide, though. This patch makes GDB instead explicitly request that the target reports clone events or not, on a per-thread basis. In order to be able to do that with GDBserver, we need a new remote protocol feature. Since a following patch will want to enable thread exit events on per-thread basis too, the packet introduced here is more generic than just for clone events. It lets you enable/disable a set of options at once, modelled on Linux ptrace's PTRACE_SETOPTIONS. IOW, this commit introduces a new QThreadOptions packet, that lets you specify a set of per-thread event options you want to enable. The packet accepts a list of options/thread-id pairs, similarly to vCont, processed left to right, with the options field being a number interpreted as a bit mask of options. The only option defined in this commit is GDB_THREAD_OPTION_CLONE (0x1), which ask the remote target to report clone events. Another patch later in the series will introduce another option. For example, this packet sets option "1" (clone events) on thread p1000.2345: QThreadOptions;1:p1000.2345 and this clears options for all threads of process 1000, and then sets option "1" (clone events) on thread p1000.2345: QThreadOptions;0:p1000.-1;1:p1000.2345 This clears options of all threads of all processes: QThreadOptions;0 The target reports the set of supported options by including "QThreadOptions=<supported options>" in its qSupported response. infrun is then tweaked to enable GDB_THREAD_OPTION_CLONE when stepping over a breakpoint. Unlike PTRACE_SETOPTIONS, fork/vfork/clone children do NOT inherit their parent's thread options. This is so that GDB can send e.g., "QThreadOptions;0;1:TID" without worrying about threads it doesn't know about yet. Documentation for this new remote protocol feature is included in a documentation patch later in the series. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie41e5093b2573f14cf6ac41b0b5804eba75be37e
2023-11-13Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONEDPedro Alves1-70/+88
(A good chunk of the problem statement in the commit log below is Andrew's, adjusted for a different solution, and for covering displaced stepping too. The testcase is mostly Andrew's too.) This commit addresses bugs gdb/19675 and gdb/27830, which are about stepping over a breakpoint set at a clone syscall instruction, one is about displaced stepping, and the other about in-line stepping. Currently, when a new thread is created through a clone syscall, GDB sets the new thread running. With 'continue' this makes sense (assuming no schedlock): - all-stop mode, user issues 'continue', all threads are set running, a newly created thread should also be set running. - non-stop mode, user issues 'continue', other pre-existing threads are not affected, but as the new thread is (sort-of) a child of the thread the user asked to run, it makes sense that the new threads should be created in the running state. Similarly, if we are stopped at the clone syscall, and there's no software breakpoint at this address, then the current behaviour is fine: - all-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). While stepping the thread of interest all the other threads will be allowed to continue. A newly created thread will be set running, and then stopped once the thread of interest has completed its step. - non-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). Other threads might be running or stopped, but as with the continue case above, the new thread will be created running. The only possible issue here is that the new thread will be left running after the initial thread has completed its stepi. The user would need to manually select the thread and interrupt it, this might not be what the user expects. However, this is not something this commit tries to change. The problem then is what happens when we try to step over a clone syscall if there is a breakpoint at the syscall address. - For both all-stop and non-stop modes, with in-line stepping: + user issues 'stepi', + [non-stop mode only] GDB stops all threads. In all-stop mode all threads are already stopped. + GDB removes s/w breakpoint at syscall address, + GDB single steps just the thread of interest, all other threads are left stopped, + New thread is created running, + Initial thread completes its step, + [non-stop mode only] GDB resumes all threads that it previously stopped. There are two problems in the in-line stepping scenario above: 1. The new thread might pass through the same code that the initial thread is in (i.e. the clone syscall code), in which case it will fail to hit the breakpoint in clone as this was removed so the first thread can single step, 2. The new thread might trigger some other stop event before the initial thread reports its step completion. If this happens we end up triggering an assertion as GDB assumes that only the thread being stepped should stop. The assert looks like this: infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. - For both all-stop and non-stop modes, with displaced stepping: + user issues 'stepi', + GDB starts the displaced step, moves thread's PC to the out-of-line scratch pad, maybe adjusts registers, + GDB single steps the thread of interest, [non-stop mode only] all other threads are left as they were, either running or stopped. In all-stop, all other threads are left stopped. + New thread is created running, + Initial thread completes its step, GDB re-adjusts its PC, restores/releases scratchpad, + [non-stop mode only] GDB resumes the thread, now past its breakpoint. + [all-stop mode only] GDB resumes all threads. There is one problem with the displaced stepping scenario above: 3. When the parent thread completed its step, GDB adjusted its PC, but did not adjust the child's PC, thus that new child thread will continue execution in the scratch pad, invoking undefined behavior. If you're lucky, you see a crash. If unlucky, the inferior gets silently corrupted. What is needed is for GDB to have more control over whether the new thread is created running or not. Issue #1 above requires that the new thread not be allowed to run until the breakpoint has been reinserted. The only way to guarantee this is if the new thread is held in a stopped state until the single step has completed. Issue #3 above requires that GDB is informed of when a thread clones itself, and of what is the child's ptid, so that GDB can fixup both the parent and the child. When looking for solutions to this problem I considered how GDB handles fork/vfork as these have some of the same issues. The main difference between fork/vfork and clone is that the clone events are not reported back to core GDB. Instead, the clone event is handled automatically in the target code and the child thread is immediately set running. Note we have support for requesting thread creation events out of the target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported for the new/child thread. That would be sufficient to address in-line stepping (issue #1), but not for displaced-stepping (issue #3). To handle displaced-stepping, we need an event that is reported to the _parent_ of the clone, as the information about the displaced step is associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED includes no indication of which thread is the parent that spawned the new child. In fact, for some targets, like e.g., Windows, it would be impossible to know which thread that was, as thread creation there doesn't work by "cloning". The solution implemented here is to model clone on fork/vfork, and introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except that we end up with a new thread in the same process, instead of a new thread of a new process. Like FORKED and VFORKED, THREAD_CLONED waitstatuses have a child_ptid property, and the child is held stopped until GDB explicitly resumes it. This addresses the in-line stepping case (issues #1 and #2). The infrun code that handles displaced stepping fixup for the child after a fork/vfork event is thus reused for THREAD_CLONE, with some minimal conditions added, addressing the displaced stepping case (issue #3). The native Linux backend is adjusted to unconditionally report TARGET_WAITKIND_THREAD_CLONED events to the core. Following the follow_fork model in core GDB, we introduce a target_follow_clone target method, which is responsible for making the new clone child visible to the rest of GDB. Subsequent patches will add clone events support to the remote protocol and gdbserver. displaced_step_in_progress_thread becomes unused with this patch, but a new use will reappear later in the series. To avoid deleting it and readding it back, this patch marks it with attribute unused, and the latter patch removes the attribute again. We need to do this because the function is static, and with no callers, the compiler would warn, (error with -Werror), breaking the build. This adds a new gdb.threads/stepi-over-clone.exp testcase, which exercises stepping over a clone syscall, with displaced stepping vs inline stepping, and all-stop vs non-stop. We already test stepping over clone syscalls with gdb.base/step-over-syscall.exp, but this test uses pthreads, while the other test uses raw clone, and this one is more thorough. The testcase passes on native GNU/Linux, but fails against GDBserver. GDBserver will be fixed by a later patch in the series. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2023-11-13gdb/linux: Delete all other LWPs immediately on ptrace exec eventPedro Alves1-5/+3
I noticed that on an Ubuntu 20.04 system, after a following patch ("Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd end up with four new unexpected GDB core dumps: === gdb Summary === # of unexpected core files 4 # of expected passes 48 That said patch is making the pre-existing gdb.threads/step-over-exec.exp testcase (almost silently) expose a latent problem in gdb/linux-nat.c, resulting in a GDB crash when: #1 - a non-leader thread execs #2 - the post-exec program stops somewhere #3 - you kill the inferior Instead of #3 directly, the testcase just returns, which ends up in gdb_exit, tearing down GDB, which kills the inferior, and is thus equivalent to #3 above. Vis (after said patch is applied): $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true ... (top-gdb) r ... (gdb) b main ... (gdb) r ... Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 69 argv0 = argv[0]; (gdb) c Continuing. [New Thread 0x7ffff7d89700 (LWP 2506975)] Other going in exec. Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 28 foo (); (gdb) k ... Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 393 return m_suspend.waitstatus_pending_p; (top-gdb) bt #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 #3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 #4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 #5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590 #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 ... The root of the problem is that when a non-leader LWP execs, it just changes its tid to the tgid, replacing the pre-exec leader thread, becoming the new leader. There's no thread exit event for the execing thread. It's as if the old pre-exec LWP vanishes without trace. The ptrace man page says: "PTRACE_O_TRACEEXEC (since Linux 2.5.46) Stop the tracee at the next execve(2). A waitpid(2) by the tracer will return a status value such that status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) If the execing thread is not a thread group leader, the thread ID is reset to thread group leader's ID before this stop. Since Linux 3.0, the former thread ID can be retrieved with PTRACE_GETEVENTMSG." When the core of GDB processes an exec events, it deletes all the threads of the inferior. But, that is too late -- deleting the thread does not delete the corresponding LWP, so we end leaving the pre-exec non-leader LWP stale in the LWP list. That's what leads to the crash above -- linux_nat_target::kill iterates over all LWPs, and after the patch in question, that code will look for the corresponding thread_info for each LWP. For the pre-exec non-leader LWP still listed, won't find one. This patch fixes it, by deleting the pre-exec non-leader LWP (and thread) from the LWP/thread lists as soon as we get an exec event out of ptrace. GDBserver does not need an equivalent fix, because it is already doing this, as side effect of mourning the pre-exec process, in gdbserver/linux-low.cc: else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) { ... /* Delete the execing process and all its threads. */ mourn (proc); switch_to_thread (nullptr); The crash with gdb.threads/step-over-exec.exp is not observable on newer systems, which postdate the glibc change to move "libpthread.so" internals to "libc.so.6", because right after the exec, GDB traps a load event for "libc.so.6", which leads to GDB trying to open libthread_db for the post-exec inferior, and, on such systems that succeeds. When we load libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps, and then waits to see their stops. While doing this, GDB detects that the pre-exec stale LWP is gone, and deletes it. If we use "catch exec" to stop right at the exec before the "libc.so.6" load event ever happens, and issue "kill" right there, then GDB crashes on newer systems as well. So instead of tweaking gdb.threads/step-over-exec.exp to cover the fix, add a new gdb.threads/threads-after-exec.exp testcase that uses "catch exec". The test also uses the new "maint info linux-lwps" command if testing on Linux native, which also exposes the stale LWP problem with an unfixed GDB. Also tweak a comment in infrun.c:follow_exec referring to how linux-nat.c used to behave, as it would become stale otherwise. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
2023-11-08rs6000, Fix Linux DWARF register mappingCarl Love1-0/+13
Overview of issues fixed by the patch. The primary issue this patch fixes is the DWARF register mapping for Linux. The changes in ppc-linux-tdep.c fix the DWARF register mapping issues. The register mapping issue is responsible for two of the five regression bugs seen in gdb.base/store.exp. Once the register mapping was fixed, an underlying issue with the unwinding of the signal trampoline in common-code in ifrun.c was found. This underlying bug is best described by Ulrich in the following description. The unwinder bug shows up on platforms where the kernel uses a trampoline to dispatch "calls to" the signal handler (not just *returns from* the signal handler). Many platforms use a trampoline for signal return, and that is working fine, but the only platform I'm (Ulrich) aware of that uses a trampoline for signal handler calls is (recent kernels for) PowerPC. I believe the rationale for using a trampoline here is to improve performance by avoiding unbalancing of the branch predictor's call/return stack. However, on PowerPC the bug is dormant as well as it is hidden by *another* bug that prevents correct unwinding out of the signal trampoline. This is because the custom CFI for the trampoline uses a register number (VSCR) that is not ever used by compiler-generated CFI, and that particular register is mapped to an invalid number by the current PowerPC DWARF mapper. The underlying unwinder bug is exposed by the "new" regression failures in gdb.base/sigstep.exp. These failures were previously masked by the fact that GDB was not seeing a valid frame when it tried to unwind the frames. The sigstep.exp test is specifically testing stepping into a signal handler. With the correct DWARF register mapping in place, specifically the VSCR mapping, the signal trampoline code now unwinds to a valid frame exposing the pre-existing bug in how the signal handler on PowerPC works. The one line change infrun.c fixes the exiting bug in the common-code for platforms that use a trampoline to dispatch calls to the signal handler by not stopping in the SIGTRAMP_FRAME. Detailed description of the DWARF register mapping fix. The PowerPC DWARF register mapping is the same for the .eh_frame and .debug_frame on Linux. PowerPC uses different mapping for .eh_frame and .debug_frame on other operating systems. The current GDB support for mapping the DWARF registers in rs6000_linux_dwarf2_reg_to_regnum and rs6000_adjust_frame_regnum file gdb/rs6000-tdep.c is not correct for Linux. The files have some legacy mappings for spe_acc, spefscr, EV which was removed from GCC in 2017. This patch adds a two new functions rs6000_linux_dwarf2_reg_to_regnum, and rs6000_linux_adjust_frame_regnum in file gdb/ppc-linux-tdep.c to handle the DWARF register mappings on Linux. Function rs6000_linux_dwarf2_reg_to_regnum is installed for both gdb_dwarf_to_regnum and gdbarch_stab_reg_to_regnum since the mappings are the same. The ppc_linux_init_abi function in gdb/ppc-linux-tdep.c is updated to call set_gdbarch_dwarf2_reg_to_regnum map the new function rs6000_linux_dwarf2_reg_to_regnum for the architecture. Similarly, dwarf2_frame_set_adjust_regnum is called to map rs6000_linux_adjust_frame_regnum into the architecture. Additional detail on the signal handling fix. The specific sequence of events for handling a signal on most architectures is as follows: 1) Some code is running when a signal arrives. 2) The kernel handles the signal and dispatches to the handler. ... However on PowerPC the sequence of events is: 1) Some code is running when a signal arrives. 2) The kernel handles the signal and dispatches to the trampoline. 3) The trampoline performs a normal function call to the handler. ... We want the "nexti" to step into, not over, signal handlers invoked by the kernel. This is the case for most platforms as the kernel puts a signal trampoline frame onto the stack to handle proper return after the handler. However, on some platforms such as PowerPC, the kernel actually uses a trampoline to handle *invocation* of the handler. We do not want GDB to stop in the SIGTRAMP_FRAME. The issue is fixed in function process_event_stop_test by adding a check that the frame is not a SIGTRAMP_FRAME to the if statement to stop in a subroutine call. This prevents GDB from erroneously detecting the trampoline invocation as a subroutine call. This patch fixes two regression test failures in gdb.base/store.exp. The patch then fixes an exposed, dormant, signal handling issue that is exposed in the signal handling test gdb.base/sigstep.exp. The patch has been tested on Power 8 LE/BE, Power 9 LE/BE, Power 10 with no new regressions. Note, only two of the five failures in store.exp are fixed. The remaining three failures are fixed in a following patch.
2023-10-10gdb: remove target_gdbarchSimon Marchi1-4/+7
This function is just a wrapper around the current inferior's gdbarch. I find that having that wrapper just obscures where the arch is coming from, and that it's often used as "I don't know which arch to use so I'll use this magical target_gdbarch function that gets me an arch" when the arch should in fact come from something in the context (a thread, objfile, symbol, etc). I think that removing it and inlining `current_inferior ()->arch ()` everywhere will make it a bit clearer where that arch comes from and will trigger people into reflecting whether this is the right place to get the arch or not. Change-Id: I79f14b4e4934c88f91ca3a3155f5fc3ea2fadf6b Reviewed-By: John Baldwin <jhb@FreeBSD.org> Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-10-10gdb: add inferior::{arch, set_arch}Simon Marchi1-4/+4
Make the inferior's gdbarch field private, and add getters and setters. This helped me by allowing putting breakpoints on set_arch to know when the inferior's arch was set. A subsequent patch in this series also adds more things in set_arch. Change-Id: I0005bd1ef4cd6b612af501201cec44e457998eec Reviewed-By: John Baldwin <jhb@FreeBSD.org> Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-08-23gdb: remove the silent parameter from exit_inferior_1 and cleanupAndrew Burgess1-1/+1
After the previous commit, exit_inferior_1 no longer makes use of the silent parameter. This commit removes this parameter and cleans up the callers. After doing this exit_inferior_1, exit_inferior, and exit_inferior_silent are all equivalent, so rename exit_inferior_1 to exit_inferior and delete exit_inferior_silent, update all the callers. Also I spotted the declaration exit_inferior_num_silent in inferior.h, but this function is not defined anywhere, so I deleted the declaration. There should be no user visible changes after this commit.
2023-08-16gdb: fix vfork regressions when target-non-stop is offAndrew Burgess1-3/+5
It was pointed out on the mailing list[1] that after this commit: commit b1e0126ec56e099d753c20e91a9f8623aabd6b46 Date: Wed Jun 21 14:18:54 2023 +0100 gdb: don't resume vfork parent while child is still running the test gdb.base/vfork-follow-parent.exp now has some failures when run with the native-gdbserver or native-extended-gdbserver boards: FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to end of inferior 2 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to break_parent (timeout) The reason that these failures don't show up when run on the standard unix board is that the test is only run in the default operating mode, so for Linux this will be all-stop on top of non-stop. If we adjust the test script so that it runs in the default mode and with target-non-stop turned off, then we see the same failures on the unix board. This commit includes this change. The way that the test is written means that it is not (currently) possible to turn on non-stop mode and have the test still work, so this commit does not do that. I have also updated the test script so that the vfork child performs an exec as well as the current exit. Exec and exit are the two ways in which a vfork child can release the vfork parent, so testing both of these cases is useful I think. In this test the inferior performs a vfork and the vfork-child immediately exits. The vfork-parent will wait for the vfork-child and then blocks waiting for gdb. Once gdb has released the vfork-parent, the vfork-parent also exits. In the test that fails, GDB sets 'detach-on-fork off' and then runs to the vfork. At this point the test tries to just "continue", but this fails as the vfork-parent is still selected, and the parent can't continue until the vfork-child completes. As the vfork-child is stopped by GDB the parent will never stop once resumed, so GDB refuses to resume it. The test script then sets 'schedule-multiple on' and once again continues. This time GDB, in theory, resumes both the parent and the child, the parent will be held blocked by the kernel, but the child will run until it exits, and which point GDB stops again, this time with inferior 2, the newly exited vfork-child, selected. What happens after this in the test script is irrelevant as far as this failure is concerned. To understand why the test started failing we should consider the behaviour of four different cases: 1. All-stop-on-non-stop before commit b1e0126ec56e, 2. All-stop-on-non-stop after commit b1e0126ec56e, 3. All-stop-on-all-stop before commit b1e0126ec56e, and 4. All-stop-on-all-stop after commit b1e0126ec56e. Only case #4 is failing after commit b1e0126ec56e, but I think the other cases are interesting because, (a) they inform how we might fix the regression, and (b) it turns out the behaviour of #2 changed too with the commit, but the change was harmless. For #1 All-stop-on-non-stop before commit b1e0126ec56e, what happens is: 1. GDB calls proceed with the vfork-parent selected, as schedule multiple is on user_visible_resume_ptid returns -1 (everything) as the resume_ptid (see proceed function), 2. As this is all-stop-on-non-stop, every thread is resumed individually, so GDB tries to resume both the vfork-parent and the vfork-child, both of which succeed, 3. The vfork-parent is held stopped by the kernel, 4. The vfork-child completes (exits) at which point the GDB sees the EXITED event for the vfork-child and the VFORK_DONE event for the vfork-parent, 5. At this point we might take two paths depending on which event GDB handles first, if GDB handles the VFORK_DONE first then: (a) As GDB is controlling both parent and child the VFORK_DONE is ignored (see handle_vfork_done), the vfork-parent will be resumed, (b) GDB processes the EXITED event, selects the (now defunct) vfork-child, and stops, returning control to the user. Alternatively, if GDB selects the EXITED event first then: (c) GDB processes the EXITED event, selects the (now defunct) vfork-child, and stops, returning control to the user. (d) At some future time the user resumes the vfork-parent, at which point the VFORK_DONE is reported to GDB, however, GDB is ignoring the VFORK_DONE (see handle_vfork_done), so the parent is resumed. For case #2, all-stop-on-non-stop after commit b1e0126ec56e, the important difference is in step (2) above, now, instead of resuming both the vfork-parent and the vfork-child, only the vfork-child is resumed. As such, when we get to step (5), only a single event, the EXITED event is reported. GDB handles the EXITED just as in (5)(c), then, later, when the user resumes the vfork-parent, the VFORKED_DONE is immediately delivered from the kernel, but this is ignored just as in (5)(d), and so, though the pattern of when the vfork-parent is resumed changes, the overall pattern of which events are reported and when, doesn't actually change. In fact, by not resuming the vfork-parent, the order of events (in this test) is now deterministic, which (maybe?) is a good thing. If we now consider case #3, all-stop-on-all-stop before commit b1e0126ec56e, then what happens is: 1. GDB calls proceed with the vfork-parent selected, as schedule multiple is on user_visible_resume_ptid returns -1 (everything) as the resume_ptid (see proceed function), 2. As this is all-stop-on-all-stop, the resume is passed down to the linux-nat target, the vfork-parent is the event thread, while the vfork-child is a sibling of the event thread, 3. In linux_nat_target::resume, GDB calls linux_nat_resume_callback for all threads, this causes the vfork-child to be resumed. Then in linux_nat_target::resume, the event thread, the vfork-parent, is also resumed. 4. The vfork-parent is held stopped by the kernel, 5. The vfork-child completes (exits) at which point the GDB sees the EXITED event for the vfork-child and the VFORK_DONE event for the vfork-parent, 6. We are now in a situation identical to step (5) as for all-stop-on-non-stop above, GDB selects one of the events to handle, and whichever we select the user sees the correct behaviour. And so, finally, we can consider #4, all-stop-on-all-stop after commit b1e0126ec56e, this is the case that started failing. We start out just like above, in proceed, the resume_ptid is -1 (resume everything), due to schedule multiple being on. And just like above, due to the target being all-stop, we call proceed_resume_thread_checked just once, for the current thread, which, remember, is the vfork-parent thread. The change in commit b1e0126ec56e was to avoid resuming a vfork-parent thread, read the commit message for the justification for this change. However, this means that GDB now rejects resuming the vfork-parent in this case, which means that nothing gets resumed! Obviously, if nothing resumes, then nothing will ever stop, and so GDB appears to hang. I considered a couple of solutions which, in the end, I didn't go with, these were: 1. Move the vfork-parent check out of proceed_resume_thread_checked, and place it in proceed, but only on the all-stop-on-non-stop path, this should still address the issue seen in b1e0126ec56e, but would avoid the issue seen here. I rejected this just because it didn't feel great to split the checks that exist in proceed_resume_thread_checked like this, 2. Extend the condition in proceed_resume_thread_checked by adding a target_is_non_stop_p check. This would have the same effect as idea 1, but leaves all the checks in the same place, which I think would be better, but this still just didn't feel right to me, and so, What I noticed was that for the all-stop-on-non-stop, after commit b1e0126ec56e, we only resumed the vfork-child, and this seems fine. The vfork-parent isn't going to run anyway (the kernel will hold it back), so if feels like we there's no harm in just waiting for the child to complete, and then resuming the parent. So then I started looking at follow_fork, which is called from the top of proceed. This function already has the task of switching between the parent and child based on which the user wishes to follow. So, I wondered, could we use this to switch to the vfork-child in the case that we are attached to both? Turns out this is pretty simple to do. Having done that, now the process is for all-stop-on-all-stop after commit b1e0126ec56e, and with this new fix is: 1. GDB calls proceed with the vfork-parent selected, but, 2. In follow_fork, and follow_fork_inferior, GDB switches the selected thread to be that of the vfork-child, 3. Back in proceed user_visible_resume_ptid returns -1 (everything) as the resume_ptid still, but now, 4. When GDB calls proceed_resume_thread_checked, the vfork-child is the current selected thread, this is not a vfork-parent, and so GDB allows the proceed to continue to the linux-nat target, 5. In linux_nat_target::resume, GDB calls linux_nat_resume_callback for all threads, this does not resume the vfork-parent (because it is a vfork-parent), and then the vfork-child is resumed as this is the event thread, At this point we are back in the same situation as for all-stop-on-non-stop after commit b1e0126ec56e, that is, the vfork-child is resumed, while the vfork-parent is held stopped by GDB. Eventually the vfork-child will exit or exec, at which point the vfork-parent will be resumed. [1] https://inbox.sourceware.org/gdb-patches/3e1e1db0-13d9-dd32-b4bb-051149ae6e76@simark.ca/
2023-08-03gdb: avoid double stop after failed breakpoint condition checkAndrew Burgess1-1/+15
This commit replaces this earlier commit: commit 2e411b8c68eb2b035b31d5b00d940d4be1a0928b Date: Fri Oct 14 14:53:15 2022 +0100 gdb: don't always print breakpoint location after failed condition check and is a result of feedback received here[1]. The original commit addressed a problem where, if a breakpoint condition included an inferior function call, and if the inferior function call failed, then GDB would announce the stop twice. Here's an example of GDB's output before the above commit that shows the problem being addressed: (gdb) break foo if (some_func ()) Breakpoint 1 at 0x40111e: file bpcond.c, line 11. (gdb) r Starting program: /tmp/bpcond Program received signal SIGSEGV, Segmentation fault. 0x0000000000401116 in some_func () at bpcond.c:5 5 return *p; Error in testing condition for breakpoint 1: The program being debugged stopped while in a function called from GDB. Evaluation of the expression containing the function (some_func) will be abandoned. When the function is done executing, GDB will silently stop. Breakpoint 1, 0x0000000000401116 in some_func () at bpcond.c:5 5 return *p; (gdb) The original commit addressed this issue in breakpoint.c, by spotting that the $pc had changed while evaluating the breakpoint condition, and inferring from this that GDB must have stopped elsewhere. However, the way in which the original commit suppressed the second stop announcement was to set bpstat::print to true -- this tells GDB not to print the frame during the stop announcement, and for the CLI this is fine, however, it was pointed out that for the MI this still isn't really enough. Below is an example from an MI session after the above commit was applied, this shows the problem with the above commit: -break-insert -c "cond_fail()" foo ^done,bkpt={number="1",type="breakpoint",disp="keep",enabled="y",addr="0x000000000040111e",func="foo",file="/tmp/mi-condbreak-fail.c",line="30",thread-groups=["i1"],cond="cond_fail()",times="0",original-location="foo"} (gdb) -exec-run =thread-group-started,id="i1",pid="2636270" =thread-created,id="1",group-id="i1" =library-loaded,id="/lib64/ld-linux-x86-64.so.2",target-name="/lib64/ld-linux-x86-64.so.2",host-name="/lib64/ld-linux-x86-64.so.2",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7fd3110",to="0x00007ffff7ff2bb4"}] ^running *running,thread-id="all" (gdb) =library-loaded,id="/lib64/libm.so.6",target-name="/lib64/libm.so.6",host-name="/lib64/libm.so.6",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7e59390",to="0x00007ffff7ef4f98"}] =library-loaded,id="/lib64/libc.so.6",target-name="/lib64/libc.so.6",host-name="/lib64/libc.so.6",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7ca66b0",to="0x00007ffff7df3c5f"}] ~"\nProgram" ~" received signal SIGSEGV, Segmentation fault.\n" ~"0x0000000000401116 in cond_fail () at /tmp/mi-condbreak-fail.c:24\n" ~"24\t return *p;\t\t\t/* Crash here. */\n" *stopped,reason="signal-received",signal-name="SIGSEGV",signal-meaning="Segmentation fault",frame={addr="0x0000000000401116",func="cond_fail",args=[],file="/tmp/mi-condbreak-fail.c",fullname="/tmp/mi-condbreak-fail.c",line="24",arch="i386:x86-64"},thread-id="1",stopped-threads="all",core="9" &"Error in testing condition for breakpoint 1:\n" &"The program being debugged was signaled while in a function called from GDB.\n" &"GDB remains in the frame where the signal was received.\n" &"To change this behavior use \"set unwindonsignal on\".\n" &"Evaluation of the expression containing the function\n" &"(cond_fail) will be abandoned.\n" &"When the function is done executing, GDB will silently stop.\n" =breakpoint-modified,bkpt={number="1",type="breakpoint",disp="keep",enabled="y",addr="0x000000000040111e",func="foo",file="/tmp/mi-condbreak-fail.c",fullname="/tmp/mi-condbreak-fail.c",line="30",thread-groups=["i1"],cond="cond_fail()",times="1",original-location="foo"} *stopped (gdb) Notice that we still see two '*stopped' lines, the first includes the full frame information, while the second has no frame information, this is a result of bpstat::print having been set. Ideally, the second '*stopped' line should not be present. By setting bpstat::print I was addressing the problem too late, this flag really only changes how interp::on_normal_stop prints the stop event, and interp::on_normal_stop is called (indirectly) from the normal_stop function in infrun.c. A better solution is to avoid calling normal_stop at all for the stops which should not be reported to the user, and this is what I do in this commit. This commit has 3 parts: 1. In breakpoint.c, revert the above commit, 2. In fetch_inferior_event (infrun.c), capture the stop-id before calling handle_inferior_event. If, after calling handle_inferior_event, the stop-id has changed, then this indicates that somewhere within handle_inferior_event, a stop was announced to the user. If this is the case then GDB should not call normal_stop, and we should rely on whoever announced the stop to ensure that we are in a PROMPT_NEEDED state, which means the prompt will be displayed once fetch_inferior_event returns. And, 3. In infcall.c, do two things: (a) In run_inferior_call, after making the inferior call, ensure that either async_disable_stdin or async_enable_stdin is called to put the prompt state, and stdin handling into the correct state based on whether the inferior call completed successfully or not, and (b) In call_thread_fsm::should_stop, call async_enable_stdin rather than changing the prompt state directly. This isn't strictly necessary, but helped me understand this code more. This async_enable_stdin call is only reached if normal_stop is not going to be called, and replaces the async_enable_stdin call that exists in normal_stop. Though we could just adjust the prompt state if felt (to me) much easier to understand when I could see this call and the corresponding call in normal_stop. With these changes in place now, when the inferior call (from the breakpoint condition) fails, infcall.c leaves the prompt state as PROMPT_NEEDED, and leaves stdin registered with the event loop. Back in fetch_inferior_event GDB notices that the stop-id has changed and so avoids calling normal_stop. And on return from fetch_inferior_event GDB will display the prompt and handle input from stdin. As normal_stop is not called the MI problem is solved, and the test added in the earlier mentioned commit still passes just fine, so the CLI has not regressed. [1] https://inbox.sourceware.org/gdb-patches/6fd4aa13-6003-2563-5841-e80d5a55d59e@palves.net/
2023-07-17gdb: additional debug output in infrun.c and linux-nat.cAndrew Burgess1-4/+24
While working on some of the recent patches relating to vfork handling I found this debug output very helpful, I'd like to propose adding this into GDB. With debug turned off there should be no user visible changes after this commit.
2023-07-17gdb: don't resume vfork parent while child is still runningAndrew Burgess1-4/+20
Like the last few commit, this fixes yet another vfork related issue. Like the commit titled: gdb: don't restart vfork parent while waiting for child to finish which addressed a case in linux-nat where we would try to resume a vfork parent, this commit addresses a very similar case, but this time occurring in infrun.c. Just like with that previous commit, this bug results in the assert: x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. In this case the issue occurs when target-non-stop is on, but non-stop is off, and again, schedule-multiple is on. As with the previous commit, GDB is in follow-fork-mode child. If you have not done so, it is worth reading the earlier commit as many of the problems leading to the failure are the same, they just appear in a different part of GDB. Here are the steps leading to the assertion failure: 1. The user performs a 'next' over a vfork, GDB stop in the vfork child, 2. As we are planning to follow the child GDB sets the vfork_parent and vfork_child member variables in the two inferiors, the thread_waiting_for_vfork_done member is left as nullptr, that member is only used when GDB is planning to follow the parent inferior, 3. The user does 'continue', our expectation is that the vfork child should resume, and once that process has exited or execd, GDB should detach from the vfork parent. As a result of the 'continue' GDB eventually enters the proceed function, 4. In proceed we selected a ptid_t to resume, because schedule-multiple is on we select minus_one_ptid (see user_visible_resume_ptid), 5. As GDB is running in all-stop on top of non-stop mode, in the proceed function we iterate over all threads that match the resume ptid, which turns out to be all threads, and call proceed_resume_thread_checked. One of the threads we iterate over is the vfork parent thread, 6. As the thread passed to proceed_resume_thread_checked doesn't match any of the early return conditions, GDB will set the thread resumed, 7. As we are resuming one thread at a time, this thread is seen by the lower layers (e.g. linux-nat) as the "event thread", which means we don't apply any of the checks, e.g. is this thread a vfork parent, instead we assume that GDB core knows what it's doing, and linux-nat will resume the thread, we have now incorrectly set running the vfork parent thread when this thread should be waiting for the vfork child to complete, 8. Back in the proceed function GDB continues to iterate over all threads, and now (correctly) resumes the vfork child thread, 8. As the vfork child is still alive the kernel holds the vfork parent stopped, 9. Eventually the child performs its exec and GDB is sent and EXECD event. However, because the parent is resumed, as soon as the child performs its exec the vfork parent also sends a VFORK_DONE event to GDB, 10. Depending on timing both of these events might seem to arrive in GDB at the same time. Normally GDB expects to see the EXECD or EXITED/SIGNALED event from the vfork child before getting the VFORK_DONE in the parent. We know this because it is as a result of the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see handle_vfork_child_exec_or_exit for details). Further the comment in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that when we remain attached to the child (not the parent) we should not expect to see a VFORK_DONE, 11. If both events arrive at the same time then GDB will randomly choose one event to handle first, in some cases this will be the VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB expects that (a) the vfork child has finished, however, in this case this is not completely true, the child has finished, but GDB has not processed the event associated with the completion yet, and (b) upon seeing a VFORK_DONE GDB assumes we are remaining attached to the parent, and so resumes the parent process, 12. GDB now handles the EXECD event. In our case we are detaching from the parent, so GDB calls target_detach (see handle_vfork_child_exec_or_exit), 13. While this has been going on the vfork parent is executing, and might even exit, 14. In linux_nat_target::detach the first thing we do is stop all threads in the process we're detaching from, the result of the stop request will be cached on the lwp_info object, 15. In our case the vfork parent has exited though, so when GDB waits for the thread, instead of a stop due to signal, we instead get a thread exited status, 16. Later in the detach process we try to resume the threads just prior to making the ptrace call to actually detach (see detach_one_lwp), as part of the process to resume a thread we try to touch some registers within the thread, and before doing this GDB asserts that the thread is stopped, 17. An exited thread is not classified as stopped, and so the assert triggers! Just like with the earlier commit, the fix is to spot the vfork parent status of the thread, and not resume such threads. Where the earlier commit fixed this in linux-nat, in this case I think the fix should live in infrun.c, in proceed_resume_thread_checked. This function already has a similar check to not resume the vfork parent in the case where we are planning to follow the vfork parent, I propose adding a similar case that checks for the vfork parent when we plan to follow the vfork child. This new check will mean that at step #6 above GDB doesn't try to resume the vfork parent thread, which prevents the VFORK_DONE from ever arriving. Once GDB sees the EXECD/EXITED/SIGNALLED event from the vfork child GDB will detach from the parent. There's no test included in this commit. In a subsequent commit I will expand gdb.base/foll-vfork.exp which is when this bug would be exposed. If you do want to reproduce this failure then you will for certainly need to run the gdb.base/foll-vfork.exp test in a loop as the failures are all very timing sensitive. I've found that running multiple copies in parallel makes the failure more likely to appear, I usually run ~6 copies in parallel and expect to see a failure after within 10mins.
2023-07-17gdb, infrun: refactor part of `proceed` into separate functionMihails Strasuns1-69/+86
Split the thread resuming code from proceed into new function proceed_resume_thread_checked. Co-Authored-By: Christina Schimpe <christina.schimpe@intel.com>