aboutsummaryrefslogtreecommitdiff
path: root/gdb/infrun.c
AgeCommit message (Collapse)AuthorFilesLines
2024-01-12Update copyright year range in header of all files managed by GDBAndrew Burgess1-1/+1
This commit is the result of the following actions: - Running gdb/copyright.py to update all of the copyright headers to include 2024, - Manually updating a few files the copyright.py script told me to update, these files had copyright headers embedded within the file, - Regenerating gdbsupport/Makefile.in to refresh it's copyright date, - Using grep to find other files that still mentioned 2023. If these files were updated last year from 2022 to 2023 then I've updated them this year to 2024. I'm sure I've probably missed some dates. Feel free to fix them up as you spot them.
2024-01-02PowerPC and aarch64: Fix reverse stepping failureCarl Love1-0/+57
When running GDB's testsuite on aarch64-linux/Ubuntu 20.04 (also spotted on the ppc backend), there are failures in gdb.reverse/solib-precsave.exp and gdb.reverse/solib-reverse.exp. The failure happens around the following code: 38 b[1] = shr2(17); /* middle part two */ 40 b[0] = 6; b[1] = 9; /* generic statement, end part two */ 42 shr1 ("message 1\n"); /* shr1 one */ Normal execution: - step from line 38 will land on line 40. - step from line 40 will land on line 42. Reverse execution: - step from line 42 will land on line 40. - step from line 40 will land on line 40. - step from line 40 will land on line 38. The problem here is that line 40 contains two contiguous but distinct PC ranges in the line table, like so: Line 40 - [0x7ec ~ 0x7f4] Line 40 - [0x7f4 ~ 0x7fc] The two distinct ranges are generated because GCC started outputting source column information, which GDB doesn't take into account at the moment. When stepping forward from line 40, we skip both of these ranges and land on line 42. When stepping backward from line 42, we stop at the start PC of the second (or first, going backwards) range of line 40. Since we've reached ecs->event_thread->control.step_range_start, we stop stepping backwards. The above issues were fixed by introducing a new function that looks for adjacent PC ranges for the same line, until we notice a line change. Then we take that as the start PC of the range. The new start PC for the range is used for the control.step_range_start when setting up a step range. The test case gdb.reverse/map-to-same-line.exp is added to test the fix for the above reverse step issues. Patch has been tested on PowerPC, X86 and AArch64 with no regressions.
2023-12-20Step over thread exit, always delete the thread non-silentlyPedro Alves1-4/+7
With AMD GPU debugging, I noticed that when stepping over a breakpoint placed on top of the s_endpgm instruction inline (displaced=off), GDB would behave differently -- it wouldn't print the wave exit. E.g: With displaced stepping, or no breakpoint at all: stepi [AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited] Command aborted, thread exited. (gdb) With inline stepping: stepi Command aborted, thread exited. (gdb) In the cases we see the "exited" notification, handle_thread_exit is what first called delete_thread on the exiting thread, which is non-silent. With inline stepping, however, handle_thread_exit ends up in update_thread_list (via restart_threads) before any delete_thread call. Thus, amd_dbgapi_target::update_thread_list notices that the wave is gone and deletes it with delete_thread_silent. This commit fixes it, by making handle_thread_exited call set_thread_exited (with the default silent=false) early, which emits the user-visible notification. Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I22ab3145e18d07c99dace45576307b9f9d5d966f
2023-12-20displaced_step_finish: Don't fetch the regcache of exited threadsPedro Alves1-7/+12
displaced_step_finish can be called with event_status.kind == TARGET_WAITKIND_THREAD_EXITED, and in that case it is not possible to get at the already-exited thread's registers. This patch moves the get_thread_regcache calls to branches that actually need it, where we know the thread is still alive. It also adds an assertion to get_thread_regcache, to help catching these broken cases sooner. Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I63b5eacb3e02a538fc5087c270d8025adfda88c3
2023-12-20Ensure selected thread after thread exit stopPedro Alves1-0/+7
While making step over thread exit work properly on AMDGPU, I noticed that if there's a breakpoint on top of the exit syscall, and, displaced stepping is off, then when GDB reports "Command aborted, thread exited.", GDB also switches focus to a random thread, instead of leaving the exited thread as selected: (gdb) thread [Current thread is 6, lane 0 (AMDGPU Lane 1:4:1:1/0 (0,0,0)[0,0,0])] (gdb) si Command aborted, thread exited. (gdb) thread [Current thread is 5 (Thread 0x7ffff626f640 (LWP 3248392))] (gdb) The previous patch extended gdb.threads/step-over-thread-exit.exp to exercise this on GNU/Linux (on the CPU side), and there, after that "si", we always end up with the exiting thread as selected even without this fix, but that's just a concidence, there's a code path that happens to select the exiting thread for an unrelated reason. This commit add the explict switch, fixing the latent problem for GNU/Linux, and the actual problem on AMDGPU. I wrote a gdb.rocm/ testcase for this, but it can't be upstreamed yet, until more pieces of the DWARF machinery are upstream as well. Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I6ff57a79514ac0142bba35c749fe83d53d9e4e51
2023-11-28[gdb] Fix segfault in for_each_block, part 2Tom de Vries1-5/+11
The previous commit describes PR gdb/30547, a segfault when running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise on s390x). The root cause for the segmentation fault is that linux_is_uclinux gives an incorrect result: it returns true instead of false. So, why does linux_is_uclinux: ... int linux_is_uclinux (void) { CORE_ADDR dummy; return (target_auxv_search (AT_NULL, &dummy) > 0 && target_auxv_search (AT_PAGESZ, &dummy) == 0); ... return true? This is because ppc_linux_target_wordsize returns 4 instead of 8, causing ppc_linux_nat_target::auxv_parse to misinterpret the auxv vector. So, why does ppc_linux_target_wordsize: ... int ppc_linux_target_wordsize (int tid) { int wordsize = 4; /* Check for 64-bit inferior process. This is the case when the host is 64-bit, and in addition the top bit of the MSR register is set. */ long msr; errno = 0; msr = (long) ptrace (PTRACE_PEEKUSER, tid, PT_MSR * 8, 0); if (errno == 0 && ppc64_64bit_inferior_p (msr)) wordsize = 8; return wordsize; } ... return 4? Specifically, we get this result because because tid == 0, so we get errno == ESRCH. The tid == 0 is caused by the switch_to_no_thread in handle_vfork_child_exec_or_exit: ... /* Switch to no-thread while running clone_program_space, so that clone_program_space doesn't want to read the selected frame of a dead process. */ scoped_restore_current_thread restore_thread; switch_to_no_thread (); inf->pspace = new program_space (maybe_new_address_space ()); ... but moving the maybe_new_address_space call to before that gives us the same result. The tid is no longer 0, but we still get ESRCH because the thread has exited. Fix this in handle_vfork_child_exec_or_exit by doing the maybe_new_address_space call in the context of the vfork parent. Tested on top of trunk on x86_64-linux and ppc64le-linux. Tested on top of gdb-14-branch on ppc64-linux. Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca> PR gdb/30547 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547
2023-11-28[gdb] Fix segfault in for_each_block, part 1Tom de Vries1-24/+23
When running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise on s390x), I run into: ... (gdb) PASS: gdb.base/vfork-follow-parent.exp: \ exec_file=vfork-follow-parent-exit: target-non-stop=on: non-stop=off: \ resolution_method=schedule-multiple: print unblock_parent = 1 continue^M Continuing.^M Reading symbols from vfork-follow-parent-exit...^M ^M ^M Fatal signal: Segmentation fault^M ----- Backtrace -----^M 0x1027d3e7 gdb_internal_backtrace_1^M src/gdb/bt-utils.c:122^M 0x1027d54f _Z22gdb_internal_backtracev^M src/gdb/bt-utils.c:168^M 0x1057643f handle_fatal_signal^M src/gdb/event-top.c:889^M 0x10576677 handle_sigsegv^M src/gdb/event-top.c:962^M 0x3fffa7610477 ???^M 0x103f2144 for_each_block^M src/gdb/dcache.c:199^M 0x103f235b _Z17dcache_invalidateP13dcache_struct^M src/gdb/dcache.c:251^M 0x10bde8c7 _Z24target_dcache_invalidatev^M src/gdb/target-dcache.c:50^M ... or similar. The root cause for the segmentation fault is that linux_is_uclinux gives an incorrect result: it should always return false, given that we're running on a regular linux system, but instead it returns first true, then false. In more detail, the segmentation fault happens as follows: - a program space with an address space is created - a second program space is about to be created. maybe_new_address_space is called, and because linux_is_uclinux returns true, maybe_new_address_space returns false, and no new address space is created - a second program space with the same address space is created - a program space is deleted. Because linux_is_uclinux now returns false, gdbarch_has_shared_address_space (current_inferior ()->arch ()) returns false, and the address space is deleted - when gdb uses the address space of the remaining program space, we run into the segfault, because the address space is deleted. Hardcoding linux_is_uclinux to false makes the test-case pass. We leave addressing the root cause for the following commit in this series. For now, prevent the segmentation fault by making the address space a refcounted object. This was already suggested here [1]: ... A better solution might be to have the address spaces be reference counted ... Tested on top of trunk on x86_64-linux and ppc64le-linux. Tested on top of gdb-14-branch on ppc64-linux. Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca> PR gdb/30547 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547 [1] https://sourceware.org/pipermail/gdb-patches/2023-October/202928.html
2023-11-27gdb: make catch_syscall_enabled return boolSimon Marchi1-1/+1
Make it return a bool and adjust a few comparisons where it's used. Change-Id: Ic77d23b0dcfcfc9195dfe65e4c7ff9cf3229f6fb
2023-11-21gdb: Replace gdb::optional with std::optionalLancelot Six1-6/+6
Since GDB now requires C++17, we don't need the internally maintained gdb::optional implementation. This patch does the following replacing: - gdb::optional -> std::optional - gdb::in_place -> std::in_place - #include "gdbsupport/gdb_optional.h" -> #include <optional> This change has mostly been done automatically. One exception is gdbsupport/thread-pool.* which did not use the gdb:: prefix as it already lives in the gdb namespace. Change-Id: I19a92fa03e89637bab136c72e34fd351524f65e9 Approved-By: Tom Tromey <tom@tromey.com> Approved-By: Pedro Alves <pedro@palves.net>
2023-11-20gdb/infrun: simplify process_event_stop_testGuinevere Larsen1-12/+6
The previous commit introduced some local variables to make some if statements simpler. This commit uses them more liberally throughout the process_event_stop_test in order to simplify the function a little more. No functional changes are expected. Approved-By: Tom Tromey <tom@tromey.com>
2023-11-20gdb/record: print frame information when exiting a recursive callGuinevere Larsen1-0/+18
Currently, when GDB is reverse stepping out of a function into the same function due to a recursive call, it doesn't print frame information, as reported by PR record/29178. This happens because when the inferior leaves the current frame, GDB decides to refresh the step information, clobbering the original step_frame_id, making it impossible to figure out later on that the frame has been changed. This commit changes GDB so that, if we notice we're in this exact situation, we won't refresh the step information. Because of implementation details, this change can cause some debug information to be read when it normally wouldn't before, which showed up as a regression on gdb.dwarf2/dw2-out-of-range-end-of-seq. Since that isn't a problem, the test was changed to allow for the new output. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29178 Approved-By: Tom Tromey <tom@tromey.com>
2023-11-17gdb: pass address_space to target dcache functionsSimon Marchi1-3/+3
A simple refactor to make the reference to current_program_space bubble up one level. No behavior changes expected. Change-Id: I237cf2f45ae73c35bcb433ce40e3c03cef6b87e2
2023-11-17gdb: remove get_current_regcacheSimon Marchi1-13/+10
Remove get_current_regcache, inlining the call to get_thread_regcache in callers. When possible, pass the right thread_info object known from the local context. Otherwise, fall back to passing `inferior_thread ()`. This makes the reference to global context bubble up one level, a small step towards the long term goal of reducing the number of references to global context (or rather, moving those references as close as possible to the top of the call tree). No behavior change expected. Change-Id: Ifa6980c88825d803ea586546b6b4c633c33be8d6
2023-11-17gdb: remove regcache's address spaceSimon Marchi1-22/+21
While looking at the regcache code, I noticed that the address space (passed to regcache when constructing it, and available through regcache::aspace) wasn't relevant for the regcache itself. Callers of regcache::aspace use that method because it appears to be a convenient way of getting the address space for a thread, if you already have the regcache. But there is always another way to get the address space, as the callers pretty much always know which thread they are dealing with. The regcache code itself doesn't use the address space. This patch removes anything related to address_space from the regcache code, and updates callers to get it from the thread in context. This removes a bit of unnecessary complexity from the regcache code. The current get_thread_arch_regcache function gets an address_space for the given thread using the target_thread_address_space function (which calls the target_ops::thread_address_space method). This suggest that there might have been the intention of supporting per-thread address spaces. But digging through the history, I did not find any such case. Maybe this method was just added because we needed a way to get an address space from a ptid (because constructing a regcache required an address space), and this seemed like the right way to do it, I don't know. The only implementations of thread_address_space and process_stratum_target::thread_address_space and linux_nat_target::thread_address_space, which essentially just return the inferior's address space. And thread_address_space is only used in the current get_thread_arch_regcache, which gets removed. So, I think that the thread_address_space target method can be removed, and we can assume that it's fine to use the inferior's address space everywhere. Callers of regcache::aspace are updated to get the address space from the relevant inferior, either using some context they already know about, or in last resort using the current global context. So, to summarize: - remove everything in regcache related to address spaces - in particular, remove get_thread_arch_regcache, and rename get_thread_arch_aspace_regcache to get_thread_arch_regcache - remove target_ops::thread_address_space, and target_thread_address_space - adjust all users of regcache::aspace to get the address space another way Change-Id: I04fd41b22c83fe486522af7851c75bcfb31c88c7
2023-11-13Cancel execution command on thread exit, when stepping, nexting, etc.Pedro Alves1-10/+63
If your target has no support for TARGET_WAITKIND_NO_RESUMED events (and no way to support them, such as the yet-unsubmitted AMDGPU target), and you step over thread exit with scheduler-locking on, this is what you get: (gdb) n [Thread ... exited] *hang* Getting back the prompt by typing Ctrl-C may not even work, since no inferior thread is running to receive the SIGINT. Even if it works, it seems unnecessarily harsh. If you started an execution command for which there's a clear thread of interest (step, next, until, etc.), and that thread disappears, then I think it's more user friendly if GDB just detects the situation and aborts the command, giving back the prompt. That is what this commit implements. It does this by explicitly requesting the target to report thread exit events whenever the main resumed thread has a thread_fsm. Note that unlike stepping over a breakpoint, we don't need to enable clone events in this case. With this patch, we get: (gdb) n [Thread 0x7ffff7d89700 (LWP 3961883) exited] Command aborted, thread exited. (gdb) Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I901ab64c91d10830590b2dac217b5264635a2b95
2023-11-13Don't resume new threads if scheduler-locking is in effectPedro Alves1-8/+33
If scheduler-locking is in effect, e.g., with "set scheduler-locking on", and you step over a function that spawns a new thread, the new thread is allowed to run free, at least until some event is hit, at which point, whether the new thread is re-resumed depends on a number of seemingly random factors. E.g., if the target is all-stop, and the parent thread hits a breakpoint, and GDB decides the breakpoint isn't interesting to report to the user, then the parent thread is resumed, but the new thread is left stopped. I think that letting the new threads run with scheduler-locking enabled is a defect. This commit fixes that, making use of the new clone events on Linux, and of target_thread_events() on targets where new threads have no connection to the thread that spawned them. Testcase and documentation changes included. Approved-By: Eli Zaretskii <eliz@gnu.org> Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie12140138b37534b7fc1d904da34f0f174aa11ce
2023-11-13stop_all_threads: (re-)enable async before waiting for stopsPedro Alves1-0/+81
Running the gdb.threads/step-over-thread-exit-while-stop-all-threads.exp testcase added later in the series against gdbserver, after the TARGET_WAITKIND_NO_RESUMED fix from the following patch, would run into an infinite loop in stop_all_threads, leading to a timeout: FAIL: gdb.threads/step-over-thread-exit-while-stop-all-threads.exp: displaced-stepping=off: target-non-stop=on: iter 0: continue (timeout) The is really a latent bug, and it is about the fact that stop_all_threads stops listening to events from a target as soon as it sees a TARGET_WAITKIND_NO_RESUMED, ignoring that TARGET_WAITKIND_NO_RESUMED may be delayed. handle_no_resumed knows how to handle delayed no-resumed events, but stop_all_threads was never taught to. In more detail, here's what happens with that testcase: #1 - Multiple threads report breakpoint hits to gdb. #2 - gdb picks one events, and it's for thread 1. All other stops are left pending. thread 1 needs to move past a breakpoint, so gdb stops all threads to start an inline step over for thread 1. While stopping threads, some of the threads that were still running report events that are also left pending. #2 - gdb steps thread 1 #3 - Thread 1 exits while stepping (it steps over an exit syscall), gdbserver reports thread exit for thread 1 #4 - Thread 1 was the last resumed thread, so gdbserver also reports no-resumed: [remote] Notification received: Stop:w0;p3445d0.3445d3 [remote] Sending packet: $vStopped#55 [remote] Packet received: N [remote] Sending packet: $vStopped#55 [remote] Packet received: OK #5 - gdb processes the thread exit for thread 1, finishes the step over and restarts threads. #6 - gdb picks the next event to process out of one of the resumed threads with pending events: [infrun] random_resumed_with_pending_wait_status: Found 32 events, selecting #11 #7 - This is again a breakpoint hit and the breakpoint needs to be stepped over too, so gdb starts a step-over dance again. #8 - We reach stop_all_threads, which finds that some threads need to be stopped. #9 - wait_one finally consumes the no-resumed event queue by #4. Seeing this, wait_one disable target async, to stop listening for events out of the remote target. #10 - We still haven't seen all the stops expected, so stop_all_threads tries another iteration. #11 - Because the remote target is no longer async, and there are no other targets, wait_one return no-resumed immediately without polling the remote target. #12 - We still haven't seen all the stops expected, so stop_all_threads tries another iteration. goto #11, looping forever. Fix this by explicitly enabling/re-enabling target async on targets that can async, before waiting for stops. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie3ffb0df89635585a6631aa842689cecc989e33f
2023-11-13gdb: clear step over information on thread exit (PR gdb/27338)Pedro Alves1-16/+155
GDB doesn't handle correctly the case where a thread steps over a breakpoint (using either in-line or displaced stepping), and the executed instruction causes the thread to exit. Using the test program included later in the series, this is what it looks like with displaced-stepping, on x86-64 Linux, where we have two displaced-step buffers: $ ./gdb -q -nx --data-directory=data-directory build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit -ex "b my_exit_syscall" -ex r Reading symbols from build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit... Breakpoint 1 at 0x123c: file src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S, line 68. Starting program: build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1". [New Thread 0x7ffff7c5f640 (LWP 2915510)] [Switching to Thread 0x7ffff7c5f640 (LWP 2915510)] Thread 2 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68 68 syscall (gdb) c Continuing. [New Thread 0x7ffff7c5f640 (LWP 2915524)] [Thread 0x7ffff7c5f640 (LWP 2915510) exited] [Switching to Thread 0x7ffff7c5f640 (LWP 2915524)] Thread 3 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68 68 syscall (gdb) c Continuing. [New Thread 0x7ffff7c5f640 (LWP 2915616)] [Thread 0x7ffff7c5f640 (LWP 2915524) exited] [Switching to Thread 0x7ffff7c5f640 (LWP 2915616)] Thread 4 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68 68 syscall (gdb) c Continuing. ... hangs ... The first two times we do "continue", we displaced-step the syscall instruction that causes the thread to exit. When the thread exits, the main thread, waiting on pthread_join, is unblocked. It spawns a new thread, which hits the breakpoint on the syscall again. However, infrun was never notified that the displaced-stepping threads are done using the displaced-step buffer, so now both buffers are marked as used. So when we do the third continue, there are no buffers available to displaced-step the syscall, so the thread waits forever for its turn. When trying the same but with in-line step over (displaced-stepping disabled): $ ./gdb -q -nx --data-directory=data-directory \ build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit \ -ex "b my_exit_syscall" -ex "set displaced-stepping off" -ex r Reading symbols from build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit... Breakpoint 1 at 0x123c: file src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S, line 68. Starting program: build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit [Thread debugging using libthread_db enabled] Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1". [New Thread 0x7ffff7c5f640 (LWP 2928290)] [Switching to Thread 0x7ffff7c5f640 (LWP 2928290)] Thread 2 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68 68 syscall (gdb) c Continuing. [Thread 0x7ffff7c5f640 (LWP 2928290) exited] No unwaited-for children left. (gdb) i th Id Target Id Frame 1 Thread 0x7ffff7c60740 (LWP 2928285) "step-over-threa" 0x00007ffff7f7c9b7 in __pthread_clockjoin_ex () from /usr/lib/libpthread.so.0 The current thread <Thread ID 2> has terminated. See `help thread'. (gdb) thread 1 [Switching to thread 1 (Thread 0x7ffff7c60740 (LWP 2928285))] #0 0x00007ffff7f7c9b7 in __pthread_clockjoin_ex () from /usr/lib/libpthread.so.0 (gdb) c Continuing. ^C^C ... hangs ... The "continue" causes an in-line step to occur, meaning the main thread is stopped while we step the syscall. The stepped thread exits when executing the syscall, the linux-nat target notices there are no more resumed threads to be waited for, so returns TARGET_WAITKIND_NO_RESUMED, which causes the prompt to return. But infrun never clears the in-line step over info. So if we try continuing the main thread, GDB doesn't resume it, because it thinks there's an in-line step in progress that we need to wait for to finish, and we are stuck there. To fix this, infrun needs to be informed when a thread doing a displaced or in-line step over exits. We can do that with the new target_set_thread_options mechanism which is optimal for only enabling exit events of the thread that needs it; or, if that is not supported, by using target_thread_events, which enables thread exit events for all threads. This is done by this commit. This patch then modifies handle_inferior_event in infrun.c to clean up any step-over the exiting thread might have been doing at the time of the exit. The cases to consider are: - the exiting thread was doing an in-line step-over with an all-stop target - the exiting thread was doing an in-line step-over with a non-stop target - the exiting thread was doing a displaced step-over with a non-stop target The displaced-stepping buffer implementation in displaced-stepping.c is modified to account for the fact that it's possible that we "finish" a displaced step after a thread exit event. The buffer that the exiting thread was using is marked as available again and the original instructions under the scratch pad are restored. However, it skips applying the fixup, which wouldn't make sense since the thread does not exist anymore. Another case that needs handling is if a displaced-stepping thread exits, and the event is reported while we are in stop_all_threads. We should call displaced_step_finish in the handle_one function, in that case. It was already called in other code paths, just not the "thread exited" path. This commit doesn't make infrun ask the target to report the TARGET_WAITKIND_THREAD_EXITED events yet, that'll be done later in the series. Note that "stop_print_frame = false;" line is moved to normal_stop, because TARGET_WAITKIND_THREAD_EXITED can also end up with the event transmorphed into TARGET_WAITKIND_NO_RESUMED. Moving it to normal_stop keeps it centralized. Co-authored-by: Simon Marchi <simon.marchi@efficios.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338 Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I745c6955d7ef90beb83bcf0ff1d1ac8b9b6285a5
2023-11-13Introduce GDB_THREAD_OPTION_EXIT thread option, fix step-over-thread-exitPedro Alves1-5/+10
When stepping over a breakpoint with displaced stepping, GDB needs to be informed if the stepped thread exits, otherwise the displaced stepping buffer that was allocated to that thread leaks, and this can result in deadlock, with other threads waiting for their turn to displaced step, but their turn never comes. Similarly, when stepping over a breakpoint in line, GDB also needs to be informed if the stepped thread exits, so that is can clear the step over state and re-resume threads. This commit makes it possible for GDB to ask the target to report thread exit events for a given thread, using the new "thread options" mechanism introduced by a previous patch. This only adds the core bits. Following patches in the series will teach the Linux backends (native & gdbserver) to handle the GDB_THREAD_OPTION_EXIT option, and then a later patch will make use of these thread exit events to clean up displaced stepping and inline stepping state properly. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I96b719fdf7fee94709e98bb3a90751d8134f3a38 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338
2023-11-13Move deleting thread on TARGET_WAITKIND_THREAD_EXITED to corePedro Alves1-5/+27
Currently, infrun assumes that when TARGET_WAITKIND_THREAD_EXITED is reported, the corresponding GDB thread has already been removed from the GDB thread list. Later in the series, that will no longer work, as infrun will need to refer to the thread's thread_info when it processes TARGET_WAITKIND_THREAD_EXITED. As preparation, this patch makes deleting the GDB thread responsibility of infrun, instead of the target. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I013d87f61ffc9aaca49f0d6ce2a43e3ea69274de
2023-11-13Thread options & clone events (core + remote)Pedro Alves1-1/+62
A previous patch taught GDB about a new TARGET_WAITKIND_THREAD_CLONED event kind, and made the Linux target report clone events. A following patch will teach Linux GDBserver to do the same thing. However, for remote debugging, it wouldn't be ideal for GDBserver to report every clone event to GDB, when GDB only cares about such events in some specific situations. Reporting clone events all the time would be potentially chatty. We don't enable thread create/exit events all the time for the same reason. Instead we have the QThreadEvents packet. QThreadEvents is target-wide, though. This patch makes GDB instead explicitly request that the target reports clone events or not, on a per-thread basis. In order to be able to do that with GDBserver, we need a new remote protocol feature. Since a following patch will want to enable thread exit events on per-thread basis too, the packet introduced here is more generic than just for clone events. It lets you enable/disable a set of options at once, modelled on Linux ptrace's PTRACE_SETOPTIONS. IOW, this commit introduces a new QThreadOptions packet, that lets you specify a set of per-thread event options you want to enable. The packet accepts a list of options/thread-id pairs, similarly to vCont, processed left to right, with the options field being a number interpreted as a bit mask of options. The only option defined in this commit is GDB_THREAD_OPTION_CLONE (0x1), which ask the remote target to report clone events. Another patch later in the series will introduce another option. For example, this packet sets option "1" (clone events) on thread p1000.2345: QThreadOptions;1:p1000.2345 and this clears options for all threads of process 1000, and then sets option "1" (clone events) on thread p1000.2345: QThreadOptions;0:p1000.-1;1:p1000.2345 This clears options of all threads of all processes: QThreadOptions;0 The target reports the set of supported options by including "QThreadOptions=<supported options>" in its qSupported response. infrun is then tweaked to enable GDB_THREAD_OPTION_CLONE when stepping over a breakpoint. Unlike PTRACE_SETOPTIONS, fork/vfork/clone children do NOT inherit their parent's thread options. This is so that GDB can send e.g., "QThreadOptions;0;1:TID" without worrying about threads it doesn't know about yet. Documentation for this new remote protocol feature is included in a documentation patch later in the series. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie41e5093b2573f14cf6ac41b0b5804eba75be37e
2023-11-13Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONEDPedro Alves1-70/+88
(A good chunk of the problem statement in the commit log below is Andrew's, adjusted for a different solution, and for covering displaced stepping too. The testcase is mostly Andrew's too.) This commit addresses bugs gdb/19675 and gdb/27830, which are about stepping over a breakpoint set at a clone syscall instruction, one is about displaced stepping, and the other about in-line stepping. Currently, when a new thread is created through a clone syscall, GDB sets the new thread running. With 'continue' this makes sense (assuming no schedlock): - all-stop mode, user issues 'continue', all threads are set running, a newly created thread should also be set running. - non-stop mode, user issues 'continue', other pre-existing threads are not affected, but as the new thread is (sort-of) a child of the thread the user asked to run, it makes sense that the new threads should be created in the running state. Similarly, if we are stopped at the clone syscall, and there's no software breakpoint at this address, then the current behaviour is fine: - all-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). While stepping the thread of interest all the other threads will be allowed to continue. A newly created thread will be set running, and then stopped once the thread of interest has completed its step. - non-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). Other threads might be running or stopped, but as with the continue case above, the new thread will be created running. The only possible issue here is that the new thread will be left running after the initial thread has completed its stepi. The user would need to manually select the thread and interrupt it, this might not be what the user expects. However, this is not something this commit tries to change. The problem then is what happens when we try to step over a clone syscall if there is a breakpoint at the syscall address. - For both all-stop and non-stop modes, with in-line stepping: + user issues 'stepi', + [non-stop mode only] GDB stops all threads. In all-stop mode all threads are already stopped. + GDB removes s/w breakpoint at syscall address, + GDB single steps just the thread of interest, all other threads are left stopped, + New thread is created running, + Initial thread completes its step, + [non-stop mode only] GDB resumes all threads that it previously stopped. There are two problems in the in-line stepping scenario above: 1. The new thread might pass through the same code that the initial thread is in (i.e. the clone syscall code), in which case it will fail to hit the breakpoint in clone as this was removed so the first thread can single step, 2. The new thread might trigger some other stop event before the initial thread reports its step completion. If this happens we end up triggering an assertion as GDB assumes that only the thread being stepped should stop. The assert looks like this: infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. - For both all-stop and non-stop modes, with displaced stepping: + user issues 'stepi', + GDB starts the displaced step, moves thread's PC to the out-of-line scratch pad, maybe adjusts registers, + GDB single steps the thread of interest, [non-stop mode only] all other threads are left as they were, either running or stopped. In all-stop, all other threads are left stopped. + New thread is created running, + Initial thread completes its step, GDB re-adjusts its PC, restores/releases scratchpad, + [non-stop mode only] GDB resumes the thread, now past its breakpoint. + [all-stop mode only] GDB resumes all threads. There is one problem with the displaced stepping scenario above: 3. When the parent thread completed its step, GDB adjusted its PC, but did not adjust the child's PC, thus that new child thread will continue execution in the scratch pad, invoking undefined behavior. If you're lucky, you see a crash. If unlucky, the inferior gets silently corrupted. What is needed is for GDB to have more control over whether the new thread is created running or not. Issue #1 above requires that the new thread not be allowed to run until the breakpoint has been reinserted. The only way to guarantee this is if the new thread is held in a stopped state until the single step has completed. Issue #3 above requires that GDB is informed of when a thread clones itself, and of what is the child's ptid, so that GDB can fixup both the parent and the child. When looking for solutions to this problem I considered how GDB handles fork/vfork as these have some of the same issues. The main difference between fork/vfork and clone is that the clone events are not reported back to core GDB. Instead, the clone event is handled automatically in the target code and the child thread is immediately set running. Note we have support for requesting thread creation events out of the target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported for the new/child thread. That would be sufficient to address in-line stepping (issue #1), but not for displaced-stepping (issue #3). To handle displaced-stepping, we need an event that is reported to the _parent_ of the clone, as the information about the displaced step is associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED includes no indication of which thread is the parent that spawned the new child. In fact, for some targets, like e.g., Windows, it would be impossible to know which thread that was, as thread creation there doesn't work by "cloning". The solution implemented here is to model clone on fork/vfork, and introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except that we end up with a new thread in the same process, instead of a new thread of a new process. Like FORKED and VFORKED, THREAD_CLONED waitstatuses have a child_ptid property, and the child is held stopped until GDB explicitly resumes it. This addresses the in-line stepping case (issues #1 and #2). The infrun code that handles displaced stepping fixup for the child after a fork/vfork event is thus reused for THREAD_CLONE, with some minimal conditions added, addressing the displaced stepping case (issue #3). The native Linux backend is adjusted to unconditionally report TARGET_WAITKIND_THREAD_CLONED events to the core. Following the follow_fork model in core GDB, we introduce a target_follow_clone target method, which is responsible for making the new clone child visible to the rest of GDB. Subsequent patches will add clone events support to the remote protocol and gdbserver. displaced_step_in_progress_thread becomes unused with this patch, but a new use will reappear later in the series. To avoid deleting it and readding it back, this patch marks it with attribute unused, and the latter patch removes the attribute again. We need to do this because the function is static, and with no callers, the compiler would warn, (error with -Werror), breaking the build. This adds a new gdb.threads/stepi-over-clone.exp testcase, which exercises stepping over a clone syscall, with displaced stepping vs inline stepping, and all-stop vs non-stop. We already test stepping over clone syscalls with gdb.base/step-over-syscall.exp, but this test uses pthreads, while the other test uses raw clone, and this one is more thorough. The testcase passes on native GNU/Linux, but fails against GDBserver. GDBserver will be fixed by a later patch in the series. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2023-11-13gdb/linux: Delete all other LWPs immediately on ptrace exec eventPedro Alves1-5/+3
I noticed that on an Ubuntu 20.04 system, after a following patch ("Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd end up with four new unexpected GDB core dumps: === gdb Summary === # of unexpected core files 4 # of expected passes 48 That said patch is making the pre-existing gdb.threads/step-over-exec.exp testcase (almost silently) expose a latent problem in gdb/linux-nat.c, resulting in a GDB crash when: #1 - a non-leader thread execs #2 - the post-exec program stops somewhere #3 - you kill the inferior Instead of #3 directly, the testcase just returns, which ends up in gdb_exit, tearing down GDB, which kills the inferior, and is thus equivalent to #3 above. Vis (after said patch is applied): $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true ... (top-gdb) r ... (gdb) b main ... (gdb) r ... Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 69 argv0 = argv[0]; (gdb) c Continuing. [New Thread 0x7ffff7d89700 (LWP 2506975)] Other going in exec. Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 28 foo (); (gdb) k ... Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 393 return m_suspend.waitstatus_pending_p; (top-gdb) bt #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 #3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 #4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 #5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590 #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 ... The root of the problem is that when a non-leader LWP execs, it just changes its tid to the tgid, replacing the pre-exec leader thread, becoming the new leader. There's no thread exit event for the execing thread. It's as if the old pre-exec LWP vanishes without trace. The ptrace man page says: "PTRACE_O_TRACEEXEC (since Linux 2.5.46) Stop the tracee at the next execve(2). A waitpid(2) by the tracer will return a status value such that status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) If the execing thread is not a thread group leader, the thread ID is reset to thread group leader's ID before this stop. Since Linux 3.0, the former thread ID can be retrieved with PTRACE_GETEVENTMSG." When the core of GDB processes an exec events, it deletes all the threads of the inferior. But, that is too late -- deleting the thread does not delete the corresponding LWP, so we end leaving the pre-exec non-leader LWP stale in the LWP list. That's what leads to the crash above -- linux_nat_target::kill iterates over all LWPs, and after the patch in question, that code will look for the corresponding thread_info for each LWP. For the pre-exec non-leader LWP still listed, won't find one. This patch fixes it, by deleting the pre-exec non-leader LWP (and thread) from the LWP/thread lists as soon as we get an exec event out of ptrace. GDBserver does not need an equivalent fix, because it is already doing this, as side effect of mourning the pre-exec process, in gdbserver/linux-low.cc: else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) { ... /* Delete the execing process and all its threads. */ mourn (proc); switch_to_thread (nullptr); The crash with gdb.threads/step-over-exec.exp is not observable on newer systems, which postdate the glibc change to move "libpthread.so" internals to "libc.so.6", because right after the exec, GDB traps a load event for "libc.so.6", which leads to GDB trying to open libthread_db for the post-exec inferior, and, on such systems that succeeds. When we load libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps, and then waits to see their stops. While doing this, GDB detects that the pre-exec stale LWP is gone, and deletes it. If we use "catch exec" to stop right at the exec before the "libc.so.6" load event ever happens, and issue "kill" right there, then GDB crashes on newer systems as well. So instead of tweaking gdb.threads/step-over-exec.exp to cover the fix, add a new gdb.threads/threads-after-exec.exp testcase that uses "catch exec". The test also uses the new "maint info linux-lwps" command if testing on Linux native, which also exposes the stale LWP problem with an unfixed GDB. Also tweak a comment in infrun.c:follow_exec referring to how linux-nat.c used to behave, as it would become stale otherwise. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
2023-11-08rs6000, Fix Linux DWARF register mappingCarl Love1-0/+13
Overview of issues fixed by the patch. The primary issue this patch fixes is the DWARF register mapping for Linux. The changes in ppc-linux-tdep.c fix the DWARF register mapping issues. The register mapping issue is responsible for two of the five regression bugs seen in gdb.base/store.exp. Once the register mapping was fixed, an underlying issue with the unwinding of the signal trampoline in common-code in ifrun.c was found. This underlying bug is best described by Ulrich in the following description. The unwinder bug shows up on platforms where the kernel uses a trampoline to dispatch "calls to" the signal handler (not just *returns from* the signal handler). Many platforms use a trampoline for signal return, and that is working fine, but the only platform I'm (Ulrich) aware of that uses a trampoline for signal handler calls is (recent kernels for) PowerPC. I believe the rationale for using a trampoline here is to improve performance by avoiding unbalancing of the branch predictor's call/return stack. However, on PowerPC the bug is dormant as well as it is hidden by *another* bug that prevents correct unwinding out of the signal trampoline. This is because the custom CFI for the trampoline uses a register number (VSCR) that is not ever used by compiler-generated CFI, and that particular register is mapped to an invalid number by the current PowerPC DWARF mapper. The underlying unwinder bug is exposed by the "new" regression failures in gdb.base/sigstep.exp. These failures were previously masked by the fact that GDB was not seeing a valid frame when it tried to unwind the frames. The sigstep.exp test is specifically testing stepping into a signal handler. With the correct DWARF register mapping in place, specifically the VSCR mapping, the signal trampoline code now unwinds to a valid frame exposing the pre-existing bug in how the signal handler on PowerPC works. The one line change infrun.c fixes the exiting bug in the common-code for platforms that use a trampoline to dispatch calls to the signal handler by not stopping in the SIGTRAMP_FRAME. Detailed description of the DWARF register mapping fix. The PowerPC DWARF register mapping is the same for the .eh_frame and .debug_frame on Linux. PowerPC uses different mapping for .eh_frame and .debug_frame on other operating systems. The current GDB support for mapping the DWARF registers in rs6000_linux_dwarf2_reg_to_regnum and rs6000_adjust_frame_regnum file gdb/rs6000-tdep.c is not correct for Linux. The files have some legacy mappings for spe_acc, spefscr, EV which was removed from GCC in 2017. This patch adds a two new functions rs6000_linux_dwarf2_reg_to_regnum, and rs6000_linux_adjust_frame_regnum in file gdb/ppc-linux-tdep.c to handle the DWARF register mappings on Linux. Function rs6000_linux_dwarf2_reg_to_regnum is installed for both gdb_dwarf_to_regnum and gdbarch_stab_reg_to_regnum since the mappings are the same. The ppc_linux_init_abi function in gdb/ppc-linux-tdep.c is updated to call set_gdbarch_dwarf2_reg_to_regnum map the new function rs6000_linux_dwarf2_reg_to_regnum for the architecture. Similarly, dwarf2_frame_set_adjust_regnum is called to map rs6000_linux_adjust_frame_regnum into the architecture. Additional detail on the signal handling fix. The specific sequence of events for handling a signal on most architectures is as follows: 1) Some code is running when a signal arrives. 2) The kernel handles the signal and dispatches to the handler. ... However on PowerPC the sequence of events is: 1) Some code is running when a signal arrives. 2) The kernel handles the signal and dispatches to the trampoline. 3) The trampoline performs a normal function call to the handler. ... We want the "nexti" to step into, not over, signal handlers invoked by the kernel. This is the case for most platforms as the kernel puts a signal trampoline frame onto the stack to handle proper return after the handler. However, on some platforms such as PowerPC, the kernel actually uses a trampoline to handle *invocation* of the handler. We do not want GDB to stop in the SIGTRAMP_FRAME. The issue is fixed in function process_event_stop_test by adding a check that the frame is not a SIGTRAMP_FRAME to the if statement to stop in a subroutine call. This prevents GDB from erroneously detecting the trampoline invocation as a subroutine call. This patch fixes two regression test failures in gdb.base/store.exp. The patch then fixes an exposed, dormant, signal handling issue that is exposed in the signal handling test gdb.base/sigstep.exp. The patch has been tested on Power 8 LE/BE, Power 9 LE/BE, Power 10 with no new regressions. Note, only two of the five failures in store.exp are fixed. The remaining three failures are fixed in a following patch.
2023-10-10gdb: remove target_gdbarchSimon Marchi1-4/+7
This function is just a wrapper around the current inferior's gdbarch. I find that having that wrapper just obscures where the arch is coming from, and that it's often used as "I don't know which arch to use so I'll use this magical target_gdbarch function that gets me an arch" when the arch should in fact come from something in the context (a thread, objfile, symbol, etc). I think that removing it and inlining `current_inferior ()->arch ()` everywhere will make it a bit clearer where that arch comes from and will trigger people into reflecting whether this is the right place to get the arch or not. Change-Id: I79f14b4e4934c88f91ca3a3155f5fc3ea2fadf6b Reviewed-By: John Baldwin <jhb@FreeBSD.org> Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-10-10gdb: add inferior::{arch, set_arch}Simon Marchi1-4/+4
Make the inferior's gdbarch field private, and add getters and setters. This helped me by allowing putting breakpoints on set_arch to know when the inferior's arch was set. A subsequent patch in this series also adds more things in set_arch. Change-Id: I0005bd1ef4cd6b612af501201cec44e457998eec Reviewed-By: John Baldwin <jhb@FreeBSD.org> Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-08-23gdb: remove the silent parameter from exit_inferior_1 and cleanupAndrew Burgess1-1/+1
After the previous commit, exit_inferior_1 no longer makes use of the silent parameter. This commit removes this parameter and cleans up the callers. After doing this exit_inferior_1, exit_inferior, and exit_inferior_silent are all equivalent, so rename exit_inferior_1 to exit_inferior and delete exit_inferior_silent, update all the callers. Also I spotted the declaration exit_inferior_num_silent in inferior.h, but this function is not defined anywhere, so I deleted the declaration. There should be no user visible changes after this commit.
2023-08-16gdb: fix vfork regressions when target-non-stop is offAndrew Burgess1-3/+5
It was pointed out on the mailing list[1] that after this commit: commit b1e0126ec56e099d753c20e91a9f8623aabd6b46 Date: Wed Jun 21 14:18:54 2023 +0100 gdb: don't resume vfork parent while child is still running the test gdb.base/vfork-follow-parent.exp now has some failures when run with the native-gdbserver or native-extended-gdbserver boards: FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to end of inferior 2 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1 (timeout) FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to break_parent (timeout) The reason that these failures don't show up when run on the standard unix board is that the test is only run in the default operating mode, so for Linux this will be all-stop on top of non-stop. If we adjust the test script so that it runs in the default mode and with target-non-stop turned off, then we see the same failures on the unix board. This commit includes this change. The way that the test is written means that it is not (currently) possible to turn on non-stop mode and have the test still work, so this commit does not do that. I have also updated the test script so that the vfork child performs an exec as well as the current exit. Exec and exit are the two ways in which a vfork child can release the vfork parent, so testing both of these cases is useful I think. In this test the inferior performs a vfork and the vfork-child immediately exits. The vfork-parent will wait for the vfork-child and then blocks waiting for gdb. Once gdb has released the vfork-parent, the vfork-parent also exits. In the test that fails, GDB sets 'detach-on-fork off' and then runs to the vfork. At this point the test tries to just "continue", but this fails as the vfork-parent is still selected, and the parent can't continue until the vfork-child completes. As the vfork-child is stopped by GDB the parent will never stop once resumed, so GDB refuses to resume it. The test script then sets 'schedule-multiple on' and once again continues. This time GDB, in theory, resumes both the parent and the child, the parent will be held blocked by the kernel, but the child will run until it exits, and which point GDB stops again, this time with inferior 2, the newly exited vfork-child, selected. What happens after this in the test script is irrelevant as far as this failure is concerned. To understand why the test started failing we should consider the behaviour of four different cases: 1. All-stop-on-non-stop before commit b1e0126ec56e, 2. All-stop-on-non-stop after commit b1e0126ec56e, 3. All-stop-on-all-stop before commit b1e0126ec56e, and 4. All-stop-on-all-stop after commit b1e0126ec56e. Only case #4 is failing after commit b1e0126ec56e, but I think the other cases are interesting because, (a) they inform how we might fix the regression, and (b) it turns out the behaviour of #2 changed too with the commit, but the change was harmless. For #1 All-stop-on-non-stop before commit b1e0126ec56e, what happens is: 1. GDB calls proceed with the vfork-parent selected, as schedule multiple is on user_visible_resume_ptid returns -1 (everything) as the resume_ptid (see proceed function), 2. As this is all-stop-on-non-stop, every thread is resumed individually, so GDB tries to resume both the vfork-parent and the vfork-child, both of which succeed, 3. The vfork-parent is held stopped by the kernel, 4. The vfork-child completes (exits) at which point the GDB sees the EXITED event for the vfork-child and the VFORK_DONE event for the vfork-parent, 5. At this point we might take two paths depending on which event GDB handles first, if GDB handles the VFORK_DONE first then: (a) As GDB is controlling both parent and child the VFORK_DONE is ignored (see handle_vfork_done), the vfork-parent will be resumed, (b) GDB processes the EXITED event, selects the (now defunct) vfork-child, and stops, returning control to the user. Alternatively, if GDB selects the EXITED event first then: (c) GDB processes the EXITED event, selects the (now defunct) vfork-child, and stops, returning control to the user. (d) At some future time the user resumes the vfork-parent, at which point the VFORK_DONE is reported to GDB, however, GDB is ignoring the VFORK_DONE (see handle_vfork_done), so the parent is resumed. For case #2, all-stop-on-non-stop after commit b1e0126ec56e, the important difference is in step (2) above, now, instead of resuming both the vfork-parent and the vfork-child, only the vfork-child is resumed. As such, when we get to step (5), only a single event, the EXITED event is reported. GDB handles the EXITED just as in (5)(c), then, later, when the user resumes the vfork-parent, the VFORKED_DONE is immediately delivered from the kernel, but this is ignored just as in (5)(d), and so, though the pattern of when the vfork-parent is resumed changes, the overall pattern of which events are reported and when, doesn't actually change. In fact, by not resuming the vfork-parent, the order of events (in this test) is now deterministic, which (maybe?) is a good thing. If we now consider case #3, all-stop-on-all-stop before commit b1e0126ec56e, then what happens is: 1. GDB calls proceed with the vfork-parent selected, as schedule multiple is on user_visible_resume_ptid returns -1 (everything) as the resume_ptid (see proceed function), 2. As this is all-stop-on-all-stop, the resume is passed down to the linux-nat target, the vfork-parent is the event thread, while the vfork-child is a sibling of the event thread, 3. In linux_nat_target::resume, GDB calls linux_nat_resume_callback for all threads, this causes the vfork-child to be resumed. Then in linux_nat_target::resume, the event thread, the vfork-parent, is also resumed. 4. The vfork-parent is held stopped by the kernel, 5. The vfork-child completes (exits) at which point the GDB sees the EXITED event for the vfork-child and the VFORK_DONE event for the vfork-parent, 6. We are now in a situation identical to step (5) as for all-stop-on-non-stop above, GDB selects one of the events to handle, and whichever we select the user sees the correct behaviour. And so, finally, we can consider #4, all-stop-on-all-stop after commit b1e0126ec56e, this is the case that started failing. We start out just like above, in proceed, the resume_ptid is -1 (resume everything), due to schedule multiple being on. And just like above, due to the target being all-stop, we call proceed_resume_thread_checked just once, for the current thread, which, remember, is the vfork-parent thread. The change in commit b1e0126ec56e was to avoid resuming a vfork-parent thread, read the commit message for the justification for this change. However, this means that GDB now rejects resuming the vfork-parent in this case, which means that nothing gets resumed! Obviously, if nothing resumes, then nothing will ever stop, and so GDB appears to hang. I considered a couple of solutions which, in the end, I didn't go with, these were: 1. Move the vfork-parent check out of proceed_resume_thread_checked, and place it in proceed, but only on the all-stop-on-non-stop path, this should still address the issue seen in b1e0126ec56e, but would avoid the issue seen here. I rejected this just because it didn't feel great to split the checks that exist in proceed_resume_thread_checked like this, 2. Extend the condition in proceed_resume_thread_checked by adding a target_is_non_stop_p check. This would have the same effect as idea 1, but leaves all the checks in the same place, which I think would be better, but this still just didn't feel right to me, and so, What I noticed was that for the all-stop-on-non-stop, after commit b1e0126ec56e, we only resumed the vfork-child, and this seems fine. The vfork-parent isn't going to run anyway (the kernel will hold it back), so if feels like we there's no harm in just waiting for the child to complete, and then resuming the parent. So then I started looking at follow_fork, which is called from the top of proceed. This function already has the task of switching between the parent and child based on which the user wishes to follow. So, I wondered, could we use this to switch to the vfork-child in the case that we are attached to both? Turns out this is pretty simple to do. Having done that, now the process is for all-stop-on-all-stop after commit b1e0126ec56e, and with this new fix is: 1. GDB calls proceed with the vfork-parent selected, but, 2. In follow_fork, and follow_fork_inferior, GDB switches the selected thread to be that of the vfork-child, 3. Back in proceed user_visible_resume_ptid returns -1 (everything) as the resume_ptid still, but now, 4. When GDB calls proceed_resume_thread_checked, the vfork-child is the current selected thread, this is not a vfork-parent, and so GDB allows the proceed to continue to the linux-nat target, 5. In linux_nat_target::resume, GDB calls linux_nat_resume_callback for all threads, this does not resume the vfork-parent (because it is a vfork-parent), and then the vfork-child is resumed as this is the event thread, At this point we are back in the same situation as for all-stop-on-non-stop after commit b1e0126ec56e, that is, the vfork-child is resumed, while the vfork-parent is held stopped by GDB. Eventually the vfork-child will exit or exec, at which point the vfork-parent will be resumed. [1] https://inbox.sourceware.org/gdb-patches/3e1e1db0-13d9-dd32-b4bb-051149ae6e76@simark.ca/
2023-08-03gdb: avoid double stop after failed breakpoint condition checkAndrew Burgess1-1/+15
This commit replaces this earlier commit: commit 2e411b8c68eb2b035b31d5b00d940d4be1a0928b Date: Fri Oct 14 14:53:15 2022 +0100 gdb: don't always print breakpoint location after failed condition check and is a result of feedback received here[1]. The original commit addressed a problem where, if a breakpoint condition included an inferior function call, and if the inferior function call failed, then GDB would announce the stop twice. Here's an example of GDB's output before the above commit that shows the problem being addressed: (gdb) break foo if (some_func ()) Breakpoint 1 at 0x40111e: file bpcond.c, line 11. (gdb) r Starting program: /tmp/bpcond Program received signal SIGSEGV, Segmentation fault. 0x0000000000401116 in some_func () at bpcond.c:5 5 return *p; Error in testing condition for breakpoint 1: The program being debugged stopped while in a function called from GDB. Evaluation of the expression containing the function (some_func) will be abandoned. When the function is done executing, GDB will silently stop. Breakpoint 1, 0x0000000000401116 in some_func () at bpcond.c:5 5 return *p; (gdb) The original commit addressed this issue in breakpoint.c, by spotting that the $pc had changed while evaluating the breakpoint condition, and inferring from this that GDB must have stopped elsewhere. However, the way in which the original commit suppressed the second stop announcement was to set bpstat::print to true -- this tells GDB not to print the frame during the stop announcement, and for the CLI this is fine, however, it was pointed out that for the MI this still isn't really enough. Below is an example from an MI session after the above commit was applied, this shows the problem with the above commit: -break-insert -c "cond_fail()" foo ^done,bkpt={number="1",type="breakpoint",disp="keep",enabled="y",addr="0x000000000040111e",func="foo",file="/tmp/mi-condbreak-fail.c",line="30",thread-groups=["i1"],cond="cond_fail()",times="0",original-location="foo"} (gdb) -exec-run =thread-group-started,id="i1",pid="2636270" =thread-created,id="1",group-id="i1" =library-loaded,id="/lib64/ld-linux-x86-64.so.2",target-name="/lib64/ld-linux-x86-64.so.2",host-name="/lib64/ld-linux-x86-64.so.2",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7fd3110",to="0x00007ffff7ff2bb4"}] ^running *running,thread-id="all" (gdb) =library-loaded,id="/lib64/libm.so.6",target-name="/lib64/libm.so.6",host-name="/lib64/libm.so.6",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7e59390",to="0x00007ffff7ef4f98"}] =library-loaded,id="/lib64/libc.so.6",target-name="/lib64/libc.so.6",host-name="/lib64/libc.so.6",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7ca66b0",to="0x00007ffff7df3c5f"}] ~"\nProgram" ~" received signal SIGSEGV, Segmentation fault.\n" ~"0x0000000000401116 in cond_fail () at /tmp/mi-condbreak-fail.c:24\n" ~"24\t return *p;\t\t\t/* Crash here. */\n" *stopped,reason="signal-received",signal-name="SIGSEGV",signal-meaning="Segmentation fault",frame={addr="0x0000000000401116",func="cond_fail",args=[],file="/tmp/mi-condbreak-fail.c",fullname="/tmp/mi-condbreak-fail.c",line="24",arch="i386:x86-64"},thread-id="1",stopped-threads="all",core="9" &"Error in testing condition for breakpoint 1:\n" &"The program being debugged was signaled while in a function called from GDB.\n" &"GDB remains in the frame where the signal was received.\n" &"To change this behavior use \"set unwindonsignal on\".\n" &"Evaluation of the expression containing the function\n" &"(cond_fail) will be abandoned.\n" &"When the function is done executing, GDB will silently stop.\n" =breakpoint-modified,bkpt={number="1",type="breakpoint",disp="keep",enabled="y",addr="0x000000000040111e",func="foo",file="/tmp/mi-condbreak-fail.c",fullname="/tmp/mi-condbreak-fail.c",line="30",thread-groups=["i1"],cond="cond_fail()",times="1",original-location="foo"} *stopped (gdb) Notice that we still see two '*stopped' lines, the first includes the full frame information, while the second has no frame information, this is a result of bpstat::print having been set. Ideally, the second '*stopped' line should not be present. By setting bpstat::print I was addressing the problem too late, this flag really only changes how interp::on_normal_stop prints the stop event, and interp::on_normal_stop is called (indirectly) from the normal_stop function in infrun.c. A better solution is to avoid calling normal_stop at all for the stops which should not be reported to the user, and this is what I do in this commit. This commit has 3 parts: 1. In breakpoint.c, revert the above commit, 2. In fetch_inferior_event (infrun.c), capture the stop-id before calling handle_inferior_event. If, after calling handle_inferior_event, the stop-id has changed, then this indicates that somewhere within handle_inferior_event, a stop was announced to the user. If this is the case then GDB should not call normal_stop, and we should rely on whoever announced the stop to ensure that we are in a PROMPT_NEEDED state, which means the prompt will be displayed once fetch_inferior_event returns. And, 3. In infcall.c, do two things: (a) In run_inferior_call, after making the inferior call, ensure that either async_disable_stdin or async_enable_stdin is called to put the prompt state, and stdin handling into the correct state based on whether the inferior call completed successfully or not, and (b) In call_thread_fsm::should_stop, call async_enable_stdin rather than changing the prompt state directly. This isn't strictly necessary, but helped me understand this code more. This async_enable_stdin call is only reached if normal_stop is not going to be called, and replaces the async_enable_stdin call that exists in normal_stop. Though we could just adjust the prompt state if felt (to me) much easier to understand when I could see this call and the corresponding call in normal_stop. With these changes in place now, when the inferior call (from the breakpoint condition) fails, infcall.c leaves the prompt state as PROMPT_NEEDED, and leaves stdin registered with the event loop. Back in fetch_inferior_event GDB notices that the stop-id has changed and so avoids calling normal_stop. And on return from fetch_inferior_event GDB will display the prompt and handle input from stdin. As normal_stop is not called the MI problem is solved, and the test added in the earlier mentioned commit still passes just fine, so the CLI has not regressed. [1] https://inbox.sourceware.org/gdb-patches/6fd4aa13-6003-2563-5841-e80d5a55d59e@palves.net/
2023-07-17gdb: additional debug output in infrun.c and linux-nat.cAndrew Burgess1-4/+24
While working on some of the recent patches relating to vfork handling I found this debug output very helpful, I'd like to propose adding this into GDB. With debug turned off there should be no user visible changes after this commit.
2023-07-17gdb: don't resume vfork parent while child is still runningAndrew Burgess1-4/+20
Like the last few commit, this fixes yet another vfork related issue. Like the commit titled: gdb: don't restart vfork parent while waiting for child to finish which addressed a case in linux-nat where we would try to resume a vfork parent, this commit addresses a very similar case, but this time occurring in infrun.c. Just like with that previous commit, this bug results in the assert: x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. In this case the issue occurs when target-non-stop is on, but non-stop is off, and again, schedule-multiple is on. As with the previous commit, GDB is in follow-fork-mode child. If you have not done so, it is worth reading the earlier commit as many of the problems leading to the failure are the same, they just appear in a different part of GDB. Here are the steps leading to the assertion failure: 1. The user performs a 'next' over a vfork, GDB stop in the vfork child, 2. As we are planning to follow the child GDB sets the vfork_parent and vfork_child member variables in the two inferiors, the thread_waiting_for_vfork_done member is left as nullptr, that member is only used when GDB is planning to follow the parent inferior, 3. The user does 'continue', our expectation is that the vfork child should resume, and once that process has exited or execd, GDB should detach from the vfork parent. As a result of the 'continue' GDB eventually enters the proceed function, 4. In proceed we selected a ptid_t to resume, because schedule-multiple is on we select minus_one_ptid (see user_visible_resume_ptid), 5. As GDB is running in all-stop on top of non-stop mode, in the proceed function we iterate over all threads that match the resume ptid, which turns out to be all threads, and call proceed_resume_thread_checked. One of the threads we iterate over is the vfork parent thread, 6. As the thread passed to proceed_resume_thread_checked doesn't match any of the early return conditions, GDB will set the thread resumed, 7. As we are resuming one thread at a time, this thread is seen by the lower layers (e.g. linux-nat) as the "event thread", which means we don't apply any of the checks, e.g. is this thread a vfork parent, instead we assume that GDB core knows what it's doing, and linux-nat will resume the thread, we have now incorrectly set running the vfork parent thread when this thread should be waiting for the vfork child to complete, 8. Back in the proceed function GDB continues to iterate over all threads, and now (correctly) resumes the vfork child thread, 8. As the vfork child is still alive the kernel holds the vfork parent stopped, 9. Eventually the child performs its exec and GDB is sent and EXECD event. However, because the parent is resumed, as soon as the child performs its exec the vfork parent also sends a VFORK_DONE event to GDB, 10. Depending on timing both of these events might seem to arrive in GDB at the same time. Normally GDB expects to see the EXECD or EXITED/SIGNALED event from the vfork child before getting the VFORK_DONE in the parent. We know this because it is as a result of the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see handle_vfork_child_exec_or_exit for details). Further the comment in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that when we remain attached to the child (not the parent) we should not expect to see a VFORK_DONE, 11. If both events arrive at the same time then GDB will randomly choose one event to handle first, in some cases this will be the VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB expects that (a) the vfork child has finished, however, in this case this is not completely true, the child has finished, but GDB has not processed the event associated with the completion yet, and (b) upon seeing a VFORK_DONE GDB assumes we are remaining attached to the parent, and so resumes the parent process, 12. GDB now handles the EXECD event. In our case we are detaching from the parent, so GDB calls target_detach (see handle_vfork_child_exec_or_exit), 13. While this has been going on the vfork parent is executing, and might even exit, 14. In linux_nat_target::detach the first thing we do is stop all threads in the process we're detaching from, the result of the stop request will be cached on the lwp_info object, 15. In our case the vfork parent has exited though, so when GDB waits for the thread, instead of a stop due to signal, we instead get a thread exited status, 16. Later in the detach process we try to resume the threads just prior to making the ptrace call to actually detach (see detach_one_lwp), as part of the process to resume a thread we try to touch some registers within the thread, and before doing this GDB asserts that the thread is stopped, 17. An exited thread is not classified as stopped, and so the assert triggers! Just like with the earlier commit, the fix is to spot the vfork parent status of the thread, and not resume such threads. Where the earlier commit fixed this in linux-nat, in this case I think the fix should live in infrun.c, in proceed_resume_thread_checked. This function already has a similar check to not resume the vfork parent in the case where we are planning to follow the vfork parent, I propose adding a similar case that checks for the vfork parent when we plan to follow the vfork child. This new check will mean that at step #6 above GDB doesn't try to resume the vfork parent thread, which prevents the VFORK_DONE from ever arriving. Once GDB sees the EXECD/EXITED/SIGNALLED event from the vfork child GDB will detach from the parent. There's no test included in this commit. In a subsequent commit I will expand gdb.base/foll-vfork.exp which is when this bug would be exposed. If you do want to reproduce this failure then you will for certainly need to run the gdb.base/foll-vfork.exp test in a loop as the failures are all very timing sensitive. I've found that running multiple copies in parallel makes the failure more likely to appear, I usually run ~6 copies in parallel and expect to see a failure after within 10mins.
2023-07-17gdb, infrun: refactor part of `proceed` into separate functionMihails Strasuns1-69/+86
Split the thread resuming code from proceed into new function proceed_resume_thread_checked. Co-Authored-By: Christina Schimpe <christina.schimpe@intel.com>
2023-07-17gdb: fix an issue with vfork in non-stop modeAndrew Burgess1-12/+22
This commit fixes a bug introduced by this commit: commit d8bbae6ea080249c05ca90b1f8640fde48a18301 Date: Fri Jan 14 15:40:59 2022 -0500 gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on) The problem can be seen in this GDB session: $ gdb -q (gdb) set non-stop on (gdb) file ./gdb/testsuite/outputs/gdb.base/foll-vfork/foll-vfork Reading symbols from ./gdb/testsuite/outputs/gdb.base/foll-vfork/foll-vfork... (gdb) tcatch vfork Catchpoint 1 (vfork) (gdb) run Starting program: /tmp/gdb/testsuite/outputs/gdb.base/foll-vfork/foll-vfork Temporary catchpoint 1 (vforked process 1375914), 0x00007ffff7d5043c in vfork () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff7d5043c in vfork () from /lib64/libc.so.6 #1 0x00000000004011af in main (argc=1, argv=0x7fffffffad88) at .../gdb/testsuite/gdb.base/foll-vfork.c:32 (gdb) finish Run till exit from #0 0x00007ffff7d5043c in vfork () from /lib64/libc.so.6 [Detaching after vfork from child process 1375914] No unwaited-for children left. (gdb) Notice the "No unwaited-for children left." error. This is incorrect, given where we are stopped there's no reason why we shouldn't be able to use "finish" to return to the main frame. When the inferior is stopped as a result of the 'tcatch vfork', the inferior is in the process of performing the vfork, that is, GDB has seen the VFORKED event, but has not yet attached to the new child process, nor has the child process been resumed. However, GDB has seen the VFORKED, and, as we are going to follow the parent process, the inferior for the vfork parent will have its thread_waiting_for_vfork_done member variable set, this will point to the one and only thread that makes up the vfork parent process. When the "finish" command is used GDB eventually ends up in the proceed function (in infrun.c), in here we pass through all the function until we eventually encounter this 'else if' condition: else if (!cur_thr->resumed () && !thread_is_in_step_over_chain (cur_thr) /* In non-stop, forbid resuming a thread if some other thread of that inferior is waiting for a vfork-done event (this means breakpoints are out for this inferior). */ && !(non_stop && cur_thr->inf->thread_waiting_for_vfork_done != nullptr)) { The first two of these conditions will both be true, the thread is not already resumed, and is not in the step-over chain, however, the third condition, this one: && !(non_stop && cur_thr->inf->thread_waiting_for_vfork_done != nullptr)) is false, and this prevents the thread we are trying to finish from being resumed. This condition is false because (a) non_stop is true, and (b) cur_thr->inf->thread_waiting_for_vfork_done is not nullptr (see above for why). Now, if we check the comment embedded within the condition it says: /* In non-stop, forbid resuming a thread if some other thread of that inferior is waiting for a vfork-done event (this means breakpoints are out for this inferior). */ And this makes sense, if we have a vfork parent with two thread, and one thread has performed a vfork, then we shouldn't try to resume the second thread. However, if we are trying to resume the thread that actually performed a vfork, then this is fine. If we never resume the vfork parent then we'll never get a VFORK_DONE event, and so the vfork will never complete. Thus, the condition should actually be: && !(non_stop && cur_thr->inf->thread_waiting_for_vfork_done != nullptr && cur_thr->inf->thread_waiting_for_vfork_done != cur_thr)) This extra check will allow the vfork parent thread to resume, but prevent any other thread in the vfork parent process from resuming. This is the same condition that already exists in the all-stop on a non-stop-target block earlier in the proceed function. My actual fix is slightly different to the above, first, I've chosen to use a nested 'if' check instead of extending the original 'else if' check, this makes it easier to write a longer comment explaining what's going on, and second, instead of checking 'non_stop' I've switched to checking 'target_is_non_stop_p'. In this context this is effectively the same thing, a previous 'else if' block in proceed already handles '!non_stop && target_is_non_stop_p ()', so by the time we get here, if 'target_is_non_stop_p ()' then we must be running in non_stop mode. Both of these tweaks will make the next patch easier, which is a refactor to merge two parts of the proceed function, so this nested 'if' block is not going to exist for long. For testing, there is no test included with this commit. The test was exposed when using a modified version of the gdb.base/foll-vfork.exp test script, however, there are other bugs that are exposed when using the modified test script. These bugs will be addressed in subsequent commits, and then I'll add the updated gdb.base/foll-vfork.exp. If you wish to reproduce this failure then grab the updates to gdb.base/foll-vfork.exp from the later commit and run this test, the failure is always reproducible.
2023-06-05[gdb] Fix more typosTom de Vries1-1/+1
Fix some more typos: - distinquish -> distinguish - actualy -> actually - singe -> single - frash -> frame - chid -> child - dissassembler -> disassembler - uninitalized -> uninitialized - precontidion -> precondition - regsiters -> registers - marge -> merge - sate -> state - garanteed -> guaranteed - explictly -> explicitly - prefices (nonstandard plural) -> prefixes - bondary -> boundary - formated -> formatted - ithe -> the - arrav -> array - coresponding -> corresponding - owend -> owned - fials -> fails - diasm -> disasm - ture -> true - tpye -> type There's one code change, the name of macro SIG_CODE_BONDARY_FAULT changed to SIG_CODE_BOUNDARY_FAULT. Tested on x86_64-linux.
2023-05-30gdb: add interp::on_about_to_proceed methodSimon Marchi1-1/+11
Same idea as previous patches, but for about_to_proceed. We only need (and want, as far as the mi_interp implementation is concerned) to notify the interpreter that caused the proceed. Change-Id: Id259bca10dbc3d43d46607ff7b95243a9cbe2f89
2023-05-30gdb: add interp::on_user_selected_context_changed methodSimon Marchi1-0/+8
Same as previous patches, but for user_selected_context_changed. Change-Id: I40de15be897671227d4bcf3e747f0fd595f0d5be
2023-05-30gdb: add interp::on_sync_execution_done methodSimon Marchi1-1/+1
Same as previous patches, but for sync_execution_done. Except that here, we only want to notify the interpreter that is executing the command, not all interpreters. Change-Id: I729c719447b5c5f29af65dbf6fed9132e2cd308b
2023-05-30gdb: add interp::on_no_history methodSimon Marchi1-1/+1
Same as previous patches, but for no_history. Change-Id: I06930fe7cb4082138c6c5496c5118fe4951c10da
2023-05-30gdb: add interp::on_exited methodSimon Marchi1-1/+1
Same as previous patch, but for exited. Remove the exited observable, since nothing uses it anymore, and we don't have anything coming that will use it. Change-Id: I358cbea0159af56752dfee7510d6a86191e722bb
2023-05-30gdb: add interp::on_signal_exited methodSimon Marchi1-1/+1
Same as previous patch, but for signal_exited. Remove the signal_exited observable, since nothing uses it anymore, and we don't have anything coming that will use it. Change-Id: I0dca1eab76338bf27be755786e3dad3241698b10
2023-05-30gdb: add interp::on_normal_stop methodSimon Marchi1-6/+13
Same idea as the previous patch, but for the normal_stop event. Change-Id: I4fc8ca8a51c63829dea390a2b6ce30b77f9fb863
2023-05-30gdb: add interp::on_signal_received methodSimon Marchi1-2/+12
Instead of having the interpreter code registering observers for the signal_received observable, add a "signal_received" virtual method to struct interp. Add a interps_notify_signal_received function that loops over all UIs and calls the signal_received method on the interpreter. Finally, add a notify_signal_received function that calls interps_notify_signal_received and then notifies the observers. Replace all existing notifications to the signal_received observers with calls to notify_signal_received. Before this patch, the CLI and MI code both register a signal_received observer. These observer go over all UIs, and, for those that have a interpreter of the right kind, print the stop notifiation. After this patch, we have just one "loop over all UIs", inside interps_notify_signal_received. Since the interp::on_signal_received method gets called once for each interpreter, the implementations only need to deal with the current interpreter (the "this" pointer). The motivation for this patch comes from a future patch, that makes the amdgpu code register an observer to print a warning after the CLI's signal stop message. Since the amdgpu and the CLI code both use observers, the order of the two messages is not stable, unless we define the priority using the observer dependency system. However, the approach of using virtual methods on the interpreters seems like a good change anyway, I think it's more straightforward and simple to understand than the current solution that uses observers. We are sure that the amdgpu message gets printed after the CLI message, since observers are notified after interpreters. Keep the signal_received, even if nothing uses if, because we will be using it in the upcoming amdgpu patch implementing the warning described above. Change-Id: I4d8614bb8f6e0717f4bfc2a59abded3702f23ac4
2023-05-25gdb: add breakpoint::first_loc methodsSimon Marchi1-5/+6
Add convenience first_loc methods to struct breakpoint (const and non-const overloads). A subsequent patch changes the list of locations to be an intrusive_list and makes the actual list private, so these spots would need to change from: b->loc to something ugly like: *b->locations ().begin () That would make the code much heavier and not readable. There is a surprisingly big number of places that access the first location of breakpoints. Whether this is correct, or these spots fail to consider the possibility of multi-location breakpoints, I don't know. But anyhow, I think that using this instead: b->first_loc () conveys the intention better than the other two forms. Change-Id: Ibbefe3e4ca6cdfe570351fe7e2725f2ce11d1e95 Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2023-05-01gdb: move struct ui and related things to ui.{c,h}Simon Marchi1-0/+1
I'd like to move some things so they become methods on struct ui. But first, I think that struct ui and the related things are big enough to deserve their own file, instead of being scattered through top.{c,h} and event-top.c. Change-Id: I15594269ace61fd76ef80a7b58f51ff3ab6979bc
2023-04-24gdb: remove end_stepping_range observableSimon Marchi1-12/+0
I noticed that this observable was never notified, which means we can probably safely remove it. The notification was removed in: commit 243a925328f8e3184b2356bee497181049c0174f Author: Pedro Alves <palves@redhat.com> Date: Wed Sep 9 18:23:24 2015 +0100 Replace "struct continuation" mechanism by something more extensible print_end_stepping_range_reason in turn becomes unused, so remote it as well. Change-Id: If5da5149276c282d2540097c8c4327ce0f70431a
2023-04-17gdb: switch to right inferior in fetch_inferior_eventSimon Marchi1-3/+7
The problem explained and fixed in the previous patch could have also been fixed by this patch. But I think it's good change anyhow, that could prevent future bugs, so here it is. fetch_inferior_event switches to an arbitrary (in practice, the first) inferior of the process target of the inferior used to fetch the event. The idea is that the event handling code will need to do some target calls, so we want to switch to an inferior that has target target. However, you can have two inferiors that share a process target, but with one inferior having an additional target on top: inf 1 inf 2 ----- ----- another target process target process target exec exec Let's say inferior 2 is selected by do_target_wait and returns an event that is really synthetized by "another target". This "another target" could be a thread or record stratum target (in the case explained by the previous patch, it was the arch stratum target, but it's because the amd-dbgapi abuses the arch layer). fetch_inferior_event will then switch to the first inferior with "process target", so inferior 1. handle_signal_stop then tries to fetch the thread's registers: ecs->event_thread->set_stop_pc (regcache_read_pc (get_thread_regcache (ecs->event_thread))); This will try to get the thread's register by calling into the current target stack, the stack of inferior 1. This is problematic because "another target" might have a special fetch_registers implementation. I think it would be a good idea to switch to the inferior for which the even was reported, not just some inferior of the same process target. This will ensure that any target call done before we eventually call context_switch will be done on the full target stack that reported the event. Not all events are associated to an inferior though. For instance, TARGET_WAITKIND_NO_RESUMED. In those cases, some targets return null_ptid, some return minus_one_ptid (ideally the expected return value should be clearly defined / documented). So, if the ptid returned is either of these, switch to an arbitrary inferior with that process target, as before. Change-Id: I1ffc8c1095125ab591d0dc79ea40025b1d7454af Reviewed-By: Pedro Alves <pedro@palves.net>
2023-04-17gdb: make regcache::raw_update switch to right inferiorSimon Marchi1-1/+1
With the following patch, which teaches the amd-dbgapi target to handle inferiors that fork, we end up with target stacks in the following state, when an inferior that does not use the GPU forks an inferior that eventually uses the GPU. inf 1 inf 2 ----- ----- amd-dbgapi linux-nat linux-nat exec exec When a GPU thread from inferior 2 hits a breakpoint, the following sequence of events would happen, if it was not for the current patch. - we start with inferior 1 as current - do_target_wait_1 makes inferior 2 current, does a target_wait, which returns a stop event for an amd-dbgapi wave (thread). - do_target_wait's scoped_restore_current_thread restores inferior 1 as current - fetch_inferior_event calls switch_to_target_no_thread with linux-nat as the process target, since linux-nat is officially the process target of inferior 2. This makes inferior 1 the current inferior, as it's the first inferior with that target. - In handle_signal_stop, we have: ecs->event_thread->suspend.stop_pc = regcache_read_pc (get_thread_regcache (ecs->event_thread)); context_switch (ecs); regcache_read_pc executes while inferior 1 is still the current one (because it's before the `context_switch`). This is a problem, because the regcache is for a ptid managed by the amd-dbgapi target (e.g. (12345, 1, 1)), a ptid that does not make sense for the linux-nat target. The fetch_registers target call goes directly to the linux-nat target, which gets confused. - We would then get an error like: Couldn't get extended state status: No such process. ... since linux-nat tries to do a ptrace call on tid 1. GDB should switch to the inferior the ptid belongs to before doing the target call to fetch registers, to make sure the call hits the right target stack (it should be handled by the amd-dbgapi target in this case). In fact the following patch does this change, and it would be enough to fix this specific problem. However, I propose to change regcache to make it switch to the right inferior, if needed, before doing target calls. That makes the interface as a whole more independent of the global context. My first attempt at doing this was to find an inferior using the process stratum target and the ptid that regcache already knows about: gdb::optional<scoped_restore_current_thread> restore_thread; inferior *inf = find_inferior_ptid (this->target (), this->ptid ()); if (inf != current_inferior ()) { restore_thread.emplace (); switch_to_inferior_no_thread (inf); } However, this caused some failures in fork-related tests and gdbserver boards. When we detach a fork child, we may create a regcache for the child, but there is no corresponding inferior. For instance, to restore the PC after a displaced step over the fork syscall. So find_inferior_ptid would return nullptr, and switch_to_inferior_no_thread would hit a failed assertion. So, this patch adds to regcache the information "the inferior to switch to to makes target calls". In typical cases, it will be the inferior that matches the regcache's ptid. But in some cases, like the detached fork child one, it will be another inferior (in this example, it will be the fork parent inferior). The problem that we witnessed was in regcache::raw_update specifically, but I looked for other regcache methods doing target calls, and added the same inferior switching code to raw_write too. In the regcache constructor and in get_thread_arch_aspace_regcache, "inf_for_target_calls" replaces the process_stratum_target parameter. We suppose that the process stratum target that would be passed otherwise is the same that is in inf_for_target_calls's target stack, so we don't need to pass both in parallel. The process stratum target is still used as a key in the `target_pid_ptid_regcache_map` map, but that's it. There is one spot that needs to be updated outside of the regcache code, which is the path that handles the "restore PC after a displaced step in a fork child we're about to detach" case mentioned above. regcache_test_data needs to be changed to include full-fledged mock contexts (because there now needs to be inferiors, not just targets). Change-Id: Id088569ce106e1f194d9ae7240ff436f11c5e123 Reviewed-By: Pedro Alves <pedro@palves.net>
2023-04-17gdb: add inferior_forked observableSimon Marchi1-0/+2
In the upcoming patch to support fork in the amd-dbgapi target, the amd-dbgapi target will need to be notified of fork events through an observer, to attach itself (attach in the amd-dbgapi sense, not ptrace sense) to the new inferior / process. The reason that this can't be done through target_ops::follow_fork is that the amd-dbgapi target isn't pushed on the inferior's target stack right away. It attaches itself to the process and only pushes itself on its target stack if and when the inferior initializes the ROCm runtime. If an inferior that is not using the ROCm runtime forks, we want to be notified of it, so we can attach to the child, and catch if the child starts using the ROCm runtime. So, add a new observable and notify it in follow_fork_inferior. It will be used later in this series. Change-Id: I67fced5a9cba6d5da72b9c7ea1c8397644ca1d54 Reviewed-By: Pedro Alves <pedro@palves.net>
2023-04-17gdb: pass execing and following inferior to inferior_execd observersSimon Marchi1-18/+21
The upcoming patch to support exec in the amd-dbgapi target needs to detach amd-dbgapi from the inferior doing the exec and attach amd-dbgapi to the inferior continuing the execution. They may or may not be the same, depending on the `set follow-exec-mode` setting. But even if they are the same, we need to do the detach / attach dance. With the current observable signature, the observers only receive the inferior in which execution continues (the "following" inferior). Change the signature to pass both inferiors, and update all existing observers. Change-Id: I259d1ea09f70f43be739378d6023796f2fce2659 Reviewed-By: Pedro Alves <pedro@palves.net>
2023-04-04gdb: make find_thread_ptid a process_stratum_target methodSimon Marchi1-6/+6
Make find_thread_ptid (the overload that takes a process_stratum_target) a method of process_stratum_target. Change-Id: Ib190a925a83c6b93e9c585dc7c6ab65efbdd8629 Reviewed-By: Tom Tromey <tom@tromey.com>