aboutsummaryrefslogtreecommitdiff
path: root/gdb/testsuite/gdb.threads
AgeCommit message (Collapse)AuthorFilesLines
9 days[gdb/testsuite] Fix trailing-text-in-parentheses duplicatesTom de Vries2-6/+6
Fix all trailing-text-in-parentheses duplicates exposed by previous patch. Tested on x86_64-linux and aarch64-linux.
11 days[gdb/testsuite] Remove PR31554 kfail in gdb.threads/leader-exit-attach.expTom de Vries1-8/+0
When running test-case gdb.threads/leader-exit-attach.exp with target board native-extended-gdbserver I run into: ... (gdb) KFAIL: $exp: attach (PRMS: gdb/31555) print $_inferior_thread_count^M $1 = 0^M (gdb) KPASS: $exp: get valueof "$_inferior_thread_count" (PRMS server/31554) ... The PR mentioned in the KPASS, PR31554 was fixed by commit f1fc8dc2dcc ("Fix "attach" failure handling with GDBserver"), and consequently the PR is closed. Fix this by removing the corresponding kfail. Tested on x86_64-linux.
11 days[gdb/testsuite] Fix gdb.threads/leader-exit-attach.exp with check-read1Tom de Vries1-3/+3
With test-case gdb.threads/leader-exit-attach.exp and check-read1, I run into: ... (gdb) attach 18591^M Attaching to program: leader-exit-attach, process 18591^M warning: process 18591 is a zombie - the process has already terminatedKFAIL: $exp: attach (PRMS: gdb/31555) ^M ptrace: Operation not permitted.^M (gdb) FAIL: $exp: get valueof "$_inferior_thread_count" ... The problem is that the gdb_test_multiple in the test-case doesn't consume the prompt in all clauses: ... gdb_test_multiple "attach $testpid" "attach" { -re "Attaching to process $testpid failed.*" { # GNU/Linux gdbserver. Linux ptrace does not let you attach # to zombie threads. setup_kfail "gdb/31555" *-*-linux* fail $gdb_test_name } -re "warning: process $testpid is a zombie - the process has already terminated.*" { # Native GNU/Linux. Linux ptrace does not let you attach to # zombie threads. setup_kfail "gdb/31555" *-*-linux* fail $gdb_test_name } -re "Attaching to program: $escapedbinfile, process $testpid.*$gdb_prompt $" { pass $gdb_test_name set attached 1 } } ... Fix this by using -wrap in the first two clauses. While we're at it, also use -wrap in the third clause. Tested on x86_64-linux.
2024-06-21[gdb/testsuite] Fix regexp in gdb.threads/stepi-over-clone.expTom de Vries1-1/+1
On fedora rawhide, I ran into: ... (gdb) continue^M Continuing.^M ^M Catchpoint 2 (call to syscall clone3), 0x000000000042097d in __clone3 ()^M (gdb) FAIL: gdb.threads/stepi-over-clone.exp: continue ... Fix this by updating a regexp to also recognize __clone3. Tested on x86_64-linux. Tested-By: Guinevere Larsen <blarsen@redhat.com>
2024-05-24[gdb/testsuite] Add PR26286 kfail in ↵Tom de Vries1-1/+24
gdb.threads/attach-many-short-lived-threads.exp When running test-case gdb.threads/attach-many-short-lived-threads.exp, I run regularly into PR26286: ... (gdb) continue^M Continuing.^M [LWP ... exited]^M ... [LWP ... exited]^M ^M Program terminated with signal SIGTRAP, Trace/breakpoint trap.^M The program no longer exists.^M (gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 9: \ break at break_fn: 1 ... Add a kfail for this, such that we have: ... (gdb) KFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 9: \ break at break_fn: 1 (PRMS: threads/26286) ... Reviewed-By: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Tested on x86_64-linux.
2024-05-14gdb/testsuite: remove unnecessary -Wl,-soname,NAME build flagsAndrew Burgess1-6/+2
While working on another patch I needed to pass -Wl,-soname,NAME as a compiler flag. I initially looked for other tests that did this, and found a few examples, so I copied what they did. But when I checked the gdb.log file I noticed that we were actually getting -Wl,-soname passed twice. I tracked the repeated option to 'proc gdb_compile_shlib_1' in lib/gdb.exp. It turns out that we always add -Wl,-soname when compiling a shared library. Here's an example of a build command from gdb.base/prelink.exp: builtin_spawn -ignore SIGHUP gcc -fno-stack-protector \ /tmp/build/gdb/testsuite/outputs/gdb.base/prelink/prelink-lib.c.o \ -fdiagnostics-color=never -shared -g \ -Wl,-soname,prelink.so -Wl,-soname,prelink.so -lm \ -o /tmp/build/gdb/testsuite/outputs/gdb.base/prelink/prelink.so Notice that '-Wl,-soname,prelink.so' is repeated. I believe that all of the places where tests add '-Wl,-soname,NAME' as a build option, are unnecessary. In this commit I propose we remove them all. As part of this change I've switched from calling gdb_compile_shlib directly, to instead call build_executable and adding the 'shlib' flag. I've tested with gcc and clang and see no changes in the test results after this commit. All the compile commands still have -Wl,-soname added, but now it's only added once, from within lib/gdb.exp. There should be no change in what is tested after this commit. Approved-By: Tom Tromey <tom@tromey.com>
2024-05-08Fix AIX thread exit events not being reported and UI to show kernel thread ID.Aditya Vidyadhar Kamath1-4/+2
In AIX when a thread exits we were not showing that a thread exit event happened and GDB continued to keep the terminated threads. If we have terminated threads then the UI on info threads command will look like (gdb) info threads Id Target Id Frame * 1 Thread 1 (tid 26607979, running) 0xd0611d70 in _p_nsleep () from /usr/lib/libpthreads.a(_shr_xpg5.o) 2 Thread 258 (tid 30998799, finished) aix-thread: ptrace (52, 30998799) returned -1 (errno = 3 The process does not exist.) If we see the frame is not getting displayed correctly. The reason for the same is that in AIX we were not managing thread states. In particular we do not know when a thread terminates. The reason being in sync_threadlists () the pbuf and gbuf lists remain the same though certain threads exit. This patch is a fix to the same. Also certain UI is changed. On a new thread born and exit the UI in AIX will be similar to Linux with both user and kernel thread information. [New Thread 258 (tid 32178533)] [New Thread 515 (tid 30343651)] [New Thread 772 (tid 33554909)] [New Thread 1029 (tid 24969489)] [New Thread 1286 (tid 18153945)] [New Thread 1543 (tid 30736739)] [Thread 258 (tid 32178533) exited] [Thread 515 (tid 30343651) exited] [Thread 772 (tid 33554909) exited] [Thread 1029 (tid 24969489) exited] [Thread 1286 (tid 18153945) exited] [Thread 1543 (tid 30736739) exited] and info threads will look like (gdb) info threads Id Target Id Frame * 1 Thread 1 (tid 31326579) ([running]) 0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o) Also a small change to testcase gdb.threads/thread_events.exp to make sure this test runs on AIX as well.
2024-05-06[gdb/testsuite] Handle ptrace operation not permitted in can_spawn_for_attachTom de Vries4-23/+28
When running the testsuite on a system with kernel.yama.ptrace_scope set to 1, we run into attach failures. Fix this by recognizing "ptrace: Operation not permitted" in can_spawn_for_attach. Tested on aarch64-linux and x86_64-linux. Approved-By: Pedro Alves <pedro@palves.net>
2024-05-03[gdb/testsuite] Use save_vars to restore GDBFLAGSTom de Vries3-16/+14
There's a pattern of using: ... set saved_gdbflags $GDBFLAGS set GDBFLAGS "$GDBFLAGS ..." <do something with GDBFLAGS> set GDBFLAGS $saved_gdbflags ... Simplify this by using save_vars: ... save_vars { GDBFLAGS } { set GDBFLAGS "$GDBFLAGS ..." <do something with GDBFLAGS> } ... Tested on x86_64-linux.
2024-04-26gdb_is_target_remote -> gdb_protocol_is_remotePedro Alves3-3/+3
This is similar to the previous patch, but for gdb_protocol_is_remote. gdb_is_target_remote and its MI cousin mi_is_target_remote, use "maint print target-stack", which is unnecessary when checking whether gdb_protocol is "remote" or "extended-remote" would do. Checking gdb_protocol is more efficient, and can be done before starting GDB and running to main, unlike gdb_is_target_remote/mi_is_target_remote. This adds a new gdb_protocol_is_remote procedure, and uses it in place of gdb_is_target_remote/mi_is_target_remote throughout. There are no uses of gdb_is_target_remote/mi_is_target_remote left after this. Those will be eliminated in a following patch. In some spots, we no longer need to defer the check until after starting GDB, so the patch adjusts accordingly. Change-Id: I90267c132f942f63426f46dbca0b77dbfdf9d2ef Approved-By: Tom Tromey <tom@tromey.com>
2024-04-26gdb_is_target_native -> gdb_protocol_is_nativePedro Alves1-1/+1
gdb_is_target_native uses "maint print target-stack", which is unnecessary when checking whether gdb_protocol is empty would do. Checking gdb_protocol is more efficient, and can be done before starting GDB and running to main, unlike gdb_is_target_native. This adds a new gdb_protocol_is_native procedure, and uses it in place of gdb_is_target_native. At first, I thought that we'd end up with a few testcases needing to use gdb_is_target_native still, especially multi-target tests that connect to targets different from the default board target, but no, actually all uses of gdb_is_target_native could be converted. gdb_is_target_native will be eliminated in a following patch. In some spots, we no longer need to defer the check until after starting GDB, so the patch adjusts accordingly. Change-Id: Ia706232dbffac70f9d9740bcb89c609dbee5cee3 Approved-By: Tom Tromey <tom@tromey.com>
2024-04-24[gdb/testsuite] Fix gdb.threads/threadcrash.exp for remote hostTom de Vries1-5/+3
With test-case gdb.threads/threadcrash.exp using host board local-remote-host and target board remote-gdbserver-on-localhost I run into: ... (gdb) PASS: gdb.threads/threadcrash.exp: test_gcore: continue to crash gcore $outputs/gdb.threads/threadcrash/threadcrash.gcore^M Failed to open '$outputs/gdb.threads/threadcrash/threadcrash.gcore' for output.^M (gdb) FAIL: gdb.threads/threadcrash.exp: test_gcore: saving gcore UNSUPPORTED: gdb.threads/threadcrash.exp: test_gcore: couldn't generate gcore file ... The problem is that the gcore command tries to save a file on a remote host, but the filename is a location on build. Fix this by using host_standard_output_file. Tested on x86_64-linux.
2024-04-24[gdb/testsuite] Fix gdb.threads/threadcrash.exp with glibc debuginfoTom de Vries1-1/+1
After installing glibc debuginfo, I ran into: ... FAIL: gdb.threads/threadcrash.exp: test_live_inferior: \ $thread_count == [llength $test_list] ... This happens because the clause: ... -re "^\r\n${hs}main$hs$eol" { ... which is intended to match only: ... #1 <hex> in main () at threadcrash.c:423^M ... also matches "remaining" in: ... #1 <hex> in __GI___nanosleep (requested_time=<hex>, remaining=<hex>) at \ nanosleep.c:27^M ... Fix this by checking for "in main" instead. Tested on x86_64-linux.
2024-04-12New testcase gdb.threads/leader-exit-attach.exp (PR threads/8153)Pedro Alves1-0/+87
Add a new testcase for exercising attaching to a process after its main thread has exited. This is not possible on Linux, the kernel does not allow attaching to a zombie task, so the test is kfailed there. It is possible however on Windows at least, and was the scenario addressed by the Windows backend fix in https://sourceware.org/legacy-ml/gdb-patches/2003-12/msg00479.html, nowadays PR threads/8153, back in 2003. Passes cleanly on Cygwin. KFAILed on GNU/Linux native and gdbserver. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=8153 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31554 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31555 Change-Id: Ib554f92f68c965bb4603cdf2aadb55ca45ded53b
2024-04-11[gdb/testsuite] Fix gdb.threads/access-mem-running-thread-exit.exp with clangTom de Vries1-0/+2
When running test-case gdb.threads/access-mem-running-thread-exit.exp with clang, we run into: ... (gdb) print global_var = 555^M No symbol "global_var" in current context.^M (gdb) FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: \ access mem (write to global_var, inf=2, iter=1) ... The problem is that clang removes the unused variable. Fix this in the same way as done in commit b4f767131f7 ("Fix gdb.base/align-*.exp and Clang + LTO and AIX GCC"), by incrementing the variable. Tested on x86_64-linux with gcc and clang.
2024-04-04Fix a test failure in gdb.threads/stepi-over-clone.expBernd Edlinger1-0/+4
When the XML support was disabled at compile time, the test case gdb.threads/stepi-over-clone.exp fails with lots of time-outs, which can be annoying. This makes the test case unsupported instead. Approved-By: Tom Tromey <tom@tromey.com>
2024-03-28[gdb/testsuite] Fix test-case gdb.threads/attach-stopped.exp on manjaro linuxTom de Vries1-2/+4
When running test-case gdb.threads/attach-stopped.exp on aarch64-linux, using the manjaro linux distro, I get: ... (gdb) thread apply all bt^M ^M Thread 2 (Thread 0xffff8d8af120 (LWP 278116) "attach-stopped"):^M #0 0x0000ffff8d964864 in clock_nanosleep () from /usr/lib/libc.so.6^M #1 0x0000ffff8d969cac in nanosleep () from /usr/lib/libc.so.6^M #2 0x0000ffff8d969b68 in sleep () from /usr/lib/libc.so.6^M #3 0x0000aaaade370828 in func (arg=0x0) at attach-stopped.c:29^M #4 0x0000ffff8d930aec in ?? () from /usr/lib/libc.so.6^M #5 0x0000ffff8d99a5dc in ?? () from /usr/lib/libc.so.6^M ^M Thread 1 (Thread 0xffff8db62020 (LWP 278111) "attach-stopped"):^M #0 0x0000ffff8d92d2d8 in ?? () from /usr/lib/libc.so.6^M #1 0x0000ffff8d9324b8 in ?? () from /usr/lib/libc.so.6^M #2 0x0000aaaade37086c in main () at attach-stopped.c:45^M (gdb) FAIL: gdb.threads/attach-stopped.exp: threaded: attach2 to stopped bt ... The problem is that the test-case expects to see start_thread: ... gdb_test "thread apply all bt" ".*sleep.*start_thread.*" \ "$threadtype: attach2 to stopped bt" ... but lack of symbols makes that impossible. Fix this by allowing " in ?? () from " as well. Tested on aarch64-linux. PR testsuite/31451 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31451
2024-03-25gdb: rename unwindonsignal to unwind-on-signalAndrew Burgess2-5/+5
We now have unwind-on-timeout and unwind-on-terminating-exception, and then the odd one out unwindonsignal. I'm not a great fan of these squashed together command names, so in this commit I propose renaming this to unwind-on-signal. Obviously I've added the hidden alias unwindonsignal so any existing GDB scripts will keep working. There's one test that I've extended to test the alias works, but in most of the other test scripts I've changed over to use the new name. The docs are updated to reference the new name. Reviewed-By: Eli Zaretskii <eliz@gnu.org> Tested-By: Luis Machado <luis.machado@arm.com> Tested-By: Keith Seitz <keiths@redhat.com>
2024-03-25gdb: introduce unwind-on-timeout settingAndrew Burgess1-22/+47
Now that inferior function calls can timeout (see the recent introduction of direct-call-timeout and indirect-call-timeout), this commit adds a new setting unwind-on-timeout. This new setting is just like the existing unwindonsignal and unwind-on-terminating-exception, but the new setting will cause GDB to unwind the stack if an inferior function call times out. The existing inferior function call timeout tests have been updated to cover the new setting. Reviewed-By: Eli Zaretskii <eliz@gnu.org> Tested-By: Luis Machado <luis.machado@arm.com> Tested-By: Keith Seitz <keiths@redhat.com>
2024-03-25gdb: add timeouts for inferior function callsAndrew Burgess2-0/+335
In the previous commits I have been working on improving inferior function call support. One thing that worries me about using inferior function calls from a conditional breakpoint is: what happens if the inferior function call fails? If the failure is obvious, e.g. the thread performing the call crashes, or hits a breakpoint, then this case is already well handled, and the error is reported to the user. But what if the thread performing the inferior call just deadlocks? If the user made the call from a 'print' or 'call' command, then the user might have some expectation of when the function call should complete, and, when this time limit is exceeded, the user will (hopefully) interrupt GDB and regain control of the debug session. But, when the inferior function call is from a breakpoint condition it is much harder to understand that GDB is deadlocked within an inferior call. Maybe the breakpoint hasn't been hit yet? Or maybe the condition was always false? Or maybe GDB is deadlocked in an inferior call? The only way to know for sure is for the user to periodically interrupt the inferior, check on the state of all the threads, and then continue. Additionally, the focus of the previous commit was inferior function calls, from a conditional breakpoint, in a multi-threaded inferior. This opens up a whole new set of potential failure conditions. For example, what if the function called relies on interaction with some other thread, and the other thread crashes? Or hits a breakpoint? Given how inferior function calls work (in a synchronous manner), a stop event in some other thread is going to be ignored while the inferior function call is being executed as part of a breakpoint condition, and this means that GDB could get stuck waiting for the original condition thread, which will now never complete. In this commit I propose a solution to this problem. A timeout. For targets that support async-mode we can install an event-loop timer before starting the inferior function call. When the timer expires we will stop the thread performing the inferior function call. With this mechanism in place a user can be sure that any inferior call they make will either complete, or timeout eventually. Adding a timer like this is obviously a change in behaviour for the more common 'call' and 'print' uses of inferior function calls, so, in this patch, I propose having two different timers. One I call the 'direct-call-timeout', which is used for 'call' and 'print' commands. This timeout is by default set to unlimited, which, not surprisingly, means there is no timeout in place. A second timer, which I've called 'indirect-call-timeout', is used for inferior function calls from breakpoint conditions. This timeout has a default value of 30 seconds. This is a reasonably long time to wait, and hopefully should be enough in most cases to allow the inferior call to complete. An inferior call that takes more than 30 seconds, which is installed on a breakpoint condition is really going to slow down the debug session, so hopefully this is not a common use case. The user is, of course, free to reduce, or increase the timeout value, and can always use Ctrl-c to interrupt an inferior function call, but this timeout will ensure that GDB will stop at some point. The new commands added by this commit are: set direct-call-timeout SECONDS show direct-call-timeout set indirect-call-timeout SECONDS show indirect-call-timeout These new timeouts do depend on async-mode, so, if async-mode is disabled (maint set target-async off), or not supported (e.g. target sim), then the timeout is treated as unlimited (that is, no timeout is set). For targets that "fake" non-async mode, e.g. Linux native, where non-async mode is really just async mode, but then we park the target in a sissuspend, we could easily fix things so that the timeouts still work, however, for targets that really are not async aware, like the simulator, fixing things so that timeouts work correctly would be a much bigger task - that effort would be better spent just making the target async-aware. And so, I'm happy for now that this feature will only work on async targets. The two new show commands will display slightly different text if the current target is a non-async target, which should allow users to understand what's going on. There's a somewhat random test adjustment needed in gdb.base/help.exp, the test uses a regexp with the apropos command, and expects to find a single result. Turns out the new settings I added also matched the regexp, which broke the test. I've updated the regexp a little to exclude my new settings. Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Reviewed-By: Eli Zaretskii <eliz@gnu.org> Tested-By: Luis Machado <luis.machado@arm.com> Tested-By: Keith Seitz <keiths@redhat.com>
2024-03-25gdb: fix b/p conditions with infcalls in multi-threaded inferiorsAndrew Burgess6-0/+889
This commit fixes bug PR 28942, that is, creating a conditional breakpoint in a multi-threaded inferior, where the breakpoint condition includes an inferior function call. Currently, when a user tries to create such a breakpoint, then GDB will fail with: (gdb) break infcall-from-bp-cond-single.c:61 if (return_true ()) Breakpoint 2 at 0x4011fa: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-single.c, line 61. (gdb) continue Continuing. [New Thread 0x7ffff7c5d700 (LWP 2460150)] [New Thread 0x7ffff745c700 (LWP 2460151)] [New Thread 0x7ffff6c5b700 (LWP 2460152)] [New Thread 0x7ffff645a700 (LWP 2460153)] [New Thread 0x7ffff5c59700 (LWP 2460154)] Error in testing breakpoint condition: Couldn't get registers: No such process. An error occurred while in a function called from GDB. Evaluation of the expression containing the function (return_true) will be abandoned. When the function is done executing, GDB will silently stop. Selected thread is running. (gdb) Or, in some cases, like this: (gdb) break infcall-from-bp-cond-simple.c:56 if (is_matching_tid (arg, 1)) Breakpoint 2 at 0x401194: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-simple.c, line 56. (gdb) continue Continuing. [New Thread 0x7ffff7c5d700 (LWP 2461106)] [New Thread 0x7ffff745c700 (LWP 2461107)] ../../src.release/gdb/nat/x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. The precise error depends on the exact thread state; so there's race conditions depending on which threads have fully started, and which have not. But the underlying problem is always the same; when GDB tries to execute the inferior function call from within the breakpoint condition, GDB will, incorrectly, try to resume threads that are already running - GDB doesn't realise that some threads might already be running. The solution proposed in this patch requires an additional member variable thread_info::in_cond_eval. This flag is set to true (in breakpoint.c) when GDB is evaluating a breakpoint condition. In user_visible_resume_ptid (infrun.c), when the in_cond_eval flag is true, then GDB will only try to resume the current thread, that is, the thread for which the breakpoint condition is being evaluated. This solves the problem of GDB trying to resume threads that are already running. The next problem is that inferior function calls are assumed to be synchronous, that is, GDB doesn't expect to start an inferior function call in thread #1, then receive a stop from thread #2 for some other, unrelated reason. To prevent GDB responding to an event from another thread, we update fetch_inferior_event and do_target_wait in infrun.c, so that, when an inferior function call (on behalf of a breakpoint condition) is in progress, we only wait for events from the current thread (the one evaluating the condition). In do_target_wait I had to change the inferior_matches lambda function, which is used to select which inferior to wait on. Previously the logic was this: auto inferior_matches = [&wait_ptid] (inferior *inf) { return (inf->process_target () != nullptr && ptid_t (inf->pid).matches (wait_ptid)); }; This compares the pid of the inferior against the complete ptid we want to wait on. Before this commit wait_ptid was only ever minus_one_ptid (which is special, and means any process), and so every inferior would match. After this commit though wait_ptid might represent a specific thread in a specific inferior. If we compare the pid of the inferior to a specific ptid then these will not match. The fix is to compare against the pid extracted from the wait_ptid, not against the complete wait_ptid itself. In fetch_inferior_event, after receiving the event, we only want to stop all the other threads, and call inferior_event_handler with INF_EXEC_COMPLETE, if we are not evaluating a conditional breakpoint. If we are, then all the other threads should be left doing whatever they were before. The inferior_event_handler call will be performed once the breakpoint condition has finished being evaluated, and GDB decides to stop or not. The final problem that needs solving relates to GDB's commit-resume mechanism, which allows GDB to collect resume requests into a single packet in order to reduce traffic to a remote target. The problem is that the commit-resume mechanism will not send any resume requests for an inferior if there are already events pending on the GDB side. Imagine an inferior with two threads. Both threads hit a breakpoint, maybe the same conditional breakpoint. At this point there are two pending events, one for each thread. GDB selects one of the events and spots that this is a conditional breakpoint, GDB evaluates the condition. The condition includes an inferior function call, so GDB sets up for the call and resumes the one thread, the resume request is added to the commit-resume queue. When the commit-resume queue is committed GDB sees that there is a pending event from another thread, and so doesn't send any resume requests to the actual target, GDB is assuming that when we wait we will select the event from the other thread. However, as this is an inferior function call for a condition evaluation, we will not select the event from the other thread, we only care about events from the thread that is evaluating the condition - and the resume for this thread was never sent to the target. And so, GDB hangs, waiting for an event from a thread that was never fully resumed. To fix this issue I have added the concept of "forcing" the commit-resume queue. When enabling commit resume, if the force flag is true, then any resumes will be committed to the target, even if there are other threads with pending events. A note on authorship: this patch was based on some work done by Natalia Saiapova and Tankut Baris Aktemur from Intel[1]. I have made some changes to their work in this version. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28942 [1] https://sourceware.org/pipermail/gdb-patches/2020-October/172454.html Co-authored-by: Natalia Saiapova <natalia.saiapova@intel.com> Co-authored-by: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com> Tested-By: Luis Machado <luis.machado@arm.com> Tested-By: Keith Seitz <keiths@redhat.com>
2024-03-22gdb tests: Allow for "LWP" or "process" in thread IDs from info threadsJohn Baldwin18-45/+50
Several tests assume that the first word after a thread ID in 'info threads' output is "Thread". However, several targets use "LWP" instead such as the FreeBSD and NetBSD native targets. The Linux native target also uses "LWP" if libthread_db is not being used. Targets that do not support threads use "process" as the first word via normal_pid_to_str. Add a tdlabel_re global variable as a regular-expression for a thread label in `info threads' that matches either "process", "Thread", or "LWP". Some other tests in the tree don't require a specific word, and some targets may use other first words (e.g. OpenBSD uses "thread" and Ravenscar threads use "Ravenscar Thread").
2024-03-11gdb/testsuite: Reduce gdb.threads/threadcrash.exp reliance on libc symbolsGuinevere Larsen1-7/+41
The test gdb.threads/threadcrash.exp demanded GDB to fully unwind and print the names of all functions. However, some of the functions are from the libc library, and so the test implicitly demanded libc symbols to be available, and would fail otherwise, as was raised in PR gdb/31293. This commit changes it so we only explicitly check for functions that are not provided by threadcrash.c if they are indeed available. Tested on arm-linux and x86_64-linux. Approved-By: Tom de Vries <tdevries@suse.de> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31293
2024-03-11gdb/testsuite: Simplify gdb.threads/threadcrash.expTom de Vries1-29/+60
I noticed in gdb.threads/threadcrash.exp that the usage of test_list is somewhat convoluted. Simplify the test-case by storing a classification instead of a pattern in test_list. Tested on arm-linux and x86_64-linux.
2024-03-11gdb/testsuite: Use _inferior_thread_count in gdb.threads/threadcrash.expGuinevere Larsen1-21/+2
A linaro PR [1] reports that the gdb.threads/threadcrash.exp test-case fails to cout the number of threads in the inferior: ... FAIL: gdb.threads/threadcrash.exp: test_gcore: $thread_count == 7 FAIL: gdb.threads/threadcrash.exp: test_gcore: $thread_count == [llength $test_list] ... Fix this by getting the convenience variable _inferior_thread_count as opposed to calculating it based on the output of "info threads". Tested on arm-linux and x86_64-linux. Reviewed-By: Lancelot Six <lancelot.six@amd.com> Approved-By: Tom de Vries <tdevries@suse.de> [1] https://linaro.atlassian.net/browse/GNU-1120
2024-03-11gdb/testsuite: Fix gdb.threads/threadcrash.exp with check-readmoreTom de Vries1-10/+19
With check-readmore, I run into: ... FAIL: gdb.threads/threadcrash.exp: test_corefile: \ $thread_count == [llength $test_list] ... The problem is that the clauses in the gdb_test_multiple for "thread apply all backtrace" intent to match one line, but actually can match more than one line, and consequently a match for one type of thread can consume a line that was supposed to match another thread. For instance, there's this regexp: ... -re "\[^\n\]*syscall_task .location=SIGNAL_ALT_STACK\[^\n\]*" { ... It's limited at the end by \[^\n\]*, meaning the match stops at the end of the line. But it doesn't start with a ^, and consequently can match more than one line. The "\[^\n\]*" at the start doesn't prevent this, there's an implicit .* at the start of each pattern, unless it's anchored using a ^. Fix this by rewriting the regexps in a "^\r\n$hs$regexp$hs$eol" style, where: - hs is: \[^\n\]* (horizontal space), and - eol is (?=\r\n) (look-ahead end-of-line). It also turned out to be necessary to drop the -lbl switch, and introduce a corresponding explicit clause. The -lbl clause is placed ALAP, and consequently allowed the default fail clause to trigger. Tested on arm-linux and x86_64-linux.
2024-03-11gdb/testsuite: Reduce indentation in gdb.threads/threadcrash.expTom de Vries1-58/+58
In test-case gdb.threads/threadcrash.exp we have an unnecessarily indented gdb_test_multiple: ... gdb_test_multiple "thread apply all backtrace" \ "Get thread information" -lbl { -re "#\[0-9\]+\\\?\\\?\[^\n\]*" { ... Fix this by moving the command into a variable, allowing the "gdb_test_multiple ... {" to fit on a single 80 chars line. Tested on arm-linux and x86_64-linux.
2024-02-26gdb: Modify the output of "info breakpoints" and "delete breakpoints"Tiezhu Yang3-6/+6
The output of "info breakpoints" includes breakpoint, watchpoint, tracepoint, and catchpoint if they are created, so it should show all the four types are deleted in the output of "info breakpoints" to report empty list after "delete breakpoints". It should also change the output of "delete breakpoints" to make it clear that watchpoints, tracepoints, and catchpoints are also being deleted. This is suggested by Guinevere Larsen, thank you. $ make check-gdb TESTS="gdb.base/access-mem-running.exp" $ gdb/gdb gdb/testsuite/outputs/gdb.base/access-mem-running/access-mem-running [...] (gdb) break main Breakpoint 1 at 0x12000073c: file /home/loongson/gdb.git/gdb/testsuite/gdb.base/access-mem-running.c, line 32. (gdb) watch global_counter Hardware watchpoint 2: global_counter (gdb) trace maybe_stop_here Tracepoint 3 at 0x12000071c: file /home/loongson/gdb.git/gdb/testsuite/gdb.base/access-mem-running.c, line 27. (gdb) catch fork Catchpoint 4 (fork) (gdb) info breakpoints Num Type Disp Enb Address What 1 breakpoint keep y 0x000000012000073c in main at /home/loongson/gdb.git/gdb/testsuite/gdb.base/access-mem-running.c:32 2 hw watchpoint keep y global_counter 3 tracepoint keep y 0x000000012000071c in maybe_stop_here at /home/loongson/gdb.git/gdb/testsuite/gdb.base/access-mem-running.c:27 not installed on target 4 catchpoint keep y fork Without this patch: (gdb) delete breakpoints Delete all breakpoints? (y or n) y (gdb) info breakpoints No breakpoints or watchpoints. (gdb) info breakpoints 3 No breakpoint or watchpoint matching '3'. With this patch: (gdb) delete breakpoints Delete all breakpoints, watchpoints, tracepoints, and catchpoints? (y or n) y (gdb) info breakpoints No breakpoints, watchpoints, tracepoints, or catchpoints. (gdb) info breakpoints 3 No breakpoint, watchpoint, tracepoint, or catchpoint matching '3'. Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Approved-by: Kevin Buettner <kevinb@redhat.com> Reviewed-By: Eli Zaretskii <eliz@gnu.org>
2024-01-24gdb/testsuite: add test for backtracing for threaded inferiors from a corefileGuinevere Larsen2-0/+676
This patch is based on an out-of-tree patch that fedora has been carrying for a while. It tests if GDB is able to properly unwind a threaded program in the following situations: * regular threads * in a signal handler * in a signal handler executing on an alternate stack And the final frame can either be in a syscall or in an infinite loop. The test works by running the inferior until a crash to generate a corefile, or until right before the crash. Then applies a backtrace to all threads to see if any frame can't be identified, and the order of the threads in GDB. Finally, it goes thread by thread and tries to collect a large part of the backtrace, to confirm that everything is being unwound correctly. Co-Authored-By: Andrew Burgess <aburgess@redhat.com> Reviewed-By: Luis Machado <luis.machado@arm.com> Approved-By: Luis Machado <luis.machado@arm.com>
2024-01-19[gdb/testsuite] Update xfail in gdb.threads/attach-many-short-lived-threads.expTom de Vries1-1/+5
With test-case gdb.threads/attach-many-short-lived-threads.exp, I run into: ... (gdb) attach 7773^M Attaching to program: attach-many-short-lived-threads, process 7773^M Cannot attach to lwp 7776: Operation not permitted (1)^M (gdb) PASS: $exp: iter 1: attach info threads^M No threads.^M (gdb) PASS: $exp: iter 1: no new threads set breakpoint always-inserted on^M (gdb) PASS: $exp: iter 1: set breakpoint always-inserted on break break_fn^M Breakpoint 1 at 0x400b4d: file attach-many-short-lived-threads.c, line 57.^M (gdb) PASS: $exp: iter 1: break break_fn continue^M The program is not being run.^M (gdb) FAIL: $exp: iter 1: break at break_fn: 1 \ (the program is no longer running) ... There's some code in the test-case dealing with a similar warning: ... -re "warning: Cannot attach to lwp $decimal: Operation not permitted" { ... But since commit c6f7f9c80c3 ("Bail out of "attach" if a thread cannot be traced"), the warning has been changed into an error. Fix the FAIL by updating the test-case to expect an error instead of a warning. Tested on x86_64-linux. Approved-By: Tom Tromey <tom@tromey.com>
2024-01-12Update copyright year range in header of all files managed by GDBAndrew Burgess255-255/+255
This commit is the result of the following actions: - Running gdb/copyright.py to update all of the copyright headers to include 2024, - Manually updating a few files the copyright.py script told me to update, these files had copyright headers embedded within the file, - Regenerating gdbsupport/Makefile.in to refresh it's copyright date, - Using grep to find other files that still mentioned 2023. If these files were updated last year from 2022 to 2023 then I've updated them this year to 2024. I'm sure I've probably missed some dates. Feel free to fix them up as you spot them.
2024-01-04[gdb/testsuite] Handle PAC markerTom de Vries2-2/+2
On aarch64-linux, I run into: ... FAIL: gdb.base/annota1.exp: backtrace from shlibrary (timeout) ... due to the PAC marker showing up: ... ^Z^Zframe-address^M 0x000000000041025c [PAC]^M ^Z^Zframe-address-end^M ... In the docs the marker is documented as follows: ... When GDB is debugging the AArch64 architecture, and the program is using the v8.3-A feature Pointer Authentication (PAC), then whenever the link register $lr is pointing to an PAC function its value will be masked. When GDB prints a backtrace, any addresses that required unmasking will be postfixed with the marker [PAC]. When using the MI, this is printed as part of the addr_flags field. ... Update the test-case to allow the PAC marker. Likewise in a few other test-cases. While we're at it, rewrite the affected pattern pat_begin in annota1.exp into a more readable form. Likewise for the corresponding pat_end. Tested on aarch64-linux. Approved-By: Luis Machado <luis.machado@arm.com> PR testsuite/31202 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31202
2023-12-20gdb.threads/step-over-thread-exit.exp improvementsPedro Alves2-24/+119
This commit makes the following improvements to gdb.threads/step-over-thread-exit.exp: - Add a third axis to stepping over the breakpoint with displaced vs inline stepping -- also test with no breakpoint at all. - Check that when GDB reports "Command aborted, thread exited.", the selected thread is the thread that exited. This is always true currently on GNU/Linux by coincidence, but a similar testcase on AMD GPU exposed a problem here. Better make the testcase catch any potential regression. - Fixes a race that Simon ran into with GDBserver testing. (gdb) next [New Thread 2143071.2143438] Thread 3 "step-over-threa" hit Breakpoint 2, 0x000055555555524e in my_exit_syscall () at .../testsuite/lib/my-syscalls.S:74 74 SYSCALL (my_exit, __NR_exit) (gdb) FAIL: gdb.threads/step-over-thread-exit.exp: displaced-stepping=auto: non-stop=on: target-non-stop=on: schedlock=off: cmd=next: ns_stop_all=0: command aborts when thread exits I was not able to reproduce it, but I believe that what happens is the following: Once we continue, the thread 2 exits, and the main thread thus unblocks from its pthread_join, and spawns a new thread. That new thread may hit the breakpoint at my_exit_syscall very quickly. GDB could then see/process that breakpoint event before the thread exit event for the thread we care about, which would result in the failure seen above. The fix here is to not loop and start a new thread at all in the scenario where the race can happen. We only need to loop and spawn new threads when testing with "cmd=continue" and schedlock off, in which case GDB doesn't abort the command when the thread exits. Approved-By: Simon Marchi <simon.marchi@efficios.com> Change-Id: I90c95c32f00630a3f682b1541c23aff52451f9b6
2023-12-18gdb/testsuite: another attempt to fix gdb.threads/thread-specific-bp.expAndrew Burgess1-18/+6
The gdb.threads/thread-specific-bp.exp test has been a little problematic, see commits: commit 89702edd933a5595557bcd9cc4a0dcc3262226d4 Date: Thu Mar 9 12:31:26 2023 +0100 [gdb/testsuite] Fix gdb.threads/thread-specific-bp.exp on native-gdbserver and commit 2e5843d87c4050bf1109921481fb29e1c470827f Date: Fri Nov 19 14:33:39 2021 +0100 [gdb/testsuite] Fix gdb.threads/thread-specific-bp.exp But I recently saw a test failure for that test, which looked like this: ... (gdb) PASS: gdb.threads/thread-specific-bp.exp: non_stop=on: thread 1 selected continue -a Continuing. Thread 1 "thread-specific" hit Breakpoint 4, end () at /tmp/binutils-gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/thread-specific-bp.c:29 29 } (gdb) [Thread 0x7ffff7c5c700 (LWP 1552086) exited] Thread-specific breakpoint 3 deleted - thread 2 no longer in the thread list. FAIL: gdb.threads/thread-specific-bp.exp: non_stop=on: continue to end (timeout) ... This only crops up (for me) when running on a loaded machine, and still only occurs sometimes. I've had to leave the test running in a loop for 10+ minutes sometimes in order to see the failure. The problem is that we use gdb_test_multiple to try and match two patterns: (1) The 'Thread-specific breakpoint 3 deleted ....' message, and (2) The GDB prompt. As written in the test, we understand that these patterns can occur in any order, and we have a flag for each pattern. Once both patterns have been seen then we PASS the test. The problem is that once expect has matched a pattern, everything up to, and including the matched text is discarded from the input buffer. Thus, if the input buffer contains: <PATTERN 2><PATTERN 1> Then expect will first try to match <PATTERN 1>, which succeeds, and then expect discards the entire input buffer up to the end of the <PATTERN 1>. As a result, we will never spot <PATTERN 2>. Obviously we can't just reorder the patterns within the gdb_test_multiple, as the output can legitimately (and most often does) occur in the other order, in which case the test would mostly fail, and only occasionally pass! I think the easiest solution here is just to have the gdb_test_multiple contain two patterns, each pattern consists of the two parts, but in the alternative orders, thus, for a particular output configuration, only one regexp will match. With this change in place, I no longer see the intermittent failure. Approved-By: Tom Tromey <tom@tromey.com>
2023-11-15Fix gdb.threads/threads-after-exec.exp racePedro Alves1-3/+7
Simon noticed that gdb.threads/threads-after-exec.exp was racy. You can consistenly reproduce it (at git hash 319b460545dc79280e2904dcc280057cf71fb753), with: $ taskset -c 0 make check TESTS="gdb.threads/threads-after-exec.exp" gdb.log shows: (...) Thread 3 "threads-after-e" hit Catchpoint 2 (exec'd .../gdb.threads/threads-after-exec/threads-after-exec), 0x00007ffff7fe3290 in _start () from /lib64/ld-linux-x86-64.so.2 (gdb) PASS: gdb.threads/threads-after-exec.exp: continue until exec info threads Id Target Id Frame * 3 process 1443269 "threads-after-e" 0x00007ffff7fe3290 in _start () from /lib64/ld-linux-x86-64.so.2 (gdb) FAIL: gdb.threads/threads-after-exec.exp: info threads (...) maint info linux-lwps LWP Ptid Thread ID 1443269.1443269.0 1.3 (gdb) FAIL: gdb.threads/threads-after-exec.exp: maint info linux-lwps The FAILs happen because the .exp file expects that after the exec, the only thread has GDB thread number 1, but it has instead 3. This is yet another case of zombie leader detection making things a bit fuzzy. In the passing case, we have: continue Continuing. [New Thread 0x7ffff7bff640 (LWP 603183)] [Thread 0x7ffff7bff640 (LWP 603183) exited] process 603180 is executing new program: .../gdb.threads/threads-after-exec/threads-after-exec While in the failing case, we have (note remarks on the rhs): continue Continuing. [New Thread 0x7ffff7bff640 (LWP 600205)] [Thread 0x7ffff7f95740 (LWP 600202) exited] <<< gdb deletes leader thread, thread 1. [New LWP 600202] <<< gdb adds it back -- this is now thread 3. [Thread 0x7ffff7bff640 (LWP 600205) exited] process 600202 is executing new program: .../threads-after-exec/threads-after-exec The testcase only has two threads, yet GDB presented the exec for thread 3. This is GDB deleting the leader (the backend detected it was zombie, due to the exec), and then adding the leader back when it saw the exec event. I've recorded some thoughts about this in PR gdb/31069. For now, this commit just makes the testcase cope with the non-one thread number, as the number is not important for what this test is exercising. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31069 Change-Id: Id80b5c73f09c9e0005efeb494cca5d066ac3bbae
2023-11-14[gdb/testsuite] Fix gdb.threads/stepi-over-clone.exp regexpTom de Vries1-5/+9
I ran into the following FAIL: ... (gdb) PASS: gdb.threads/stepi-over-clone.exp: catch process syscalls continue^M Continuing.^M ^M Catchpoint 2 (call to syscall clone), clone () at \ ../sysdeps/unix/sysv/linux/x86_64/clone.S:78^M warning: 78 ../sysdeps/unix/sysv/linux/x86_64/clone.S: \ No such file or directory^M (gdb) FAIL: gdb.threads/stepi-over-clone.exp: continue ... All but one regexps in the .exp file use "clone\[23\]?" with "?" to also accept "clone", except the failing case. This commit fixes that case to also use "?". Furthermore, there are FAILs like this: ... (gdb) PASS: gdb.threads/stepi-over-clone.exp: third_thread=false: \ non-stop=on: displaced=off: i=0: continue stepi^M [New Thread 0x7ffff7ff8700 (LWP 15301)]^M Hello from the first thread.^M 78 in ../sysdeps/unix/sysv/linux/x86_64/clone.S^M (gdb) XXX: Consume the initial command XXX: Consume new thread line XXX: Consume first worker thread message FAIL: gdb.threads/stepi-over-clone.exp: third_thread=false: non-stop=on: \ displaced=off: i=0: stepi ... because this output is expected instead: ... Hello from the first thread.^M 0x00000000004212cd in clone3 ()^M ... The root cause for the difference is the presence of .debug_line info for clone. Fix this by updating the relevant regexps. Tested on x86_64-linux, specifically: - openSUSE Leap 15.4 (where the FAILs where observed), and - openSUSE Tumbleweed (where the FAILs where not observed). Co-Authored-By: Pedro Alves <pedro@palves.net> Approved-By: Pedro Alves <pedro@palves.net> Change-Id: I74ca9e7d4cfe6af294fd50e8c509fcbad289b78c
2023-11-13Cancel execution command on thread exit, when stepping, nexting, etc.Pedro Alves1-26/+55
If your target has no support for TARGET_WAITKIND_NO_RESUMED events (and no way to support them, such as the yet-unsubmitted AMDGPU target), and you step over thread exit with scheduler-locking on, this is what you get: (gdb) n [Thread ... exited] *hang* Getting back the prompt by typing Ctrl-C may not even work, since no inferior thread is running to receive the SIGINT. Even if it works, it seems unnecessarily harsh. If you started an execution command for which there's a clear thread of interest (step, next, until, etc.), and that thread disappears, then I think it's more user friendly if GDB just detects the situation and aborts the command, giving back the prompt. That is what this commit implements. It does this by explicitly requesting the target to report thread exit events whenever the main resumed thread has a thread_fsm. Note that unlike stepping over a breakpoint, we don't need to enable clone events in this case. With this patch, we get: (gdb) n [Thread 0x7ffff7d89700 (LWP 3961883) exited] Command aborted, thread exited. (gdb) Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I901ab64c91d10830590b2dac217b5264635a2b95
2023-11-13Testcases for stepping over thread exit syscall (PR gdb/27338)Simon Marchi4-0/+324
Add new gdb.threads/step-over-thread-exit.exp and gdb.threads/step-over-thread-exit-while-stop-all-threads.exp testcases, exercising stepping over thread exit syscall. These make use of lib/my-syscalls.S to define the exit syscall. Co-authored-by: Pedro Alves <pedro@palves.net> Reviewed-By: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338 Change-Id: Ie8b2c5747db99b7023463a897a8390d9e814a9c9
2023-11-13Don't resume new threads if scheduler-locking is in effectPedro Alves2-0/+121
If scheduler-locking is in effect, e.g., with "set scheduler-locking on", and you step over a function that spawns a new thread, the new thread is allowed to run free, at least until some event is hit, at which point, whether the new thread is re-resumed depends on a number of seemingly random factors. E.g., if the target is all-stop, and the parent thread hits a breakpoint, and GDB decides the breakpoint isn't interesting to report to the user, then the parent thread is resumed, but the new thread is left stopped. I think that letting the new threads run with scheduler-locking enabled is a defect. This commit fixes that, making use of the new clone events on Linux, and of target_thread_events() on targets where new threads have no connection to the thread that spawned them. Testcase and documentation changes included. Approved-By: Eli Zaretskii <eliz@gnu.org> Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: Ie12140138b37534b7fc1d904da34f0f174aa11ce
2023-11-13Thread options & clone events (Linux GDBserver)Pedro Alves1-6/+0
This patch teaches the Linux GDBserver backend to report clone events to GDB, when GDB has requested them with the GDB_THREAD_OPTION_CLONE thread option, via the new QThreadOptions packet. This shuffles code in linux_process_target::handle_extended_wait around to a more logical order when we now have to handle and potentially report all of fork/vfork/clone. Raname lwp_info::fork_relative -> lwp_info::relative as the field is no longer only about (v)fork. With this, gdb.threads/stepi-over-clone.exp now cleanly passes against GDBserver, so remove the native-target-only requirement from that testcase. Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I3a19bc98801ec31e5c6fdbe1ebe17df855142bb2
2023-11-13Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONEDPedro Alves2-0/+485
(A good chunk of the problem statement in the commit log below is Andrew's, adjusted for a different solution, and for covering displaced stepping too. The testcase is mostly Andrew's too.) This commit addresses bugs gdb/19675 and gdb/27830, which are about stepping over a breakpoint set at a clone syscall instruction, one is about displaced stepping, and the other about in-line stepping. Currently, when a new thread is created through a clone syscall, GDB sets the new thread running. With 'continue' this makes sense (assuming no schedlock): - all-stop mode, user issues 'continue', all threads are set running, a newly created thread should also be set running. - non-stop mode, user issues 'continue', other pre-existing threads are not affected, but as the new thread is (sort-of) a child of the thread the user asked to run, it makes sense that the new threads should be created in the running state. Similarly, if we are stopped at the clone syscall, and there's no software breakpoint at this address, then the current behaviour is fine: - all-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). While stepping the thread of interest all the other threads will be allowed to continue. A newly created thread will be set running, and then stopped once the thread of interest has completed its step. - non-stop mode, user issues 'stepi', stepping will be done in place (as there's no breakpoint to step over). Other threads might be running or stopped, but as with the continue case above, the new thread will be created running. The only possible issue here is that the new thread will be left running after the initial thread has completed its stepi. The user would need to manually select the thread and interrupt it, this might not be what the user expects. However, this is not something this commit tries to change. The problem then is what happens when we try to step over a clone syscall if there is a breakpoint at the syscall address. - For both all-stop and non-stop modes, with in-line stepping: + user issues 'stepi', + [non-stop mode only] GDB stops all threads. In all-stop mode all threads are already stopped. + GDB removes s/w breakpoint at syscall address, + GDB single steps just the thread of interest, all other threads are left stopped, + New thread is created running, + Initial thread completes its step, + [non-stop mode only] GDB resumes all threads that it previously stopped. There are two problems in the in-line stepping scenario above: 1. The new thread might pass through the same code that the initial thread is in (i.e. the clone syscall code), in which case it will fail to hit the breakpoint in clone as this was removed so the first thread can single step, 2. The new thread might trigger some other stop event before the initial thread reports its step completion. If this happens we end up triggering an assertion as GDB assumes that only the thread being stepped should stop. The assert looks like this: infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed. - For both all-stop and non-stop modes, with displaced stepping: + user issues 'stepi', + GDB starts the displaced step, moves thread's PC to the out-of-line scratch pad, maybe adjusts registers, + GDB single steps the thread of interest, [non-stop mode only] all other threads are left as they were, either running or stopped. In all-stop, all other threads are left stopped. + New thread is created running, + Initial thread completes its step, GDB re-adjusts its PC, restores/releases scratchpad, + [non-stop mode only] GDB resumes the thread, now past its breakpoint. + [all-stop mode only] GDB resumes all threads. There is one problem with the displaced stepping scenario above: 3. When the parent thread completed its step, GDB adjusted its PC, but did not adjust the child's PC, thus that new child thread will continue execution in the scratch pad, invoking undefined behavior. If you're lucky, you see a crash. If unlucky, the inferior gets silently corrupted. What is needed is for GDB to have more control over whether the new thread is created running or not. Issue #1 above requires that the new thread not be allowed to run until the breakpoint has been reinserted. The only way to guarantee this is if the new thread is held in a stopped state until the single step has completed. Issue #3 above requires that GDB is informed of when a thread clones itself, and of what is the child's ptid, so that GDB can fixup both the parent and the child. When looking for solutions to this problem I considered how GDB handles fork/vfork as these have some of the same issues. The main difference between fork/vfork and clone is that the clone events are not reported back to core GDB. Instead, the clone event is handled automatically in the target code and the child thread is immediately set running. Note we have support for requesting thread creation events out of the target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported for the new/child thread. That would be sufficient to address in-line stepping (issue #1), but not for displaced-stepping (issue #3). To handle displaced-stepping, we need an event that is reported to the _parent_ of the clone, as the information about the displaced step is associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED includes no indication of which thread is the parent that spawned the new child. In fact, for some targets, like e.g., Windows, it would be impossible to know which thread that was, as thread creation there doesn't work by "cloning". The solution implemented here is to model clone on fork/vfork, and introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except that we end up with a new thread in the same process, instead of a new thread of a new process. Like FORKED and VFORKED, THREAD_CLONED waitstatuses have a child_ptid property, and the child is held stopped until GDB explicitly resumes it. This addresses the in-line stepping case (issues #1 and #2). The infrun code that handles displaced stepping fixup for the child after a fork/vfork event is thus reused for THREAD_CLONE, with some minimal conditions added, addressing the displaced stepping case (issue #3). The native Linux backend is adjusted to unconditionally report TARGET_WAITKIND_THREAD_CLONED events to the core. Following the follow_fork model in core GDB, we introduce a target_follow_clone target method, which is responsible for making the new clone child visible to the rest of GDB. Subsequent patches will add clone events support to the remote protocol and gdbserver. displaced_step_in_progress_thread becomes unused with this patch, but a new use will reappear later in the series. To avoid deleting it and readding it back, this patch marks it with attribute unused, and the latter patch removes the attribute again. We need to do this because the function is static, and with no callers, the compiler would warn, (error with -Werror), breaking the build. This adds a new gdb.threads/stepi-over-clone.exp testcase, which exercises stepping over a clone syscall, with displaced stepping vs inline stepping, and all-stop vs non-stop. We already test stepping over clone syscalls with gdb.base/step-over-syscall.exp, but this test uses pthreads, while the other test uses raw clone, and this one is more thorough. The testcase passes on native GNU/Linux, but fails against GDBserver. GDBserver will be fixed by a later patch in the series. Co-authored-by: Andrew Burgess <aburgess@redhat.com> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675 Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830 Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e Reviewed-By: Andrew Burgess <aburgess@redhat.com>
2023-11-13gdb/linux: Delete all other LWPs immediately on ptrace exec eventPedro Alves2-0/+113
I noticed that on an Ubuntu 20.04 system, after a following patch ("Step over clone syscall w/ breakpoint, TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp was passing cleanly, but still, we'd end up with four new unexpected GDB core dumps: === gdb Summary === # of unexpected core files 4 # of expected passes 48 That said patch is making the pre-existing gdb.threads/step-over-exec.exp testcase (almost silently) expose a latent problem in gdb/linux-nat.c, resulting in a GDB crash when: #1 - a non-leader thread execs #2 - the post-exec program stops somewhere #3 - you kill the inferior Instead of #3 directly, the testcase just returns, which ends up in gdb_exit, tearing down GDB, which kills the inferior, and is thus equivalent to #3 above. Vis (after said patch is applied): $ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true ... (top-gdb) r ... (gdb) b main ... (gdb) r ... Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69 69 argv0 = argv[0]; (gdb) c Continuing. [New Thread 0x7ffff7d89700 (LWP 2506975)] Other going in exec. Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28 28 foo (); (gdb) k ... Thread 1 "gdb" received signal SIGSEGV, Segmentation fault. 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 393 return m_suspend.waitstatus_pending_p; (top-gdb) bt #0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393 #1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345 #2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564 #3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284 #4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278 #5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247 #6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864 #7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590 #8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911 ... The root of the problem is that when a non-leader LWP execs, it just changes its tid to the tgid, replacing the pre-exec leader thread, becoming the new leader. There's no thread exit event for the execing thread. It's as if the old pre-exec LWP vanishes without trace. The ptrace man page says: "PTRACE_O_TRACEEXEC (since Linux 2.5.46) Stop the tracee at the next execve(2). A waitpid(2) by the tracer will return a status value such that status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8)) If the execing thread is not a thread group leader, the thread ID is reset to thread group leader's ID before this stop. Since Linux 3.0, the former thread ID can be retrieved with PTRACE_GETEVENTMSG." When the core of GDB processes an exec events, it deletes all the threads of the inferior. But, that is too late -- deleting the thread does not delete the corresponding LWP, so we end leaving the pre-exec non-leader LWP stale in the LWP list. That's what leads to the crash above -- linux_nat_target::kill iterates over all LWPs, and after the patch in question, that code will look for the corresponding thread_info for each LWP. For the pre-exec non-leader LWP still listed, won't find one. This patch fixes it, by deleting the pre-exec non-leader LWP (and thread) from the LWP/thread lists as soon as we get an exec event out of ptrace. GDBserver does not need an equivalent fix, because it is already doing this, as side effect of mourning the pre-exec process, in gdbserver/linux-low.cc: else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events) { ... /* Delete the execing process and all its threads. */ mourn (proc); switch_to_thread (nullptr); The crash with gdb.threads/step-over-exec.exp is not observable on newer systems, which postdate the glibc change to move "libpthread.so" internals to "libc.so.6", because right after the exec, GDB traps a load event for "libc.so.6", which leads to GDB trying to open libthread_db for the post-exec inferior, and, on such systems that succeeds. When we load libthread_db, we call linux_stop_and_wait_all_lwps, which, as the name suggests, stops all lwps, and then waits to see their stops. While doing this, GDB detects that the pre-exec stale LWP is gone, and deletes it. If we use "catch exec" to stop right at the exec before the "libc.so.6" load event ever happens, and issue "kill" right there, then GDB crashes on newer systems as well. So instead of tweaking gdb.threads/step-over-exec.exp to cover the fix, add a new gdb.threads/threads-after-exec.exp testcase that uses "catch exec". The test also uses the new "maint info linux-lwps" command if testing on Linux native, which also exposes the stale LWP problem with an unfixed GDB. Also tweak a comment in infrun.c:follow_exec referring to how linux-nat.c used to behave, as it would become stale otherwise. Reviewed-By: Andrew Burgess <aburgess@redhat.com> Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
2023-11-08gdb: call update_thread_list after completing an inferior callAndrew Burgess2-0/+263
I noticed that if GDB is using a remote or extended-remote target, then, if an inferior call caused a new thread to appear, or for an existing thread to exit, then these events are not reported to the user. The problem is that for these targets GDB relies on a call to update_thread_list to learn about changes to the inferior's thread list. If GDB doesn't pass through the normal stop code then GDB will not call update_thread_list, and so will not report changes in the thread list. This commit adds an additional update_thread_list call, after which thread events are correctly reported.
2023-11-08gdb: call update_thread_list for $_inferior_thread_count functionAndrew Burgess2-0/+255
I noticed that sometimes the value returned by $_inferior_thread_count can become out of sync with the actual thread count of the inferior, and will disagree with the number of threads reported by 'info threads'. This commit fixes this issue. The cause of the problem is that 'info threads' includes a call to update_thread_list, this can be seen in print_thread_info_1 in thread.c, while $_inferior_thread_count doesn't include a similar call, see the function inferior_thread_count_make_value also in thread.c. Of course, this is only a problem when GDB is running on a target that relies on update_thread_list calls to learn about new threads, e.g. remote or extended-remote targets. Native targets generally learn about new threads as soon as they appear and will not have this problem. I ran into this issue when writing a test for the next commit which uses inferior function calls to add an remove threads from an inferior. But for testing I've made use of non-stop mode and asynchronous inferior execution; by reading the inferior state I can know when a new thread has been created, at which point I can print $_inferior_thread_count while the inferior is still running. This is important, if I stop the inferior then GDB will pass through an update_thread_list call in the normal stop code, which will synchronise the thread list, after which $_inferior_thread_count will report the correct value. With this change in place $_inferior_thread_count is now correct.
2023-10-26gdb: handle main thread exiting during detachAndrew Burgess2-0/+201
Overview ======== Consider the following situation, GDB is in non-stop mode, the main thread is running while a second thread is stopped. The user has the second thread selected as the current thread and asks GDB to detach. At the exact moment of detach the main thread exits. This situation currently causes crashes, assertion failures, and unexpected errors to be reported from GDB for both native and remote targets. This commit addresses this situation for native and remote targets. There are a number of different fixes, but all are required in order to get this functionality working correct for native and remote targets. Native Linux Target =================== For the native Linux target, detaching is handled in the function linux_nat_target::detach. In here we call stop_wait_callback for each thread, and it is this callback that will spot that the main thread has exited. GDB then detaches from everything except the main thread by calling detach_callback. After this the first problem is this assert: /* Only the initial process should be left right now. */ gdb_assert (num_lwps (pid) == 1); The num_lwps call will return 0 as the main thread has exited and all of the other threads have now been detached. I fix this by changing the assert to allow for 0 or 1 lwps at this point. As the 0 case can only happen in non-stop mode, the assert becomes: gdb_assert (num_lwps (pid) == 1 || (target_is_non_stop_p () && num_lwps (pid) == 0)); The next problem is that we do: main_lwp = find_lwp_pid (ptid_t (pid)); and then proceed assuming that main_lwp is not nullptr. In the case that the main thread has exited though, main_lwp will be nullptr. However, we only need main_lwp so that GDB can detach from the thread. If the main thread has exited, and GDB has already detached from every other thread, then GDB has finished detaching, GDB can skip the calls that try to detach from the main thread, and then tell the user that the detach was a success. For Remote Targets ================== On remote targets there are two problems. First is that when the exit occurs during the early phase of the detach, we see the stop notification arrive while GDB is removing the breakpoints ahead of the detach. The 'set debug remote on' trace looks like this: [remote] Sending packet: $z0,7f1648fe0241,1#35 [remote] Notification received: Stop:W0;process:2a0ac8 # At this point an unpatched gdbserver segfaults, and the connection # is broken. A patched gdbserver continues as below... [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff00a8,1#68 [remote] Packet received: E01 [remote] Sending packet: $z0,7f1648ff132f,1#6b [remote] Packet received: E01 [remote] Sending packet: $D;2a0ac8#3e [remote] Packet received: E01 I was originally running into Segmentation Faults, from within gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This function calls current_process() and then dereferences the result to find the breakpoint list. However, in our case, the current process has already exited, and so the current_process() call returns nullptr. At the point of failure, the gdbserver backtrace looks like this: #0 0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982 #1 0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093 #2 0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372 #3 0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498 ... The problem is that, as a result non-stop being on, the process exiting is only reported back to GDB after the request to remove a breakpoint has been sent. Clearly gdbserver can't actually remove this breakpoint -- the process has already exited -- so I think the best solution is for gdbserver just to report an error, which is what I've done. The second problem I ran into was on the gdb side, as the process has already exited, but GDB has not yet acknowledged the exit event, the detach -- the 'D' packet in the above trace -- fails. This was being reported to the user with a 'Can't detach process' error. As the test actually calls detach from Python code, this error was then becoming a Python exception. Though clearly the detach has returned an error, and so, maybe, having GDB throw an error would be fine, I think in this case, there's a good argument that the remote error can be ignored -- if GDB tries to detach and gets back an error, and if there's a pending exit event for the pid we tried to detach, then just ignore the error and pretend the detach worked fine. We could possibly check for a pending exit event before sending the detach packet, however, I believe that it might be possible (in non-stop mode) for the stop notification to arrive after the detach is sent, but before gdbserver has started processing the detach. In this case we would still need to check for pending stop events after seeing the detach fail, so I figure there's no point having two checks -- we just send the detach request, and if it fails, check to see if the process has already exited. Testing ======= In order to test this issue I needed to ensure that the exit event arrives at the same time as the detach call. The window of opportunity for getting the exit to arrive is so small I've never managed to trigger this in real use -- I originally spotted this issue while working on another patch, which did manage to trigger this issue. However, if we trigger both the exit and the detach from a single Python function then we never return to GDB's event loop, as such GDB never processes the exit event, and so the first time GDB gets a chance to see the exit is during the detach call. And so that is the approach I've taken for testing this patch. Tested-By: Kevin Buettner <kevinb@redhat.com> Approved-By: Kevin Buettner <kevinb@redhat.com>
2023-10-06process-dies-while-detaching.exp: Exit early if GDB misses sync breakpointThiago Jung Bauermann1-5/+5
I'm seeing a lot of variability in the failures of gdb.threads/process-dies-while-detaching.exp on aarch64-linux. On this platform, a problem yet to be investigated causes GDB to miss the _exit breakpoint. What happens next is random because after missing that breakpoint, GDB is out of sync with the inferior. This causes the tests following that point in the testcase to fail in a random way. In this scenario it's better to exit the testcase early to avoid random results in the testsuite. We are relying on gdb_continue_to_breakpoint to return the result of gdb_test_multiple. This is already the case because in Tcl the return value of a function is the return value of the last command it runs. But change gdb_continue_to_breakpoint to explicitly return this value, to make it clear this is the intended behaviour. Tested on aarch64-linux. Tested-By: Guinevere Larsen <blarsen@redhat.com> Approved-By: Andrew Burgess <aburgess@redhat.com>
2023-09-28gdb/testsuite: Add relative versus absolute LD_LIBRARY_PATH testKevin Buettner3-0/+145
At one time, circa 2006, there was a bug, which was presumably fixed without adding a test case: If you provided some relative path to the shared library, such as with export LD_LIBRARY_PATH=. then gdb would fail to match the shared library name during the TLS lookup. I think there may have been a bit more to it than is provided by that explanation, since the test also takes care to split the debug info into a separate file. In any case, this commit is based on one of Red Hat's really old local patches. I've attempted to update it and remove a fair amount of cruft, hopefully without losing any critical elements from the test. Testing on Fedora 38 (correctly) shows 1 unsupported test for native-gdbserver and 5 PASSes for the native target as well as native-extended-gdbserver. In his review of v1 of this patch, Lancelot SIX observed that 'thread_local' could be used in place of '__thread' in the C source files. But it only became available via the standard in C11, so I used additional_flags=-std=c11 for compiling both the shared object and the main program. Also, while testing with CC_FOR_TARGET=clang, I found that 'additional_flags=-Wl,-soname=${binsharedbase}' caused clang to complain that this linker flag was unused when compiling the source file, so I moved this linker option to 'ldflags='. My testing for this v2 patch shows the same results as with v1, but I've done additional testing with CC_FOR_TARGET=clang as well. The results are the same as when gcc is used. Co-Authored-by: Jan Kratochvil <jan@jankratochvil.net> Reviewed-By: Lancelot Six <lancelot.six@amd.com>
2023-09-27Adjust gdb.thread/pthreads.exp for CygwinPedro Alves1-7/+7
The Cygwin runtime spawns a few extra threads, so using hardcoded thread numbers in tests rarely works correctly. Thankfully, this testcase already records the ids of the important threads in globals. It just so happens that they are not used in a few tests. This commit fixes that. With this, the test passes cleanly on Cygwin [1]. Still passes cleanly on x86-64 GNU/Linux. [1] - with system GDB. Upstream GDB is missing a couple patches Cygwin carries downstream. Approved-By: Tom Tromey <tom@tromey.com> Change-Id: I01bf71fcb44ceddea8bd16b933b10b964749a6af
2023-09-27In gdb.threads/pthreads.c, handle pthread_attr_setscope ENOTSUPPedro Alves1-1/+2
On Cygwin, I see: (gdb) PASS: gdb.threads/pthreads.exp: break thread1 continue Continuing. pthread_attr_setscope 1: Not supported (134) [Thread 3732.0x265c exited with code 1] [Thread 3732.0x2834 exited with code 1] [Thread 3732.0x2690 exited with code 1] Program terminated with signal SIGHUP, Hangup. The program no longer exists. (gdb) FAIL: gdb.threads/pthreads.exp: Continue to creation of first thread ... and then a set of cascading failures. Fix this by treating ENOTSUP the same way as if PTHREAD_SCOPE_SYSTEM were not defined. I.e., ignore ENOTSUP errors, and proceed with testing. Approved-By: Tom Tromey <tom@tromey.com> Change-Id: Iea68ff8b9937570726154f36610c48ef96101871
2023-09-27Fix gdb.threads/pthreads.exp error handling/printingPedro Alves1-8/+22
On Cygwin, I noticed: (gdb) PASS: gdb.threads/pthreads.exp: break thread1 continue Continuing. pthread_attr_setscope 1: No error [Thread 8732.0x28f8 exited with code 1] [Thread 8732.0xb50 exited with code 1] [Thread 8732.0x17f8 exited with code 1] Program terminated with signal SIGHUP, Hangup. The program no longer exists. (gdb) FAIL: gdb.threads/pthreads.exp: Continue to creation of first thread Note "No error" in "pthread_attr_setscope 1: No error". That is a bug in the test. It is using perror, but that prints errno, while the pthread functions return the error directly. Fix all cases of the same problem, by adding a new print_error function and using it. We now get: ... pthread_attr_setscope 1: Not supported (134) ... Approved-By: Tom Tromey <tom@tromey.com> Change-Id: I972ebc931b157bc0f9084e6ecd8916a5e39238f5