Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix all trailing-text-in-parentheses duplicates exposed by previous patch.
Tested on x86_64-linux and aarch64-linux.
|
|
When running test-case gdb.threads/leader-exit-attach.exp with target board
native-extended-gdbserver I run into:
...
(gdb) KFAIL: $exp: attach (PRMS: gdb/31555)
print $_inferior_thread_count^M
$1 = 0^M
(gdb) KPASS: $exp: get valueof "$_inferior_thread_count" (PRMS server/31554)
...
The PR mentioned in the KPASS, PR31554 was fixed by commit f1fc8dc2dcc
("Fix "attach" failure handling with GDBserver"), and consequently the PR is
closed.
Fix this by removing the corresponding kfail.
Tested on x86_64-linux.
|
|
With test-case gdb.threads/leader-exit-attach.exp and check-read1, I run into:
...
(gdb) attach 18591^M
Attaching to program: leader-exit-attach, process 18591^M
warning: process 18591 is a zombie - the process has already terminatedKFAIL: $exp: attach (PRMS: gdb/31555)
^M
ptrace: Operation not permitted.^M
(gdb) FAIL: $exp: get valueof "$_inferior_thread_count"
...
The problem is that the gdb_test_multiple in the test-case doesn't consume the
prompt in all clauses:
...
gdb_test_multiple "attach $testpid" "attach" {
-re "Attaching to process $testpid failed.*" {
# GNU/Linux gdbserver. Linux ptrace does not let you attach
# to zombie threads.
setup_kfail "gdb/31555" *-*-linux*
fail $gdb_test_name
}
-re "warning: process $testpid is a zombie - the process has already terminated.*" {
# Native GNU/Linux. Linux ptrace does not let you attach to
# zombie threads.
setup_kfail "gdb/31555" *-*-linux*
fail $gdb_test_name
}
-re "Attaching to program: $escapedbinfile, process $testpid.*$gdb_prompt $" {
pass $gdb_test_name
set attached 1
}
}
...
Fix this by using -wrap in the first two clauses.
While we're at it, also use -wrap in the third clause.
Tested on x86_64-linux.
|
|
On fedora rawhide, I ran into:
...
(gdb) continue^M
Continuing.^M
^M
Catchpoint 2 (call to syscall clone3), 0x000000000042097d in __clone3 ()^M
(gdb) FAIL: gdb.threads/stepi-over-clone.exp: continue
...
Fix this by updating a regexp to also recognize __clone3.
Tested on x86_64-linux.
Tested-By: Guinevere Larsen <blarsen@redhat.com>
|
|
gdb.threads/attach-many-short-lived-threads.exp
When running test-case gdb.threads/attach-many-short-lived-threads.exp, I run
regularly into PR26286:
...
(gdb) continue^M
Continuing.^M
[LWP ... exited]^M
...
[LWP ... exited]^M
^M
Program terminated with signal SIGTRAP, Trace/breakpoint trap.^M
The program no longer exists.^M
(gdb) FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 9: \
break at break_fn: 1
...
Add a kfail for this, such that we have:
...
(gdb) KFAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 9: \
break at break_fn: 1 (PRMS: threads/26286)
...
Reviewed-By: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Tested on x86_64-linux.
|
|
While working on another patch I needed to pass -Wl,-soname,NAME as a
compiler flag. I initially looked for other tests that did this, and
found a few examples, so I copied what they did.
But when I checked the gdb.log file I noticed that we were actually
getting -Wl,-soname passed twice.
I tracked the repeated option to 'proc gdb_compile_shlib_1' in
lib/gdb.exp. It turns out that we always add -Wl,-soname when
compiling a shared library.
Here's an example of a build command from gdb.base/prelink.exp:
builtin_spawn -ignore SIGHUP gcc -fno-stack-protector \
/tmp/build/gdb/testsuite/outputs/gdb.base/prelink/prelink-lib.c.o \
-fdiagnostics-color=never -shared -g \
-Wl,-soname,prelink.so -Wl,-soname,prelink.so -lm \
-o /tmp/build/gdb/testsuite/outputs/gdb.base/prelink/prelink.so
Notice that '-Wl,-soname,prelink.so' is repeated.
I believe that all of the places where tests add '-Wl,-soname,NAME' as
a build option, are unnecessary.
In this commit I propose we remove them all.
As part of this change I've switched from calling gdb_compile_shlib
directly, to instead call build_executable and adding the 'shlib'
flag.
I've tested with gcc and clang and see no changes in the test results
after this commit. All the compile commands still have -Wl,-soname
added, but now it's only added once, from within lib/gdb.exp.
There should be no change in what is tested after this commit.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
In AIX when a thread exits we were not showing that a thread exit event happened
and GDB continued to keep the terminated threads.
If we have terminated threads then the UI on info threads command will look like
(gdb) info threads
Id Target Id Frame
* 1 Thread 1 (tid 26607979, running) 0xd0611d70 in _p_nsleep () from /usr/lib/libpthreads.a(_shr_xpg5.o)
2 Thread 258 (tid 30998799, finished) aix-thread: ptrace (52, 30998799) returned -1 (errno = 3 The process does not exist.)
If we see the frame is not getting displayed correctly.
The reason for the same is that in AIX we were not managing thread states. In particular we do not know
when a thread terminates.
The reason being in sync_threadlists () the pbuf and gbuf lists remain the same though certain threads exit.
This patch is a fix to the same.
Also certain UI is changed.
On a new thread born and exit the UI in AIX will be similar to Linux with both user and kernel thread information.
[New Thread 258 (tid 32178533)]
[New Thread 515 (tid 30343651)]
[New Thread 772 (tid 33554909)]
[New Thread 1029 (tid 24969489)]
[New Thread 1286 (tid 18153945)]
[New Thread 1543 (tid 30736739)]
[Thread 258 (tid 32178533) exited]
[Thread 515 (tid 30343651) exited]
[Thread 772 (tid 33554909) exited]
[Thread 1029 (tid 24969489) exited]
[Thread 1286 (tid 18153945) exited]
[Thread 1543 (tid 30736739) exited]
and info threads will look like
(gdb) info threads
Id Target Id Frame
* 1 Thread 1 (tid 31326579) ([running]) 0xd0611d70 in _p_nsleep () from /usr/lib/libpthread.a(_shr_xpg5.o)
Also a small change to testcase gdb.threads/thread_events.exp to make sure this test runs on AIX as well.
|
|
When running the testsuite on a system with kernel.yama.ptrace_scope set to 1,
we run into attach failures.
Fix this by recognizing "ptrace: Operation not permitted" in
can_spawn_for_attach.
Tested on aarch64-linux and x86_64-linux.
Approved-By: Pedro Alves <pedro@palves.net>
|
|
There's a pattern of using:
...
set saved_gdbflags $GDBFLAGS
set GDBFLAGS "$GDBFLAGS ..."
<do something with GDBFLAGS>
set GDBFLAGS $saved_gdbflags
...
Simplify this by using save_vars:
...
save_vars { GDBFLAGS } {
set GDBFLAGS "$GDBFLAGS ..."
<do something with GDBFLAGS>
}
...
Tested on x86_64-linux.
|
|
This is similar to the previous patch, but for gdb_protocol_is_remote.
gdb_is_target_remote and its MI cousin mi_is_target_remote, use "maint
print target-stack", which is unnecessary when checking whether
gdb_protocol is "remote" or "extended-remote" would do. Checking
gdb_protocol is more efficient, and can be done before starting GDB
and running to main, unlike gdb_is_target_remote/mi_is_target_remote.
This adds a new gdb_protocol_is_remote procedure, and uses it in place
of gdb_is_target_remote/mi_is_target_remote throughout.
There are no uses of gdb_is_target_remote/mi_is_target_remote left
after this. Those will be eliminated in a following patch.
In some spots, we no longer need to defer the check until after
starting GDB, so the patch adjusts accordingly.
Change-Id: I90267c132f942f63426f46dbca0b77dbfdf9d2ef
Approved-By: Tom Tromey <tom@tromey.com>
|
|
gdb_is_target_native uses "maint print target-stack", which is
unnecessary when checking whether gdb_protocol is empty would do.
Checking gdb_protocol is more efficient, and can be done before
starting GDB and running to main, unlike gdb_is_target_native.
This adds a new gdb_protocol_is_native procedure, and uses it in place
of gdb_is_target_native.
At first, I thought that we'd end up with a few testcases needing to
use gdb_is_target_native still, especially multi-target tests that
connect to targets different from the default board target, but no,
actually all uses of gdb_is_target_native could be converted.
gdb_is_target_native will be eliminated in a following patch.
In some spots, we no longer need to defer the check until after
starting GDB, so the patch adjusts accordingly.
Change-Id: Ia706232dbffac70f9d9740bcb89c609dbee5cee3
Approved-By: Tom Tromey <tom@tromey.com>
|
|
With test-case gdb.threads/threadcrash.exp using host board local-remote-host
and target board remote-gdbserver-on-localhost I run into:
...
(gdb) PASS: gdb.threads/threadcrash.exp: test_gcore: continue to crash
gcore $outputs/gdb.threads/threadcrash/threadcrash.gcore^M
Failed to open '$outputs/gdb.threads/threadcrash/threadcrash.gcore' for output.^M
(gdb) FAIL: gdb.threads/threadcrash.exp: test_gcore: saving gcore
UNSUPPORTED: gdb.threads/threadcrash.exp: test_gcore: couldn't generate gcore file
...
The problem is that the gcore command tries to save a file on a remote host,
but the filename is a location on build.
Fix this by using host_standard_output_file.
Tested on x86_64-linux.
|
|
After installing glibc debuginfo, I ran into:
...
FAIL: gdb.threads/threadcrash.exp: test_live_inferior: \
$thread_count == [llength $test_list]
...
This happens because the clause:
...
-re "^\r\n${hs}main$hs$eol" {
...
which is intended to match only:
...
#1 <hex> in main () at threadcrash.c:423^M
...
also matches "remaining" in:
...
#1 <hex> in __GI___nanosleep (requested_time=<hex>, remaining=<hex>) at \
nanosleep.c:27^M
...
Fix this by checking for "in main" instead.
Tested on x86_64-linux.
|
|
Add a new testcase for exercising attaching to a process after its
main thread has exited.
This is not possible on Linux, the kernel does not allow attaching to
a zombie task, so the test is kfailed there. It is possible however
on Windows at least, and was the scenario addressed by the Windows
backend fix in
https://sourceware.org/legacy-ml/gdb-patches/2003-12/msg00479.html,
nowadays PR threads/8153, back in 2003.
Passes cleanly on Cygwin.
KFAILed on GNU/Linux native and gdbserver.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=8153
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31554
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31555
Change-Id: Ib554f92f68c965bb4603cdf2aadb55ca45ded53b
|
|
When running test-case gdb.threads/access-mem-running-thread-exit.exp with
clang, we run into:
...
(gdb) print global_var = 555^M
No symbol "global_var" in current context.^M
(gdb) FAIL: gdb.threads/access-mem-running-thread-exit.exp: all-stop: \
access mem (write to global_var, inf=2, iter=1)
...
The problem is that clang removes the unused variable.
Fix this in the same way as done in commit b4f767131f7
("Fix gdb.base/align-*.exp and Clang + LTO and AIX GCC"), by incrementing the
variable.
Tested on x86_64-linux with gcc and clang.
|
|
When the XML support was disabled at compile time,
the test case gdb.threads/stepi-over-clone.exp fails
with lots of time-outs, which can be annoying.
This makes the test case unsupported instead.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
When running test-case gdb.threads/attach-stopped.exp on aarch64-linux, using
the manjaro linux distro, I get:
...
(gdb) thread apply all bt^M
^M
Thread 2 (Thread 0xffff8d8af120 (LWP 278116) "attach-stopped"):^M
#0 0x0000ffff8d964864 in clock_nanosleep () from /usr/lib/libc.so.6^M
#1 0x0000ffff8d969cac in nanosleep () from /usr/lib/libc.so.6^M
#2 0x0000ffff8d969b68 in sleep () from /usr/lib/libc.so.6^M
#3 0x0000aaaade370828 in func (arg=0x0) at attach-stopped.c:29^M
#4 0x0000ffff8d930aec in ?? () from /usr/lib/libc.so.6^M
#5 0x0000ffff8d99a5dc in ?? () from /usr/lib/libc.so.6^M
^M
Thread 1 (Thread 0xffff8db62020 (LWP 278111) "attach-stopped"):^M
#0 0x0000ffff8d92d2d8 in ?? () from /usr/lib/libc.so.6^M
#1 0x0000ffff8d9324b8 in ?? () from /usr/lib/libc.so.6^M
#2 0x0000aaaade37086c in main () at attach-stopped.c:45^M
(gdb) FAIL: gdb.threads/attach-stopped.exp: threaded: attach2 to stopped bt
...
The problem is that the test-case expects to see start_thread:
...
gdb_test "thread apply all bt" ".*sleep.*start_thread.*" \
"$threadtype: attach2 to stopped bt"
...
but lack of symbols makes that impossible.
Fix this by allowing " in ?? () from " as well.
Tested on aarch64-linux.
PR testsuite/31451
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31451
|
|
We now have unwind-on-timeout and unwind-on-terminating-exception, and
then the odd one out unwindonsignal.
I'm not a great fan of these squashed together command names, so in
this commit I propose renaming this to unwind-on-signal.
Obviously I've added the hidden alias unwindonsignal so any existing
GDB scripts will keep working.
There's one test that I've extended to test the alias works, but in
most of the other test scripts I've changed over to use the new name.
The docs are updated to reference the new name.
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Tested-By: Luis Machado <luis.machado@arm.com>
Tested-By: Keith Seitz <keiths@redhat.com>
|
|
Now that inferior function calls can timeout (see the recent
introduction of direct-call-timeout and indirect-call-timeout), this
commit adds a new setting unwind-on-timeout.
This new setting is just like the existing unwindonsignal and
unwind-on-terminating-exception, but the new setting will cause GDB to
unwind the stack if an inferior function call times out.
The existing inferior function call timeout tests have been updated to
cover the new setting.
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Tested-By: Luis Machado <luis.machado@arm.com>
Tested-By: Keith Seitz <keiths@redhat.com>
|
|
In the previous commits I have been working on improving inferior
function call support. One thing that worries me about using inferior
function calls from a conditional breakpoint is: what happens if the
inferior function call fails?
If the failure is obvious, e.g. the thread performing the call
crashes, or hits a breakpoint, then this case is already well handled,
and the error is reported to the user.
But what if the thread performing the inferior call just deadlocks?
If the user made the call from a 'print' or 'call' command, then the
user might have some expectation of when the function call should
complete, and, when this time limit is exceeded, the user
will (hopefully) interrupt GDB and regain control of the debug
session.
But, when the inferior function call is from a breakpoint condition it
is much harder to understand that GDB is deadlocked within an inferior
call. Maybe the breakpoint hasn't been hit yet? Or maybe the
condition was always false? Or maybe GDB is deadlocked in an inferior
call? The only way to know for sure is for the user to periodically
interrupt the inferior, check on the state of all the threads, and
then continue.
Additionally, the focus of the previous commit was inferior function
calls, from a conditional breakpoint, in a multi-threaded inferior.
This opens up a whole new set of potential failure conditions. For
example, what if the function called relies on interaction with some
other thread, and the other thread crashes? Or hits a breakpoint?
Given how inferior function calls work (in a synchronous manner), a
stop event in some other thread is going to be ignored while the
inferior function call is being executed as part of a breakpoint
condition, and this means that GDB could get stuck waiting for the
original condition thread, which will now never complete.
In this commit I propose a solution to this problem. A timeout. For
targets that support async-mode we can install an event-loop timer
before starting the inferior function call. When the timer expires we
will stop the thread performing the inferior function call. With this
mechanism in place a user can be sure that any inferior call they make
will either complete, or timeout eventually.
Adding a timer like this is obviously a change in behaviour for the
more common 'call' and 'print' uses of inferior function calls, so, in
this patch, I propose having two different timers. One I call the
'direct-call-timeout', which is used for 'call' and 'print' commands.
This timeout is by default set to unlimited, which, not surprisingly,
means there is no timeout in place.
A second timer, which I've called 'indirect-call-timeout', is used for
inferior function calls from breakpoint conditions. This timeout has
a default value of 30 seconds. This is a reasonably long time to
wait, and hopefully should be enough in most cases to allow the
inferior call to complete. An inferior call that takes more than 30
seconds, which is installed on a breakpoint condition is really going
to slow down the debug session, so hopefully this is not a common use
case.
The user is, of course, free to reduce, or increase the timeout value,
and can always use Ctrl-c to interrupt an inferior function call, but
this timeout will ensure that GDB will stop at some point.
The new commands added by this commit are:
set direct-call-timeout SECONDS
show direct-call-timeout
set indirect-call-timeout SECONDS
show indirect-call-timeout
These new timeouts do depend on async-mode, so, if async-mode is
disabled (maint set target-async off), or not supported (e.g. target
sim), then the timeout is treated as unlimited (that is, no timeout is
set).
For targets that "fake" non-async mode, e.g. Linux native, where
non-async mode is really just async mode, but then we park the target
in a sissuspend, we could easily fix things so that the timeouts still
work, however, for targets that really are not async aware, like the
simulator, fixing things so that timeouts work correctly would be a
much bigger task - that effort would be better spent just making the
target async-aware. And so, I'm happy for now that this feature will
only work on async targets.
The two new show commands will display slightly different text if the
current target is a non-async target, which should allow users to
understand what's going on.
There's a somewhat random test adjustment needed in gdb.base/help.exp,
the test uses a regexp with the apropos command, and expects to find a
single result. Turns out the new settings I added also matched the
regexp, which broke the test. I've updated the regexp a little to
exclude my new settings.
Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Tested-By: Luis Machado <luis.machado@arm.com>
Tested-By: Keith Seitz <keiths@redhat.com>
|
|
This commit fixes bug PR 28942, that is, creating a conditional
breakpoint in a multi-threaded inferior, where the breakpoint
condition includes an inferior function call.
Currently, when a user tries to create such a breakpoint, then GDB
will fail with:
(gdb) break infcall-from-bp-cond-single.c:61 if (return_true ())
Breakpoint 2 at 0x4011fa: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-single.c, line 61.
(gdb) continue
Continuing.
[New Thread 0x7ffff7c5d700 (LWP 2460150)]
[New Thread 0x7ffff745c700 (LWP 2460151)]
[New Thread 0x7ffff6c5b700 (LWP 2460152)]
[New Thread 0x7ffff645a700 (LWP 2460153)]
[New Thread 0x7ffff5c59700 (LWP 2460154)]
Error in testing breakpoint condition:
Couldn't get registers: No such process.
An error occurred while in a function called from GDB.
Evaluation of the expression containing the function
(return_true) will be abandoned.
When the function is done executing, GDB will silently stop.
Selected thread is running.
(gdb)
Or, in some cases, like this:
(gdb) break infcall-from-bp-cond-simple.c:56 if (is_matching_tid (arg, 1))
Breakpoint 2 at 0x401194: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-simple.c, line 56.
(gdb) continue
Continuing.
[New Thread 0x7ffff7c5d700 (LWP 2461106)]
[New Thread 0x7ffff745c700 (LWP 2461107)]
../../src.release/gdb/nat/x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
The precise error depends on the exact thread state; so there's race
conditions depending on which threads have fully started, and which
have not. But the underlying problem is always the same; when GDB
tries to execute the inferior function call from within the breakpoint
condition, GDB will, incorrectly, try to resume threads that are
already running - GDB doesn't realise that some threads might already
be running.
The solution proposed in this patch requires an additional member
variable thread_info::in_cond_eval. This flag is set to true (in
breakpoint.c) when GDB is evaluating a breakpoint condition.
In user_visible_resume_ptid (infrun.c), when the in_cond_eval flag is
true, then GDB will only try to resume the current thread, that is,
the thread for which the breakpoint condition is being evaluated.
This solves the problem of GDB trying to resume threads that are
already running.
The next problem is that inferior function calls are assumed to be
synchronous, that is, GDB doesn't expect to start an inferior function
call in thread #1, then receive a stop from thread #2 for some other,
unrelated reason. To prevent GDB responding to an event from another
thread, we update fetch_inferior_event and do_target_wait in infrun.c,
so that, when an inferior function call (on behalf of a breakpoint
condition) is in progress, we only wait for events from the current
thread (the one evaluating the condition).
In do_target_wait I had to change the inferior_matches lambda
function, which is used to select which inferior to wait on.
Previously the logic was this:
auto inferior_matches = [&wait_ptid] (inferior *inf)
{
return (inf->process_target () != nullptr
&& ptid_t (inf->pid).matches (wait_ptid));
};
This compares the pid of the inferior against the complete ptid we
want to wait on. Before this commit wait_ptid was only ever
minus_one_ptid (which is special, and means any process), and so every
inferior would match.
After this commit though wait_ptid might represent a specific thread
in a specific inferior. If we compare the pid of the inferior to a
specific ptid then these will not match. The fix is to compare
against the pid extracted from the wait_ptid, not against the complete
wait_ptid itself.
In fetch_inferior_event, after receiving the event, we only want to
stop all the other threads, and call inferior_event_handler with
INF_EXEC_COMPLETE, if we are not evaluating a conditional breakpoint.
If we are, then all the other threads should be left doing whatever
they were before. The inferior_event_handler call will be performed
once the breakpoint condition has finished being evaluated, and GDB
decides to stop or not.
The final problem that needs solving relates to GDB's commit-resume
mechanism, which allows GDB to collect resume requests into a single
packet in order to reduce traffic to a remote target.
The problem is that the commit-resume mechanism will not send any
resume requests for an inferior if there are already events pending on
the GDB side.
Imagine an inferior with two threads. Both threads hit a breakpoint,
maybe the same conditional breakpoint. At this point there are two
pending events, one for each thread.
GDB selects one of the events and spots that this is a conditional
breakpoint, GDB evaluates the condition.
The condition includes an inferior function call, so GDB sets up for
the call and resumes the one thread, the resume request is added to
the commit-resume queue.
When the commit-resume queue is committed GDB sees that there is a
pending event from another thread, and so doesn't send any resume
requests to the actual target, GDB is assuming that when we wait we
will select the event from the other thread.
However, as this is an inferior function call for a condition
evaluation, we will not select the event from the other thread, we
only care about events from the thread that is evaluating the
condition - and the resume for this thread was never sent to the
target.
And so, GDB hangs, waiting for an event from a thread that was never
fully resumed.
To fix this issue I have added the concept of "forcing" the
commit-resume queue. When enabling commit resume, if the force flag
is true, then any resumes will be committed to the target, even if
there are other threads with pending events.
A note on authorship: this patch was based on some work done by
Natalia Saiapova and Tankut Baris Aktemur from Intel[1]. I have made
some changes to their work in this version.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28942
[1] https://sourceware.org/pipermail/gdb-patches/2020-October/172454.html
Co-authored-by: Natalia Saiapova <natalia.saiapova@intel.com>
Co-authored-by: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
Tested-By: Luis Machado <luis.machado@arm.com>
Tested-By: Keith Seitz <keiths@redhat.com>
|
|
Several tests assume that the first word after a thread ID in 'info
threads' output is "Thread". However, several targets use "LWP"
instead such as the FreeBSD and NetBSD native targets. The Linux
native target also uses "LWP" if libthread_db is not being used.
Targets that do not support threads use "process" as the first word
via normal_pid_to_str.
Add a tdlabel_re global variable as a regular-expression for a thread
label in `info threads' that matches either "process", "Thread", or
"LWP".
Some other tests in the tree don't require a specific word, and
some targets may use other first words (e.g. OpenBSD uses "thread"
and Ravenscar threads use "Ravenscar Thread").
|
|
The test gdb.threads/threadcrash.exp demanded GDB to fully unwind and
print the names of all functions. However, some of the functions are
from the libc library, and so the test implicitly demanded libc symbols
to be available, and would fail otherwise, as was raised in PR
gdb/31293.
This commit changes it so we only explicitly check for functions that
are not provided by threadcrash.c if they are indeed available.
Tested on arm-linux and x86_64-linux.
Approved-By: Tom de Vries <tdevries@suse.de>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31293
|
|
I noticed in gdb.threads/threadcrash.exp that the usage of test_list is
somewhat convoluted.
Simplify the test-case by storing a classification instead of a pattern in
test_list.
Tested on arm-linux and x86_64-linux.
|
|
A linaro PR [1] reports that the gdb.threads/threadcrash.exp test-case fails
to cout the number of threads in the inferior:
...
FAIL: gdb.threads/threadcrash.exp: test_gcore: $thread_count == 7
FAIL: gdb.threads/threadcrash.exp: test_gcore: $thread_count == [llength $test_list]
...
Fix this by getting the convenience variable _inferior_thread_count as opposed
to calculating it based on the output of "info threads".
Tested on arm-linux and x86_64-linux.
Reviewed-By: Lancelot Six <lancelot.six@amd.com>
Approved-By: Tom de Vries <tdevries@suse.de>
[1] https://linaro.atlassian.net/browse/GNU-1120
|
|
With check-readmore, I run into:
...
FAIL: gdb.threads/threadcrash.exp: test_corefile: \
$thread_count == [llength $test_list]
...
The problem is that the clauses in the gdb_test_multiple for
"thread apply all backtrace" intent to match one line, but actually can
match more than one line, and consequently a match for one type of thread can
consume a line that was supposed to match another thread.
For instance, there's this regexp:
...
-re "\[^\n\]*syscall_task .location=SIGNAL_ALT_STACK\[^\n\]*" {
...
It's limited at the end by \[^\n\]*, meaning the match stops at the end of the
line.
But it doesn't start with a ^, and consequently can match more than one line.
The "\[^\n\]*" at the start doesn't prevent this, there's an implicit .* at
the start of each pattern, unless it's anchored using a ^.
Fix this by rewriting the regexps in a "^\r\n$hs$regexp$hs$eol" style, where:
- hs is: \[^\n\]* (horizontal space), and
- eol is (?=\r\n) (look-ahead end-of-line).
It also turned out to be necessary to drop the -lbl switch, and introduce a
corresponding explicit clause. The -lbl clause is placed ALAP, and
consequently allowed the default fail clause to trigger.
Tested on arm-linux and x86_64-linux.
|
|
In test-case gdb.threads/threadcrash.exp we have an unnecessarily indented
gdb_test_multiple:
...
gdb_test_multiple "thread apply all backtrace" \
"Get thread information" -lbl {
-re "#\[0-9\]+\\\?\\\?\[^\n\]*" {
...
Fix this by moving the command into a variable, allowing the
"gdb_test_multiple ... {" to fit on a single 80 chars line.
Tested on arm-linux and x86_64-linux.
|
|
The output of "info breakpoints" includes breakpoint, watchpoint,
tracepoint, and catchpoint if they are created, so it should show
all the four types are deleted in the output of "info breakpoints"
to report empty list after "delete breakpoints".
It should also change the output of "delete breakpoints" to make it
clear that watchpoints, tracepoints, and catchpoints are also being
deleted. This is suggested by Guinevere Larsen, thank you.
$ make check-gdb TESTS="gdb.base/access-mem-running.exp"
$ gdb/gdb gdb/testsuite/outputs/gdb.base/access-mem-running/access-mem-running
[...]
(gdb) break main
Breakpoint 1 at 0x12000073c: file /home/loongson/gdb.git/gdb/testsuite/gdb.base/access-mem-running.c, line 32.
(gdb) watch global_counter
Hardware watchpoint 2: global_counter
(gdb) trace maybe_stop_here
Tracepoint 3 at 0x12000071c: file /home/loongson/gdb.git/gdb/testsuite/gdb.base/access-mem-running.c, line 27.
(gdb) catch fork
Catchpoint 4 (fork)
(gdb) info breakpoints
Num Type Disp Enb Address What
1 breakpoint keep y 0x000000012000073c in main at /home/loongson/gdb.git/gdb/testsuite/gdb.base/access-mem-running.c:32
2 hw watchpoint keep y global_counter
3 tracepoint keep y 0x000000012000071c in maybe_stop_here at /home/loongson/gdb.git/gdb/testsuite/gdb.base/access-mem-running.c:27
not installed on target
4 catchpoint keep y fork
Without this patch:
(gdb) delete breakpoints
Delete all breakpoints? (y or n) y
(gdb) info breakpoints
No breakpoints or watchpoints.
(gdb) info breakpoints 3
No breakpoint or watchpoint matching '3'.
With this patch:
(gdb) delete breakpoints
Delete all breakpoints, watchpoints, tracepoints, and catchpoints? (y or n) y
(gdb) info breakpoints
No breakpoints, watchpoints, tracepoints, or catchpoints.
(gdb) info breakpoints 3
No breakpoint, watchpoint, tracepoint, or catchpoint matching '3'.
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Approved-by: Kevin Buettner <kevinb@redhat.com>
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
|
|
This patch is based on an out-of-tree patch that fedora has been
carrying for a while. It tests if GDB is able to properly unwind a
threaded program in the following situations:
* regular threads
* in a signal handler
* in a signal handler executing on an alternate stack
And the final frame can either be in a syscall or in an infinite loop.
The test works by running the inferior until a crash to generate a
corefile, or until right before the crash. Then applies a backtrace to
all threads to see if any frame can't be identified, and the order of
the threads in GDB. Finally, it goes thread by thread and tries to
collect a large part of the backtrace, to confirm that everything is
being unwound correctly.
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
Reviewed-By: Luis Machado <luis.machado@arm.com>
Approved-By: Luis Machado <luis.machado@arm.com>
|
|
With test-case gdb.threads/attach-many-short-lived-threads.exp, I run into:
...
(gdb) attach 7773^M
Attaching to program: attach-many-short-lived-threads, process 7773^M
Cannot attach to lwp 7776: Operation not permitted (1)^M
(gdb) PASS: $exp: iter 1: attach
info threads^M
No threads.^M
(gdb) PASS: $exp: iter 1: no new threads
set breakpoint always-inserted on^M
(gdb) PASS: $exp: iter 1: set breakpoint always-inserted on
break break_fn^M
Breakpoint 1 at 0x400b4d: file attach-many-short-lived-threads.c, line 57.^M
(gdb) PASS: $exp: iter 1: break break_fn
continue^M
The program is not being run.^M
(gdb) FAIL: $exp: iter 1: break at break_fn: 1 \
(the program is no longer running)
...
There's some code in the test-case dealing with a similar warning:
...
-re "warning: Cannot attach to lwp $decimal: Operation not permitted" {
...
But since commit c6f7f9c80c3 ("Bail out of "attach" if a thread cannot be
traced"), the warning has been changed into an error.
Fix the FAIL by updating the test-case to expect an error instead of a
warning.
Tested on x86_64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
This commit is the result of the following actions:
- Running gdb/copyright.py to update all of the copyright headers to
include 2024,
- Manually updating a few files the copyright.py script told me to
update, these files had copyright headers embedded within the
file,
- Regenerating gdbsupport/Makefile.in to refresh it's copyright
date,
- Using grep to find other files that still mentioned 2023. If
these files were updated last year from 2022 to 2023 then I've
updated them this year to 2024.
I'm sure I've probably missed some dates. Feel free to fix them up as
you spot them.
|
|
On aarch64-linux, I run into:
...
FAIL: gdb.base/annota1.exp: backtrace from shlibrary (timeout)
...
due to the PAC marker showing up:
...
^Z^Zframe-address^M
0x000000000041025c [PAC]^M
^Z^Zframe-address-end^M
...
In the docs the marker is documented as follows:
...
When GDB is debugging the AArch64 architecture, and the program is using the
v8.3-A feature Pointer Authentication (PAC), then whenever the link register
$lr is pointing to an PAC function its value will be masked. When GDB prints
a backtrace, any addresses that required unmasking will be postfixed with the
marker [PAC]. When using the MI, this is printed as part of the addr_flags
field.
...
Update the test-case to allow the PAC marker.
Likewise in a few other test-cases.
While we're at it, rewrite the affected pattern pat_begin in annota1.exp into
a more readable form. Likewise for the corresponding pat_end.
Tested on aarch64-linux.
Approved-By: Luis Machado <luis.machado@arm.com>
PR testsuite/31202
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31202
|
|
This commit makes the following improvements to
gdb.threads/step-over-thread-exit.exp:
- Add a third axis to stepping over the breakpoint with displaced vs
inline stepping -- also test with no breakpoint at all.
- Check that when GDB reports "Command aborted, thread exited.", the
selected thread is the thread that exited. This is always true
currently on GNU/Linux by coincidence, but a similar testcase on AMD
GPU exposed a problem here. Better make the testcase catch any
potential regression.
- Fixes a race that Simon ran into with GDBserver testing.
(gdb) next
[New Thread 2143071.2143438]
Thread 3 "step-over-threa" hit Breakpoint 2, 0x000055555555524e in my_exit_syscall () at .../testsuite/lib/my-syscalls.S:74
74 SYSCALL (my_exit, __NR_exit)
(gdb) FAIL: gdb.threads/step-over-thread-exit.exp: displaced-stepping=auto: non-stop=on: target-non-stop=on: schedlock=off: cmd=next: ns_stop_all=0: command aborts when thread exits
I was not able to reproduce it, but I believe that what happens is
the following:
Once we continue, the thread 2 exits, and the main thread thus
unblocks from its pthread_join, and spawns a new thread. That new
thread may hit the breakpoint at my_exit_syscall very quickly. GDB
could then see/process that breakpoint event before the thread exit
event for the thread we care about, which would result in the
failure seen above.
The fix here is to not loop and start a new thread at all in the
scenario where the race can happen. We only need to loop and spawn
new threads when testing with "cmd=continue" and schedlock off, in
which case GDB doesn't abort the command when the thread exits.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: I90c95c32f00630a3f682b1541c23aff52451f9b6
|
|
The gdb.threads/thread-specific-bp.exp test has been a little
problematic, see commits:
commit 89702edd933a5595557bcd9cc4a0dcc3262226d4
Date: Thu Mar 9 12:31:26 2023 +0100
[gdb/testsuite] Fix gdb.threads/thread-specific-bp.exp on native-gdbserver
and
commit 2e5843d87c4050bf1109921481fb29e1c470827f
Date: Fri Nov 19 14:33:39 2021 +0100
[gdb/testsuite] Fix gdb.threads/thread-specific-bp.exp
But I recently saw a test failure for that test, which looked like
this:
...
(gdb) PASS: gdb.threads/thread-specific-bp.exp: non_stop=on: thread 1 selected
continue -a
Continuing.
Thread 1 "thread-specific" hit Breakpoint 4, end () at /tmp/binutils-gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/thread-specific-bp.c:29
29 }
(gdb) [Thread 0x7ffff7c5c700 (LWP 1552086) exited]
Thread-specific breakpoint 3 deleted - thread 2 no longer in the thread list.
FAIL: gdb.threads/thread-specific-bp.exp: non_stop=on: continue to end (timeout)
...
This only crops up (for me) when running on a loaded machine, and
still only occurs sometimes. I've had to leave the test running in a
loop for 10+ minutes sometimes in order to see the failure.
The problem is that we use gdb_test_multiple to try and match two
patterns:
(1) The 'Thread-specific breakpoint 3 deleted ....' message, and
(2) The GDB prompt.
As written in the test, we understand that these patterns can occur in
any order, and we have a flag for each pattern. Once both patterns
have been seen then we PASS the test.
The problem is that once expect has matched a pattern, everything up
to, and including the matched text is discarded from the input
buffer. Thus, if the input buffer contains:
<PATTERN 2><PATTERN 1>
Then expect will first try to match <PATTERN 1>, which succeeds, and
then expect discards the entire input buffer up to the end of the
<PATTERN 1>. As a result, we will never spot <PATTERN 2>.
Obviously we can't just reorder the patterns within the
gdb_test_multiple, as the output can legitimately (and most often
does) occur in the other order, in which case the test would mostly
fail, and only occasionally pass!
I think the easiest solution here is just to have the
gdb_test_multiple contain two patterns, each pattern consists of the
two parts, but in the alternative orders, thus, for a particular
output configuration, only one regexp will match. With this change in
place, I no longer see the intermittent failure.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
Simon noticed that gdb.threads/threads-after-exec.exp was racy. You
can consistenly reproduce it (at git hash
319b460545dc79280e2904dcc280057cf71fb753), with:
$ taskset -c 0 make check TESTS="gdb.threads/threads-after-exec.exp"
gdb.log shows:
(...)
Thread 3 "threads-after-e" hit Catchpoint 2 (exec'd .../gdb.threads/threads-after-exec/threads-after-exec), 0x00007ffff7fe3290
in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) PASS: gdb.threads/threads-after-exec.exp: continue until exec
info threads
Id Target Id Frame
* 3 process 1443269 "threads-after-e" 0x00007ffff7fe3290 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) FAIL: gdb.threads/threads-after-exec.exp: info threads
(...)
maint info linux-lwps
LWP Ptid Thread ID
1443269.1443269.0 1.3
(gdb) FAIL: gdb.threads/threads-after-exec.exp: maint info linux-lwps
The FAILs happen because the .exp file expects that after the exec,
the only thread has GDB thread number 1, but it has instead 3.
This is yet another case of zombie leader detection making things a
bit fuzzy.
In the passing case, we have:
continue
Continuing.
[New Thread 0x7ffff7bff640 (LWP 603183)]
[Thread 0x7ffff7bff640 (LWP 603183) exited]
process 603180 is executing new program: .../gdb.threads/threads-after-exec/threads-after-exec
While in the failing case, we have (note remarks on the rhs):
continue
Continuing.
[New Thread 0x7ffff7bff640 (LWP 600205)]
[Thread 0x7ffff7f95740 (LWP 600202) exited] <<< gdb deletes leader thread, thread 1.
[New LWP 600202] <<< gdb adds it back -- this is now thread 3.
[Thread 0x7ffff7bff640 (LWP 600205) exited]
process 600202 is executing new program: .../threads-after-exec/threads-after-exec
The testcase only has two threads, yet GDB presented the exec for
thread 3. This is GDB deleting the leader (the backend detected it
was zombie, due to the exec), and then adding the leader back when it
saw the exec event.
I've recorded some thoughts about this in PR gdb/31069.
For now, this commit just makes the testcase cope with the non-one
thread number, as the number is not important for what this test is
exercising.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31069
Change-Id: Id80b5c73f09c9e0005efeb494cca5d066ac3bbae
|
|
I ran into the following FAIL:
...
(gdb) PASS: gdb.threads/stepi-over-clone.exp: catch process syscalls
continue^M
Continuing.^M
^M
Catchpoint 2 (call to syscall clone), clone () at \
../sysdeps/unix/sysv/linux/x86_64/clone.S:78^M
warning: 78 ../sysdeps/unix/sysv/linux/x86_64/clone.S: \
No such file or directory^M
(gdb) FAIL: gdb.threads/stepi-over-clone.exp: continue
...
All but one regexps in the .exp file use "clone\[23\]?" with "?" to
also accept "clone", except the failing case. This commit fixes that
case to also use "?".
Furthermore, there are FAILs like this:
...
(gdb) PASS: gdb.threads/stepi-over-clone.exp: third_thread=false: \
non-stop=on: displaced=off: i=0: continue
stepi^M
[New Thread 0x7ffff7ff8700 (LWP 15301)]^M
Hello from the first thread.^M
78 in ../sysdeps/unix/sysv/linux/x86_64/clone.S^M
(gdb) XXX: Consume the initial command
XXX: Consume new thread line
XXX: Consume first worker thread message
FAIL: gdb.threads/stepi-over-clone.exp: third_thread=false: non-stop=on: \
displaced=off: i=0: stepi
...
because this output is expected instead:
...
Hello from the first thread.^M
0x00000000004212cd in clone3 ()^M
...
The root cause for the difference is the presence of .debug_line info for
clone.
Fix this by updating the relevant regexps.
Tested on x86_64-linux, specifically:
- openSUSE Leap 15.4 (where the FAILs where observed), and
- openSUSE Tumbleweed (where the FAILs where not observed).
Co-Authored-By: Pedro Alves <pedro@palves.net>
Approved-By: Pedro Alves <pedro@palves.net>
Change-Id: I74ca9e7d4cfe6af294fd50e8c509fcbad289b78c
|
|
If your target has no support for TARGET_WAITKIND_NO_RESUMED events
(and no way to support them, such as the yet-unsubmitted AMDGPU
target), and you step over thread exit with scheduler-locking on, this
is what you get:
(gdb) n
[Thread ... exited]
*hang*
Getting back the prompt by typing Ctrl-C may not even work, since no
inferior thread is running to receive the SIGINT. Even if it works,
it seems unnecessarily harsh. If you started an execution command for
which there's a clear thread of interest (step, next, until, etc.),
and that thread disappears, then I think it's more user friendly if
GDB just detects the situation and aborts the command, giving back the
prompt.
That is what this commit implements. It does this by explicitly
requesting the target to report thread exit events whenever the main
resumed thread has a thread_fsm. Note that unlike stepping over a
breakpoint, we don't need to enable clone events in this case.
With this patch, we get:
(gdb) n
[Thread 0x7ffff7d89700 (LWP 3961883) exited]
Command aborted, thread exited.
(gdb)
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I901ab64c91d10830590b2dac217b5264635a2b95
|
|
Add new gdb.threads/step-over-thread-exit.exp and
gdb.threads/step-over-thread-exit-while-stop-all-threads.exp
testcases, exercising stepping over thread exit syscall. These make
use of lib/my-syscalls.S to define the exit syscall.
Co-authored-by: Pedro Alves <pedro@palves.net>
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338
Change-Id: Ie8b2c5747db99b7023463a897a8390d9e814a9c9
|
|
If scheduler-locking is in effect, e.g., with "set scheduler-locking
on", and you step over a function that spawns a new thread, the new
thread is allowed to run free, at least until some event is hit, at
which point, whether the new thread is re-resumed depends on a number
of seemingly random factors. E.g., if the target is all-stop, and the
parent thread hits a breakpoint, and GDB decides the breakpoint isn't
interesting to report to the user, then the parent thread is resumed,
but the new thread is left stopped.
I think that letting the new threads run with scheduler-locking
enabled is a defect. This commit fixes that, making use of the new
clone events on Linux, and of target_thread_events() on targets where
new threads have no connection to the thread that spawned them.
Testcase and documentation changes included.
Approved-By: Eli Zaretskii <eliz@gnu.org>
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: Ie12140138b37534b7fc1d904da34f0f174aa11ce
|
|
This patch teaches the Linux GDBserver backend to report clone events
to GDB, when GDB has requested them with the GDB_THREAD_OPTION_CLONE
thread option, via the new QThreadOptions packet.
This shuffles code in linux_process_target::handle_extended_wait
around to a more logical order when we now have to handle and
potentially report all of fork/vfork/clone.
Raname lwp_info::fork_relative -> lwp_info::relative as the field is
no longer only about (v)fork.
With this, gdb.threads/stepi-over-clone.exp now cleanly passes against
GDBserver, so remove the native-target-only requirement from that
testcase.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I3a19bc98801ec31e5c6fdbe1ebe17df855142bb2
|
|
(A good chunk of the problem statement in the commit log below is
Andrew's, adjusted for a different solution, and for covering
displaced stepping too. The testcase is mostly Andrew's too.)
This commit addresses bugs gdb/19675 and gdb/27830, which are about
stepping over a breakpoint set at a clone syscall instruction, one is
about displaced stepping, and the other about in-line stepping.
Currently, when a new thread is created through a clone syscall, GDB
sets the new thread running. With 'continue' this makes sense
(assuming no schedlock):
- all-stop mode, user issues 'continue', all threads are set running,
a newly created thread should also be set running.
- non-stop mode, user issues 'continue', other pre-existing threads
are not affected, but as the new thread is (sort-of) a child of the
thread the user asked to run, it makes sense that the new threads
should be created in the running state.
Similarly, if we are stopped at the clone syscall, and there's no
software breakpoint at this address, then the current behaviour is
fine:
- all-stop mode, user issues 'stepi', stepping will be done in place
(as there's no breakpoint to step over). While stepping the thread
of interest all the other threads will be allowed to continue. A
newly created thread will be set running, and then stopped once the
thread of interest has completed its step.
- non-stop mode, user issues 'stepi', stepping will be done in place
(as there's no breakpoint to step over). Other threads might be
running or stopped, but as with the continue case above, the new
thread will be created running. The only possible issue here is
that the new thread will be left running after the initial thread
has completed its stepi. The user would need to manually select
the thread and interrupt it, this might not be what the user
expects. However, this is not something this commit tries to
change.
The problem then is what happens when we try to step over a clone
syscall if there is a breakpoint at the syscall address.
- For both all-stop and non-stop modes, with in-line stepping:
+ user issues 'stepi',
+ [non-stop mode only] GDB stops all threads. In all-stop mode all
threads are already stopped.
+ GDB removes s/w breakpoint at syscall address,
+ GDB single steps just the thread of interest, all other threads
are left stopped,
+ New thread is created running,
+ Initial thread completes its step,
+ [non-stop mode only] GDB resumes all threads that it previously
stopped.
There are two problems in the in-line stepping scenario above:
1. The new thread might pass through the same code that the initial
thread is in (i.e. the clone syscall code), in which case it will
fail to hit the breakpoint in clone as this was removed so the
first thread can single step,
2. The new thread might trigger some other stop event before the
initial thread reports its step completion. If this happens we
end up triggering an assertion as GDB assumes that only the
thread being stepped should stop. The assert looks like this:
infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.
- For both all-stop and non-stop modes, with displaced stepping:
+ user issues 'stepi',
+ GDB starts the displaced step, moves thread's PC to the
out-of-line scratch pad, maybe adjusts registers,
+ GDB single steps the thread of interest, [non-stop mode only] all
other threads are left as they were, either running or stopped.
In all-stop, all other threads are left stopped.
+ New thread is created running,
+ Initial thread completes its step, GDB re-adjusts its PC,
restores/releases scratchpad,
+ [non-stop mode only] GDB resumes the thread, now past its
breakpoint.
+ [all-stop mode only] GDB resumes all threads.
There is one problem with the displaced stepping scenario above:
3. When the parent thread completed its step, GDB adjusted its PC,
but did not adjust the child's PC, thus that new child thread
will continue execution in the scratch pad, invoking undefined
behavior. If you're lucky, you see a crash. If unlucky, the
inferior gets silently corrupted.
What is needed is for GDB to have more control over whether the new
thread is created running or not. Issue #1 above requires that the
new thread not be allowed to run until the breakpoint has been
reinserted. The only way to guarantee this is if the new thread is
held in a stopped state until the single step has completed. Issue #3
above requires that GDB is informed of when a thread clones itself,
and of what is the child's ptid, so that GDB can fixup both the parent
and the child.
When looking for solutions to this problem I considered how GDB
handles fork/vfork as these have some of the same issues. The main
difference between fork/vfork and clone is that the clone events are
not reported back to core GDB. Instead, the clone event is handled
automatically in the target code and the child thread is immediately
set running.
Note we have support for requesting thread creation events out of the
target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported
for the new/child thread. That would be sufficient to address in-line
stepping (issue #1), but not for displaced-stepping (issue #3). To
handle displaced-stepping, we need an event that is reported to the
_parent_ of the clone, as the information about the displaced step is
associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED
includes no indication of which thread is the parent that spawned the
new child. In fact, for some targets, like e.g., Windows, it would be
impossible to know which thread that was, as thread creation there
doesn't work by "cloning".
The solution implemented here is to model clone on fork/vfork, and
introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is
similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except
that we end up with a new thread in the same process, instead of a new
thread of a new process. Like FORKED and VFORKED, THREAD_CLONED
waitstatuses have a child_ptid property, and the child is held stopped
until GDB explicitly resumes it. This addresses the in-line stepping
case (issues #1 and #2).
The infrun code that handles displaced stepping fixup for the child
after a fork/vfork event is thus reused for THREAD_CLONE, with some
minimal conditions added, addressing the displaced stepping case
(issue #3).
The native Linux backend is adjusted to unconditionally report
TARGET_WAITKIND_THREAD_CLONED events to the core.
Following the follow_fork model in core GDB, we introduce a
target_follow_clone target method, which is responsible for making the
new clone child visible to the rest of GDB.
Subsequent patches will add clone events support to the remote
protocol and gdbserver.
displaced_step_in_progress_thread becomes unused with this patch, but
a new use will reappear later in the series. To avoid deleting it and
readding it back, this patch marks it with attribute unused, and the
latter patch removes the attribute again. We need to do this because
the function is static, and with no callers, the compiler would warn,
(error with -Werror), breaking the build.
This adds a new gdb.threads/stepi-over-clone.exp testcase, which
exercises stepping over a clone syscall, with displaced stepping vs
inline stepping, and all-stop vs non-stop. We already test stepping
over clone syscalls with gdb.base/step-over-syscall.exp, but this test
uses pthreads, while the other test uses raw clone, and this one is
more thorough. The testcase passes on native GNU/Linux, but fails
against GDBserver. GDBserver will be fixed by a later patch in the
series.
Co-authored-by: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830
Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
|
|
I noticed that on an Ubuntu 20.04 system, after a following patch
("Step over clone syscall w/ breakpoint,
TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp
was passing cleanly, but still, we'd end up with four new unexpected
GDB core dumps:
=== gdb Summary ===
# of unexpected core files 4
# of expected passes 48
That said patch is making the pre-existing
gdb.threads/step-over-exec.exp testcase (almost silently) expose a
latent problem in gdb/linux-nat.c, resulting in a GDB crash when:
#1 - a non-leader thread execs
#2 - the post-exec program stops somewhere
#3 - you kill the inferior
Instead of #3 directly, the testcase just returns, which ends up in
gdb_exit, tearing down GDB, which kills the inferior, and is thus
equivalent to #3 above.
Vis (after said patch is applied):
$ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true
...
(top-gdb) r
...
(gdb) b main
...
(gdb) r
...
Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69
69 argv0 = argv[0];
(gdb) c
Continuing.
[New Thread 0x7ffff7d89700 (LWP 2506975)]
Other going in exec.
Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28
28 foo ();
(gdb) k
...
Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
393 return m_suspend.waitstatus_pending_p;
(top-gdb) bt
#0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
#1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345
#2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564
#3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284
#4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278
#5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247
#6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864
#7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590
#8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911
...
The root of the problem is that when a non-leader LWP execs, it just
changes its tid to the tgid, replacing the pre-exec leader thread,
becoming the new leader. There's no thread exit event for the execing
thread. It's as if the old pre-exec LWP vanishes without trace. The
ptrace man page says:
"PTRACE_O_TRACEEXEC (since Linux 2.5.46)
Stop the tracee at the next execve(2). A waitpid(2) by the
tracer will return a status value such that
status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))
If the execing thread is not a thread group leader, the thread
ID is reset to thread group leader's ID before this stop.
Since Linux 3.0, the former thread ID can be retrieved with
PTRACE_GETEVENTMSG."
When the core of GDB processes an exec events, it deletes all the
threads of the inferior. But, that is too late -- deleting the thread
does not delete the corresponding LWP, so we end leaving the pre-exec
non-leader LWP stale in the LWP list. That's what leads to the crash
above -- linux_nat_target::kill iterates over all LWPs, and after the
patch in question, that code will look for the corresponding
thread_info for each LWP. For the pre-exec non-leader LWP still
listed, won't find one.
This patch fixes it, by deleting the pre-exec non-leader LWP (and
thread) from the LWP/thread lists as soon as we get an exec event out
of ptrace.
GDBserver does not need an equivalent fix, because it is already doing
this, as side effect of mourning the pre-exec process, in
gdbserver/linux-low.cc:
else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events)
{
...
/* Delete the execing process and all its threads. */
mourn (proc);
switch_to_thread (nullptr);
The crash with gdb.threads/step-over-exec.exp is not observable on
newer systems, which postdate the glibc change to move "libpthread.so"
internals to "libc.so.6", because right after the exec, GDB traps a
load event for "libc.so.6", which leads to GDB trying to open
libthread_db for the post-exec inferior, and, on such systems that
succeeds. When we load libthread_db, we call
linux_stop_and_wait_all_lwps, which, as the name suggests, stops all
lwps, and then waits to see their stops. While doing this, GDB
detects that the pre-exec stale LWP is gone, and deletes it.
If we use "catch exec" to stop right at the exec before the
"libc.so.6" load event ever happens, and issue "kill" right there,
then GDB crashes on newer systems as well. So instead of tweaking
gdb.threads/step-over-exec.exp to cover the fix, add a new
gdb.threads/threads-after-exec.exp testcase that uses "catch exec".
The test also uses the new "maint info linux-lwps" command if testing
on Linux native, which also exposes the stale LWP problem with an
unfixed GDB.
Also tweak a comment in infrun.c:follow_exec referring to how
linux-nat.c used to behave, as it would become stale otherwise.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
|
|
I noticed that if GDB is using a remote or extended-remote target,
then, if an inferior call caused a new thread to appear, or for an
existing thread to exit, then these events are not reported to the
user.
The problem is that for these targets GDB relies on a call to
update_thread_list to learn about changes to the inferior's thread
list.
If GDB doesn't pass through the normal stop code then GDB will not
call update_thread_list, and so will not report changes in the thread
list.
This commit adds an additional update_thread_list call, after which
thread events are correctly reported.
|
|
I noticed that sometimes the value returned by $_inferior_thread_count
can become out of sync with the actual thread count of the inferior,
and will disagree with the number of threads reported by 'info
threads'. This commit fixes this issue.
The cause of the problem is that 'info threads' includes a call to
update_thread_list, this can be seen in print_thread_info_1 in
thread.c, while $_inferior_thread_count doesn't include a similar
call, see the function inferior_thread_count_make_value also in
thread.c.
Of course, this is only a problem when GDB is running on a target that
relies on update_thread_list calls to learn about new threads,
e.g. remote or extended-remote targets. Native targets generally
learn about new threads as soon as they appear and will not have this
problem.
I ran into this issue when writing a test for the next commit which
uses inferior function calls to add an remove threads from an
inferior. But for testing I've made use of non-stop mode and
asynchronous inferior execution; by reading the inferior state I can
know when a new thread has been created, at which point I can print
$_inferior_thread_count while the inferior is still running. This is
important, if I stop the inferior then GDB will pass through an
update_thread_list call in the normal stop code, which will
synchronise the thread list, after which $_inferior_thread_count will
report the correct value.
With this change in place $_inferior_thread_count is now correct.
|
|
Overview
========
Consider the following situation, GDB is in non-stop mode, the main
thread is running while a second thread is stopped. The user has the
second thread selected as the current thread and asks GDB to detach.
At the exact moment of detach the main thread exits.
This situation currently causes crashes, assertion failures, and
unexpected errors to be reported from GDB for both native and remote
targets.
This commit addresses this situation for native and remote targets.
There are a number of different fixes, but all are required in order
to get this functionality working correct for native and remote
targets.
Native Linux Target
===================
For the native Linux target, detaching is handled in the function
linux_nat_target::detach. In here we call stop_wait_callback for each
thread, and it is this callback that will spot that the main thread
has exited.
GDB then detaches from everything except the main thread by calling
detach_callback.
After this the first problem is this assert:
/* Only the initial process should be left right now. */
gdb_assert (num_lwps (pid) == 1);
The num_lwps call will return 0 as the main thread has exited and all
of the other threads have now been detached. I fix this by changing
the assert to allow for 0 or 1 lwps at this point. As the 0 case can
only happen in non-stop mode, the assert becomes:
gdb_assert (num_lwps (pid) == 1
|| (target_is_non_stop_p () && num_lwps (pid) == 0));
The next problem is that we do:
main_lwp = find_lwp_pid (ptid_t (pid));
and then proceed assuming that main_lwp is not nullptr. In the case
that the main thread has exited though, main_lwp will be nullptr.
However, we only need main_lwp so that GDB can detach from the
thread. If the main thread has exited, and GDB has already detached
from every other thread, then GDB has finished detaching, GDB can skip
the calls that try to detach from the main thread, and then tell the
user that the detach was a success.
For Remote Targets
==================
On remote targets there are two problems.
First is that when the exit occurs during the early phase of the
detach, we see the stop notification arrive while GDB is removing the
breakpoints ahead of the detach. The 'set debug remote on' trace
looks like this:
[remote] Sending packet: $z0,7f1648fe0241,1#35
[remote] Notification received: Stop:W0;process:2a0ac8
# At this point an unpatched gdbserver segfaults, and the connection
# is broken. A patched gdbserver continues as below...
[remote] Packet received: E01
[remote] Sending packet: $z0,7f1648ff00a8,1#68
[remote] Packet received: E01
[remote] Sending packet: $z0,7f1648ff132f,1#6b
[remote] Packet received: E01
[remote] Sending packet: $D;2a0ac8#3e
[remote] Packet received: E01
I was originally running into Segmentation Faults, from within
gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This
function calls current_process() and then dereferences the result to
find the breakpoint list.
However, in our case, the current process has already exited, and so
the current_process() call returns nullptr. At the point of failure,
the gdbserver backtrace looks like this:
#0 0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982
#1 0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093
#2 0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372
#3 0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498
...
The problem is that, as a result non-stop being on, the process
exiting is only reported back to GDB after the request to remove a
breakpoint has been sent. Clearly gdbserver can't actually remove
this breakpoint -- the process has already exited -- so I think the
best solution is for gdbserver just to report an error, which is what
I've done.
The second problem I ran into was on the gdb side, as the process has
already exited, but GDB has not yet acknowledged the exit event, the
detach -- the 'D' packet in the above trace -- fails. This was being
reported to the user with a 'Can't detach process' error. As the test
actually calls detach from Python code, this error was then becoming a
Python exception.
Though clearly the detach has returned an error, and so, maybe, having
GDB throw an error would be fine, I think in this case, there's a good
argument that the remote error can be ignored -- if GDB tries to
detach and gets back an error, and if there's a pending exit event for
the pid we tried to detach, then just ignore the error and pretend the
detach worked fine.
We could possibly check for a pending exit event before sending the
detach packet, however, I believe that it might be possible (in
non-stop mode) for the stop notification to arrive after the detach is
sent, but before gdbserver has started processing the detach. In this
case we would still need to check for pending stop events after seeing
the detach fail, so I figure there's no point having two checks -- we
just send the detach request, and if it fails, check to see if the
process has already exited.
Testing
=======
In order to test this issue I needed to ensure that the exit event
arrives at the same time as the detach call. The window of
opportunity for getting the exit to arrive is so small I've never
managed to trigger this in real use -- I originally spotted this issue
while working on another patch, which did manage to trigger this
issue.
However, if we trigger both the exit and the detach from a single
Python function then we never return to GDB's event loop, as such GDB
never processes the exit event, and so the first time GDB gets a
chance to see the exit is during the detach call. And so that is the
approach I've taken for testing this patch.
Tested-By: Kevin Buettner <kevinb@redhat.com>
Approved-By: Kevin Buettner <kevinb@redhat.com>
|
|
I'm seeing a lot of variability in the failures of
gdb.threads/process-dies-while-detaching.exp on aarch64-linux. On this
platform, a problem yet to be investigated causes GDB to miss the _exit
breakpoint. What happens next is random because after missing that
breakpoint, GDB is out of sync with the inferior. This causes the tests
following that point in the testcase to fail in a random way.
In this scenario it's better to exit the testcase early to avoid random
results in the testsuite.
We are relying on gdb_continue_to_breakpoint to return the result of
gdb_test_multiple. This is already the case because in Tcl the return
value of a function is the return value of the last command it runs. But
change gdb_continue_to_breakpoint to explicitly return this value, to make
it clear this is the intended behaviour.
Tested on aarch64-linux.
Tested-By: Guinevere Larsen <blarsen@redhat.com>
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
At one time, circa 2006, there was a bug, which was presumably fixed
without adding a test case:
If you provided some relative path to the shared library, such as
with
export LD_LIBRARY_PATH=.
then gdb would fail to match the shared library name during the
TLS lookup.
I think there may have been a bit more to it than is provided by that
explanation, since the test also takes care to split the debug info
into a separate file.
In any case, this commit is based on one of Red Hat's really old
local patches. I've attempted to update it and remove a fair amount
of cruft, hopefully without losing any critical elements from the
test.
Testing on Fedora 38 (correctly) shows 1 unsupported test for
native-gdbserver and 5 PASSes for the native target as well as
native-extended-gdbserver.
In his review of v1 of this patch, Lancelot SIX observed that
'thread_local' could be used in place of '__thread' in the C source
files. But it only became available via the standard in C11, so I
used additional_flags=-std=c11 for compiling both the shared object
and the main program.
Also, while testing with CC_FOR_TARGET=clang, I found that
'additional_flags=-Wl,-soname=${binsharedbase}' caused clang
to complain that this linker flag was unused when compiling
the source file, so I moved this linker option to 'ldflags='.
My testing for this v2 patch shows the same results as with v1,
but I've done additional testing with CC_FOR_TARGET=clang as
well. The results are the same as when gcc is used.
Co-Authored-by: Jan Kratochvil <jan@jankratochvil.net>
Reviewed-By: Lancelot Six <lancelot.six@amd.com>
|
|
The Cygwin runtime spawns a few extra threads, so using hardcoded
thread numbers in tests rarely works correctly. Thankfully, this
testcase already records the ids of the important threads in globals.
It just so happens that they are not used in a few tests. This commit
fixes that.
With this, the test passes cleanly on Cygwin [1]. Still passes cleanly on
x86-64 GNU/Linux.
[1] - with system GDB. Upstream GDB is missing a couple patches
Cygwin carries downstream.
Approved-By: Tom Tromey <tom@tromey.com>
Change-Id: I01bf71fcb44ceddea8bd16b933b10b964749a6af
|
|
On Cygwin, I see:
(gdb) PASS: gdb.threads/pthreads.exp: break thread1
continue
Continuing.
pthread_attr_setscope 1: Not supported (134)
[Thread 3732.0x265c exited with code 1]
[Thread 3732.0x2834 exited with code 1]
[Thread 3732.0x2690 exited with code 1]
Program terminated with signal SIGHUP, Hangup.
The program no longer exists.
(gdb) FAIL: gdb.threads/pthreads.exp: Continue to creation of first thread
... and then a set of cascading failures.
Fix this by treating ENOTSUP the same way as if PTHREAD_SCOPE_SYSTEM
were not defined. I.e., ignore ENOTSUP errors, and proceed with
testing.
Approved-By: Tom Tromey <tom@tromey.com>
Change-Id: Iea68ff8b9937570726154f36610c48ef96101871
|
|
On Cygwin, I noticed:
(gdb) PASS: gdb.threads/pthreads.exp: break thread1
continue
Continuing.
pthread_attr_setscope 1: No error
[Thread 8732.0x28f8 exited with code 1]
[Thread 8732.0xb50 exited with code 1]
[Thread 8732.0x17f8 exited with code 1]
Program terminated with signal SIGHUP, Hangup.
The program no longer exists.
(gdb) FAIL: gdb.threads/pthreads.exp: Continue to creation of first thread
Note "No error" in "pthread_attr_setscope 1: No error". That is a bug
in the test. It is using perror, but that prints errno, while the
pthread functions return the error directly. Fix all cases of the
same problem, by adding a new print_error function and using it.
We now get:
...
pthread_attr_setscope 1: Not supported (134)
...
Approved-By: Tom Tromey <tom@tromey.com>
Change-Id: I972ebc931b157bc0f9084e6ecd8916a5e39238f5
|