Age | Commit message (Collapse) | Author | Files | Lines |
|
native-gdbserver
With test-case gdb.threads/multiple-successive-infcall.exp and target board
native-gdbserver I run into:
...
(gdb) continue^M
Continuing.^M
[New Thread 758.759]^M
^M
Thread 1 "multiple-succes" hit Breakpoint 2, main () at \
multiple-successive-infcall.c:97^M
97 thread_ids[tid] = tid + 2; /* prethreadcreationmarker */^M
(gdb) FAIL: gdb.threads/multiple-successive-infcall.exp: thread=5: \
created new thread
...
The problem is that the new thread message doesn't match the regexp, which
expects something like this instead:
...
[New Thread 0x7ffff746e700 (LWP 570)]^M
...
Fix this by accepting this form of new thread message.
Tested on x86_64-linux.
|
|
With test-case gdb.threads/thread-specific-bp.exp and target board
native-gdbserver I run into:
...
(gdb) PASS: gdb.threads/thread-specific-bp.exp: non_stop=off: thread 1 selected
continue^M
Continuing.^M
Thread-specific breakpoint 3 deleted - thread 2 no longer in the thread list.^M
^M
Thread 1 "thread-specific" hit Breakpoint 4, end () at \
thread-specific-bp.c:29^M
29 }^M
(gdb) FAIL: gdb.threads/thread-specific-bp.exp: non_stop=off: \
continue to end (timeout)
...
The problem is that the test-case tries to match the "[Thread ... exited]"
message which we do see with native testing:
...
Continuing.^M
[Thread 0x7ffff746e700 (LWP 7047) exited]^M
Thread-specific breakpoint 3 deleted - thread 2 no longer in the thread list.^M
...
The fact that the message is missing was reported as PR remote/30129.
We could add a KFAIL for this, but the functionality the test-case is trying
to test has nothing to do with the message, so it should pass. I only added
matching of the message in commit 2e5843d87c4 ("[gdb/testsuite] Fix
gdb.threads/thread-specific-bp.exp") to handle a race, not realizing doing so
broke testing on native-gdbserver.
Fix this by matching the "Thread-specific breakpoint $decimal deleted" message
instead.
Tested on x86_64-linux.
|
|
Fix test-cases for target board remote-gdbserver-on-localhost by using
gdb_remote_download.
Tested on x86_64-linux.
|
|
With test-case gdb.server/unittest.exp and a build with --disable-unit-tests I
get:
...
(gdb) builtin_spawn /data/vries/gdb/leap-15-4/build/gdbserver/gdbserver \
--selftest^M
Selftests have been disabled for this build.^M
UNSUPPORTED: gdb.server/unittest.exp: unit tests
...
but with target board remote-stdio-gdbserver I get instead:
...
(gdb) builtin_spawn /usr/bin/ssh -t -l vries localhost \
/data/vries/gdb/leap-15-4/build/gdbserver/gdbserver --selftest^M
Selftests have been disabled for this build.^M
Connection to localhost closed.^M^M
FAIL: gdb.server/unittest.exp: unit tests
...
Fix this by making the regexp less strict.
Tested on x86_64-linux.
|
|
With test-case gdb.server/unittest.exp and target board remote-stdio-gdbserver
I run into:
...
(gdb) builtin_spawn /usr/bin/ssh -t -l vries localhost /usr/bin/gdbserver \
--selftest^M
Selftests have been disabled for this build.^M
UNSUPPORTED: gdb.server/unittest.exp: unit tests
...
due to using the system gdbserver /usr/bin/gdbserver rather than the one from
the build.
Fix this by removing the hard-coding of /usr/bin/gdbserver in
remote-stdio-gdbserver, allowing find_gdbserver to do its work, such that we
have instead:
...
(gdb) builtin_spawn /usr/bin/ssh -t -l vries localhost \
/data/vries/gdb/leap-15-4/build/gdbserver/gdbserver --selftest^M
Running selftest remote_memory_tagging.^M
Ran 1 unit tests, 0 failed^M
Connection to localhost closed.^M^M
PASS: gdb.server/unittest.exp: unit tests
...
Tested on x86_64-linux.
|
|
Fix test-case gdb.server/sysroot.exp with target board
remote-gdbserver-on-localhost, by:
- using gdb_remote_download, and
- disabling the "local" scenario for remote host.
Tested on x86_64-linux.
|
|
Test-case gdb.server/multi-ui-errors.exp fails for target board
remote-gdbserver-on-localhost with REMOTE_TARGET_USERNAME=remote-target:
...
(gdb) PASS: gdb.server/multi-ui-errors.exp: interact with GDB's main UI
Executing on target: kill -9 6447 (timeout = 300)
builtin_spawn [open ...]^M
XYZ1ZYX
sh: line 0: kill: (6447) - Operation not permitted
...
The problem is that the kill command:
...
remote_exec target "kill -9 $gdbserver_pid"
...
intended to kill gdbserver instead tries to kill the ssh client session in
which the gdbserver runs, and fails because it's trying as the remote target
user (remote-target on localhost) to kill a pid owned by the the build user
($USER on localhost).
Fix this by getting the gdbserver pid using the ppid trick from
server-kill.exp.
Likewise in gdb.server/server-kill-python.exp.
Tested on x86_64-linux.
|
|
In commit 80dc83fd0e7 ("gdb/remote: handle target dying just before a stepi")
an observation is made that test-case gdb.server/server-kill.exp claims to
kill gdbserver, but actually kills the inferior. Consequently, the commit
adds testing of killing gdbserver alongside.
The problem is that:
- the original observation is incorrect (possibly caused by misreading getppid
as getpid)
- consequently, the test-case doesn't test killing the inferior, instead it
tests killing gdbserver twice
- the method to get the gdbserver PID added in the commit doesn't work
for target board remote-gdbserver-on-localhost, it returns the
PID of the ssh client session instead.
Fixing the method for getting the inferior PID gives us fails, and there's no
evidence that killing the inferior ever worked.
So, fix this by reverting the commit and just killing gdbserver, using the
original method of getting the gdbserver PID which does work for target board
remote-gdbserver-on-localhost.
Tested on x86_64-linux.
|
|
Test-case gdb.server/connect-with-no-symbol-file.exp fails with target board
remote-gdbserver-on-localhost.
The problem is here:
...
set target_exec [gdb_remote_download target $binfile.bak $binfile]
...
A "gdb_remote_download target" copies from build to target. So $binfile is
assumed to be a target path, but it's actually a build path.
Fix this by:
- fist copying $binfile.bak to $binfile, and
- simply doing [gdb_remote_download target $binfile].
Then, $binfile.bak is created here:
...
# Make sure we have the original symbol file in a safe place to copy from.
gdb_remote_download host $binfile $binfile.bak
...
and since "gdb_remote_download host" copies from build to host, $binfile.bak
is assumed to be a host path, but it's actually a build path. This happens to
cause no problems in this configuration (because build == host), but it would
for a remote host configuration.
So let's fix this by making build rather than host the "safe place to copy
from".
Tested on x86_64-linux.
|
|
The test fails on Power 10 with the RHEL9 distro. It also fails on
Power 9.
The test set a the breakpoint in main that stops at line:
a = 9; /* start here */. The test then sets a break point at the same
line where it wants to start the test and does a continue. GDB does not
stop again on the same line where it is stopped, but rather continues to
the end of the program.
Initialize variable A to zero so the break on main will stop before setting
a break point on line a = 9; /* start here */.
Make the match on the breakpoint number generic.
Patch has been tested on Power 10 with RHEL 9, Power 10 with Ubuntu 22.04,
and Power 9 with Fedora 36 with no regression failures.
|
|
Fix test-case gdb.threads/execl.exp on target board
remote-gdbserver-on-localhost using gdb_remote_download.
Tested on x86_64-linux.
|
|
Now that index cache files are written in the background, one test in
index-cache.exp is racy -- it assumes that the cache file will have
been written during startup.
This patch fixes the problem by introducing a new maintenance command
to wait for all pending writes to the index cache.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
|
|
Fix test-case gdb.base/skip-solib.exp for target board
remote-gdbserver-on-localhost using gdb_load_shlib.
Tested on x86_64-linux.
|
|
In test-case gdb.base/skip-solib.exp the linking against a shared library is
done manually:
...
if {[gdb_compile "${binfile_main}.o" "${binfile_main}" executable \
[list debug "additional_flags=-L$testobjdir" \
"additional_flags=-l${test}" \
"ldflags=-Wl,-rpath=$testobjdir"]] != ""} {
...
Instead, use the shlib gdb_compile option such that we simply have:
...
[list debug shlib=$binfile_lib]] != ""} {
...
Tested on x86_64-linux.
|
|
remote target
Fix test-case gdb.base/fork-no-detach-follow-child-dlopen.exp for target board
remote-gdbserver-on-localhost.exp by using gdb_download_shlib and gdb_locate_shlib.
Tested on x86_64-linux.
|
|
With test-case gdb.base/break-probes.exp and target board
remote-gdbserver-on-localhost (using REMOTE_TARGET_USERNAME) we run into some
failures.
Fix these by adding the missing gdb_download_shlib and gdb_locate_shlib.
Tested on x86_64-linux.
|
|
remote-gdbserver-on-localhost
Fix test-case gdb.dwarf2/dw2-zero-range.exp for target board
remote-gdbserver-on-localhost using gdb_load_shlib.
Tested on x86_64-linux.
|
|
remote-gdbserver-on-localhost
With test-case gdb.base/signals-state-child.exp on target board
remote-gdbserver-on-localhost I run into:
...
builtin_spawn /usr/bin/ssh -t -l remote-target localhost \
$outputs/gdb.base/signals-state-child/signals-state-child-standalone^M
bash: $outputs/gdb.base/signals-state-child/signals-state-child-standalone: \
Permission denied^M
Connection to localhost closed.^M^M
FAIL: gdb.base/signals-state-child.exp: collect standalone signals state
...
The problem is that we're trying to run an executable on the target board using
a host path.
After fixing this by downloading the exec to the target board, we run into:
...
builtin_spawn /usr/bin/ssh -t -l remote-target localhost \
signals-state-child-standalone^M
bash: signals-state-child-standalone: command not found^M
Connection to localhost closed.^M^M
FAIL: gdb.base/signals-state-child.exp: collect standalone signals state
...
Fix this by using an absolute path name for the exec on the target board.
The dejagnu proc standard_file does not support op == "absolute" for target
boards, so add an implementation in remote-gdbserver-on-localhost.exp.
Also:
- fix a PATH-in-test-name issue
- cleanup gdb.txt and standalone.txt on target board
Tested on x86_64-linux.
|
|
remote-gdbserver-on-localhost
Test-case gdb.cp/breakpoint-shlib-func.exp fails with target board
remote-gdbserver-on-localhost.
Fix this by adding the missing gdb_load_shlib.
Tested on x86_64-linux.
|
|
On AIX, the debugger cannot access vector registers before they
are first used by the inferior. Hence we change the test case
such that some vector registers are accessed by the variable 'x' in AIX
and other targets are not affected as a consequence of the same.
|
|
When running test-cases gdb.mi/*.exp with target board
remote-gdbserver-on-localhost, we run into a few fails.
Fix these (and make things more similar to the gdb.exp procs) by:
- factoring out mi_load_shlib out of mi_load_shlibs
- making mi_load_shlib use gdb_download_shlib, like
gdb_load_shlib
- factoring out mi_locate_shlib out of mi_load_shlib
- making mi_locate_shlib check for mi_spawn_id, like
gdb_locate_shlib
- using gdb_download_shlib and mi_locate_shlib in the test-cases.
Tested on x86_64-linux, with and without target board
remote-gdbserver-on-localhost.
|
|
Use a pthread_barrier to ensure the child thread is started before
the main thread gets to the first breakpoint.
|
|
Use a pthread_barrier to ensure all threads are started before
proceeding to the breakpoint where info threads output is checked.
|
|
- Some OS's use a different syscall for exit(). For example, the
BSD's use SYS_exit rather than SYS_exit_group. Update the C source
file and the expect script to support SYS_exit as an alternative to
SYS_exit_group.
- The cross-arch syscall number tests are all Linux-specific with
hardcoded syscall numbers specific to Linux kernels. Skip these
tests on non-Linux systems. FreeBSD kernels for example use the
same system call numbers on all platforms, so the test is also not
relevant on FreeBSD.
|
|
Setting the stack size to 2*PTHREAD_STACK_MIN actually lowered the
stack on FreeBSD rather than raising it causing non-main threads in
the test program to overflow their stack and crash. Double the
existing stack size rather than assuming that the initial stack size
is PTHREAD_STACK_MIN.
|
|
The orig_rax pseudo-register is Linux-specific and isn't relevant to
this test. The fs_base and gs_base registers are also not treated as
system registers in other OS ABIs. This allows the test to pass on
FreeBSD.
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
Test-case gdb.base/gdb-caching-proc.exp doesn't really test gdb, but it tests
the gdb_caching_procs in the testsuite, so it belongs in gdb.testsuite rather
than gdb.base.
Move test-case gdb.base/gdb-caching-proc.exp to gdb.testsuite, renaming it to
gdb.testsuite/gdb-caching-proc-consistency.exp to not clash with
recently added gdb.testsuite/gdb-caching-proc.exp.
Tested on x86_64-linux.
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
Test-case gdb.base/morestack.exp contains:
...
require {have_compile_flag -fsplit-stack}
...
and I want to cache the result of have_compile_flag.
Currently gdb_caching_proc doesn't allow args, so I could add:
...
gdb_caching_proc have_compile_flag_fsplit_stack {
return [have_compile_flag -fsplit-stack]
}
...
and then use that proc instead, but I find this cumbersome and
maintenance-unfriendly.
Instead, allow args in a gdb_caching_proc, such that I can simply do:
...
-proc have_compile_flag { flag } {
+gdb_caching_proc have_compile_flag { flag } {
...
Note that gdb_caching_procs with args do not work with the
gdb.base/gdb-caching-procs.exp test-case, so those procs are skipped.
Tested on x86_64-linux.
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
A regular tcl proc with no args looks like:
...
proc foo {} {
return 1
}
...
but a gdb_caching_proc deviates from that syntax by dropping the explicit no
args bit:
...
gdb_caching_proc foo {
return 1
}
...
Make the gdb_caching_proc use the same syntax as regular procs, such that we
have instead:
...
gdb_caching_proc foo {} {
return 1
}
...
Tested on x86_64-linux.
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
Add test-case gdb.testsuite/gdb-caching-proc.exp that excercises
gdb_caching_proc.
Tested on x86_64-linux.
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
The DAP stackTrace implementation did not fully account for frames
without debuginfo. Attemping this would yield a result like:
{"request_seq": 5, "type": "response", "command": "stackTrace", "success": false, "message": "'NoneType' object has no attribute 'filename'", "seq": 11}
This patch fixes the problem by adding another check for None.
|
|
When running gdb.base/bg-exec-sigint-bp-cond.exp when SHELL is dash,
rather than bash, I get:
c&^M
Continuing.^M
(gdb) sh: 1: kill: Illegal option -S^M
^M
Breakpoint 2, foo () at /home/jenkins/smarchi/binutils-gdb/build/gdb/testsuite/../../../gdb/testsuite/gdb.base/bg-exec-sigint-bp-cond.c:23^M
23 return 0;^M
FAIL: gdb.base/bg-exec-sigint-bp-cond.exp: no force memory write: SIGINT does not interrupt background execution (timeout)
This is because it uses the kill command built-in the dash shell, and
using the SIG prefix with kill does not work with dash's kill. The
difference is listed in the documentation for bash's POSIX-correct mode
[1]:
The kill builtin does not accept signal names with a ‘SIG’ prefix.
Replace SIGINT with INT in that test.
By grepping, I found two other instances (gdb.base/sigwinch-notty.exp
and gdb.threads/detach-step-over.exp). Those were not problematic on my
system though. Since they are done through remote_exec, they don't go
through the shell and therefore invoke /bin/kill. On my Arch Linux,
it's:
$ /bin/kill --version
kill from util-linux 2.38.1 (with: sigqueue, pidfd)
and on my Ubuntu:
$ /bin/kill --version
kill from procps-ng 3.3.17
These two implementations accept "-SIGINT". But according to the POSIX
spec [2], the kill utility should recognize the signal name without the
SIG prefix (if it recognizes them with the SIG prefix, it's an
extension):
-s signal_name
Specify the signal to send, using one of the symbolic names defined
in the <signal.h> header. Values of signal_name shall be recognized
in a case-independent fashion, without the SIG prefix. In addition,
the symbolic name 0 shall be recognized, representing the signal
value zero. The corresponding signal shall be sent instead of SIGTERM.
-signal_name
[XSI] [Option Start]
Equivalent to -s signal_name. [Option End]
So, just in case some /bin/kill implementation happens to not recognize
the SIG prefixes, change these two other calls to remove the SIG
prefix.
[1] https://www.gnu.org/software/bash/manual/html_node/Bash-POSIX-Mode.html
[2] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html
Change-Id: I81ccedd6c9428ab63b9261813f1905a18941f8da
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
Use "set always-read-ctf on" instead of --strip-debug in the ctf test-cases.
Tested on x86_64-linux.
|
|
Simon pointed out that the recent patch to add half-float support to
'x/f' caused a couple of regressions in long_long.exp. This patch
fixes these by updating the expected results.
|
|
Using 'x/hf' should print bytes as float16, but instead it currently
prints as an integer. I tracked this down to a missing case in
float_type_from_length.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30161
Approved-By: Simon Marchi <simon.marchi@efficios.com>
|
|
[ This is a simplified rewrite of an earlier submission "[RFC][gdb/symtab] Add
maint set symbol-read-order", submitted here (
https://sourceware.org/pipermail/gdb-patches/2022-September/192044.html
). ]
With the test-case included in this patch, we run into:
...
(gdb) file dwarf2-and-ctf
(gdb) print var_ctf^M
'var_ctf' has unknown type; cast it to its declared type^M
...
The problem is that the executable contains both ctf and dwarf2, so the ctf
info (which contains the type information about var_ctf) is ignored.
GDB has support for handling multiple debug formats, but the common use case
for ctf is to be used when dwarf2 is not present, and gdb reflects that,
assuming that by reading ctf in addition there won't be any extra information,
so it's not worth the additional cycles and memory.
Add a new command "set/show always-read-ctf on/off", that when on forces
unconditional reading of ctf, allowing us to do:
...
(gdb) set always-read-ctf on
(gdb) file dwarf2-and-ctf
(gdb) print var_ctf^M
$2 = 2^M
...
The setting is off by default, preserving current behaviour.
A bit of background on the relevance of reading order: the formats have a
priority relationship between them, where reading earlier means lower
priority. By reading the format with the most detail last, we ensure it has
the highest priority, which makes sure that in case there is overlapping info,
the most detailed info is found. This explains the current reading order of
mdebug, stabs and dwarf2.
Add the unconditional reading of ctf before dwarf2, because it's less detailed
than dwarf2. The conditional reading of ctf is still done after the attempt to
read dwarf2, necessarily so because we only know whether there's dwarf2 after
we've tried to read it.
The new command allow us to replace uses of -Wl,--strip-debug added in commit
908a926ec4e ("[gdb/testsuite] Fix ctf test-cases on openSUSE Tumbleweed") by
uses of "set always-read-ctf on", but I've left that for another commit.
Tested on x86_64-linux.
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
The copyright years in the ROCm files (e.g. solib-rocm.c) are wrong,
they end in 2022 instead of 2023. I suppose because I posted (or at
least prepared) the patches in 2022 but merged them in 2023, and forgot
to update the year. I found a bunch of other files that are in the same
situation. Fix them all up.
Change-Id: Ia55f5b563606c2ba6a89046f22bc0bf1c0ff2e10
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
I ran into:
...
(gdb) PASS: gdb.python/py-record-btrace.exp: function call: \
python print(c.prev)
python print(c == c.next.prev)^M
Traceback (most recent call last):^M
File "<string>", line 1, in <module>^M
AttributeError: 'NoneType' object has no attribute 'prev'^M
Error while executing Python code.^M
(gdb) FAIL: gdb.python/py-record-btrace.exp: function call: \
python print(c == c.next.prev)
...
due to having only 4 insn instead of 100:
...
python print(len(insn))^M
4^M
...
This could be caused by the same hw bug as we already have an xfail for, so
expand the xfail matching.
Tested on x86_64-linux.
PR testsuite/30185
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30185
Approved-By: Markus T. Metzger <markus.t.metzger@intel.com>
|
|
When debugging GDB, I find it a bit tedious to inspect htab_t objects.
It is possible to find the entries by poking at the fields, but it's
annoying to do each time. I think a pretty printer would help. Add a
basic one to gdb-gdb.py.
The pretty printer advertises itself as "array-like", and the result
looks like:
(top-gdb) p bfcache
$3 = htab_t with 3 elements = {0x6210003252a0, 0x62100032caa0, 0x62100033baa0}
The htab_t itself doesn't know about the type of pointed objects. But
it's easy enough to cast the addresses to the right type to use them:
(top-gdb) print *((btrace_frame_cache *) 0x6210003252a0)
$6 = {tp = 0x61700002ed80, frame = 0x6210003251e0, bfun = 0x62000000b390}
Change-Id: Ia692e3555fe7a117b7ec087840246b1260a704c6
Reviewed-By: Tom Tromey <tom@tromey.com>
|
|
On powerpc64le-linux, I run into two timeouts:
...
FAIL: gdb.python/py-breakpoint.exp: test_watchpoints: \
Test watchpoint write (timeout)
FAIL: gdb.python/py-breakpoint.exp: test_bkpt_internal: \
Test watchpoint write (timeout)
...
In this case, hw watchpoints are not supported, and using sw watchpoints
is slow.
Most of the time is spent in handling a try-catch, which triggers a malloc. I
think this bit is more relevant for the "catch throw" part of the test-case,
so fix the timeouts by setting the watchpoints after the try-catch.
Tested on x86_64-linux and powerpc64le-linux.
|
|
On x86_64-linux, I have:
...
(gdb) watch -location y^M
Hardware watchpoint 2: -location y^M
(gdb) PASS: gdb.rust/watch.exp: watch -location y
...
but on powerpc64le-linux, I run into:
...
(gdb) watch -location y^M
Watchpoint 2: -location y^M
(gdb) FAIL: gdb.rust/watch.exp: watch -location y
...
due to the regexp matching "Hardware watchpoint" but not "Watchpoint":
...
gdb_test "watch -location y" ".*watchpoint .* -location .*"
...
Fix this by making the regexp less restrictive.
Tested on x86_64-linux and powerpc64le-linux.
|
|
Background
----------
When a thread-specific breakpoint is deleted as a result of the
specific thread exiting the function remove_threaded_breakpoints is
called which sets the disposition of the breakpoint to
disp_del_at_next_stop and sets the breakpoint number to 0. Setting
the breakpoint number to zero has the effect of hiding the breakpoint
from the user. We also print a message indicating that the breakpoint
has been deleted.
It was brought to my attention during a review of another patch[1]
that setting a breakpoints number to zero will suppress the MI
breakpoint-deleted notification for that breakpoint, and indeed, this
can be seen to be true, in delete_breakpoint, if the breakpoint number
is zero, then GDB will not notify the breakpoint_deleted observer.
It seems wrong that a user created, thread-specific breakpoint, will
have a =breakpoint-created notification, but will not have a
=breakpoint-deleted notification. I suspect that this is a bug.
[1] https://sourceware.org/pipermail/gdb-patches/2023-February/196560.html
The First Problem
-----------------
During my initial testing I wanted to see how GDB handled the
breakpoint after it's number was set to zero. To do this I created
the testcase gdb.threads/thread-bp-deleted.exp. This test creates a
worker thread, which immediately exits. After the worker thread has
exited the main thread spins in a loop.
In GDB I break once the worker thread has been created and place a
thread-specific breakpoint, then use 'continue&' to resume the
inferior in non-stop mode. The worker thread then exits, but the main
thread never stops - instead it sits in the spin. I then tried to use
'maint info breakpoints' to see what GDB thought of the
thread-specific breakpoint.
Unfortunately, GDB crashed like this:
(gdb) continue&
Continuing.
(gdb) [Thread 0x7ffff7c5d700 (LWP 1202458) exited]
Thread-specific breakpoint 3 deleted - thread 2 no longer in the thread list.
maint info breakpoints
... snip some output ...
Fatal signal: Segmentation fault
----- Backtrace -----
0x5ffb62 gdb_internal_backtrace_1
../../src/gdb/bt-utils.c:122
0x5ffc05 _Z22gdb_internal_backtracev
../../src/gdb/bt-utils.c:168
0x89965e handle_fatal_signal
../../src/gdb/event-top.c:964
0x8997ca handle_sigsegv
../../src/gdb/event-top.c:1037
0x7f96f5971b1f ???
/usr/src/debug/glibc-2.30-2-gd74461fa34/nptl/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0xe602b0 _Z15print_thread_idP11thread_info
../../src/gdb/thread.c:1439
0x5b3d05 print_one_breakpoint_location
../../src/gdb/breakpoint.c:6542
0x5b462e print_one_breakpoint
../../src/gdb/breakpoint.c:6702
0x5b5354 breakpoint_1
../../src/gdb/breakpoint.c:6924
0x5b58b8 maintenance_info_breakpoints
../../src/gdb/breakpoint.c:7009
... etc ...
As the thread-specific breakpoint is set to disp_del_at_next_stop, and
GDB hasn't stopped yet, then the breakpoint still exists in the global
breakpoint list.
The breakpoint will not show in 'info breakpoints' as its number is
zero, but it will show in 'maint info breakpoints'.
As GDB prints the breakpoint, the thread-id for the breakpoint is
printed as part of the 'stop only in thread ...' line. Printing the
thread-id involves calling find_thread_global_id to convert the global
thread-id into a thread_info*. Then calling print_thread_id to
convert the thread_info* into a string.
The problem is that find_thread_global_id returns nullptr as the
thread for the thread-specific breakpoint has exited. The
print_thread_id assumes it will be passed a non-nullptr. As a result
GDB crashes.
In this commit I've added an assert to print_thread_id (gdb/thread.c)
to check that the pointed passed in is not nullptr. This assert would
have triggered in the above case before GDB crashed.
MI Notifications: The Dangers Of Changing A Breakpoint's Number
---------------------------------------------------------------
Currently the delete_breakpoint function doesn't trigger the
breakpoint_deleted observer for any breakpoint with the number zero.
There is a comment explaining why this is the case in the code; it's
something about watchpoints. But I did consider just removing the 'is
the number zero' guard and always triggering the breakpoint_deleted
observer, figuring that I'd then fix the watchpoint issue some other
way.
But I realised this wasn't going to be good enough. When the MI
notification was delivered the number would be zero, so any frontend
parsing the notifications would not be able to match
=breakpoint-deleted notification to the earlier =breakpoint-created
notification.
What this means is that, at the point the breakpoint_deleted observer
is called, the breakpoint's number must be correct.
MI Notifications: The Dangers Of Delaying Deletion
--------------------------------------------------
The test I used to expose the above crash also brought another problem
to my attention. In the above test we used 'continue&' to resume,
after which a thread exited, but the inferior didn't stop. Recreating
the same test in the MI looks like this:
-break-insert -p 2 main
^done,bkpt={number="2",type="breakpoint",disp="keep",...<snip>...}
(gdb)
-exec-continue
^running
*running,thread-id="all"
(gdb)
~"[Thread 0x7ffff7c5d700 (LWP 987038) exited]\n"
=thread-exited,id="2",group-id="i1"
~"Thread-specific breakpoint 2 deleted - thread 2 no longer in the thread list.\n"
At this point the we have a single thread left, which is still
running:
-thread-info
^done,threads=[{id="1",target-id="Thread 0x7ffff7c5eb80 (LWP 987035)",name="thread-bp-delet",state="running",core="4"}],current-thread-id="1"
(gdb)
Notice that we got the =thread-exited notification from GDB as soon as
the thread exited. We also saw the CLI line from GDB, the line
explaining that breakpoint 2 was deleted. But, as expected, we didn't
see the =breakpoint-deleted notification.
I say "as expected" because the number was set to zero. But, even if
the number was not set to zero we still wouldn't see the
notification. The MI notification is driven by the breakpoint_deleted
observer, which is only called when we actually delete the breakpoint,
which is only done the next time GDB stops.
Now, maybe this is fine. The notification is delivered a little
late. But remember, by setting the number to zero the breakpoint will
be hidden from the user, for example, the breakpoint is removed from
the MI's -break-info command output.
This means that GDB is in a position where the breakpoint doesn't show
up in the breakpoint table, but a =breakpoint-deleted notification has
not yet been sent out. This doesn't seem right to me.
What this means is that, when the thread exits, we should immediately
be sending out the =breakpoint-deleted notification. We should not
wait for GDB to next stop before sending the notification.
The Solution
------------
My proposed solution is this; in remove_threaded_breakpoints, instead
of setting the disposition to disp_del_at_next_stop and setting the
number to zero, we now just call delete_breakpoint directly.
The notification will now be sent out immediately; as soon as the
thread exits.
As the number has not changed when delete_breakpoint is called, the
notification will have the correct number.
And as the breakpoint is immediately removed from the breakpoint list,
we no longer need to worry about 'maint info breakpoints' trying to
print the thread-id for an exited thread.
My only concern is that calling delete_breakpoint directly seems so
obvious that I wonder why the original patch (that added
remove_threaded_breakpoints) didn't take this approach. This code was
added in commit 49fa26b0411d, but the commit message offers no clues
to why this approach was taken, and the original email thread offers
no insights either[2]. There are no test regressions after making
this change, so I'm hopeful that this is going to be fine.
[2] https://sourceware.org/pipermail/gdb-patches/2013-September/106493.html
The Complication
----------------
Of course, it couldn't be that simple.
The script gdb.python/py-finish-breakpoint.exp had some regressions
during testing.
The problem was with the FinishBreakpoint.out_of_scope callback
implementation. This callback is supposed to trigger whenever the
FinishBreakpoint goes out of scope; and this includes when the thread
for the breakpoint exits.
The problem I ran into is the Python FinishBreakpoint implementation.
Specifically, after this change I was loosing some of the out_of_scope
calls.
The problem is that the out_of_scope call (of which I'm interested) is
triggered from the inferior_exit observer. Before my change the
observers were called in this order:
thread_exit
inferior_exit
breakpoint_deleted
The inferior_exit would trigger the out_of_scope call.
After my change the breakpoint_deleted notification (for
thread-specific breakpoints) occurs earlier, as soon as the
thread-exits, so now the order is:
thread_exit
breakpoint_deleted
inferior_exit
Currently, after the breakpoint_deleted call the Python object
associated with the breakpoint is released, so, when we get to the
inferior_exit observer, there's no longer a Python object to call the
out_of_scope method on.
My solution is to follow the model for how bpfinishpy_pre_stop_hook
and bpfinishpy_post_stop_hook are called, this is done from
gdbpy_breakpoint_cond_says_stop in py-breakpoint.c.
I've now added a new bpfinishpy_pre_delete_hook
gdbpy_breakpoint_deleted in py-breakpoint.c, and from this new hook
function I check and where needed call the out_of_scope method.
With this fix in place I now see the
gdb.python/py-finish-breakpoint.exp test fully passing again.
Testing
-------
Tested on x86-64/Linux with unix, native-gdbserver, and
native-extended-gdbserver boards.
New tests added to covers all the cases I've discussed above.
Approved-By: Pedro Alves <pedro@palves.net>
|
|
I currently see this failure when running the gdb.mi/mi-pending.exp
test using the native-extended-remote board:
-break-insert -f -c x==4 mi-pendshr.c:pendfunc2
&"No source file named mi-pendshr.c.\n"
^done,bkpt={number="2",type="breakpoint",disp="keep",enabled="y",addr="<PENDING>",pending="mi-pendshr.c:pendfunc2",cond="x==4",evaluated-by="host",times="0",original-location="mi-pendshr.c:pendfunc2"}
(gdb)
FAIL: gdb.mi/mi-pending.exp: MI pending breakpoint on mi-pendshr.c:pendfunc2 if x==4 (unexpected output)
The failure is caused by the 'evaluated-by="host"' string, which only
appears in the output when the test is run using the
native-extended-remote board.
I could fix this by just updating the pattern in
gdb.mi/mi-pending.exp, but I have instead updated mi-pending.exp to
make more use of the support procs in mi-support.exp. This did
require making a couple of adjustments to mi-support.exp, but I think
the result is that mi-pending.exp is now easier to read, and I see no
failures with native-extended-remote anymore.
One of the test names has changed after this work, I think the old
test name was wrong - it described a breakpoint as pending when the
breakpoint was not pending, I suspect a copy & paste error.
But there's no changes to what is actually being tested after this
patch.
Approved-By: Pedro Alves <pedro@palves.net>
|
|
I noticed that several tests included copy & pasted code to run the
'maint show target-non-stop' command, and then switch based on the
result.
In this commit I factor this code out into a helper proc in
lib/gdb.exp, and update all the places I could find that used this
pattern to make use of the helper proc.
There should be no change in what is tested after this commit.
Reviewed-By: Pedro Alves <pedro@palves.net>
|
|
Introduce foreach_mi_ui_mode, a helper proc which can be used when
tests are going to be repeated once with the MI in the main UI, and
once with the MI on a separate UI.
The proc is used like this:
foreach_mi_ui_mode VAR {
# BODY
}
The BODY will be run twice, once with VAR set to 'main' and once with
VAR set to 'separate', inside BODY we can then change the behaviour
based on the current UI mode.
The point of this proc is that we sometimes shouldn't run the separate
UI tests (when gdb_debug_enabled is true), and this proc hides all
this logic. If the separate UI mode should not be used then BODY will
be run just once with VAR set to 'main'.
I've updated two tests that can make use of this helper proc. I'm
going to add another similar test in a later commit.
There should be no change to what is tested with this commit.
Approved-By: Pedro Alves <pedro@palves.net>
|
|
The mi_clean_restart proc calls the mi_gdb_start proc passing no
arguments.
In this commit I add an extra (optional) argument to the
mi_clean_restart proc, and pass this through to mi_gdb_start.
The benefit of this is that we can now use mi_clean_restart when we
also want to pass the 'separate-mi-tty' or 'separate-inferior-tty'
flags to mi_gdb_start, and avoids having to otherwise duplicate the
contents of mi_clean_restart in different tests.
I've updated the obvious places where this new functionality can be
used, and I'm seeing no test regressions.
Reviewed-By: Pedro Alves <pedro@palves.net>
|
|
Building on the previous commit, now that the breakpoint related
support functions in lib/mi-support.exp can now help creating the
patterns for thread specific breakpoints, make use of this
functionality for gdb.mi/mi-nsmoribund.exp and gdb.mi/mi-pending.exp.
There should be no changes in what is tested after this commit.
Reviewed-By: Pedro Alves <pedro@palves.net>
|
|
When creating a thread-specific breakpoint with a single location, the
'thread' field would be repeated in the MI output. This can be seen
in two existing tests gdb.mi/mi-nsmoribund.exp and
gdb.mi/mi-pending.exp, e.g.:
(gdb)
-break-insert -p 1 bar
^done,bkpt={number="1",type="breakpoint",disp="keep",
enabled="y",
addr="0x000000000040110a",func="bar",
file="/tmp/mi-thread-specific-bp.c",
fullname="/tmp/mi-thread-specific-bp.c",
line="32",thread-groups=["i1"],
thread="1",thread="1", <================ DUPLICATION!
times="0",original-location="bar"}
I know we need to be careful when adjusting MI output, but I'm hopeful
in this case, as the field is duplicated, and the field contents are
always identical, that we might get away with removing one of the
duplicates.
The change in GDB is a fairly trivial condition change.
We did have a couple of tests that contained the duplicate fields in
their expected output, but given there was no comment pointing out
this oddity either in the GDB code, or in the test, I suspect this was
more a case of copying whatever output GDB produced and using that as
the expected results. I've updated these tests to remove the
duplication.
I've update lib/mi-support.exp to provide support for building
breakpoint patterns that contain the thread field, and I've made use
of this in a new test I've added that is just about creating
thread-specific breakpoints and checking the results. The two tests I
mentioned above as being updated could also use the new
lib/mi-support.exp functionality, but I'm going to do that in a later
patch, this way it is clear what changes I'm actually proposing to
make to the expected output.
As I said, I hope that frontends will be able to handle this change,
but I still think its worth adding a NEWS entry, that way, if someone
runs into problems, there's a chance they can figure out what's going
on.
This should not impact CLI output at all.
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Approved-By: Pedro Alves <pedro@palves.net>
|
|
Hannes filed a bug showing a crash, where a pretty-printer written in
Python could cause a use-after-free. He sent a patch, but I thought a
different approach was needed.
In a much earlier patch (see bug #12533), we changed the Python code
to release new values from the value chain when constructing a
gdb.Value. The rationale for this is that if you write a command that
does a lot of computations in a loop, all the values will be kept live
by the value chain, resulting in gdb using a large amount of memory.
However, suppose a value is passed to Python from some code in gdb
that needs to use the value after the call into Python. In this
scenario, value_to_value_object will still release the value -- and
because gdb code doesn't generally keep strong references to values (a
consequence of the ancient decision to use the value chain to avoid
memory management), this will result in a use-after-free.
This scenario can happen, as it turns out, when a value is passed to
Python for pretty-printing. Now, normally this route boxes the value
via value_to_value_object_no_release, avoiding the problematic release
from the value chain. However, if you then call Value.cast, the
underlying value API might return the same value, when is then
released from the chain.
This patch fixes the problem by changing how value boxing is done.
value_to_value_object no longer removes a value from the chain.
Instead, every spot in gdb that might construct new values uses a
scoped_value_mark to ensure that the requirements of bug #12533 are
met. And, because incoming values aren't ever released from the chain
(the Value.cast one comes earlier on the chain than the
scoped_value_mark), the bug can no longer occur. (Note that many
spots in the Python layer already take this approach, so not many
places needed to be touched.)
In the future I think we should replace the use of raw "value *" with
value_ref_ptr pretty much everywhere. This will ensure lifetime
safety throughout gdb.
The test case in this patch comes from Hannes' original patch. I only
made a trivial ("require") change to it. However, while this fails
for him, I can't make it fail on this machine; nevertheless, he tried
my patch and reported the bug as being fixed.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30044
|
|
With:
- catch a fork in thread 1
- select thread 2
- set follow-fork child
- next
... follow_fork notices that thread 1 had last stopped for a fork
which hasn't been followed yet, and because thread 1 is not the
current thread, GDB aborts the execution command, presenting the stop
in thread 1.
That makes sense, as only the forking thread (thread 1) survives in
the child, so better stop and let the user decide how to proceed.
However, with:
- catch a fork in thread 1
- select thread 2
- set follow-fork parent << note difference here
- next
... GDB does the same: follow_fork notices that thread 1 had last
stopped for a fork which hasn't been followed yet, and because thread
1 is not the current thread, GDB aborts the execution command,
presenting the stop in thread 1.
Aborting/stopping in this case doesn't make sense to me. As we're
following the parent, thread 2 will still continue to exist in the
parent. What the child does after we've followed the parent shouldn't
matter -- it can go on running free, be detached, etc., depending on
"set schedule-multiple", "set detach-on-fork", etc. That does not
influence the execution command that the user issued for the parent
thread.
So this patch changes GDB in that direction -- in follow_fork, if
following the parent, and we've switched threads meanwhile, switch
back to the unfollowed thread, follow it (stay with the parent), and
don't abort/stop. If we're following a fork (as opposed to vfork),
then switch back again to the thread that the user was trying to
resume. If following a vfork, however, stay with the vforking-thread
selected, as we will need to see a vfork_done event first, before we
can resume any other thread.
As I was working on this, I managed to end up calling target_resume
for a solo-thread resume (to collect the vfork_done event), with
scope_ptid pointing at the vfork parent thread, and inferior_ptid
pointing to the vfork child. For a solo-thread resume, the scope_ptid
argument to target_resume must the same as inferior_ptid. The mistake
was caught by the assertion in target_resume, like so:
...
[infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [1722839.1722839.0] at 0x5555555553c3
[infrun] do_target_resume: resume_ptid=1722839.1722939.0, step=0, sig=GDB_SIGNAL_0
../../src/gdb/target.c:2661: internal-error: target_resume: Assertion `inferior_ptid.matches (scope_ptid)' failed.
...
but I think it doesn't hurt to catch such a mistake earlier, hence the
change in internal_resume_ptid.
Change-Id: I896705506a16d2488b1bfb4736315dd966f4e412
|