Age | Commit message (Collapse) | Author | Files | Lines |
|
This patch teaches the Linux GDBserver backend to report clone events
to GDB, when GDB has requested them with the GDB_THREAD_OPTION_CLONE
thread option, via the new QThreadOptions packet.
This shuffles code in linux_process_target::handle_extended_wait
around to a more logical order when we now have to handle and
potentially report all of fork/vfork/clone.
Raname lwp_info::fork_relative -> lwp_info::relative as the field is
no longer only about (v)fork.
With this, gdb.threads/stepi-over-clone.exp now cleanly passes against
GDBserver, so remove the native-target-only requirement from that
testcase.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I3a19bc98801ec31e5c6fdbe1ebe17df855142bb2
|
|
(A good chunk of the problem statement in the commit log below is
Andrew's, adjusted for a different solution, and for covering
displaced stepping too. The testcase is mostly Andrew's too.)
This commit addresses bugs gdb/19675 and gdb/27830, which are about
stepping over a breakpoint set at a clone syscall instruction, one is
about displaced stepping, and the other about in-line stepping.
Currently, when a new thread is created through a clone syscall, GDB
sets the new thread running. With 'continue' this makes sense
(assuming no schedlock):
- all-stop mode, user issues 'continue', all threads are set running,
a newly created thread should also be set running.
- non-stop mode, user issues 'continue', other pre-existing threads
are not affected, but as the new thread is (sort-of) a child of the
thread the user asked to run, it makes sense that the new threads
should be created in the running state.
Similarly, if we are stopped at the clone syscall, and there's no
software breakpoint at this address, then the current behaviour is
fine:
- all-stop mode, user issues 'stepi', stepping will be done in place
(as there's no breakpoint to step over). While stepping the thread
of interest all the other threads will be allowed to continue. A
newly created thread will be set running, and then stopped once the
thread of interest has completed its step.
- non-stop mode, user issues 'stepi', stepping will be done in place
(as there's no breakpoint to step over). Other threads might be
running or stopped, but as with the continue case above, the new
thread will be created running. The only possible issue here is
that the new thread will be left running after the initial thread
has completed its stepi. The user would need to manually select
the thread and interrupt it, this might not be what the user
expects. However, this is not something this commit tries to
change.
The problem then is what happens when we try to step over a clone
syscall if there is a breakpoint at the syscall address.
- For both all-stop and non-stop modes, with in-line stepping:
+ user issues 'stepi',
+ [non-stop mode only] GDB stops all threads. In all-stop mode all
threads are already stopped.
+ GDB removes s/w breakpoint at syscall address,
+ GDB single steps just the thread of interest, all other threads
are left stopped,
+ New thread is created running,
+ Initial thread completes its step,
+ [non-stop mode only] GDB resumes all threads that it previously
stopped.
There are two problems in the in-line stepping scenario above:
1. The new thread might pass through the same code that the initial
thread is in (i.e. the clone syscall code), in which case it will
fail to hit the breakpoint in clone as this was removed so the
first thread can single step,
2. The new thread might trigger some other stop event before the
initial thread reports its step completion. If this happens we
end up triggering an assertion as GDB assumes that only the
thread being stepped should stop. The assert looks like this:
infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.
- For both all-stop and non-stop modes, with displaced stepping:
+ user issues 'stepi',
+ GDB starts the displaced step, moves thread's PC to the
out-of-line scratch pad, maybe adjusts registers,
+ GDB single steps the thread of interest, [non-stop mode only] all
other threads are left as they were, either running or stopped.
In all-stop, all other threads are left stopped.
+ New thread is created running,
+ Initial thread completes its step, GDB re-adjusts its PC,
restores/releases scratchpad,
+ [non-stop mode only] GDB resumes the thread, now past its
breakpoint.
+ [all-stop mode only] GDB resumes all threads.
There is one problem with the displaced stepping scenario above:
3. When the parent thread completed its step, GDB adjusted its PC,
but did not adjust the child's PC, thus that new child thread
will continue execution in the scratch pad, invoking undefined
behavior. If you're lucky, you see a crash. If unlucky, the
inferior gets silently corrupted.
What is needed is for GDB to have more control over whether the new
thread is created running or not. Issue #1 above requires that the
new thread not be allowed to run until the breakpoint has been
reinserted. The only way to guarantee this is if the new thread is
held in a stopped state until the single step has completed. Issue #3
above requires that GDB is informed of when a thread clones itself,
and of what is the child's ptid, so that GDB can fixup both the parent
and the child.
When looking for solutions to this problem I considered how GDB
handles fork/vfork as these have some of the same issues. The main
difference between fork/vfork and clone is that the clone events are
not reported back to core GDB. Instead, the clone event is handled
automatically in the target code and the child thread is immediately
set running.
Note we have support for requesting thread creation events out of the
target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported
for the new/child thread. That would be sufficient to address in-line
stepping (issue #1), but not for displaced-stepping (issue #3). To
handle displaced-stepping, we need an event that is reported to the
_parent_ of the clone, as the information about the displaced step is
associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED
includes no indication of which thread is the parent that spawned the
new child. In fact, for some targets, like e.g., Windows, it would be
impossible to know which thread that was, as thread creation there
doesn't work by "cloning".
The solution implemented here is to model clone on fork/vfork, and
introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is
similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except
that we end up with a new thread in the same process, instead of a new
thread of a new process. Like FORKED and VFORKED, THREAD_CLONED
waitstatuses have a child_ptid property, and the child is held stopped
until GDB explicitly resumes it. This addresses the in-line stepping
case (issues #1 and #2).
The infrun code that handles displaced stepping fixup for the child
after a fork/vfork event is thus reused for THREAD_CLONE, with some
minimal conditions added, addressing the displaced stepping case
(issue #3).
The native Linux backend is adjusted to unconditionally report
TARGET_WAITKIND_THREAD_CLONED events to the core.
Following the follow_fork model in core GDB, we introduce a
target_follow_clone target method, which is responsible for making the
new clone child visible to the rest of GDB.
Subsequent patches will add clone events support to the remote
protocol and gdbserver.
displaced_step_in_progress_thread becomes unused with this patch, but
a new use will reappear later in the series. To avoid deleting it and
readding it back, this patch marks it with attribute unused, and the
latter patch removes the attribute again. We need to do this because
the function is static, and with no callers, the compiler would warn,
(error with -Werror), breaking the build.
This adds a new gdb.threads/stepi-over-clone.exp testcase, which
exercises stepping over a clone syscall, with displaced stepping vs
inline stepping, and all-stop vs non-stop. We already test stepping
over clone syscalls with gdb.base/step-over-syscall.exp, but this test
uses pthreads, while the other test uses raw clone, and this one is
more thorough. The testcase passes on native GNU/Linux, but fails
against GDBserver. GDBserver will be fixed by a later patch in the
series.
Co-authored-by: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830
Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
|
|
I noticed that on an Ubuntu 20.04 system, after a following patch
("Step over clone syscall w/ breakpoint,
TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp
was passing cleanly, but still, we'd end up with four new unexpected
GDB core dumps:
=== gdb Summary ===
# of unexpected core files 4
# of expected passes 48
That said patch is making the pre-existing
gdb.threads/step-over-exec.exp testcase (almost silently) expose a
latent problem in gdb/linux-nat.c, resulting in a GDB crash when:
#1 - a non-leader thread execs
#2 - the post-exec program stops somewhere
#3 - you kill the inferior
Instead of #3 directly, the testcase just returns, which ends up in
gdb_exit, tearing down GDB, which kills the inferior, and is thus
equivalent to #3 above.
Vis (after said patch is applied):
$ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true
...
(top-gdb) r
...
(gdb) b main
...
(gdb) r
...
Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69
69 argv0 = argv[0];
(gdb) c
Continuing.
[New Thread 0x7ffff7d89700 (LWP 2506975)]
Other going in exec.
Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28
28 foo ();
(gdb) k
...
Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
393 return m_suspend.waitstatus_pending_p;
(top-gdb) bt
#0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
#1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345
#2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564
#3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284
#4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278
#5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247
#6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864
#7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590
#8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911
...
The root of the problem is that when a non-leader LWP execs, it just
changes its tid to the tgid, replacing the pre-exec leader thread,
becoming the new leader. There's no thread exit event for the execing
thread. It's as if the old pre-exec LWP vanishes without trace. The
ptrace man page says:
"PTRACE_O_TRACEEXEC (since Linux 2.5.46)
Stop the tracee at the next execve(2). A waitpid(2) by the
tracer will return a status value such that
status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))
If the execing thread is not a thread group leader, the thread
ID is reset to thread group leader's ID before this stop.
Since Linux 3.0, the former thread ID can be retrieved with
PTRACE_GETEVENTMSG."
When the core of GDB processes an exec events, it deletes all the
threads of the inferior. But, that is too late -- deleting the thread
does not delete the corresponding LWP, so we end leaving the pre-exec
non-leader LWP stale in the LWP list. That's what leads to the crash
above -- linux_nat_target::kill iterates over all LWPs, and after the
patch in question, that code will look for the corresponding
thread_info for each LWP. For the pre-exec non-leader LWP still
listed, won't find one.
This patch fixes it, by deleting the pre-exec non-leader LWP (and
thread) from the LWP/thread lists as soon as we get an exec event out
of ptrace.
GDBserver does not need an equivalent fix, because it is already doing
this, as side effect of mourning the pre-exec process, in
gdbserver/linux-low.cc:
else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events)
{
...
/* Delete the execing process and all its threads. */
mourn (proc);
switch_to_thread (nullptr);
The crash with gdb.threads/step-over-exec.exp is not observable on
newer systems, which postdate the glibc change to move "libpthread.so"
internals to "libc.so.6", because right after the exec, GDB traps a
load event for "libc.so.6", which leads to GDB trying to open
libthread_db for the post-exec inferior, and, on such systems that
succeeds. When we load libthread_db, we call
linux_stop_and_wait_all_lwps, which, as the name suggests, stops all
lwps, and then waits to see their stops. While doing this, GDB
detects that the pre-exec stale LWP is gone, and deletes it.
If we use "catch exec" to stop right at the exec before the
"libc.so.6" load event ever happens, and issue "kill" right there,
then GDB crashes on newer systems as well. So instead of tweaking
gdb.threads/step-over-exec.exp to cover the fix, add a new
gdb.threads/threads-after-exec.exp testcase that uses "catch exec".
The test also uses the new "maint info linux-lwps" command if testing
on Linux native, which also exposes the stale LWP problem with an
unfixed GDB.
Also tweak a comment in infrun.c:follow_exec referring to how
linux-nat.c used to behave, as it would become stale otherwise.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
|
|
I noticed that if GDB is using a remote or extended-remote target,
then, if an inferior call caused a new thread to appear, or for an
existing thread to exit, then these events are not reported to the
user.
The problem is that for these targets GDB relies on a call to
update_thread_list to learn about changes to the inferior's thread
list.
If GDB doesn't pass through the normal stop code then GDB will not
call update_thread_list, and so will not report changes in the thread
list.
This commit adds an additional update_thread_list call, after which
thread events are correctly reported.
|
|
I noticed that sometimes the value returned by $_inferior_thread_count
can become out of sync with the actual thread count of the inferior,
and will disagree with the number of threads reported by 'info
threads'. This commit fixes this issue.
The cause of the problem is that 'info threads' includes a call to
update_thread_list, this can be seen in print_thread_info_1 in
thread.c, while $_inferior_thread_count doesn't include a similar
call, see the function inferior_thread_count_make_value also in
thread.c.
Of course, this is only a problem when GDB is running on a target that
relies on update_thread_list calls to learn about new threads,
e.g. remote or extended-remote targets. Native targets generally
learn about new threads as soon as they appear and will not have this
problem.
I ran into this issue when writing a test for the next commit which
uses inferior function calls to add an remove threads from an
inferior. But for testing I've made use of non-stop mode and
asynchronous inferior execution; by reading the inferior state I can
know when a new thread has been created, at which point I can print
$_inferior_thread_count while the inferior is still running. This is
important, if I stop the inferior then GDB will pass through an
update_thread_list call in the normal stop code, which will
synchronise the thread list, after which $_inferior_thread_count will
report the correct value.
With this change in place $_inferior_thread_count is now correct.
|
|
Add a new command completer function for the disassemble command.
There are two things that this completion function changes. First,
after the previous commit, the new function calls skip_over_slash_fmt,
which means that hitting tab after entering a /OPT flag now inserts a
space ready to start typing the address to disassemble at:
(gdb) disassemble /r<TAB>
(gdb) disassemble /r <CURSOR>
But also, we now get symbol completion after a /OPT option set,
previously this would do nothing:
(gdb) disassemble /r mai<TAB>
But now:
(gdb) disassemble /r mai<TAB>
(gdb) disassemble /r main <CURSOR>
Which was my main motivation for working on this commit.
However, I have made a second change in the completion function.
Currently, the disassemble command calls the generic
location_completer function, however, the disassemble docs say:
Note that the 'disassemble' command's address arguments are specified
using expressions in your programming language (*note Expressions:
Expressions.), not location specs (*note Location Specifications::).
So, for example, if you want to disassemble function 'bar' in file
'foo.c', you must type 'disassemble 'foo.c'::bar' and not 'disassemble
foo.c:bar'.
And indeed, if I try:
(gdb) disassemble hello.c:main
No symbol "hello" in current context.
(gdb) disassemble hello.c::main
No symbol "hello" in current context.
(gdb) disassemble 'hello.c'::main
Dump of assembler code for function main:
... snip ...
But, if I do this:
(gdb) disassemble hell<TAB>
(gdb) disassemble hello.c:<CURSOR>
which is a consequence of using the location_completer function. So
in this commit, after calling skip_over_slash_fmt, I forward the bulk
of the disassemble command completion to expression_completer. Now
when I try this:
(gdb) disassemble hell<TAB>
gives nothing, which I think is an improvement. There is one slight
disappointment, if I do:
(gdb) disassemble 'hell<TAB>
I still get nothing. I had hoped that this would expand to:
'hello.c':: but I guess this is a limitation of the current
expression_completer implementation, however, I don't think this is a
regression, the previous expansion was just wrong. Fixing
expression_completer is out of scope for this commit.
I've added some disassembler command completion tests, and also a test
that disassembling using 'FILE'::FUNC syntax works, as I don't think
that is tested anywhere.
|
|
The disassembler gained a new /b flag in this commit:
commit d4ce49b7ac077a9882d6a5e689e260300045ca88
Date: Tue Jun 21 20:23:35 2022 +0100
gdb: disassembler opcode display formatting
The /b and /r flags result in the instruction opcodes displayed in
different formats, so it's not possible to have both at the same
time. Currently the /b flag overrides the /r flag.
We have a similar situation with the /m and /s flags, but here, if the
user tries to use both flags then they will get an error.
I think the error is clearer, so in this commit I propose that we add
an error if /r and /b are both used.
Obviously this change breaks backwards compatibility. I don't have a
compelling argument for why we should make the change beyond my
feeling that it was a mistake not to add this error from the start,
and that the new behaviour is better.
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
|
|
In AIX unused or constant variables are collected as garbage by the linker and in the dwarf dump
an address with all f's in hexadecimal are assigned. Hence the testcase fails with many failures stating
it cannot access memory.
This patch is a small change to get it working in AIX as well.
|
|
Recently added test-case gdb.dwarf2/dw2-gas-workaround.exp:
- passes when gdb is configured using $(cd ../src; pwd)/configure, but
- fails when using ../src/configure.
Fix this by making the matching more precise:
...
- -re -wrap "$objdir.*" {
+ -re -wrap "name_for_id = $objdir/$srcfile\r\n.*" {
...
such that we only fail on the line:
...
[symtab-create] start_subfile: name = dw2-lines.c, name_for_id = \
/data/vries/gdb/leap-15-4/build/gdb/testsuite/dw2-lines.c^M
...
Reported-By: Carl Love <cel@us.ibm.com>
|
|
While working on background reading of DWARF, I came across the
DWZ-reading code. This code can query the user (via the debuginfod
support) -- something that cannot be done off the main thread.
Looking into it, I realized that this code can be run much earlier,
avoiding this problem. Digging a bit deeper, I also found a
discrepancy here between how the DWARF reader works in "readnow" mode
as compared to the normal modes.
This patch cleans this up by trying to read the DWZ file earlier, and
also by having the DWARF reader convert any exception here into a
warning. This unifies the various cases, but also makes it so that
errors do not prevent gdb from continuing on to the extent possible.
Regression tested on x86-64 Fedora 38.
|
|
When running test-case gdb.tui/tui-layout-asm-short-prog.exp on AlmaLinux 9.2
ppc64le, I run into:
...
FAIL: gdb.tui/tui-layout-asm-short-prog.exp: check asm box contents
...
The problem is that we get:
...
7 [ No Assembly Available ]
...
because tui_get_begin_asm_address doesn't succeed.
In more detail, tui_get_begin_asm_address calls:
...
find_line_pc (sal.symtab, sal.line, &addr);
...
with:
...
(gdb) p *sal.symtab
$5 = {next = 0x130393c0, m_compunit = 0x130392f0, m_linetable = 0x0,
filename = "tui-layout-asm-short-prog.S",
filename_for_id = "$gdb/build/gdb/testsuite/tui-layout-asm-short-prog.S",
m_language = language_asm, fullname = 0x0}
(gdb) p sal.line
$6 = 1
...
The problem is the filename_for_id which is the source file prefixed with the
compilation dir rather than the source dir.
This is due to faulty debug info generated by gas, PR28629:
...
<1a> DW_AT_name : tui-layout-asm-short-prog.S
<1e> DW_AT_comp_dir : $gdb/build/gdb/testsuite
<22> DW_AT_producer : GNU AS 2.35.2
...
The DW_AT_name is relative, and it's relative to the DW_AT_comp_dir entry,
making the effective name $gdb/build/gdb/testsuite/tui-layout-asm-short-prog.S.
The bug is fixed starting version 2.38, where we get instead:
...
<1a> DW_AT_name :
$gdb/src/gdb/testsuite/gdb.tui/tui-layout-asm-short-prog.S
<1e> DW_AT_comp_dir : $gdb/build/gdb/testsuite
<22> DW_AT_producer : GNU AS 2.38
...
Work around the faulty debug info by constructing the filename_for_id using
the second directory from the directory table in the .debug_line header:
...
The Directory Table (offset 0x22, lines 2, columns 1):
Entry Name
0 $gdb/build/gdb/testsuite
1 $gdb/src/gdb/testsuite/gdb.tui
...
Note that the used gas contains a backport of commit 3417bfca676 ("GAS:
DWARF-5: Ensure that the 0'th entry in the directory table contains the
current working directory."), because directory 0 is correct. With the
unpatched 2.35.2 release the directory 0 entry is incorrect: it's a copy of
entry 1.
Add a dwarf assembly test-case that reflects the debug info as generated by
unpatched gas 2.35.2.
Tested on x86_64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
This patch implements the DAP setVariable request.
setVariable is a bit odd in that it specifies the variable to modify
by passing in the variable's container and the name of the variable.
This approach can't handle variable shadowing (there are a couple of
open DAP bugs on this topic), so this patch renames duplicates to
avoid the problem.
|
|
Add a gdb.Value.bytes attribute. This attribute contains the bytes of
the value (assuming the complete bytes of the value are available).
If the bytes of the gdb.Value are not available then accessing this
attribute raises an exception.
The bytes object returned from gdb.Value.bytes is cached within GDB so
that the same bytes object is returned each time. The bytes object is
created on-demand though to reduce unnecessary work.
For some values we can of course obtain the same information by
reading inferior memory based on gdb.Value.address and
gdb.Value.type.sizeof, however, not every value is in memory, so we
don't always have an address.
The gdb.Value.bytes attribute will convert any value to a bytes
object, so long as the contents are available. The value can be one
created purely in Python code, the value could be in a register,
or (of course) the value could be in memory.
The Value.bytes attribute can also be assigned too. Assigning to this
attribute is similar to calling Value.assign, the value of the
underlying value is updated within the inferior. The value assigned
to Value.bytes must be a buffer which contains exactly the correct
number of bytes (i.e. unlike value creation, we don't allow oversized
buffers).
To support this assignment like behaviour I've factored out the core
of valpy_assign. I've also updated convert_buffer_and_type_to_value
so that it can (for my use case) check the exact buffer length.
The restrictions for when the Value.bytes can or cannot be written too
are exactly the same as for Value.assign.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=13267
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Approved-By: Tom Tromey <tom@tromey.com>
|
|
Overview
========
Consider the following situation, GDB is in non-stop mode, the main
thread is running while a second thread is stopped. The user has the
second thread selected as the current thread and asks GDB to detach.
At the exact moment of detach the main thread exits.
This situation currently causes crashes, assertion failures, and
unexpected errors to be reported from GDB for both native and remote
targets.
This commit addresses this situation for native and remote targets.
There are a number of different fixes, but all are required in order
to get this functionality working correct for native and remote
targets.
Native Linux Target
===================
For the native Linux target, detaching is handled in the function
linux_nat_target::detach. In here we call stop_wait_callback for each
thread, and it is this callback that will spot that the main thread
has exited.
GDB then detaches from everything except the main thread by calling
detach_callback.
After this the first problem is this assert:
/* Only the initial process should be left right now. */
gdb_assert (num_lwps (pid) == 1);
The num_lwps call will return 0 as the main thread has exited and all
of the other threads have now been detached. I fix this by changing
the assert to allow for 0 or 1 lwps at this point. As the 0 case can
only happen in non-stop mode, the assert becomes:
gdb_assert (num_lwps (pid) == 1
|| (target_is_non_stop_p () && num_lwps (pid) == 0));
The next problem is that we do:
main_lwp = find_lwp_pid (ptid_t (pid));
and then proceed assuming that main_lwp is not nullptr. In the case
that the main thread has exited though, main_lwp will be nullptr.
However, we only need main_lwp so that GDB can detach from the
thread. If the main thread has exited, and GDB has already detached
from every other thread, then GDB has finished detaching, GDB can skip
the calls that try to detach from the main thread, and then tell the
user that the detach was a success.
For Remote Targets
==================
On remote targets there are two problems.
First is that when the exit occurs during the early phase of the
detach, we see the stop notification arrive while GDB is removing the
breakpoints ahead of the detach. The 'set debug remote on' trace
looks like this:
[remote] Sending packet: $z0,7f1648fe0241,1#35
[remote] Notification received: Stop:W0;process:2a0ac8
# At this point an unpatched gdbserver segfaults, and the connection
# is broken. A patched gdbserver continues as below...
[remote] Packet received: E01
[remote] Sending packet: $z0,7f1648ff00a8,1#68
[remote] Packet received: E01
[remote] Sending packet: $z0,7f1648ff132f,1#6b
[remote] Packet received: E01
[remote] Sending packet: $D;2a0ac8#3e
[remote] Packet received: E01
I was originally running into Segmentation Faults, from within
gdbserver/mem-break.cc, in the function find_gdb_breakpoint. This
function calls current_process() and then dereferences the result to
find the breakpoint list.
However, in our case, the current process has already exited, and so
the current_process() call returns nullptr. At the point of failure,
the gdbserver backtrace looks like this:
#0 0x00000000004190e4 in find_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:982
#1 0x000000000041930d in delete_gdb_breakpoint (z_type=48 '0', addr=4198762, kind=1) at ../../src/gdbserver/mem-break.cc:1093
#2 0x000000000042d8db in process_serial_event () at ../../src/gdbserver/server.cc:4372
#3 0x000000000042dcab in handle_serial_event (err=0, client_data=0x0) at ../../src/gdbserver/server.cc:4498
...
The problem is that, as a result non-stop being on, the process
exiting is only reported back to GDB after the request to remove a
breakpoint has been sent. Clearly gdbserver can't actually remove
this breakpoint -- the process has already exited -- so I think the
best solution is for gdbserver just to report an error, which is what
I've done.
The second problem I ran into was on the gdb side, as the process has
already exited, but GDB has not yet acknowledged the exit event, the
detach -- the 'D' packet in the above trace -- fails. This was being
reported to the user with a 'Can't detach process' error. As the test
actually calls detach from Python code, this error was then becoming a
Python exception.
Though clearly the detach has returned an error, and so, maybe, having
GDB throw an error would be fine, I think in this case, there's a good
argument that the remote error can be ignored -- if GDB tries to
detach and gets back an error, and if there's a pending exit event for
the pid we tried to detach, then just ignore the error and pretend the
detach worked fine.
We could possibly check for a pending exit event before sending the
detach packet, however, I believe that it might be possible (in
non-stop mode) for the stop notification to arrive after the detach is
sent, but before gdbserver has started processing the detach. In this
case we would still need to check for pending stop events after seeing
the detach fail, so I figure there's no point having two checks -- we
just send the detach request, and if it fails, check to see if the
process has already exited.
Testing
=======
In order to test this issue I needed to ensure that the exit event
arrives at the same time as the detach call. The window of
opportunity for getting the exit to arrive is so small I've never
managed to trigger this in real use -- I originally spotted this issue
while working on another patch, which did manage to trigger this
issue.
However, if we trigger both the exit and the detach from a single
Python function then we never return to GDB's event loop, as such GDB
never processes the exit event, and so the first time GDB gets a
chance to see the exit is during the detach call. And so that is the
approach I've taken for testing this patch.
Tested-By: Kevin Buettner <kevinb@redhat.com>
Approved-By: Kevin Buettner <kevinb@redhat.com>
|
|
If we make writing an index-cache entry very slow by doing this in
index_cache::store:
...
try
{
+ sleep (15);
index_cache_debug ("writing index cache for objfile %s",
bfd_get_filename (per_bfd->obfd));
...
we run into:
...
FAIL: gdb.dwarf2/per-bfd-sharing.exp: \
couldn't remove files in temporary cache dir
...
The FAIL happens because there is no index-cache entry in the cache dir.
The problem is that gdb is killed (by gdb_exit) before the index-cache entry
is written.
Fix this by using "maint wait-for-index-cache".
Tested on x86_64-linux.
PR testsuite/30528
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30528
|
|
Clang doesn't use CFA information for variable locations. This makes it
so software breakpoints get a false hit when rbp gets popped, causing
a FAIL in gdb.python/py-watchpoint.exp. Since this is nothing wrong with
GDB itself, add an xfail to reduce noise.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
The test gdb.python/py-explore-cc.exp was showing one unexpected
failure. This was due to how clang mapped instructions to lines,
resulting in the inferior seemingly stopping at a different location.
This patch adds a nop line in the relevant location so we don't need to
add XFAILs for existing clang releases, if this gets solved in future
versions.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
When printing a value, I think the history reference -- the "$1" in
the output -- should be styled using the "variable" style. This patch
implements this.
|
|
multi-inferior-gpu.exp
The gdb.rocm/multi-inferior-gpu.exp testcase uses a "continue $thread"
command, but this is incorrect. If "continue" is given an argument, it
sets the ignore count of the breakpoint the thread stopped at.
For this testcase it does not really matter since the breakpoint is not
meant to be hit anymore, so whatever the ignore count is won't influence
the outcome of the test. It is worth fixing nevertheless.
Change-Id: I0eb674d5529cdeb9e808b74870a29b6077265737
Approved-By: Simon Marchi <simon.marchi@efficios.com>
|
|
Functions of the hip runtime returning a hipError_t can be marked
nodiscard depending on the configuration[1] (when compiled with C++17).
This patch makes sure that we always check the value returned by
hipDeviceSynchronize and friends, and print an error message when
appropriate. This avoid a wall of warnings when running the testsuite
if the compiler defaults to using C++17.
It is always a good practice to check the return values anyway.
[1] https://github.com/ROCm-Developer-Tools/HIP/blob/docs/5.7.1/include/hip/hip_runtime_api.h#L203-L218
Change-Id: I2a819a8ac45f4bcf814efe9a2ff12c6a7ad22f97
Approved-By: Simon Marchi <simon.marchi@efficios.com>
|
|
When running test-case gdb.base/jit-bfd-name.exp, I run into:
...
ERROR: tcl error sourcing gdb/testsuite/gdb.base/jit-bfd-name.exp.
ERROR: can't read "start": no such variable
...
The problem is that commit c96ceed9dce ("gdb: include the end address in
in-memory bfd filenames") introduced a use of variable start, but not a
definition.
Fix this by adding the missing definition.
Tested on x86_64-linux.
|
|
Commit
66984afd29e gdb: include the base address in in-memory bfd filenames
added the base address to in-memory bfd filenames. Also add the end
address to allow dumping the in-memory bfd using the 'dump memory'
command.
|
|
A pretty-printer's 'children' method may return values other than a
gdb.Value -- it may return any value that can be converted to a
gdb.Value.
I noticed that this case did not work for DAP. This patch fixes the
problem.
|
|
Andry pointed out that the DAP code did not properly handle
gdb.LazyString results from a pretty-printer, yielding:
TypeError: Object of type LazyString is not JSON serializable
This patch fixes the problem, partly with a small patch in varref.py,
but mainly by implementing tp_str for LazyString.
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
|
|
Andry noticed that given a DAP setExpression request, where the
expression to set is a register, DAP will return the wrong value -- it
will return the old value, not the updated one.
This happens because gdb.Value.assign (which was recently added for
DAP) does not update the value.
In this patch, I chose to have the assign method update the Value
in-place. It's also possible to have it return a new value, but this
didn't seem very useful to me.
|
|
Andry Ogorodnik, a co-worker, noticed that multiple "scopes" requests
with the same frame would yield different variableReference values in
the response.
This patch adds a regression test for this, and adds a scope cache in
scopes.py, ensuring that multiple identical requests will get the same
response.
Tested-By: Alexandra Petlanova Hajkova <ahajkova@redhat.com>
|
|
When using glibc debuginfo generated with gas 2.39, we run into PR gas/29517:
...
$ gdb -q -batch a.out -ex start -ex "p (char *)strstr (\"haha\", \"ah\")"
Temporary breakpoint 1 at 0x40051b: file hello.c, line 6.
Temporary breakpoint 1, main () at hello.c:6
6 printf ("hello\n");
Invalid cast.
...
while without glibc debuginfo installed we get the expected result:
...
$n = 0x7ffff7daa1b1 "aha"
...
and likewise with glibc debuginfo generated with gas 2.40.
The strstr ifunc resolves to __strstr_sse2_unaligned. The problem is that gas
generates dwarf that states that the return type is void:
...
<1><3e1e58>: Abbrev Number: 2 (DW_TAG_subprogram)
<3e1e59> DW_AT_name : __strstr_sse2_unaligned
<3e1e5d> DW_AT_external : 1
<3e1e5e> DW_AT_low_pc : 0xbbd2e
<3e1e66> DW_AT_high_pc : 0xbc1c3
...
while the return type should be a DW_TAG_unspecified_type, as is the case
with gas 2.40.
We can still use the workaround of casting to another function type for both
__strstr_sse2_unaligned:
...
(gdb) p ((char * (*) (const char *, const char *))__strstr_sse2_unaligned) \
("haha", "ah")
$n = 0x7ffff7daa211 "aha"
...
and strstr (which requires using *strstr to dereference the ifunc before we
cast):
...
gdb) p ((char * (*) (const char *, const char *))*strstr) ("haha", "ah")
$n = 0x7ffff7daa251 "aha"
...
but that's a bit cumbersome to use.
Work around this in the dwarf reader, such that we have instead:
...
(gdb) p (char *)strstr ("haha", "ah")
$n = 0x7ffff7daa1b1 "aha"
...
This also requires fixing producer_is_gcc to stop returning true for
producer "GNU AS 2.39.0".
Tested on x86_64-linux.
Approved-By: Andrew Burgess <aburgess@redhat.com>
PR symtab/30911
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30911
|
|
Since commit 1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679, we have an assertion in
displaced_step_buffers::copy_insn_closure_by_addr that makes sure a closure
is available whenever we have a match between the provided address argument and
the buffer address.
That is fine, but the report in PR30872 shows this assertion triggering when
it really shouldn't. After some investigation, here's what I found out.
The 32-bit Arm architecture is the only one that calls
gdbarch_displaced_step_copy_insn_closure_by_addr directly, and that's because
32-bit Arm needs to figure out the thumb state of the original instruction
that we displaced-stepped through the displaced-step buffer.
Before the assertion was put in place by commit
1e5ccb9c5ff4fd8ade4a8694676f99f4abf2d679, there was the possibility of
getting nullptr back, which meant we were not doing a displaced-stepping
operation.
Now, with the assertion in place, this is running into issues.
It looks like displaced_step_buffers::copy_insn_closure_by_addr is
being used to return a couple different answers depending on the
state we're in:
1 - If we are actively displaced-stepping, then copy_insn_closure_by_addr
is supposed to return a valid closure for us, so we can determine the
thumb mode.
2 - If we are not actively displaced-stepping, then copy_insn_closure_by_addr
should return nullptr to signal that there isn't any displaced-step buffers
in use, because we don't have a valid closure (but we should always have
this).
Since the displaced-step buffers are always allocated, but not always used,
that means the buffers will always contain data. In particular, the buffer
addr field cannot be used to determine if the buffer is active or not.
For instance, we cannot set the buffer addr field to 0x0, as that can be a
valid PC in some cases.
My understanding is that the current_thread field should be a good candidate
to signal that a particular displaced-step buffer is active or not. If it is
nullptr, we have no threads using that buffer to displaced-step. Otherwise,
it is an active buffer in use by a particular thread.
The following fix modifies the displaced_step_buffers::copy_insn_closure_by_addr
function so we only attempt to return a closure if the buffer has an assigned
current_thread and if the buffer address matches the address argument.
Alternatively, I think we could use a function to answer the question of
whether we're actively displaced-stepping (so we have an active buffer) or
not.
I've also added a testcase that exercises the problem. It should reproduce
reliably on Arm, as that is the only architecture that faces this problem
at the moment.
Regression-tested on Ubuntu 20.04. OK?
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30872
Approved-By: Simon Marchi <simon.marchi@efficios.com>
|
|
Simon pointed out that the new file-then-restart.exp test fails with
the extended-remote target board.
The problem is that the test suite doesn't use gdb_file_cmd -- which
handles things like "set remote exec-file". This patch changes
gdb_file_cmd to make the "kill" command optional, and then switches
the test case to use it.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30933
Approved-By: Simon Marchi <simon.marchi@efficios.com>
|
|
This commit adds a new Python function, gdb.notify_mi, that can be used
to emit custom async notification to MI channel. This can be used, among
other things, to implement notifications about events MI does not support,
such as remote connection closed or register change.
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
This thread:
https://inbox.sourceware.org/gdb-patches/20231003195338.334948-1-thiago.bauermann@linaro.org/
pointed out that within gdb.base/maint.exp, some regexps within a
gdb_test_multiple were failing to match a complete line, while later
regexps within the gdb_test_multiple made use of the '^' anchor, and
so assumed that earlier lines had been completely matched and removed
from expect's buffer.
When testing with READ1 set this assumption was failing.
Fix this by extending the offending patterns with a trailing '\r\n'.
Tested-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
|
|
Now that the GDB 14 branch has been created,
this commit bumps the version number in gdb/version.in to
15.0.50.DATE-git
For the record, the GDB 14 branch was created
from commit 8f12a1a841cd0c447de7a5a0f134a0efece73588.
Also, as a result of the version bump, the following changes
have been made in gdb/testsuite:
* gdb.base/default.exp: Change $_gdb_major to 15.
|
|
On x86_64-linux, with test-case gdb.arch/i386-signal.exp I run into:
...
builtin_spawn -ignore SIGHUP gcc -fno-stack-protector i386-signal.c \
-fdiagnostics-color=never -fno-pie -g -no-pie -lm -o i386-signal^M
/tmp/cc2xydTG.s: Assembler messages:^M
/tmp/cc2xydTG.s:50: Error: operand size mismatch for `push'^M
compiler exited with status 1
output is:
/tmp/cc2xydTG.s: Assembler messages:^M
/tmp/cc2xydTG.s:50: Error: operand size mismatch for `push'^M
gdb compile failed, /tmp/cc2xydTG.s: Assembler messages:
/tmp/cc2xydTG.s:50: Error: operand size mismatch for `push'
UNTESTED: gdb.arch/i386-signal.exp: failed to compile
...
This is with gas 2.41, it compiles without problems with gas 2.40. Some more
strict checking was added in commit 5cc007751cd ("x86: further adjust
extend-to-32bit-address conditions"). This may or may not be a gas regression
( https://sourceware.org/pipermail/binutils/2023-October/129818.html ).
The offending bit is:
...
" push $sigframe\n"
...
which refers to a function:
...
" .globl sigframe\n"
"sigframe:\n"
...
The test-case passes with target board unix/-m32.
Make the test-case work by using pushq instead of push for the
is_amd64_regs_target case.
Tested on x86_64-linux, with target boards:
- unix/-m64 (is_amd64_regs_target == 1), and
- unix/-m32 (is_amd64_regs_target == 0),
PR testsuite/30928
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30928
|
|
I'm seeing a lot of variability in the failures of
gdb.threads/process-dies-while-detaching.exp on aarch64-linux. On this
platform, a problem yet to be investigated causes GDB to miss the _exit
breakpoint. What happens next is random because after missing that
breakpoint, GDB is out of sync with the inferior. This causes the tests
following that point in the testcase to fail in a random way.
In this scenario it's better to exit the testcase early to avoid random
results in the testsuite.
We are relying on gdb_continue_to_breakpoint to return the result of
gdb_test_multiple. This is already the case because in Tcl the return
value of a function is the return value of the last command it runs. But
change gdb_continue_to_breakpoint to explicitly return this value, to make
it clear this is the intended behaviour.
Tested on aarch64-linux.
Tested-By: Guinevere Larsen <blarsen@redhat.com>
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
The last few commits resolved the KFAILs in gdb.base/args.exp. With
those out of the way we can clean up this test script a little.
In this commit I have:
- Stopped passing 'nowarnings' flag when building the source file.
I see no reason why this source should issue a warning,
- Moved setup of GDBFLAGS into args_test proc, callers that passed a
newline needed a small tweak, and also the matching code needs
updating for newline handling, but I think this is nicer, the
argument lists are now given just once,
- Updated comment on args_test,
- Updated other comments.
There should be no change in what is tested after this commit.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
Similarly to how single quotes were mishandled, which was fixed two
commits ago, this commit fixes handling of newlines in arguments
passed to gdbserver.
We already had a test that covered this, gdb.base/args.exp, which,
when run with the native-extended-gdbserver board contained several
KFAIL covering this situation.
In this commit I remove the unnecessary, attempt to quote incoming
newlines within arguments, and do some minimal cleanup of the related
code. There is additional cleanup that can be done, but I'm leaving
that for the next commit.
Then I've removed the KFAIL from the test case, and performed some
minimal cleanup there too.
After this commit the gdb.base/args.exp is 100% passing with the
native-extended-gdbserver board file.
During review I was pointed to this older series:
https://inbox.sourceware.org/gdb-patches/20211022071933.3478427-1-m.weghorn@posteo.de/
which also includes this fix as part of a larger set of changes. I'm
giving a Co-Authored-By credit to the author of that original series.
I believe this smaller fix brings some benefits on its own, though the
original series does offer additional improvements. Once this is
merged I'll take a look at rebasing and resubmitting the original series.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27989
Co-Authored-By: Michael Weghorn <m.weghorn@posteo.de>
Approved-By: Tom Tromey <tom@tromey.com>
|
|
When I posted the previous patch for review Andreas Schwab pointed out
that passing a trailing empty argument also doesn't work.
The fix for this is in the same area of code as the previous patch,
but is sufficiently different that I felt it deserved a patch of its
own.
I noticed that passing arguments containing single quotes to gdbserver
didn't work correctly:
gdb -ex 'set sysroot' --args /tmp/show-args
Reading symbols from /tmp/show-args...
(gdb) target extended-remote | gdbserver --once --multi - /tmp/show-args
Remote debugging using | gdbserver --once --multi - /tmp/show-args
stdin/stdout redirected
Process /tmp/show-args created; pid = 176054
Remote debugging using stdio
Reading symbols from /lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
0x00007ffff7fd3110 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) set args abc ""
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /tmp/show-args \'
stdin/stdout redirected
Process /tmp/show-args created; pid = 176088
2 args are:
/tmp/show-args
abc
Done.
[Inferior 1 (process 176088) exited normally]
(gdb) target native
Done. Use the "run" command to start a process.
(gdb) run
Starting program: /tmp/show-args \'
2 args are:
/tmp/show-args
abc
Done.
[Inferior 1 (process 176095) exited normally]
(gdb) q
The 'shows-args' program used here just prints the arguments passed to
the inferior.
Notice that when starting the inferior using the extended-remote
target there is only a single argument 'abc', while when using the
native target there is a second argument, the blank line, representing
the empty argument.
The problem here is that the vRun packet coming from GDB looks like
this (I've removing the trailing checksum):
$vRun;PROGRAM_NAME;616263;
If we compare this to a packet with only a single argument and no
trailing empty argument:
$vRun;PROGRAM_NAME;616263
Notice the lack of the trailing ';' character here.
The problem is that gdbserver processes this string in a loop. At
each point we maintain a pointer to the character just after a ';',
and then we process everything up to either the next ';' character, or
to the end of the string.
We break out of this loop when the character we start with (in that
loop iteration) is the null-character. This means in the trailing
empty argument case, we abort the loop before doing anything with the
empty argument.
In this commit I've updated the loop, we now break out using a 'break'
statement at the end of the loop if the (sub-)string we just processed
was empty, with this change we now notice the trailing empty
argument.
I've updated the test case to cover this issue.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
I noticed that passing arguments containing single quotes to gdbserver
didn't work correctly:
gdb -ex 'set sysroot' --args /tmp/show-args
Reading symbols from /tmp/show-args...
(gdb) target extended-remote | gdbserver --once --multi - /tmp/show-args
Remote debugging using | gdbserver --once --multi - /tmp/show-args
stdin/stdout redirected
Process /tmp/show-args created; pid = 176054
Remote debugging using stdio
Reading symbols from /lib64/ld-linux-x86-64.so.2...
(No debugging symbols found in /lib64/ld-linux-x86-64.so.2)
0x00007ffff7fd3110 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) set args \'
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /tmp/show-args \'
stdin/stdout redirected
Process /tmp/show-args created; pid = 176088
2 args are:
/tmp/show-args
\'
Done.
[Inferior 1 (process 176088) exited normally]
(gdb) target native
Done. Use the "run" command to start a process.
(gdb) run
Starting program: /tmp/show-args \'
2 args are:
/tmp/show-args
'
Done.
[Inferior 1 (process 176095) exited normally]
(gdb) q
The 'shows-args' program used here just prints the arguments passed to
the inferior.
Notice that when starting the inferior using the extended-remote
target the second argument is "\'", while when running using native
target the argument is "'". The second of these is correct, the \'
used with the "set args" command is just to show GDB that the single
quote is not opening an argument string.
It turns out that the extra backslash is injected on the gdbserver
side when gdbserver processes the arguments that GDB passes it, the
code that does this was added as part of this much larger commit:
commit 2090129c36c7e582943b7d300968d19b46160d84
Date: Thu Dec 22 21:11:11 2016 -0500
Share fork_inferior et al with gdbserver
In this commit I propose removing the specific code that adds what I
believe is a stray backslash. I've extended an existing test to cover
this case, and I now see identical behaviour when using an
extended-remote target as with the native target.
This partially fixes PR gdb/27989, though there are still some issues
with newline handling which I'll address in a later commit.
During review I was pointed to this older series:
https://inbox.sourceware.org/gdb-patches/20211022071933.3478427-1-m.weghorn@posteo.de/
which also includes this fix as part of a larger set of changes. I'm
giving a Co-Authored-By credit to the author of that original series.
I believe this smaller fix brings some benefits on its own, though the
original series does offer additional improvements. Once this is
merged I'll take a look at rebasing and resubmitting the original series.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27989
Co-Authored-By: Michael Weghorn <m.weghorn@posteo.de>
Approved-By: Tom Tromey <tom@tromey.com>
|
|
When using a remote target, it is possible to tell GDB that the
executable to be debugged is located on the remote machine, like this:
(gdb) target extended-remote :54321
... snip ...
(gdb) file target:/tmp/hello.x
Reading /tmp/hello.x from remote target...
warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
Reading /tmp/hello.x from remote target...
Reading symbols from target:/tmp/hello.x...
(gdb)
So far so good. However, when we try to start the inferior we run
into a small problem:
(gdb) set remote exec-file /tmp/hello.x
(gdb) start
`target:/tmp/hello.x' has disappeared; keeping its symbols.
Temporary breakpoint 1 at 0x401198: file /tmp/hello.c, line 18.
Starting program: target:/tmp/hello.x
... snip ...
Temporary breakpoint 1, main () at /tmp/hello.c:18
18 printf ("Hello World\n");
(gdb)
Notice this line:
`target:/tmp/hello.x' has disappeared; keeping its symbols.
That's wrong, the executable hasn't been removed, GDB just doesn't
know how to check if the remote file has changed, and so falls back to
assuming that the file has been removed.
In this commit I update reread_symbols to use bfd_stat instead of
a direct stat call, this adds support for target: files, and fixes the
problem.
This change was proposed before in this commit:
https://inbox.sourceware.org/gdb-patches/20200114210956.25115-3-tromey@adacore.com/
However, that patch never got merged, and seemed to get stuck
discussing issues around gnulib stat vs system stat as used by BFD.
I didn't 100% understand the issues discussed in that thread, however,
I think the problem with the previous thread related to the changes in
gdb_bfd.c, rather than to the change in symfile.c. As such, I think
this change might be acceptable, my reasoning is:
- the objfile::mtime field is set by a call to bfd_get_mtime (see
objfiles.c), which calls bfd_stat under the hood. This will end
up using the system stat,
- In symfile.c we currently call stat directly, which will call the
gnulib stat, which, if I understand the above thread correctly,
might give a different result to the system stat in some cases,
- By switching to using bfd_stat in symfile.c we should now be
consistently calling the system stat.
There is another issue that came up during testing that this commit
fixes. Consider this GDB session:
$ gdb -q
(gdb) target extended-remote | ./gdbserver/gdbserver --multi --once -
Remote debugging using | ./gdbserver/gdbserver --multi --once -
Remote debugging using stdio
(gdb) file /tmp/hello.x
Reading symbols from /tmp/hello.x...
(gdb) set remote exec-file /tmp/hello.x
(gdb) start
... snip ...
(gdb) load
`system-supplied DSO at 0x7ffff7fcf000' has disappeared; keeping its symbols.
Loading section .interp, size 0x1c lma 0x4002a8
... snip ...
Start address 0x0000000000401050, load size 2004
Transfer rate: 326 KB/sec, 87 bytes/write.
Notice this line:
`system-supplied DSO at 0x7ffff7fcf000' has disappeared; keeping its symbols.
We actually see the same output, for the same reasons, when using a
native target, like this:
$ gdb -q
(gdb) file /tmp/hello.x
Reading symbols from /tmp/hello.x...
(gdb) start
... snip ...
(gdb) load
`system-supplied DSO at 0x7ffff7fcf000' has disappeared; keeping its symbols.
You can't do that when your target is `native'
(gdb)
In both cases this line appears because load_command (symfile.c) calls
reread_symbols, and reread_symbols loops over every currently loaded
objfile and tries to check if the file has changed on disk by calling
stat.
However, the `system-supplied DSO at 0x7ffff7fcf000' is an in-memory
BFD, the filename for this BFD is literally the string
'system-supplied DSO at 0x7ffff7fcf000'.
Before this commit GDB would try to use the system 'stat' call to stat
the file `system-supplied DSO at 0x7ffff7fcf000', which obviously
fails; there's no file with that name (usually). As a consequence of
the stat failing GDB prints the ' .... has disappeared ...' line.
Initially, all this commit did was switch from using 'stat' to using
'bfd_stat'. Calling bfd_stat on an in-memory BFD works just fine,
however, BFD just fills the 'struct stat' buffer with zeros (except
for the file size), see memory_bstat in bfd/bfdio.c.
However, there is a bit of a weirdness about in-memory BFDs. When
they are initially created the libbfd caches an mtime within the bfd
object, this is done in bfd_from_remote_memory (elfcode.h), the cached
mtime is the time at which the in-memory BFD is created.
What this means is that when GDB creates the in-memory BFD, and we
call bfd_get_mtime(), the value returned, which GDB caches within
objfile::mtime is the creation time of the in-memory BFD. But, when
this patch changes to use bfd_stat() we now get back 0, and so we
believe that the in-memory BFD has changed. This is a change in
behaviour.
To avoid this change in behaviour, in this commit, I propose that we
always skip in-memory BFDs in reread_symbols. This preserves the
behaviour from before this commit -- mostly.
As I'm not specifically checking for, and then skipping, in-memory
BFDs, we no longer see this line:
`system-supplied DSO at 0x7ffff7fcf000' has disappeared; keeping its symbols.
Which I think is an improvement.
Co-Authored-By: Tom Tromey <tromey@adacore.com>
Approved-By: Tom Tromey <tom@tromey.com>
|
|
This started with me running into this comment in symfile.c:
/* FIXME, should use print_sys_errmsg but it's not filtered. */
gdb_printf (_("`%ps' has disappeared; keeping its symbols.\n"),
styled_string (file_name_style.style (), filename));
In this particular case I think I disagree with the comment; I think
the output should be a warning rather than just a message printed to
gdb_stdout, I think when the executable, or some other objfile that is
currently being debugged, disappears from disk, this is likely an
unexpected situation, and worth warning the user about.
So, in theory, I could just call print_sys_errmsg and remove the
comment, but that would mean loosing the filename styling in the
output... so in the end I remove the comment and updated the code to
call warning.
But that got me looking at print_sys_errmsg and how it's used.
Currently the function takes a string and an errno, and prints, to
stderr, the string followed by the result of calling strerror on the
errno.
In some places the string passed to print_sys_errmsg is just a
filename, and this is used when something goes wrong. In these cases,
I think calling warning rather than gdb_printf to gdb_stderr, would be
better, and in fact, in a couple of places we manually print a
"warning" prefix, and then call print_sys_errmsg. And so, for these
users I have added a new function warning_filename_and_errno, which
takes a filename, which is printed with styling, and an errno, which
is passed through strerror and the resulting string printed. This new
function calls warning to print its output. I then updated some of
the print_sys_errmsg users to use this new function.
Some other users of print_sys_errmsg are also emitting what is clearly
a warning, however, the string being passed in is more than just a
filename, so the new warning_filename_and_errno function can't be
used, it would style the whole string. For these users I have
switched to calling warning directly, this allows me to style the
warning message correctly.
Finally, in inflow.c there is one last call to print_sys_errmsg, in
this case I just inlined the definition of print_sys_errmsg. This is
a really weird case, as after printing this message GDB just does a
hard exit. This is pretty old code, dating back to the initial GDB
import, I guess it should be updated to call error() maybe, but I'm
reluctant to make this change as part of this commit, just in case
there's some reason why we can't throw an error at this point.
With that done there are now no users of print_sys_errmsg, and so the
old function can be removed.
While I was doing all of the above I added some additional filename
styling in soure.c, this is in an else block where the if contained
the print_sys_errmsg call, so these felt related.
And finally, while I was updating the uses of print_sys_errmsg in
procfs.c, I noticed that we used a static errmsg buffer to format some
error strings. As the above changes got rid of one of the users of
errmsg I also removed the other two users, and the static buffer.
There were a couple of tests that depended on the existing output
message format that needed updating. In one case we gained an extra
'warning: ' prefix, and in the other 'Warning: ' becomes 'warning: ',
I think in both cases the new output is an improvement.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
Some gdb.base/fileio.exp tests expect the inferior to not have write
access to some files. If the test is being run as root, this is never
possible. This commit adds a way to identify if the user is root and
xfails the tests that expect no write access.
Approved-By: Tom de Vries <tdevries@suse.de>
|
|
Reusing the SME tests, this patch introduces additional tests to exercise
reading/writing ZT0, availability of the register set, signal context reading
for ZT0 and also core file generation.
Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
|
|
Add 5 SVE/SME tests to exercise all the new features like reading/writing
registers, pseudo-registers, signal frames and core files.
- Sanity check for SME: Gives a brief smoke test to make sure the most basic
of features are working correctly.
- ZA unavailability tests: Validates the behavior/content of the ZA register
is correct when no payload is available. It also exercises changing the
vector lengths.
- ZA availability tests: These tests exercise reading/writing to all the
possible ZA pseudo-registers, and validates the state is correct.
- Core file tests: Validates that core file reading and writing works
correctly and that all state dumped/loaded is sane. This is exercised for
both Linux Kernel core files and gcore core files.
- Signal frame tests: Validates the correct restoration of SME/SVE/FPSIMD
values across signal frames.
Since some of these tests are very lengthy and take a little while to run
(under QEMU at the moment), I decided to parallelize them into smaller
chunks so we can throw some more CPU power at them so they run faster.
I'd still like to add a few more tests to give the testsuite more coverage
in the areas of SME/SVE. Hopefully in the near future that will happen.
Just a reminder that these SME tests are currently unsupported when gdb is
connected to a remote target. That's because the RSP doesn't support
communicating changes in vector lenghts mid-execution, so gdb will always
get wrong state from the remote target.
Co-Authored-By: Ezra Sitorus <ezra.sitorus@arm.com>
Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
|
|
Reformat a Python file with black after this commit:
commit 59912fb2d22f8a4bb0862487f12a5cc65b6a013f
Date: Tue Sep 19 11:45:36 2023 +0100
gdb: add Python events for program space addition and removal
There should be no functional change with this commit.
|
|
Yesterday I pushed a patch to fix a small oversight in the DAP code
that caused an instructionReference to be an array instead of a
string.
This patch adds a test case for that regression. This required
exposing the TON form of the response -- something I mentioned might
be necessary when this code was changed to return just the Tcl form.
I tested this by backing out yesterday's bug fix and verifying that a
failure is seen.
|
|
Following on Tom de Vries' work in the other array-printers, this
patch changes val_print_packed_array_elements to also avoid allocating
too many values when printing an Ada packed array.
|
|
gdb.base/jit-reader-simple.exp regex
I see this failure:
FAIL: gdb.base/jit-reader-simple.exp: standalone: change addr: initial run: maint info breakpoints shows jit breakpoint
The jit breakpoint expected by the test is there, it's just that the
number of spaces doesn't match what the test expects, after "jit
events":
-2 jit events keep y 0x0000555555555119 <__jit_debug_register_code> inf 1
Fix that by relaxing the regex a bit.
Change-Id: Ia3b04e6d5978399d940fd1a590f95f15275ca7ac
|
|
When running test-case gdb.ada/import.exp with gcc 7, most test fail:
...
FAIL: gdb.ada/import.exp: print imported_var_ada
FAIL: gdb.ada/import.exp: print local_imported_var
FAIL: gdb.ada/import.exp: print pkg.imported_var_ada
FAIL: gdb.ada/import.exp: print pkg.exported_var_ada
FAIL: gdb.ada/import.exp: print exported_var_ada
FAIL: gdb.ada/import.exp: gdb_breakpoint: set breakpoint at pkg.imported_func_ada
FAIL: gdb.ada/import.exp: gdb_breakpoint: set breakpoint at imported_func_ada
FAIL: gdb.ada/import.exp: gdb_breakpoint: set breakpoint at local_imported_func
...
When running with gcc 8 or 9, only 2 tests fail:
...
FAIL: gdb.ada/import.exp: gdb_breakpoint: set breakpoint at pkg.imported_func_ada
FAIL: gdb.ada/import.exp: gdb_breakpoint: set breakpoint at imported_func_ada
...
The test-case passes fully with gcc 10, 11, 12 and 13.
Debug info for pragma import seems to not have been supported before gcc 8, so
require that version.
The two FAILs with gcc 8 and 9 seem to be due to problems in debug info. Add
an xfail for these.
Tested on x86_64-linux.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
With gcc 13.2.1, I run into a cluster of fails:
...
FAIL: gdb.ada/str_binop_equal.exp: print my_str = "ABCD"
FAIL: gdb.ada/widewide.exp: print my_wws = " helo"
FAIL: gdb.ada/widewide.exp: print my_ws = "wide"
...
The problem is that the debug info contains information about function
ada.strings.maps."=", and gdb uses it to implement the comparison.
The function is supposed to compare two char sets, not strings, so gdb
shouldn't use it. This is PR ada/30908.
I don't see the same problem with gcc 7.5.0, because the exec doesn't contain
the debug info for the function, because the corresponding object is not
linked in. Adter adding "with Ada.Strings.Maps; use Ada.Strings.Maps;" to
gdb.ada/widewide/foo.adb I run into the same problem with gcc 7.5.0.
Add KFAILs for the PR.
Tested on x86_64-linux:
- openSUSE Leap 15.4 (using gcc 7.5.0), and
- openSUSE Tumbleweed (using gcc 13.2.1).
Approved-By: Tom Tromey <tom@tromey.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30908
|
|
Initially I just wanted a Python event for when GDB removes a program
space, I'm writing a Python extension that caches information for each
program space, and need to know when I should discard entries for a
particular program space.
But, it seemed easy enough to also add an event for when GDB adds a
new program space, so I went ahead and added both new events.
Of course, we don't currently have an observable for program space
addition or removal, so I first needed to add these. After that it's
pretty simple to add two new Python events and have these trigger.
The two new event registries are:
events.new_progspace
events.free_progspace
These emit NewProgspaceEvent and FreeProgspaceEvent objects
respectively, each of these new event types has a 'progspace'
attribute that contains the relevant gdb.Progspace object.
There's a couple of things to be mindful of.
First, it is not possible to catch the NewProgspaceEvent for the very
first program space, the one that is created when GDB first starts, as
this program space is created before any Python scripts are sourced.
In order to allow this event to be caught we would need to defer
creating the first program space, and as a consequence the first
inferior, until some later time. But, existing scripts could easily
depend on there being an initial inferior, so I really don't think we
should change that -- and so, we end up with the consequence that we
can't catch the event for the first program space.
The second, I think minor, issue, is that GDB doesn't clean up its
program spaces upon exit -- or at least, they are not cleaned up
before Python is shut down. As a result, any program spaces in use at
the time GDB exits don't generate a FreeProgspaceEvent. I'm not
particularly worried about this for my use case, I'm using the event
to ensure that a cache doesn't hold stale entries within a single GDB
session. It's also easy enough to add a Python at-exit callback which
can do any final cleanup if needed.
Finally, when testing, I did hit a slightly weird issue with some of
the remote boards (e.g. remote-stdio-gdbserver). As a consequence of
this issue I see some output like this in the gdb.log:
(gdb) PASS: gdb.python/py-progspace-events.exp: inferior 1
step
FreeProgspaceEvent: <gdb.Progspace object at 0x7fb7e1d19c10>
warning: cannot close "target:/lib64/libm.so.6": Cannot execute this command while the target is running.
Use the "interrupt" command to stop the target
and then try again.
warning: cannot close "target:/lib64/libc.so.6": Cannot execute this command while the target is running.
Use the "interrupt" command to stop the target
and then try again.
warning: cannot close "target:/lib64/ld-linux-x86-64.so.2": Cannot execute this command while the target is running.
Use the "interrupt" command to stop the target
and then try again.
do_parent_stuff () at py-progspace-events.c:41
41 ++global_var;
(gdb) PASS: gdb.python/py-progspace-events.exp: step
The 'FreeProgspaceEvent ...' line is expected, that's my test Python
extension logging the event. What isn't expected are all the blocks
like:
warning: cannot close "target:/lib64/libm.so.6": Cannot execute this command while the target is running.
Use the "interrupt" command to stop the target
and then try again.
It turns out that this has nothing to do with my changes, this is just
a consequence of reading files over the remote protocol. The test
forks a child process which GDB stays attached too. When the child
exits, GDB cleans up by calling prune_inferiors, which in turn can
result in GDB trying to close some files that are open because of the
inferior being deleted.
If the prune_inferiors call occurs when the remote target is
running (and in non-async mode) then GDB will try to send a fileio
packet while the remote target is waiting for a stop reply, and the
remote target will throw an error, see remote_target::putpkt_binary in
remote.c for details.
I'm going to look at fixing this, but, as I said, this is nothing to
do with this change, I just mention it because I ended up needing to
account for these warning messages in one of my tests, and it all
looks a bit weird.
Approved-By: Tom Tromey <tom@tromey.com>
Reviewed-By: Eli Zaretskii <eliz@gnu.org>
|