Age | Commit message (Collapse) | Author | Files | Lines |
|
In a record session, when we move backward, GDB switches from normal
execution to simulation. Moving forward again, the emulation continues
until the end of the reverse history. When the end is reached, the
execution stops, and a warning message is shown. This message has been
modified to indicate that the forward emulation has reached the end, but
the execution can continue as normal, and the recording will also continue.
Before this patch, the warning message shown in that case was the same as
in the reverse case. This meant that when the end of history was reached in
either backward or forward emulation, the same message was displayed:
"No more reverse-execution history."
This message has changed for these two cases. Backward emulation:
"Reached end of recorded history; stopping.
Backward execution from here not possible."
Forward emulation:
"Reached end of recorded history; stopping.
Following forward execution will be added to history."
The reason for this change is that the initial message was deceiving, for
the forward case, making the user believe that forward debugging could not
continue.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31224
Reviewed-By: Markus T. Metzger <markus.t.metzger@intel.com> (btrace)
Approved-By: Guinevere Larsen <blarsen@redhat.com>
|
|
Commit bf2813aff8f2988ad3d53e819a0415abf295c91f introduced some logic to
not refresh the step frame id if it detects that the inferior is reverse
stepping out of a recursive call, so that we would still print frame
information once the inferior stops.
However, that logic was overly specific, and wouldn't be hit for
inferiors compiled with clang because clang adds line table entries that
aren't statements, making process_event_stop_test go through a different
branch on the relevant if statement.
Fix this by not making the code that detects "reversing out of a
recursion" an else clause to the previous if, but a standalone if block.
Approved-by: Kevin Buettner <kevinb@redhat.com>
|
|
Consider the following test-case:
...
$ cat -n test.c
1 int var;
2
3 int
4 foo (void)
5 {
6 var = 1;
7 #include "test.h"
8 }
9
10 int
11 main ()
12 {
13 return foo ();
14 }
$ cat -n test.h
1 return 1;
$ gcc test.c -g
...
When stepping through the test-case, gdb doesn't make it explicit that line 1
is not in test.c:
...
Temporary breakpoint 1, main () at test.c:13
13 return foo ();
(gdb) step
foo () at test.c:6
6 var = 1;
(gdb) n
1 return 1;
(gdb)
8 }
(gdb)
...
which makes it easy to misinterpret the output.
This is with the default "print frame-info" == auto, with documented
behaviour [1]:
...
stepi will switch between source-line and source-and-location depending on the
program counter.
...
What is actually implemented is that source-line is used unless stepping into
or out of a function.
The problem can be worked around by using
"set print frame-info source-and-location", but that's a bit verbose.
Instead, change the behaviour of "print frame-info" == auto to also use
source-and-location when stepping into another file, which gets us:
...
(gdb) n
foo () at test.h:1
1 return 1;
...
Tested on x86_64-linux.
Reviewed-By: Kevin Buettner <kevinb@redhat.com>
Reviewed-By: Kévin Le Gouguec <legouguec@adacore.com>
PR gdb/32011
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=32011
[1] https://sourceware.org/gdb/current/onlinedocs/gdb.html/Print-Settings.html#index-set-print-frame_002dinfo
|
|
Make the current program space reference bubble up one level. Pass
`current_program_space` everywhere, except in some cases where we can
get the pspace another way, and it's relatively obvious that it's the
same as the current program space.
Change-Id: Id86b79f1e44f92a398f49d137d57457174dfa96d
Approved-By: Tom Tromey <tom@tromey.com>
Reviewed-By: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
|
|
The `no_shared_libraries` function is currently used to implement the
`nosharedlibrary` command, but it also used internally by other
functions. This does not make a very good internal API.
Add the `no_shared_libraries_command` function to implement the CLI
command. Remove the unused parameters from `no_shared_libraries`.
Remove the `from_tty` parameter of `target_pre_inferior`, since it's now
unused.
Change-Id: I4fcba5ee1e0f7d250aab1a7b62b9ea16265fe962
Approved-By: Tom Tromey <tom@tromey.com>
Reviewed-By: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
|
|
Remove some includes reported as unused by clangd. Add some includes in
other files that were previously relying on the transitive include.
Change-Id: Ibdd0a998b04d21362a20d0ca8e5267e21e2e133e
|
|
Since the frame variable is now a frame_info_ptr, the issue
with the dangling frame pointer is apparently no longer there.
So remove the re-fetch code and the corresponding meanwhile
misleading comments.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
Remove the gdbcmd.h, which is reported as unused by clangd. Add
cli/cli-cmds.h instead, to get access to `cmdlist` and friends.
Change-Id: Ic0c60d2f6d3618f1bd9fd80b95ffd7c33c692a04
|
|
Now that defs.h, server.h and common-defs.h are included via the
`-include` option, it is no longer necessary for source files to include
them. Remove all the inclusions of these files I could find. Update
the generation scripts where relevant.
Change-Id: Ia026cff269c1b7ae7386dd3619bc9bb6a5332837
Approved-By: Pedro Alves <pedro@palves.net>
|
|
This commit fixes bug PR 28942, that is, creating a conditional
breakpoint in a multi-threaded inferior, where the breakpoint
condition includes an inferior function call.
Currently, when a user tries to create such a breakpoint, then GDB
will fail with:
(gdb) break infcall-from-bp-cond-single.c:61 if (return_true ())
Breakpoint 2 at 0x4011fa: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-single.c, line 61.
(gdb) continue
Continuing.
[New Thread 0x7ffff7c5d700 (LWP 2460150)]
[New Thread 0x7ffff745c700 (LWP 2460151)]
[New Thread 0x7ffff6c5b700 (LWP 2460152)]
[New Thread 0x7ffff645a700 (LWP 2460153)]
[New Thread 0x7ffff5c59700 (LWP 2460154)]
Error in testing breakpoint condition:
Couldn't get registers: No such process.
An error occurred while in a function called from GDB.
Evaluation of the expression containing the function
(return_true) will be abandoned.
When the function is done executing, GDB will silently stop.
Selected thread is running.
(gdb)
Or, in some cases, like this:
(gdb) break infcall-from-bp-cond-simple.c:56 if (is_matching_tid (arg, 1))
Breakpoint 2 at 0x401194: file /tmp/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/infcall-from-bp-cond-simple.c, line 56.
(gdb) continue
Continuing.
[New Thread 0x7ffff7c5d700 (LWP 2461106)]
[New Thread 0x7ffff745c700 (LWP 2461107)]
../../src.release/gdb/nat/x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.
The precise error depends on the exact thread state; so there's race
conditions depending on which threads have fully started, and which
have not. But the underlying problem is always the same; when GDB
tries to execute the inferior function call from within the breakpoint
condition, GDB will, incorrectly, try to resume threads that are
already running - GDB doesn't realise that some threads might already
be running.
The solution proposed in this patch requires an additional member
variable thread_info::in_cond_eval. This flag is set to true (in
breakpoint.c) when GDB is evaluating a breakpoint condition.
In user_visible_resume_ptid (infrun.c), when the in_cond_eval flag is
true, then GDB will only try to resume the current thread, that is,
the thread for which the breakpoint condition is being evaluated.
This solves the problem of GDB trying to resume threads that are
already running.
The next problem is that inferior function calls are assumed to be
synchronous, that is, GDB doesn't expect to start an inferior function
call in thread #1, then receive a stop from thread #2 for some other,
unrelated reason. To prevent GDB responding to an event from another
thread, we update fetch_inferior_event and do_target_wait in infrun.c,
so that, when an inferior function call (on behalf of a breakpoint
condition) is in progress, we only wait for events from the current
thread (the one evaluating the condition).
In do_target_wait I had to change the inferior_matches lambda
function, which is used to select which inferior to wait on.
Previously the logic was this:
auto inferior_matches = [&wait_ptid] (inferior *inf)
{
return (inf->process_target () != nullptr
&& ptid_t (inf->pid).matches (wait_ptid));
};
This compares the pid of the inferior against the complete ptid we
want to wait on. Before this commit wait_ptid was only ever
minus_one_ptid (which is special, and means any process), and so every
inferior would match.
After this commit though wait_ptid might represent a specific thread
in a specific inferior. If we compare the pid of the inferior to a
specific ptid then these will not match. The fix is to compare
against the pid extracted from the wait_ptid, not against the complete
wait_ptid itself.
In fetch_inferior_event, after receiving the event, we only want to
stop all the other threads, and call inferior_event_handler with
INF_EXEC_COMPLETE, if we are not evaluating a conditional breakpoint.
If we are, then all the other threads should be left doing whatever
they were before. The inferior_event_handler call will be performed
once the breakpoint condition has finished being evaluated, and GDB
decides to stop or not.
The final problem that needs solving relates to GDB's commit-resume
mechanism, which allows GDB to collect resume requests into a single
packet in order to reduce traffic to a remote target.
The problem is that the commit-resume mechanism will not send any
resume requests for an inferior if there are already events pending on
the GDB side.
Imagine an inferior with two threads. Both threads hit a breakpoint,
maybe the same conditional breakpoint. At this point there are two
pending events, one for each thread.
GDB selects one of the events and spots that this is a conditional
breakpoint, GDB evaluates the condition.
The condition includes an inferior function call, so GDB sets up for
the call and resumes the one thread, the resume request is added to
the commit-resume queue.
When the commit-resume queue is committed GDB sees that there is a
pending event from another thread, and so doesn't send any resume
requests to the actual target, GDB is assuming that when we wait we
will select the event from the other thread.
However, as this is an inferior function call for a condition
evaluation, we will not select the event from the other thread, we
only care about events from the thread that is evaluating the
condition - and the resume for this thread was never sent to the
target.
And so, GDB hangs, waiting for an event from a thread that was never
fully resumed.
To fix this issue I have added the concept of "forcing" the
commit-resume queue. When enabling commit resume, if the force flag
is true, then any resumes will be committed to the target, even if
there are other threads with pending events.
A note on authorship: this patch was based on some work done by
Natalia Saiapova and Tankut Baris Aktemur from Intel[1]. I have made
some changes to their work in this version.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28942
[1] https://sourceware.org/pipermail/gdb-patches/2020-October/172454.html
Co-authored-by: Natalia Saiapova <natalia.saiapova@intel.com>
Co-authored-by: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
Tested-By: Luis Machado <luis.machado@arm.com>
Tested-By: Keith Seitz <keiths@redhat.com>
|
|
This reverts commit ac0d67ed1dcf470bad6a3bc4800c2ddc9bedecca.
There was nothing wrong with the commit which I'm reverting here, but
it removed some functionality that will be needed for a later commit;
that is, the ability for GDB to ask for events from a specific ptid_t
via the do_target_wait function.
In a follow up commit, this functionality will be used to implement
inferior function calls in multi-threaded inferiors.
This is not a straight revert of the above commit. Reverting the
above commit replaces a 'nullptr' with 'NULL', I've gone in and
changed that, preserving the 'nullptr'.
Reviewed-By: Tankut Baris Aktemur <tankut.baris.aktemur@intel.com>
Tested-By: Luis Machado <luis.machado@arm.com>
Tested-By: Keith Seitz <keiths@redhat.com>
|
|
We currently pass frames to function by value, as `frame_info_ptr`.
This is somewhat expensive:
- the size of `frame_info_ptr` is 64 bytes, which is a bit big to pass
by value
- the constructors and destructor link/unlink the object in the global
`frame_info_ptr::frame_list` list. This is an `intrusive_list`, so
it's not so bad: it's just assigning a few points, there's no memory
allocation as if it was `std::list`, but still it's useless to do
that over and over.
As suggested by Tom Tromey, change many function signatures to accept
`const frame_info_ptr &` instead of `frame_info_ptr`.
Some functions reassign their `frame_info_ptr` parameter, like:
void
the_func (frame_info_ptr frame)
{
for (; frame != nullptr; frame = get_prev_frame (frame))
{
...
}
}
I wondered what to do about them, do I leave them as-is or change them
(and need to introduce a separate local variable that can be
re-assigned). I opted for the later for consistency. It might not be
clear why some functions take `const frame_info_ptr &` while others take
`frame_info_ptr`. Also, if a function took a `frame_info_ptr` because
it did re-assign its parameter, I doubt that we would think to change it
to `const frame_info_ptr &` should the implementation change such that
it doesn't need to take `frame_info_ptr` anymore. It seems better to
have a simple rule and apply it everywhere.
Change-Id: I59d10addef687d157f82ccf4d54f5dde9a963fd0
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
By inspection, I believe that breakpoint_init_inferior doesn't call
anything that relies on the current program space or inferior. So,
add an inferior parameter, to make the current inferior / program space
references bubble up one level.
Change-Id: Ib07b7a6d360e324f6ae1aa502dd314b8cce421b7
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
Make the current_program_space reference bubble up one level.
Change-Id: Idc8ed78d23bf3bb2969f6963d8cc049f26901c29
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
This code was probably needed before we had reinflatable
frame_info_ptrs, it's not necessary anymore.
Change-Id: I5474c6081ee1e39624c9266b05dbe01351a130b5
Approved-By: Tom Tromey <tom@tromey.com>
|
|
This changes lookup_symbol and associated APIs to accept
domain_search_flags rather than a domain_enum.
Note that this introduces some new constants to Python and Guile. I
chose to break out the documentation patch for this, because the
internals here do not change until a later patch, and it seemed
simpler to patch the docs just once, rather than twice.
|
|
A recent(ish) change in gdb/infrun.c made process_event_stop_test load
debug information where it would not have done so previously. The
change is:
commit bf2813aff8f2988ad3d53e819a0415abf295c91f
AuthorDate: Fri Sep 1 13:47:32 2023 +0200
CommitDate: Mon Nov 20 10:54:03 2023 +0100
gdb/record: print frame information when exiting a recursive call
Currently, when GDB is reverse stepping out of a function into the same
function due to a recursive call, it doesn't print frame information, as
reported by PR record/29178. This happens because when the inferior
leaves the current frame, GDB decides to refresh the step information,
clobbering the original step_frame_id, making it impossible to figure
out later on that the frame has been changed.
This commit changes GDB so that, if we notice we're in this exact
situation, we won't refresh the step information.
Because of implementation details, this change can cause some debug
information to be read when it normally wouldn't before, which showed up
as a regression on gdb.dwarf2/dw2-out-of-range-end-of-seq. Since that
isn't a problem, the test was changed to allow for the new output.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29178
Although there is nothing wrong with this change in principle, it
happens to break most of the tests in gdb/testsuite/gdb.rocm/*.exp.
This is because those tests do rely on GDB not loading debug
information. This is necessary because the debug information produced
for AMDGPU code is using DWARF extensions which are not supported by GDB
at this point.
In this patch, I propose to use a lazy loading mechanism so the frame_id
for the current frame is only computed when required instead of when
entering process_event_stop_test. The lazy_loader class is currently
defined locally in infrun.c, but if it turns out to be useful elsewhere,
it could go somewhere under gdbsupport.
This patch should restore the behavior GDB had before
bf2813aff8f2988ad3d53e819a0415abf295c91f when it comes to load debug
info.
Another approach could have been to revert
fb84fbf8a51f5be2e78765508ebd9753af96b492 (gdb/infrun: simplify
process_event_stop_test) and adjust the implementation of
bf2813aff8f2988ad3d53e819a0415abf295c91f (gdb/record: print frame
information when exiting a recursive call). However, I think that the
lazy loading works well with the simplification done recently, so I went
down that route.
Regression tested on x86_64-linux (Ubuntu 22.04) with AMDGPU support.
Change-Id: Ib63a162128130d1786a77c98623e9e3dcbc363b7
Approved-by: Kevin Buettner <kevinb@redhat.com>
|
|
Introduce new ui_file buffering_file to temporarily collect output
written to gdb_std* output streams during print_thread, print_frame_info
and print_stop_event.
This ensures that output during these functions is not interrupted
by debuginfod progress messages.
With the addition of deferred debuginfo downloading it is possible
for download progress messages to print during these events.
Without any intervention we can end up with poorly formatted output:
(gdb) backtrace
[...]
#8 0x00007fbe8af7d7cf in pygi_invoke_c_callable (Downloading separate debug info for /lib64/libpython3.11.so.1.0
function_cache=0x561221b224d0, state=<optimized out>...
To fix this we buffer writes to gdb_std* output streams while allowing
debuginfod progress messages to skip the buffers and print to the
underlying output streams immediately. Buffered output is then written
to the output streams. This ensures that progress messages print first,
followed by uninterrupted frame/thread/stop info:
(gdb) backtrace
[...]
Downloading separate debug info for /lib64/libpython3.11.so.1.0
#8 0x00007fbe8af7d7cf in pygi_invoke_c_callable (function_cache=0x561221b224d0, state=<optimized out>...
Co-Authored-By: Andrew Burgess <aburgess@redhat.com>
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
This commit is the result of the following actions:
- Running gdb/copyright.py to update all of the copyright headers to
include 2024,
- Manually updating a few files the copyright.py script told me to
update, these files had copyright headers embedded within the
file,
- Regenerating gdbsupport/Makefile.in to refresh it's copyright
date,
- Using grep to find other files that still mentioned 2023. If
these files were updated last year from 2022 to 2023 then I've
updated them this year to 2024.
I'm sure I've probably missed some dates. Feel free to fix them up as
you spot them.
|
|
When running GDB's testsuite on aarch64-linux/Ubuntu 20.04 (also spotted on
the ppc backend), there are failures in gdb.reverse/solib-precsave.exp and
gdb.reverse/solib-reverse.exp.
The failure happens around the following code:
38 b[1] = shr2(17); /* middle part two */
40 b[0] = 6; b[1] = 9; /* generic statement, end part two */
42 shr1 ("message 1\n"); /* shr1 one */
Normal execution:
- step from line 38 will land on line 40.
- step from line 40 will land on line 42.
Reverse execution:
- step from line 42 will land on line 40.
- step from line 40 will land on line 40.
- step from line 40 will land on line 38.
The problem here is that line 40 contains two contiguous but distinct
PC ranges in the line table, like so:
Line 40 - [0x7ec ~ 0x7f4]
Line 40 - [0x7f4 ~ 0x7fc]
The two distinct ranges are generated because GCC started outputting source
column information, which GDB doesn't take into account at the moment.
When stepping forward from line 40, we skip both of these ranges and land on
line 42. When stepping backward from line 42, we stop at the start PC of the
second (or first, going backwards) range of line 40.
Since we've reached ecs->event_thread->control.step_range_start, we stop
stepping backwards.
The above issues were fixed by introducing a new function that looks for
adjacent PC ranges for the same line, until we notice a line change. Then
we take that as the start PC of the range. The new start PC for the range
is used for the control.step_range_start when setting up a step range.
The test case gdb.reverse/map-to-same-line.exp is added to test the fix
for the above reverse step issues.
Patch has been tested on PowerPC, X86 and AArch64 with no regressions.
|
|
With AMD GPU debugging, I noticed that when stepping over a breakpoint
placed on top of the s_endpgm instruction inline (displaced=off), GDB
would behave differently -- it wouldn't print the wave exit. E.g:
With displaced stepping, or no breakpoint at all:
stepi
[AMDGPU Wave 1:4:1:1 (0,0,0)/0 exited]
Command aborted, thread exited.
(gdb)
With inline stepping:
stepi
Command aborted, thread exited.
(gdb)
In the cases we see the "exited" notification, handle_thread_exit is
what first called delete_thread on the exiting thread, which is
non-silent.
With inline stepping, however, handle_thread_exit ends up in
update_thread_list (via restart_threads) before any delete_thread
call. Thus, amd_dbgapi_target::update_thread_list notices that the
wave is gone and deletes it with delete_thread_silent.
This commit fixes it, by making handle_thread_exited call
set_thread_exited (with the default silent=false) early, which emits
the user-visible notification.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: I22ab3145e18d07c99dace45576307b9f9d5d966f
|
|
displaced_step_finish can be called with event_status.kind ==
TARGET_WAITKIND_THREAD_EXITED, and in that case it is not possible to
get at the already-exited thread's registers.
This patch moves the get_thread_regcache calls to branches that
actually need it, where we know the thread is still alive.
It also adds an assertion to get_thread_regcache, to help catching
these broken cases sooner.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: I63b5eacb3e02a538fc5087c270d8025adfda88c3
|
|
While making step over thread exit work properly on AMDGPU, I noticed
that if there's a breakpoint on top of the exit syscall, and,
displaced stepping is off, then when GDB reports "Command aborted,
thread exited.", GDB also switches focus to a random thread, instead
of leaving the exited thread as selected:
(gdb) thread
[Current thread is 6, lane 0 (AMDGPU Lane 1:4:1:1/0 (0,0,0)[0,0,0])]
(gdb) si
Command aborted, thread exited.
(gdb) thread
[Current thread is 5 (Thread 0x7ffff626f640 (LWP 3248392))]
(gdb)
The previous patch extended gdb.threads/step-over-thread-exit.exp to
exercise this on GNU/Linux (on the CPU side), and there, after that
"si", we always end up with the exiting thread as selected even
without this fix, but that's just a concidence, there's a code path
that happens to select the exiting thread for an unrelated reason.
This commit add the explict switch, fixing the latent problem for
GNU/Linux, and the actual problem on AMDGPU. I wrote a gdb.rocm/
testcase for this, but it can't be upstreamed yet, until more pieces
of the DWARF machinery are upstream as well.
Approved-By: Simon Marchi <simon.marchi@efficios.com>
Change-Id: I6ff57a79514ac0142bba35c749fe83d53d9e4e51
|
|
The previous commit describes PR gdb/30547, a segfault when running test-case
gdb.base/vfork-follow-parent.exp on powerpc64 (likewise on s390x).
The root cause for the segmentation fault is that linux_is_uclinux gives an
incorrect result: it returns true instead of false.
So, why does linux_is_uclinux:
...
int
linux_is_uclinux (void)
{
CORE_ADDR dummy;
return (target_auxv_search (AT_NULL, &dummy) > 0
&& target_auxv_search (AT_PAGESZ, &dummy) == 0);
...
return true?
This is because ppc_linux_target_wordsize returns 4 instead of 8, causing
ppc_linux_nat_target::auxv_parse to misinterpret the auxv vector.
So, why does ppc_linux_target_wordsize:
...
int
ppc_linux_target_wordsize (int tid)
{
int wordsize = 4;
/* Check for 64-bit inferior process. This is the case when the host is
64-bit, and in addition the top bit of the MSR register is set. */
long msr;
errno = 0;
msr = (long) ptrace (PTRACE_PEEKUSER, tid, PT_MSR * 8, 0);
if (errno == 0 && ppc64_64bit_inferior_p (msr))
wordsize = 8;
return wordsize;
}
...
return 4?
Specifically, we get this result because because tid == 0, so we get
errno == ESRCH.
The tid == 0 is caused by the switch_to_no_thread in
handle_vfork_child_exec_or_exit:
...
/* Switch to no-thread while running clone_program_space, so
that clone_program_space doesn't want to read the
selected frame of a dead process. */
scoped_restore_current_thread restore_thread;
switch_to_no_thread ();
inf->pspace = new program_space (maybe_new_address_space ());
...
but moving the maybe_new_address_space call to before that gives us the
same result. The tid is no longer 0, but we still get ESRCH because the
thread has exited.
Fix this in handle_vfork_child_exec_or_exit by doing the
maybe_new_address_space call in the context of the vfork parent.
Tested on top of trunk on x86_64-linux and ppc64le-linux.
Tested on top of gdb-14-branch on ppc64-linux.
Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca>
PR gdb/30547
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547
|
|
When running test-case gdb.base/vfork-follow-parent.exp on powerpc64 (likewise
on s390x), I run into:
...
(gdb) PASS: gdb.base/vfork-follow-parent.exp: \
exec_file=vfork-follow-parent-exit: target-non-stop=on: non-stop=off: \
resolution_method=schedule-multiple: print unblock_parent = 1
continue^M
Continuing.^M
Reading symbols from vfork-follow-parent-exit...^M
^M
^M
Fatal signal: Segmentation fault^M
----- Backtrace -----^M
0x1027d3e7 gdb_internal_backtrace_1^M
src/gdb/bt-utils.c:122^M
0x1027d54f _Z22gdb_internal_backtracev^M
src/gdb/bt-utils.c:168^M
0x1057643f handle_fatal_signal^M
src/gdb/event-top.c:889^M
0x10576677 handle_sigsegv^M
src/gdb/event-top.c:962^M
0x3fffa7610477 ???^M
0x103f2144 for_each_block^M
src/gdb/dcache.c:199^M
0x103f235b _Z17dcache_invalidateP13dcache_struct^M
src/gdb/dcache.c:251^M
0x10bde8c7 _Z24target_dcache_invalidatev^M
src/gdb/target-dcache.c:50^M
...
or similar.
The root cause for the segmentation fault is that linux_is_uclinux gives an
incorrect result: it should always return false, given that we're running on a
regular linux system, but instead it returns first true, then false.
In more detail, the segmentation fault happens as follows:
- a program space with an address space is created
- a second program space is about to be created. maybe_new_address_space
is called, and because linux_is_uclinux returns true, maybe_new_address_space
returns false, and no new address space is created
- a second program space with the same address space is created
- a program space is deleted. Because linux_is_uclinux now returns false,
gdbarch_has_shared_address_space (current_inferior ()->arch ()) returns
false, and the address space is deleted
- when gdb uses the address space of the remaining program space, we run into
the segfault, because the address space is deleted.
Hardcoding linux_is_uclinux to false makes the test-case pass.
We leave addressing the root cause for the following commit in this series.
For now, prevent the segmentation fault by making the address space a refcounted
object.
This was already suggested here [1]:
...
A better solution might be to have the address spaces be reference counted
...
Tested on top of trunk on x86_64-linux and ppc64le-linux.
Tested on top of gdb-14-branch on ppc64-linux.
Co-Authored-By: Simon Marchi <simon.marchi@polymtl.ca>
PR gdb/30547
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=30547
[1] https://sourceware.org/pipermail/gdb-patches/2023-October/202928.html
|
|
Make it return a bool and adjust a few comparisons where it's used.
Change-Id: Ic77d23b0dcfcfc9195dfe65e4c7ff9cf3229f6fb
|
|
Since GDB now requires C++17, we don't need the internally maintained
gdb::optional implementation. This patch does the following replacing:
- gdb::optional -> std::optional
- gdb::in_place -> std::in_place
- #include "gdbsupport/gdb_optional.h" -> #include <optional>
This change has mostly been done automatically. One exception is
gdbsupport/thread-pool.* which did not use the gdb:: prefix as it
already lives in the gdb namespace.
Change-Id: I19a92fa03e89637bab136c72e34fd351524f65e9
Approved-By: Tom Tromey <tom@tromey.com>
Approved-By: Pedro Alves <pedro@palves.net>
|
|
The previous commit introduced some local variables to make some if
statements simpler. This commit uses them more liberally throughout the
process_event_stop_test in order to simplify the function a little more.
No functional changes are expected.
Approved-By: Tom Tromey <tom@tromey.com>
|
|
Currently, when GDB is reverse stepping out of a function into the same
function due to a recursive call, it doesn't print frame information, as
reported by PR record/29178. This happens because when the inferior
leaves the current frame, GDB decides to refresh the step information,
clobbering the original step_frame_id, making it impossible to figure
out later on that the frame has been changed.
This commit changes GDB so that, if we notice we're in this exact
situation, we won't refresh the step information.
Because of implementation details, this change can cause some debug
information to be read when it normally wouldn't before, which showed up
as a regression on gdb.dwarf2/dw2-out-of-range-end-of-seq. Since that
isn't a problem, the test was changed to allow for the new output.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29178
Approved-By: Tom Tromey <tom@tromey.com>
|
|
A simple refactor to make the reference to current_program_space bubble
up one level. No behavior changes expected.
Change-Id: I237cf2f45ae73c35bcb433ce40e3c03cef6b87e2
|
|
Remove get_current_regcache, inlining the call to get_thread_regcache in
callers. When possible, pass the right thread_info object known from
the local context. Otherwise, fall back to passing `inferior_thread ()`.
This makes the reference to global context bubble up one level, a small
step towards the long term goal of reducing the number of references to
global context (or rather, moving those references as close as possible
to the top of the call tree).
No behavior change expected.
Change-Id: Ifa6980c88825d803ea586546b6b4c633c33be8d6
|
|
While looking at the regcache code, I noticed that the address space
(passed to regcache when constructing it, and available through
regcache::aspace) wasn't relevant for the regcache itself. Callers of
regcache::aspace use that method because it appears to be a convenient
way of getting the address space for a thread, if you already have the
regcache. But there is always another way to get the address space, as
the callers pretty much always know which thread they are dealing with.
The regcache code itself doesn't use the address space.
This patch removes anything related to address_space from the regcache
code, and updates callers to get it from the thread in context. This
removes a bit of unnecessary complexity from the regcache code.
The current get_thread_arch_regcache function gets an address_space for
the given thread using the target_thread_address_space function (which
calls the target_ops::thread_address_space method). This suggest that
there might have been the intention of supporting per-thread address
spaces. But digging through the history, I did not find any such case.
Maybe this method was just added because we needed a way to get an
address space from a ptid (because constructing a regcache required an
address space), and this seemed like the right way to do it, I don't
know.
The only implementations of thread_address_space and
process_stratum_target::thread_address_space and
linux_nat_target::thread_address_space, which essentially just return
the inferior's address space. And thread_address_space is only used in
the current get_thread_arch_regcache, which gets removed. So, I think
that the thread_address_space target method can be removed, and we can
assume that it's fine to use the inferior's address space everywhere.
Callers of regcache::aspace are updated to get the address space from
the relevant inferior, either using some context they already know
about, or in last resort using the current global context.
So, to summarize:
- remove everything in regcache related to address spaces
- in particular, remove get_thread_arch_regcache, and rename
get_thread_arch_aspace_regcache to get_thread_arch_regcache
- remove target_ops::thread_address_space, and
target_thread_address_space
- adjust all users of regcache::aspace to get the address space another
way
Change-Id: I04fd41b22c83fe486522af7851c75bcfb31c88c7
|
|
If your target has no support for TARGET_WAITKIND_NO_RESUMED events
(and no way to support them, such as the yet-unsubmitted AMDGPU
target), and you step over thread exit with scheduler-locking on, this
is what you get:
(gdb) n
[Thread ... exited]
*hang*
Getting back the prompt by typing Ctrl-C may not even work, since no
inferior thread is running to receive the SIGINT. Even if it works,
it seems unnecessarily harsh. If you started an execution command for
which there's a clear thread of interest (step, next, until, etc.),
and that thread disappears, then I think it's more user friendly if
GDB just detects the situation and aborts the command, giving back the
prompt.
That is what this commit implements. It does this by explicitly
requesting the target to report thread exit events whenever the main
resumed thread has a thread_fsm. Note that unlike stepping over a
breakpoint, we don't need to enable clone events in this case.
With this patch, we get:
(gdb) n
[Thread 0x7ffff7d89700 (LWP 3961883) exited]
Command aborted, thread exited.
(gdb)
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I901ab64c91d10830590b2dac217b5264635a2b95
|
|
If scheduler-locking is in effect, e.g., with "set scheduler-locking
on", and you step over a function that spawns a new thread, the new
thread is allowed to run free, at least until some event is hit, at
which point, whether the new thread is re-resumed depends on a number
of seemingly random factors. E.g., if the target is all-stop, and the
parent thread hits a breakpoint, and GDB decides the breakpoint isn't
interesting to report to the user, then the parent thread is resumed,
but the new thread is left stopped.
I think that letting the new threads run with scheduler-locking
enabled is a defect. This commit fixes that, making use of the new
clone events on Linux, and of target_thread_events() on targets where
new threads have no connection to the thread that spawned them.
Testcase and documentation changes included.
Approved-By: Eli Zaretskii <eliz@gnu.org>
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: Ie12140138b37534b7fc1d904da34f0f174aa11ce
|
|
Running the
gdb.threads/step-over-thread-exit-while-stop-all-threads.exp testcase
added later in the series against gdbserver, after the
TARGET_WAITKIND_NO_RESUMED fix from the following patch, would run
into an infinite loop in stop_all_threads, leading to a timeout:
FAIL: gdb.threads/step-over-thread-exit-while-stop-all-threads.exp: displaced-stepping=off: target-non-stop=on: iter 0: continue (timeout)
The is really a latent bug, and it is about the fact that
stop_all_threads stops listening to events from a target as soon as it
sees a TARGET_WAITKIND_NO_RESUMED, ignoring that
TARGET_WAITKIND_NO_RESUMED may be delayed. handle_no_resumed knows
how to handle delayed no-resumed events, but stop_all_threads was
never taught to.
In more detail, here's what happens with that testcase:
#1 - Multiple threads report breakpoint hits to gdb.
#2 - gdb picks one events, and it's for thread 1. All other stops are
left pending. thread 1 needs to move past a breakpoint, so gdb
stops all threads to start an inline step over for thread 1.
While stopping threads, some of the threads that were still
running report events that are also left pending.
#2 - gdb steps thread 1
#3 - Thread 1 exits while stepping (it steps over an exit syscall),
gdbserver reports thread exit for thread 1
#4 - Thread 1 was the last resumed thread, so gdbserver also reports
no-resumed:
[remote] Notification received: Stop:w0;p3445d0.3445d3
[remote] Sending packet: $vStopped#55
[remote] Packet received: N
[remote] Sending packet: $vStopped#55
[remote] Packet received: OK
#5 - gdb processes the thread exit for thread 1, finishes the step
over and restarts threads.
#6 - gdb picks the next event to process out of one of the resumed
threads with pending events:
[infrun] random_resumed_with_pending_wait_status: Found 32 events, selecting #11
#7 - This is again a breakpoint hit and the breakpoint needs to be
stepped over too, so gdb starts a step-over dance again.
#8 - We reach stop_all_threads, which finds that some threads need to
be stopped.
#9 - wait_one finally consumes the no-resumed event queue by #4.
Seeing this, wait_one disable target async, to stop listening for
events out of the remote target.
#10 - We still haven't seen all the stops expected, so
stop_all_threads tries another iteration.
#11 - Because the remote target is no longer async, and there are no
other targets, wait_one return no-resumed immediately without
polling the remote target.
#12 - We still haven't seen all the stops expected, so
stop_all_threads tries another iteration. goto #11, looping
forever.
Fix this by explicitly enabling/re-enabling target async on targets
that can async, before waiting for stops.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: Ie3ffb0df89635585a6631aa842689cecc989e33f
|
|
GDB doesn't handle correctly the case where a thread steps over a
breakpoint (using either in-line or displaced stepping), and the
executed instruction causes the thread to exit.
Using the test program included later in the series, this is what it
looks like with displaced-stepping, on x86-64 Linux, where we have two
displaced-step buffers:
$ ./gdb -q -nx --data-directory=data-directory build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit -ex "b my_exit_syscall" -ex r
Reading symbols from build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit...
Breakpoint 1 at 0x123c: file src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S, line 68.
Starting program: build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1".
[New Thread 0x7ffff7c5f640 (LWP 2915510)]
[Switching to Thread 0x7ffff7c5f640 (LWP 2915510)]
Thread 2 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68
68 syscall
(gdb) c
Continuing.
[New Thread 0x7ffff7c5f640 (LWP 2915524)]
[Thread 0x7ffff7c5f640 (LWP 2915510) exited]
[Switching to Thread 0x7ffff7c5f640 (LWP 2915524)]
Thread 3 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68
68 syscall
(gdb) c
Continuing.
[New Thread 0x7ffff7c5f640 (LWP 2915616)]
[Thread 0x7ffff7c5f640 (LWP 2915524) exited]
[Switching to Thread 0x7ffff7c5f640 (LWP 2915616)]
Thread 4 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68
68 syscall
(gdb) c
Continuing.
... hangs ...
The first two times we do "continue", we displaced-step the syscall
instruction that causes the thread to exit. When the thread exits,
the main thread, waiting on pthread_join, is unblocked. It spawns a
new thread, which hits the breakpoint on the syscall again. However,
infrun was never notified that the displaced-stepping threads are done
using the displaced-step buffer, so now both buffers are marked as
used. So when we do the third continue, there are no buffers
available to displaced-step the syscall, so the thread waits forever
for its turn.
When trying the same but with in-line step over (displaced-stepping
disabled):
$ ./gdb -q -nx --data-directory=data-directory \
build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit \
-ex "b my_exit_syscall" -ex "set displaced-stepping off" -ex r
Reading symbols from build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit...
Breakpoint 1 at 0x123c: file src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S, line 68.
Starting program: build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/step-over-thread-exit/step-over-thread-exit
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1".
[New Thread 0x7ffff7c5f640 (LWP 2928290)]
[Switching to Thread 0x7ffff7c5f640 (LWP 2928290)]
Thread 2 "step-over-threa" hit Breakpoint 1, my_exit_syscall () at src/binutils-gdb/gdb/testsuite/lib/my-syscalls.S:68
68 syscall
(gdb) c
Continuing.
[Thread 0x7ffff7c5f640 (LWP 2928290) exited]
No unwaited-for children left.
(gdb) i th
Id Target Id Frame
1 Thread 0x7ffff7c60740 (LWP 2928285) "step-over-threa" 0x00007ffff7f7c9b7 in __pthread_clockjoin_ex () from /usr/lib/libpthread.so.0
The current thread <Thread ID 2> has terminated. See `help thread'.
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff7c60740 (LWP 2928285))]
#0 0x00007ffff7f7c9b7 in __pthread_clockjoin_ex () from /usr/lib/libpthread.so.0
(gdb) c
Continuing.
^C^C
... hangs ...
The "continue" causes an in-line step to occur, meaning the main
thread is stopped while we step the syscall. The stepped thread exits
when executing the syscall, the linux-nat target notices there are no
more resumed threads to be waited for, so returns
TARGET_WAITKIND_NO_RESUMED, which causes the prompt to return. But
infrun never clears the in-line step over info. So if we try
continuing the main thread, GDB doesn't resume it, because it thinks
there's an in-line step in progress that we need to wait for to
finish, and we are stuck there.
To fix this, infrun needs to be informed when a thread doing a
displaced or in-line step over exits. We can do that with the new
target_set_thread_options mechanism which is optimal for only enabling
exit events of the thread that needs it; or, if that is not supported,
by using target_thread_events, which enables thread exit events for
all threads. This is done by this commit.
This patch then modifies handle_inferior_event in infrun.c to clean up
any step-over the exiting thread might have been doing at the time of
the exit. The cases to consider are:
- the exiting thread was doing an in-line step-over with an all-stop
target
- the exiting thread was doing an in-line step-over with a non-stop
target
- the exiting thread was doing a displaced step-over with a non-stop
target
The displaced-stepping buffer implementation in displaced-stepping.c
is modified to account for the fact that it's possible that we
"finish" a displaced step after a thread exit event. The buffer that
the exiting thread was using is marked as available again and the
original instructions under the scratch pad are restored. However, it
skips applying the fixup, which wouldn't make sense since the thread
does not exist anymore.
Another case that needs handling is if a displaced-stepping thread
exits, and the event is reported while we are in stop_all_threads. We
should call displaced_step_finish in the handle_one function, in that
case. It was already called in other code paths, just not the "thread
exited" path.
This commit doesn't make infrun ask the target to report the
TARGET_WAITKIND_THREAD_EXITED events yet, that'll be done later in the
series.
Note that "stop_print_frame = false;" line is moved to normal_stop,
because TARGET_WAITKIND_THREAD_EXITED can also end up with the event
transmorphed into TARGET_WAITKIND_NO_RESUMED. Moving it to
normal_stop keeps it centralized.
Co-authored-by: Simon Marchi <simon.marchi@efficios.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I745c6955d7ef90beb83bcf0ff1d1ac8b9b6285a5
|
|
When stepping over a breakpoint with displaced stepping, GDB needs to
be informed if the stepped thread exits, otherwise the displaced
stepping buffer that was allocated to that thread leaks, and this can
result in deadlock, with other threads waiting for their turn to
displaced step, but their turn never comes.
Similarly, when stepping over a breakpoint in line, GDB also needs to
be informed if the stepped thread exits, so that is can clear the step
over state and re-resume threads.
This commit makes it possible for GDB to ask the target to report
thread exit events for a given thread, using the new "thread options"
mechanism introduced by a previous patch.
This only adds the core bits. Following patches in the series will
teach the Linux backends (native & gdbserver) to handle the
GDB_THREAD_OPTION_EXIT option, and then a later patch will make use of
these thread exit events to clean up displaced stepping and inline
stepping state properly.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I96b719fdf7fee94709e98bb3a90751d8134f3a38
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27338
|
|
Currently, infrun assumes that when TARGET_WAITKIND_THREAD_EXITED is
reported, the corresponding GDB thread has already been removed from
the GDB thread list.
Later in the series, that will no longer work, as infrun will need to
refer to the thread's thread_info when it processes
TARGET_WAITKIND_THREAD_EXITED.
As preparation, this patch makes deleting the GDB thread
responsibility of infrun, instead of the target.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I013d87f61ffc9aaca49f0d6ce2a43e3ea69274de
|
|
A previous patch taught GDB about a new TARGET_WAITKIND_THREAD_CLONED
event kind, and made the Linux target report clone events.
A following patch will teach Linux GDBserver to do the same thing.
However, for remote debugging, it wouldn't be ideal for GDBserver to
report every clone event to GDB, when GDB only cares about such events
in some specific situations. Reporting clone events all the time
would be potentially chatty. We don't enable thread create/exit
events all the time for the same reason. Instead we have the
QThreadEvents packet. QThreadEvents is target-wide, though.
This patch makes GDB instead explicitly request that the target
reports clone events or not, on a per-thread basis.
In order to be able to do that with GDBserver, we need a new remote
protocol feature. Since a following patch will want to enable thread
exit events on per-thread basis too, the packet introduced here is
more generic than just for clone events. It lets you enable/disable a
set of options at once, modelled on Linux ptrace's PTRACE_SETOPTIONS.
IOW, this commit introduces a new QThreadOptions packet, that lets you
specify a set of per-thread event options you want to enable. The
packet accepts a list of options/thread-id pairs, similarly to vCont,
processed left to right, with the options field being a number
interpreted as a bit mask of options. The only option defined in this
commit is GDB_THREAD_OPTION_CLONE (0x1), which ask the remote target
to report clone events. Another patch later in the series will
introduce another option.
For example, this packet sets option "1" (clone events) on thread
p1000.2345:
QThreadOptions;1:p1000.2345
and this clears options for all threads of process 1000, and then sets
option "1" (clone events) on thread p1000.2345:
QThreadOptions;0:p1000.-1;1:p1000.2345
This clears options of all threads of all processes:
QThreadOptions;0
The target reports the set of supported options by including
"QThreadOptions=<supported options>" in its qSupported response.
infrun is then tweaked to enable GDB_THREAD_OPTION_CLONE when stepping
over a breakpoint.
Unlike PTRACE_SETOPTIONS, fork/vfork/clone children do NOT inherit
their parent's thread options. This is so that GDB can send e.g.,
"QThreadOptions;0;1:TID" without worrying about threads it doesn't
know about yet.
Documentation for this new remote protocol feature is included in a
documentation patch later in the series.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: Ie41e5093b2573f14cf6ac41b0b5804eba75be37e
|
|
(A good chunk of the problem statement in the commit log below is
Andrew's, adjusted for a different solution, and for covering
displaced stepping too. The testcase is mostly Andrew's too.)
This commit addresses bugs gdb/19675 and gdb/27830, which are about
stepping over a breakpoint set at a clone syscall instruction, one is
about displaced stepping, and the other about in-line stepping.
Currently, when a new thread is created through a clone syscall, GDB
sets the new thread running. With 'continue' this makes sense
(assuming no schedlock):
- all-stop mode, user issues 'continue', all threads are set running,
a newly created thread should also be set running.
- non-stop mode, user issues 'continue', other pre-existing threads
are not affected, but as the new thread is (sort-of) a child of the
thread the user asked to run, it makes sense that the new threads
should be created in the running state.
Similarly, if we are stopped at the clone syscall, and there's no
software breakpoint at this address, then the current behaviour is
fine:
- all-stop mode, user issues 'stepi', stepping will be done in place
(as there's no breakpoint to step over). While stepping the thread
of interest all the other threads will be allowed to continue. A
newly created thread will be set running, and then stopped once the
thread of interest has completed its step.
- non-stop mode, user issues 'stepi', stepping will be done in place
(as there's no breakpoint to step over). Other threads might be
running or stopped, but as with the continue case above, the new
thread will be created running. The only possible issue here is
that the new thread will be left running after the initial thread
has completed its stepi. The user would need to manually select
the thread and interrupt it, this might not be what the user
expects. However, this is not something this commit tries to
change.
The problem then is what happens when we try to step over a clone
syscall if there is a breakpoint at the syscall address.
- For both all-stop and non-stop modes, with in-line stepping:
+ user issues 'stepi',
+ [non-stop mode only] GDB stops all threads. In all-stop mode all
threads are already stopped.
+ GDB removes s/w breakpoint at syscall address,
+ GDB single steps just the thread of interest, all other threads
are left stopped,
+ New thread is created running,
+ Initial thread completes its step,
+ [non-stop mode only] GDB resumes all threads that it previously
stopped.
There are two problems in the in-line stepping scenario above:
1. The new thread might pass through the same code that the initial
thread is in (i.e. the clone syscall code), in which case it will
fail to hit the breakpoint in clone as this was removed so the
first thread can single step,
2. The new thread might trigger some other stop event before the
initial thread reports its step completion. If this happens we
end up triggering an assertion as GDB assumes that only the
thread being stepped should stop. The assert looks like this:
infrun.c:5899: internal-error: int finish_step_over(execution_control_state*): Assertion `ecs->event_thread->control.trap_expected' failed.
- For both all-stop and non-stop modes, with displaced stepping:
+ user issues 'stepi',
+ GDB starts the displaced step, moves thread's PC to the
out-of-line scratch pad, maybe adjusts registers,
+ GDB single steps the thread of interest, [non-stop mode only] all
other threads are left as they were, either running or stopped.
In all-stop, all other threads are left stopped.
+ New thread is created running,
+ Initial thread completes its step, GDB re-adjusts its PC,
restores/releases scratchpad,
+ [non-stop mode only] GDB resumes the thread, now past its
breakpoint.
+ [all-stop mode only] GDB resumes all threads.
There is one problem with the displaced stepping scenario above:
3. When the parent thread completed its step, GDB adjusted its PC,
but did not adjust the child's PC, thus that new child thread
will continue execution in the scratch pad, invoking undefined
behavior. If you're lucky, you see a crash. If unlucky, the
inferior gets silently corrupted.
What is needed is for GDB to have more control over whether the new
thread is created running or not. Issue #1 above requires that the
new thread not be allowed to run until the breakpoint has been
reinserted. The only way to guarantee this is if the new thread is
held in a stopped state until the single step has completed. Issue #3
above requires that GDB is informed of when a thread clones itself,
and of what is the child's ptid, so that GDB can fixup both the parent
and the child.
When looking for solutions to this problem I considered how GDB
handles fork/vfork as these have some of the same issues. The main
difference between fork/vfork and clone is that the clone events are
not reported back to core GDB. Instead, the clone event is handled
automatically in the target code and the child thread is immediately
set running.
Note we have support for requesting thread creation events out of the
target (TARGET_WAITKIND_THREAD_CREATED). However, those are reported
for the new/child thread. That would be sufficient to address in-line
stepping (issue #1), but not for displaced-stepping (issue #3). To
handle displaced-stepping, we need an event that is reported to the
_parent_ of the clone, as the information about the displaced step is
associated with the clone parent. TARGET_WAITKIND_THREAD_CREATED
includes no indication of which thread is the parent that spawned the
new child. In fact, for some targets, like e.g., Windows, it would be
impossible to know which thread that was, as thread creation there
doesn't work by "cloning".
The solution implemented here is to model clone on fork/vfork, and
introduce a new TARGET_WAITKIND_THREAD_CLONED event. This event is
similar to TARGET_WAITKIND_FORKED and TARGET_WAITKIND_VFORKED, except
that we end up with a new thread in the same process, instead of a new
thread of a new process. Like FORKED and VFORKED, THREAD_CLONED
waitstatuses have a child_ptid property, and the child is held stopped
until GDB explicitly resumes it. This addresses the in-line stepping
case (issues #1 and #2).
The infrun code that handles displaced stepping fixup for the child
after a fork/vfork event is thus reused for THREAD_CLONE, with some
minimal conditions added, addressing the displaced stepping case
(issue #3).
The native Linux backend is adjusted to unconditionally report
TARGET_WAITKIND_THREAD_CLONED events to the core.
Following the follow_fork model in core GDB, we introduce a
target_follow_clone target method, which is responsible for making the
new clone child visible to the rest of GDB.
Subsequent patches will add clone events support to the remote
protocol and gdbserver.
displaced_step_in_progress_thread becomes unused with this patch, but
a new use will reappear later in the series. To avoid deleting it and
readding it back, this patch marks it with attribute unused, and the
latter patch removes the attribute again. We need to do this because
the function is static, and with no callers, the compiler would warn,
(error with -Werror), breaking the build.
This adds a new gdb.threads/stepi-over-clone.exp testcase, which
exercises stepping over a clone syscall, with displaced stepping vs
inline stepping, and all-stop vs non-stop. We already test stepping
over clone syscalls with gdb.base/step-over-syscall.exp, but this test
uses pthreads, while the other test uses raw clone, and this one is
more thorough. The testcase passes on native GNU/Linux, but fails
against GDBserver. GDBserver will be fixed by a later patch in the
series.
Co-authored-by: Andrew Burgess <aburgess@redhat.com>
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=19675
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=27830
Change-Id: I95c06024736384ae8542a67ed9fdf6534c325c8e
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
|
|
I noticed that on an Ubuntu 20.04 system, after a following patch
("Step over clone syscall w/ breakpoint,
TARGET_WAITKIND_THREAD_CLONED"), the gdb.threads/step-over-exec.exp
was passing cleanly, but still, we'd end up with four new unexpected
GDB core dumps:
=== gdb Summary ===
# of unexpected core files 4
# of expected passes 48
That said patch is making the pre-existing
gdb.threads/step-over-exec.exp testcase (almost silently) expose a
latent problem in gdb/linux-nat.c, resulting in a GDB crash when:
#1 - a non-leader thread execs
#2 - the post-exec program stops somewhere
#3 - you kill the inferior
Instead of #3 directly, the testcase just returns, which ends up in
gdb_exit, tearing down GDB, which kills the inferior, and is thus
equivalent to #3 above.
Vis (after said patch is applied):
$ gdb --args ./gdb /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true
...
(top-gdb) r
...
(gdb) b main
...
(gdb) r
...
Breakpoint 1, main (argc=1, argv=0x7fffffffdb88) at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec.c:69
69 argv0 = argv[0];
(gdb) c
Continuing.
[New Thread 0x7ffff7d89700 (LWP 2506975)]
Other going in exec.
Exec-ing /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
process 2506769 is executing new program: /home/pedro/gdb/build/gdb/testsuite/outputs/gdb.threads/step-over-exec/step-over-exec-execr-thread-other-diff-text-segs-true-execd
Thread 1 "step-over-exec-" hit Breakpoint 1, main () at /home/pedro/gdb/build/gdb/testsuite/../../../src/gdb/testsuite/gdb.threads/step-over-exec-execd.c:28
28 foo ();
(gdb) k
...
Thread 1 "gdb" received signal SIGSEGV, Segmentation fault.
0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
393 return m_suspend.waitstatus_pending_p;
(top-gdb) bt
#0 0x000055555574444c in thread_info::has_pending_waitstatus (this=0x0) at ../../src/gdb/gdbthread.h:393
#1 0x0000555555a884d1 in get_pending_child_status (lp=0x5555579b8230, ws=0x7fffffffd130) at ../../src/gdb/linux-nat.c:1345
#2 0x0000555555a8e5e6 in kill_unfollowed_child_callback (lp=0x5555579b8230) at ../../src/gdb/linux-nat.c:3564
#3 0x0000555555a92a26 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::operator()(gdb::fv_detail::erased_callable, lwp_info*) const (this=0x0, ecall=..., args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:284
#4 0x0000555555a92a51 in gdb::function_view<int (lwp_info*)>::bind<int, lwp_info*>(int (*)(lwp_info*))::{lambda(gdb::fv_detail::erased_callable, lwp_info*)#1}::_FUN(gdb::fv_detail::erased_callable, lwp_info*) () at ../../src/gdb/../gdbsupport/function-view.h:278
#5 0x0000555555a91f84 in gdb::function_view<int (lwp_info*)>::operator()(lwp_info*) const (this=0x7fffffffd210, args#0=0x5555579b8230) at ../../src/gdb/../gdbsupport/function-view.h:247
#6 0x0000555555a87072 in iterate_over_lwps(ptid_t, gdb::function_view<int (lwp_info*)>) (filter=..., callback=...) at ../../src/gdb/linux-nat.c:864
#7 0x0000555555a8e732 in linux_nat_target::kill (this=0x55555653af40 <the_amd64_linux_nat_target>) at ../../src/gdb/linux-nat.c:3590
#8 0x0000555555cfdc11 in target_kill () at ../../src/gdb/target.c:911
...
The root of the problem is that when a non-leader LWP execs, it just
changes its tid to the tgid, replacing the pre-exec leader thread,
becoming the new leader. There's no thread exit event for the execing
thread. It's as if the old pre-exec LWP vanishes without trace. The
ptrace man page says:
"PTRACE_O_TRACEEXEC (since Linux 2.5.46)
Stop the tracee at the next execve(2). A waitpid(2) by the
tracer will return a status value such that
status>>8 == (SIGTRAP | (PTRACE_EVENT_EXEC<<8))
If the execing thread is not a thread group leader, the thread
ID is reset to thread group leader's ID before this stop.
Since Linux 3.0, the former thread ID can be retrieved with
PTRACE_GETEVENTMSG."
When the core of GDB processes an exec events, it deletes all the
threads of the inferior. But, that is too late -- deleting the thread
does not delete the corresponding LWP, so we end leaving the pre-exec
non-leader LWP stale in the LWP list. That's what leads to the crash
above -- linux_nat_target::kill iterates over all LWPs, and after the
patch in question, that code will look for the corresponding
thread_info for each LWP. For the pre-exec non-leader LWP still
listed, won't find one.
This patch fixes it, by deleting the pre-exec non-leader LWP (and
thread) from the LWP/thread lists as soon as we get an exec event out
of ptrace.
GDBserver does not need an equivalent fix, because it is already doing
this, as side effect of mourning the pre-exec process, in
gdbserver/linux-low.cc:
else if (event == PTRACE_EVENT_EXEC && cs.report_exec_events)
{
...
/* Delete the execing process and all its threads. */
mourn (proc);
switch_to_thread (nullptr);
The crash with gdb.threads/step-over-exec.exp is not observable on
newer systems, which postdate the glibc change to move "libpthread.so"
internals to "libc.so.6", because right after the exec, GDB traps a
load event for "libc.so.6", which leads to GDB trying to open
libthread_db for the post-exec inferior, and, on such systems that
succeeds. When we load libthread_db, we call
linux_stop_and_wait_all_lwps, which, as the name suggests, stops all
lwps, and then waits to see their stops. While doing this, GDB
detects that the pre-exec stale LWP is gone, and deletes it.
If we use "catch exec" to stop right at the exec before the
"libc.so.6" load event ever happens, and issue "kill" right there,
then GDB crashes on newer systems as well. So instead of tweaking
gdb.threads/step-over-exec.exp to cover the fix, add a new
gdb.threads/threads-after-exec.exp testcase that uses "catch exec".
The test also uses the new "maint info linux-lwps" command if testing
on Linux native, which also exposes the stale LWP problem with an
unfixed GDB.
Also tweak a comment in infrun.c:follow_exec referring to how
linux-nat.c used to behave, as it would become stale otherwise.
Reviewed-By: Andrew Burgess <aburgess@redhat.com>
Change-Id: I21ec18072c7750f3a972160ae6b9e46590376643
|
|
Overview of issues fixed by the patch.
The primary issue this patch fixes is the DWARF register mapping for
Linux. The changes in ppc-linux-tdep.c fix the DWARF register mapping
issues. The register mapping issue is responsible for two of the
five regression bugs seen in gdb.base/store.exp.
Once the register mapping was fixed, an underlying issue with the unwinding
of the signal trampoline in common-code in ifrun.c was found. This
underlying bug is best described by Ulrich in the following description.
The unwinder bug shows up on platforms where the kernel uses a trampoline
to dispatch "calls to" the signal handler (not just *returns from* the
signal handler). Many platforms use a trampoline for signal return, and
that is working fine, but the only platform I'm (Ulrich) aware of that
uses a trampoline for signal handler calls is (recent kernels for)
PowerPC. I believe the rationale for using a trampoline here
is to improve performance by avoiding unbalancing of the
branch predictor's call/return stack.
However, on PowerPC the bug is dormant as well as it is hidden
by *another* bug that prevents correct unwinding out of the
signal trampoline. This is because the custom CFI for the
trampoline uses a register number (VSCR) that is not ever used
by compiler-generated CFI, and that particular register is
mapped to an invalid number by the current PowerPC DWARF mapper.
The underlying unwinder bug is exposed by the "new" regression failures
in gdb.base/sigstep.exp. These failures were previously masked by
the fact that GDB was not seeing a valid frame when it tried to unwind
the frames. The sigstep.exp test is specifically testing stepping into
a signal handler. With the correct DWARF register mapping in place,
specifically the VSCR mapping, the signal trampoline code now unwinds to a
valid frame exposing the pre-existing bug in how the signal handler on
PowerPC works. The one line change infrun.c fixes the exiting bug in
the common-code for platforms that use a trampoline to dispatch calls
to the signal handler by not stopping in the SIGTRAMP_FRAME.
Detailed description of the DWARF register mapping fix.
The PowerPC DWARF register mapping is the same for the .eh_frame and
.debug_frame on Linux. PowerPC uses different mapping for .eh_frame and
.debug_frame on other operating systems. The current GDB support for
mapping the DWARF registers in rs6000_linux_dwarf2_reg_to_regnum and
rs6000_adjust_frame_regnum file gdb/rs6000-tdep.c is not correct for Linux.
The files have some legacy mappings for spe_acc, spefscr, EV which was
removed from GCC in 2017.
This patch adds a two new functions rs6000_linux_dwarf2_reg_to_regnum,
and rs6000_linux_adjust_frame_regnum in file gdb/ppc-linux-tdep.c to handle
the DWARF register mappings on Linux. Function
rs6000_linux_dwarf2_reg_to_regnum is installed for both gdb_dwarf_to_regnum
and gdbarch_stab_reg_to_regnum since the mappings are the same.
The ppc_linux_init_abi function in gdb/ppc-linux-tdep.c is updated to
call set_gdbarch_dwarf2_reg_to_regnum map the new function
rs6000_linux_dwarf2_reg_to_regnum for the architecture. Similarly,
dwarf2_frame_set_adjust_regnum is called to map
rs6000_linux_adjust_frame_regnum into the architecture.
Additional detail on the signal handling fix.
The specific sequence of events for handling a signal on most
architectures is as follows:
1) Some code is running when a signal arrives.
2) The kernel handles the signal and dispatches to the handler.
...
However on PowerPC the sequence of events is:
1) Some code is running when a signal arrives.
2) The kernel handles the signal and dispatches to the trampoline.
3) The trampoline performs a normal function call to the handler.
...
We want the "nexti" to step into, not over, signal handlers invoked by
the kernel. This is the case for most platforms as the kernel puts a
signal trampoline frame onto the stack to handle proper return after the
handler. However, on some platforms such as PowerPC, the kernel actually
uses a trampoline to handle *invocation* of the handler. We do not
want GDB to stop in the SIGTRAMP_FRAME. The issue is fixed in function
process_event_stop_test by adding a check that the frame is not a
SIGTRAMP_FRAME to the if statement to stop in a subroutine call. This
prevents GDB from erroneously detecting the trampoline invocation as a
subroutine call.
This patch fixes two regression test failures in gdb.base/store.exp.
The patch then fixes an exposed, dormant, signal handling issue that
is exposed in the signal handling test gdb.base/sigstep.exp.
The patch has been tested on Power 8 LE/BE, Power 9 LE/BE, Power 10 with
no new regressions. Note, only two of the five failures in store.exp
are fixed. The remaining three failures are fixed in a following
patch.
|
|
This function is just a wrapper around the current inferior's gdbarch.
I find that having that wrapper just obscures where the arch is coming
from, and that it's often used as "I don't know which arch to use so
I'll use this magical target_gdbarch function that gets me an arch" when
the arch should in fact come from something in the context (a thread,
objfile, symbol, etc). I think that removing it and inlining
`current_inferior ()->arch ()` everywhere will make it a bit clearer
where that arch comes from and will trigger people into reflecting
whether this is the right place to get the arch or not.
Change-Id: I79f14b4e4934c88f91ca3a3155f5fc3ea2fadf6b
Reviewed-By: John Baldwin <jhb@FreeBSD.org>
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
Make the inferior's gdbarch field private, and add getters and setters.
This helped me by allowing putting breakpoints on set_arch to know when
the inferior's arch was set. A subsequent patch in this series also
adds more things in set_arch.
Change-Id: I0005bd1ef4cd6b612af501201cec44e457998eec
Reviewed-By: John Baldwin <jhb@FreeBSD.org>
Approved-By: Andrew Burgess <aburgess@redhat.com>
|
|
After the previous commit, exit_inferior_1 no longer makes use of the
silent parameter. This commit removes this parameter and cleans up
the callers.
After doing this exit_inferior_1, exit_inferior, and
exit_inferior_silent are all equivalent, so rename exit_inferior_1 to
exit_inferior and delete exit_inferior_silent, update all the callers.
Also I spotted the declaration exit_inferior_num_silent in inferior.h,
but this function is not defined anywhere, so I deleted the
declaration.
There should be no user visible changes after this commit.
|
|
It was pointed out on the mailing list[1] that after this commit:
commit b1e0126ec56e099d753c20e91a9f8623aabd6b46
Date: Wed Jun 21 14:18:54 2023 +0100
gdb: don't resume vfork parent while child is still running
the test gdb.base/vfork-follow-parent.exp now has some failures when
run with the native-gdbserver or native-extended-gdbserver boards:
FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to end of inferior 2 (timeout)
FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: inferior 1 (timeout)
FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: print unblock_parent = 1 (timeout)
FAIL: gdb.base/vfork-follow-parent.exp: resolution_method=schedule-multiple: continue to break_parent (timeout)
The reason that these failures don't show up when run on the standard
unix board is that the test is only run in the default operating mode,
so for Linux this will be all-stop on top of non-stop.
If we adjust the test script so that it runs in the default mode and
with target-non-stop turned off, then we see the same failures on the
unix board. This commit includes this change.
The way that the test is written means that it is not (currently)
possible to turn on non-stop mode and have the test still work, so
this commit does not do that.
I have also updated the test script so that the vfork child performs
an exec as well as the current exit. Exec and exit are the two ways
in which a vfork child can release the vfork parent, so testing both
of these cases is useful I think.
In this test the inferior performs a vfork and the vfork-child
immediately exits. The vfork-parent will wait for the vfork-child and
then blocks waiting for gdb. Once gdb has released the vfork-parent,
the vfork-parent also exits.
In the test that fails, GDB sets 'detach-on-fork off' and then runs to
the vfork. At this point the test tries to just "continue", but this
fails as the vfork-parent is still selected, and the parent can't
continue until the vfork-child completes. As the vfork-child is
stopped by GDB the parent will never stop once resumed, so GDB refuses
to resume it.
The test script then sets 'schedule-multiple on' and once again
continues. This time GDB, in theory, resumes both the parent and the
child, the parent will be held blocked by the kernel, but the child
will run until it exits, and which point GDB stops again, this time
with inferior 2, the newly exited vfork-child, selected.
What happens after this in the test script is irrelevant as far as
this failure is concerned.
To understand why the test started failing we should consider the
behaviour of four different cases:
1. All-stop-on-non-stop before commit b1e0126ec56e,
2. All-stop-on-non-stop after commit b1e0126ec56e,
3. All-stop-on-all-stop before commit b1e0126ec56e, and
4. All-stop-on-all-stop after commit b1e0126ec56e.
Only case #4 is failing after commit b1e0126ec56e, but I think the
other cases are interesting because, (a) they inform how we might fix
the regression, and (b) it turns out the behaviour of #2 changed too
with the commit, but the change was harmless.
For #1 All-stop-on-non-stop before commit b1e0126ec56e, what happens
is:
1. GDB calls proceed with the vfork-parent selected, as schedule
multiple is on user_visible_resume_ptid returns -1 (everything)
as the resume_ptid (see proceed function),
2. As this is all-stop-on-non-stop, every thread is resumed
individually, so GDB tries to resume both the vfork-parent and the
vfork-child, both of which succeed,
3. The vfork-parent is held stopped by the kernel,
4. The vfork-child completes (exits) at which point the GDB sees the
EXITED event for the vfork-child and the VFORK_DONE event for the
vfork-parent,
5. At this point we might take two paths depending on which event
GDB handles first, if GDB handles the VFORK_DONE first then:
(a) As GDB is controlling both parent and child the VFORK_DONE is
ignored (see handle_vfork_done), the vfork-parent will be
resumed,
(b) GDB processes the EXITED event, selects the (now defunct)
vfork-child, and stops, returning control to the user.
Alternatively, if GDB selects the EXITED event first then:
(c) GDB processes the EXITED event, selects the (now defunct)
vfork-child, and stops, returning control to the user.
(d) At some future time the user resumes the vfork-parent, at
which point the VFORK_DONE is reported to GDB, however, GDB
is ignoring the VFORK_DONE (see handle_vfork_done), so the
parent is resumed.
For case #2, all-stop-on-non-stop after commit b1e0126ec56e, the
important difference is in step (2) above, now, instead of resuming
both the vfork-parent and the vfork-child, only the vfork-child is
resumed. As such, when we get to step (5), only a single event, the
EXITED event is reported.
GDB handles the EXITED just as in (5)(c), then, later, when the user
resumes the vfork-parent, the VFORKED_DONE is immediately delivered
from the kernel, but this is ignored just as in (5)(d), and so,
though the pattern of when the vfork-parent is resumed changes, the
overall pattern of which events are reported and when, doesn't
actually change. In fact, by not resuming the vfork-parent, the order
of events (in this test) is now deterministic, which (maybe?) is a
good thing.
If we now consider case #3, all-stop-on-all-stop before commit
b1e0126ec56e, then what happens is:
1. GDB calls proceed with the vfork-parent selected, as schedule
multiple is on user_visible_resume_ptid returns -1 (everything)
as the resume_ptid (see proceed function),
2. As this is all-stop-on-all-stop, the resume is passed down to the
linux-nat target, the vfork-parent is the event thread, while the
vfork-child is a sibling of the event thread,
3. In linux_nat_target::resume, GDB calls linux_nat_resume_callback
for all threads, this causes the vfork-child to be resumed. Then
in linux_nat_target::resume, the event thread, the vfork-parent,
is also resumed.
4. The vfork-parent is held stopped by the kernel,
5. The vfork-child completes (exits) at which point the GDB sees the
EXITED event for the vfork-child and the VFORK_DONE event for the
vfork-parent,
6. We are now in a situation identical to step (5) as for
all-stop-on-non-stop above, GDB selects one of the events to
handle, and whichever we select the user sees the correct
behaviour.
And so, finally, we can consider #4, all-stop-on-all-stop after commit
b1e0126ec56e, this is the case that started failing.
We start out just like above, in proceed, the resume_ptid is
-1 (resume everything), due to schedule multiple being on. And just
like above, due to the target being all-stop, we call
proceed_resume_thread_checked just once, for the current thread,
which, remember, is the vfork-parent thread.
The change in commit b1e0126ec56e was to avoid resuming a vfork-parent
thread, read the commit message for the justification for this change.
However, this means that GDB now rejects resuming the vfork-parent in
this case, which means that nothing gets resumed! Obviously, if
nothing resumes, then nothing will ever stop, and so GDB appears to
hang.
I considered a couple of solutions which, in the end, I didn't go
with, these were:
1. Move the vfork-parent check out of proceed_resume_thread_checked,
and place it in proceed, but only on the all-stop-on-non-stop
path, this should still address the issue seen in b1e0126ec56e,
but would avoid the issue seen here. I rejected this just
because it didn't feel great to split the checks that exist in
proceed_resume_thread_checked like this,
2. Extend the condition in proceed_resume_thread_checked by adding a
target_is_non_stop_p check. This would have the same effect as
idea 1, but leaves all the checks in the same place, which I
think would be better, but this still just didn't feel right to
me, and so,
What I noticed was that for the all-stop-on-non-stop, after commit
b1e0126ec56e, we only resumed the vfork-child, and this seems fine.
The vfork-parent isn't going to run anyway (the kernel will hold it
back), so if feels like we there's no harm in just waiting for the
child to complete, and then resuming the parent.
So then I started looking at follow_fork, which is called from the top
of proceed. This function already has the task of switching between
the parent and child based on which the user wishes to follow. So, I
wondered, could we use this to switch to the vfork-child in the case
that we are attached to both?
Turns out this is pretty simple to do.
Having done that, now the process is for all-stop-on-all-stop after
commit b1e0126ec56e, and with this new fix is:
1. GDB calls proceed with the vfork-parent selected, but,
2. In follow_fork, and follow_fork_inferior, GDB switches the
selected thread to be that of the vfork-child,
3. Back in proceed user_visible_resume_ptid returns -1 (everything)
as the resume_ptid still, but now,
4. When GDB calls proceed_resume_thread_checked, the vfork-child is
the current selected thread, this is not a vfork-parent, and so
GDB allows the proceed to continue to the linux-nat target,
5. In linux_nat_target::resume, GDB calls linux_nat_resume_callback
for all threads, this does not resume the vfork-parent (because
it is a vfork-parent), and then the vfork-child is resumed as
this is the event thread,
At this point we are back in the same situation as for
all-stop-on-non-stop after commit b1e0126ec56e, that is, the
vfork-child is resumed, while the vfork-parent is held stopped by
GDB.
Eventually the vfork-child will exit or exec, at which point the
vfork-parent will be resumed.
[1] https://inbox.sourceware.org/gdb-patches/3e1e1db0-13d9-dd32-b4bb-051149ae6e76@simark.ca/
|
|
This commit replaces this earlier commit:
commit 2e411b8c68eb2b035b31d5b00d940d4be1a0928b
Date: Fri Oct 14 14:53:15 2022 +0100
gdb: don't always print breakpoint location after failed condition check
and is a result of feedback received here[1].
The original commit addressed a problem where, if a breakpoint
condition included an inferior function call, and if the inferior
function call failed, then GDB would announce the stop twice. Here's
an example of GDB's output before the above commit that shows the
problem being addressed:
(gdb) break foo if (some_func ())
Breakpoint 1 at 0x40111e: file bpcond.c, line 11.
(gdb) r
Starting program: /tmp/bpcond
Program received signal SIGSEGV, Segmentation fault.
0x0000000000401116 in some_func () at bpcond.c:5
5 return *p;
Error in testing condition for breakpoint 1:
The program being debugged stopped while in a function called from GDB.
Evaluation of the expression containing the function
(some_func) will be abandoned.
When the function is done executing, GDB will silently stop.
Breakpoint 1, 0x0000000000401116 in some_func () at bpcond.c:5
5 return *p;
(gdb)
The original commit addressed this issue in breakpoint.c, by spotting
that the $pc had changed while evaluating the breakpoint condition,
and inferring from this that GDB must have stopped elsewhere.
However, the way in which the original commit suppressed the second
stop announcement was to set bpstat::print to true -- this tells GDB
not to print the frame during the stop announcement, and for the CLI
this is fine, however, it was pointed out that for the MI this still
isn't really enough. Below is an example from an MI session after the
above commit was applied, this shows the problem with the above
commit:
-break-insert -c "cond_fail()" foo
^done,bkpt={number="1",type="breakpoint",disp="keep",enabled="y",addr="0x000000000040111e",func="foo",file="/tmp/mi-condbreak-fail.c",line="30",thread-groups=["i1"],cond="cond_fail()",times="0",original-location="foo"}
(gdb)
-exec-run
=thread-group-started,id="i1",pid="2636270"
=thread-created,id="1",group-id="i1"
=library-loaded,id="/lib64/ld-linux-x86-64.so.2",target-name="/lib64/ld-linux-x86-64.so.2",host-name="/lib64/ld-linux-x86-64.so.2",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7fd3110",to="0x00007ffff7ff2bb4"}]
^running
*running,thread-id="all"
(gdb)
=library-loaded,id="/lib64/libm.so.6",target-name="/lib64/libm.so.6",host-name="/lib64/libm.so.6",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7e59390",to="0x00007ffff7ef4f98"}]
=library-loaded,id="/lib64/libc.so.6",target-name="/lib64/libc.so.6",host-name="/lib64/libc.so.6",symbols-loaded="0",thread-group="i1",ranges=[{from="0x00007ffff7ca66b0",to="0x00007ffff7df3c5f"}]
~"\nProgram"
~" received signal SIGSEGV, Segmentation fault.\n"
~"0x0000000000401116 in cond_fail () at /tmp/mi-condbreak-fail.c:24\n"
~"24\t return *p;\t\t\t/* Crash here. */\n"
*stopped,reason="signal-received",signal-name="SIGSEGV",signal-meaning="Segmentation fault",frame={addr="0x0000000000401116",func="cond_fail",args=[],file="/tmp/mi-condbreak-fail.c",fullname="/tmp/mi-condbreak-fail.c",line="24",arch="i386:x86-64"},thread-id="1",stopped-threads="all",core="9"
&"Error in testing condition for breakpoint 1:\n"
&"The program being debugged was signaled while in a function called from GDB.\n"
&"GDB remains in the frame where the signal was received.\n"
&"To change this behavior use \"set unwindonsignal on\".\n"
&"Evaluation of the expression containing the function\n"
&"(cond_fail) will be abandoned.\n"
&"When the function is done executing, GDB will silently stop.\n"
=breakpoint-modified,bkpt={number="1",type="breakpoint",disp="keep",enabled="y",addr="0x000000000040111e",func="foo",file="/tmp/mi-condbreak-fail.c",fullname="/tmp/mi-condbreak-fail.c",line="30",thread-groups=["i1"],cond="cond_fail()",times="1",original-location="foo"}
*stopped
(gdb)
Notice that we still see two '*stopped' lines, the first includes the
full frame information, while the second has no frame information,
this is a result of bpstat::print having been set. Ideally, the
second '*stopped' line should not be present.
By setting bpstat::print I was addressing the problem too late, this
flag really only changes how interp::on_normal_stop prints the stop
event, and interp::on_normal_stop is called (indirectly) from the
normal_stop function in infrun.c. A better solution is to avoid
calling normal_stop at all for the stops which should not be reported
to the user, and this is what I do in this commit.
This commit has 3 parts:
1. In breakpoint.c, revert the above commit,
2. In fetch_inferior_event (infrun.c), capture the stop-id before
calling handle_inferior_event. If, after calling
handle_inferior_event, the stop-id has changed, then this indicates
that somewhere within handle_inferior_event, a stop was announced to
the user. If this is the case then GDB should not call normal_stop,
and we should rely on whoever announced the stop to ensure that we
are in a PROMPT_NEEDED state, which means the prompt will be
displayed once fetch_inferior_event returns. And,
3. In infcall.c, do two things:
(a) In run_inferior_call, after making the inferior call, ensure
that either async_disable_stdin or async_enable_stdin is called
to put the prompt state, and stdin handling into the correct
state based on whether the inferior call completed successfully
or not, and
(b) In call_thread_fsm::should_stop, call async_enable_stdin
rather than changing the prompt state directly. This isn't
strictly necessary, but helped me understand this code more.
This async_enable_stdin call is only reached if normal_stop is
not going to be called, and replaces the async_enable_stdin call
that exists in normal_stop. Though we could just adjust the
prompt state if felt (to me) much easier to understand when I
could see this call and the corresponding call in normal_stop.
With these changes in place now, when the inferior call (from the
breakpoint condition) fails, infcall.c leaves the prompt state as
PROMPT_NEEDED, and leaves stdin registered with the event loop.
Back in fetch_inferior_event GDB notices that the stop-id has changed
and so avoids calling normal_stop.
And on return from fetch_inferior_event GDB will display the prompt
and handle input from stdin.
As normal_stop is not called the MI problem is solved, and the test
added in the earlier mentioned commit still passes just fine, so the
CLI has not regressed.
[1] https://inbox.sourceware.org/gdb-patches/6fd4aa13-6003-2563-5841-e80d5a55d59e@palves.net/
|
|
While working on some of the recent patches relating to vfork handling
I found this debug output very helpful, I'd like to propose adding
this into GDB.
With debug turned off there should be no user visible changes after
this commit.
|
|
Like the last few commit, this fixes yet another vfork related issue.
Like the commit titled:
gdb: don't restart vfork parent while waiting for child to finish
which addressed a case in linux-nat where we would try to resume a
vfork parent, this commit addresses a very similar case, but this time
occurring in infrun.c. Just like with that previous commit, this bug
results in the assert:
x86-linux-dregs.c:146: internal-error: x86_linux_update_debug_registers: Assertion `lwp_is_stopped (lwp)' failed.
In this case the issue occurs when target-non-stop is on, but non-stop
is off, and again, schedule-multiple is on. As with the previous
commit, GDB is in follow-fork-mode child. If you have not done so, it
is worth reading the earlier commit as many of the problems leading to
the failure are the same, they just appear in a different part of GDB.
Here are the steps leading to the assertion failure:
1. The user performs a 'next' over a vfork, GDB stop in the vfork
child,
2. As we are planning to follow the child GDB sets the vfork_parent
and vfork_child member variables in the two inferiors, the
thread_waiting_for_vfork_done member is left as nullptr, that member
is only used when GDB is planning to follow the parent inferior,
3. The user does 'continue', our expectation is that the vfork child
should resume, and once that process has exited or execd, GDB should
detach from the vfork parent. As a result of the 'continue' GDB
eventually enters the proceed function,
4. In proceed we selected a ptid_t to resume, because
schedule-multiple is on we select minus_one_ptid (see
user_visible_resume_ptid),
5. As GDB is running in all-stop on top of non-stop mode, in the
proceed function we iterate over all threads that match the resume
ptid, which turns out to be all threads, and call
proceed_resume_thread_checked. One of the threads we iterate over
is the vfork parent thread,
6. As the thread passed to proceed_resume_thread_checked doesn't
match any of the early return conditions, GDB will set the thread
resumed,
7. As we are resuming one thread at a time, this thread is seen by
the lower layers (e.g. linux-nat) as the "event thread", which means
we don't apply any of the checks, e.g. is this thread a
vfork parent, instead we assume that GDB core knows what it's doing,
and linux-nat will resume the thread, we have now incorrectly set
running the vfork parent thread when this thread should be waiting
for the vfork child to complete,
8. Back in the proceed function GDB continues to iterate over all
threads, and now (correctly) resumes the vfork child thread,
8. As the vfork child is still alive the kernel holds the vfork
parent stopped,
9. Eventually the child performs its exec and GDB is sent and EXECD
event. However, because the parent is resumed, as soon as the child
performs its exec the vfork parent also sends a VFORK_DONE event to
GDB,
10. Depending on timing both of these events might seem to arrive in
GDB at the same time. Normally GDB expects to see the EXECD or
EXITED/SIGNALED event from the vfork child before getting the
VFORK_DONE in the parent. We know this because it is as a result of
the EXECD/EXITED/SIGNALED that GDB detaches from the parent (see
handle_vfork_child_exec_or_exit for details). Further the comment
in target/waitstatus.h on TARGET_WAITKIND_VFORK_DONE indicates that
when we remain attached to the child (not the parent) we should not
expect to see a VFORK_DONE,
11. If both events arrive at the same time then GDB will randomly
choose one event to handle first, in some cases this will be the
VFORK_DONE. As described above, upon seeing a VFORK_DONE GDB
expects that (a) the vfork child has finished, however, in this case
this is not completely true, the child has finished, but GDB has not
processed the event associated with the completion yet, and (b) upon
seeing a VFORK_DONE GDB assumes we are remaining attached to the
parent, and so resumes the parent process,
12. GDB now handles the EXECD event. In our case we are detaching
from the parent, so GDB calls target_detach (see
handle_vfork_child_exec_or_exit),
13. While this has been going on the vfork parent is executing, and
might even exit,
14. In linux_nat_target::detach the first thing we do is stop all
threads in the process we're detaching from, the result of the stop
request will be cached on the lwp_info object,
15. In our case the vfork parent has exited though, so when GDB
waits for the thread, instead of a stop due to signal, we instead
get a thread exited status,
16. Later in the detach process we try to resume the threads just
prior to making the ptrace call to actually detach (see
detach_one_lwp), as part of the process to resume a thread we try to
touch some registers within the thread, and before doing this GDB
asserts that the thread is stopped,
17. An exited thread is not classified as stopped, and so the assert
triggers!
Just like with the earlier commit, the fix is to spot the vfork parent
status of the thread, and not resume such threads. Where the earlier
commit fixed this in linux-nat, in this case I think the fix should
live in infrun.c, in proceed_resume_thread_checked. This function
already has a similar check to not resume the vfork parent in the case
where we are planning to follow the vfork parent, I propose adding a
similar case that checks for the vfork parent when we plan to follow
the vfork child.
This new check will mean that at step #6 above GDB doesn't try to
resume the vfork parent thread, which prevents the VFORK_DONE from
ever arriving. Once GDB sees the EXECD/EXITED/SIGNALLED event from
the vfork child GDB will detach from the parent.
There's no test included in this commit. In a subsequent commit I
will expand gdb.base/foll-vfork.exp which is when this bug would be
exposed.
If you do want to reproduce this failure then you will for certainly
need to run the gdb.base/foll-vfork.exp test in a loop as the failures
are all very timing sensitive. I've found that running multiple
copies in parallel makes the failure more likely to appear, I usually
run ~6 copies in parallel and expect to see a failure after within
10mins.
|
|
Split the thread resuming code from proceed into new function
proceed_resume_thread_checked.
Co-Authored-By: Christina Schimpe <christina.schimpe@intel.com>
|