Age | Commit message (Collapse) | Author | Files | Lines |
|
This changes address_space to use new and delete, and makes some other
small C++-ification changes as well, like changing address_space_num
to be a method.
This patch was needed for the subsequent patch to rewrite the registry
system.
|
|
This changes the parameter of target_ops::async from int to bool.
Regression tested on x86-64 Fedora 34.
|
|
In all-stop mode, when the target is itself in non-stop mode (like
GNU/Linux), if you use the "step N" (or "stepi/next/nexti N") to step
a thread a number of times:
(gdb) help step
step, s
Step program until it reaches a different source line.
Usage: step [N]
Argument N means step N times (or till program stops for another reason).
... GDB prematurely stops all threads after the first step, and
doesn't re-resume them for the subsequent N-1 steps. It's as if for
the 2nd and subsequent steps, the command was running with
scheduler-locking enabled.
This can be observed with the testcase added by this commit, which
looks like this:
static pthread_barrier_t barrier;
static void *
thread_func (void *arg)
{
pthread_barrier_wait (&barrier);
return NULL;
}
int
main ()
{
pthread_t thread;
int ret;
pthread_barrier_init (&barrier, NULL, 2);
/* We run to this line below, and then issue "next 3". That should
step over the 3 lines below and land on the return statement. If
GDB prematurely stops the thread_func thread after the first of
the 3 nexts (and never resumes it again), then the join won't
ever return. */
pthread_create (&thread, NULL, thread_func, NULL); /* set break here */
pthread_barrier_wait (&barrier);
pthread_join (thread, NULL);
return 0;
}
The test hangs and times out without the GDB fix:
(gdb) next 3
[New Thread 0x7ffff7d89700 (LWP 525772)]
FAIL: gdb.threads/step-N-all-progress.exp: non-stop=off: target-non-stop=on: next 3 (timeout)
The problem is a core gdb bug.
When you do "step/stepi/next/nexti N", GDB internally creates a
thread_fsm object and associates it with the stepping thread. For the
stepping commands, the FSM's class is step_command_fsm. That object
is what keeps track of how many steps are left to make. When one step
finishes, handle_inferior_event calls stop_waiting and returns, and
then fetch_inferior_event calls the "should_stop" method of the event
thread's FSM. The implementation of that method decrements the
steps-left counter. If the counter is 0, it returns true and we
proceed to presenting the stop to the user. If it isn't 0 yet, then
the method returns false, indicating to fetch_inferior_event to "keep
going".
Focusing now on when the first step finishes -- we're in "all-stop"
mode, with the target in non-stop mode. When a step finishes,
handle_inferior_event calls stop_waiting, which itself calls
stop_all_threads to stop everything. I.e., after the first step
completes, all threads are stopped, before handle_inferior_event
returns. And after that, now in fetch_inferior_event, we consult the
thread's thread_fsm::should_stop, which as we've seen, for the first
step returns false -- i.e., we need to keep_going for another step.
However, since the target is in non-stop mode, keep_going resumes
_only_ the current thread. All the other threads remain stopped,
inadvertently.
If the target is in non-stop mode, we don't actually need to stop all
threads right after each step finishes, and then re-resume them again.
We can instead defer stopping all threads until all the steps are
completed.
So fix this by delaying the stopping of all threads until after we
called the FSM's "should_stop" method. I.e., move it from
stop_waiting, to handle_inferior_events's callers,
fetch_inferior_event and wait_for_inferior.
New test included. Tested on x86-64 GNU/Linux native and gdbserver.
Change-Id: Iaad50dcfea4464c84bdbac853a89df92ade6ae01
|
|
This patch removes ui_register_input_event_handler and
ui_unregister_input_event_handler, replacing them with methods on
'ui'. It also changes gdb to use these methods everywhere, rather
than sometimes reaching in to the ui to manage the file descriptor
directly.
|
|
While working on this patch:
https://sourceware.org/pipermail/gdb-patches/2022-January/185109.html
I found it really useful to print the executing/resumed status of all
threads (or all threads in a particular inferior) at various
places (e.g. when a new inferior is started, when GDB attaches, etc).
This debug was originally part of the above patch, but I wanted to
rewrite this as a separate patch and move the code into a new function
in infrun.h, which is what this patch does.
Unless 'set debug infrun on' is in effect, then there should be no
user visible changes after this commit.
|
|
Commit 152a1749566 ("gdb: prune inferiors at end of
fetch_inferior_event, fix intermittent failure of
gdb.threads/fork-plus-threads.exp") introduced some follow-fork-related
test failures, such as:
info inferiors^M
Num Description Connection Executable ^M
* 1 process 634972 1 (native) /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork ^M
2 process 634975 1 (native) /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork ^M
(gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: info inferiors
inferior 2^M
[Switching to inferior 2 [process 634975] (/home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork)]^M
[Switching to thread 2.1 (Thread 0x7ffff7c9a740 (LWP 634975))]^M
#0 0x00007ffff7d7abf7 in _Fork () from /usr/lib/libc.so.6^M
(gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: inferior 2
continue^M
Continuing.^M
[Inferior 2 (process 634975) exited normally]^M
[Switching to Thread 0x7ffff7c9a740 (LWP 634972)]^M
(gdb) PASS: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: continue until exit at continue unfollowed inferior to end
break callee^M
Breakpoint 2 at 0x555555555160: file /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c, line 9.^M
(gdb) FAIL: gdb.base/foll-fork.exp: follow-fork-mode=parent: detach-on-fork=off: cmd=next 2: test_follow_fork: break callee
What happens here is:
- inferior 2 is selected
- we continue, leading to inferior 2's exit
- we set breakpoint, expect 2 locations, but only one location is
resolved
Reading between the lines, we understand that inferior 2 got pruned,
when it shouldn't have been.
The issue can be reproduced by hand with:
$ ./gdb -q --data-directory=data-directory testsuite/outputs/gdb.base/foll-fork/foll-fork -ex "set detach-on-fork off" -ex start -ex "next 2" -ex "inferior 2" -ex "set debug infrun"
...
Temporary breakpoint 1, main () at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c:14
14 int v = 5;
[New inferior 2 (process 637627)]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/../lib/libthread_db.so.1".
17 if (pid == 0) /* set breakpoint here */
[Switching to inferior 2 [process 637627] (/home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork)]
[Switching to thread 2.1 (Thread 0x7ffff7c9a740 (LWP 637627))]
#0 0x00007ffff7d7abf7 in _Fork () from /usr/lib/libc.so.6
(gdb) continue
Continuing.
[infrun] clear_proceed_status_thread: 637627.637627.0
[infrun] proceed: enter
[infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
[infrun] scoped_disable_commit_resumed: reason=proceeding
[infrun] start_step_over: enter
[infrun] start_step_over: stealing global queue of threads to step, length = 0
[infrun] operator(): step-over queue now empty
[infrun] start_step_over: exit
[infrun] proceed: start: resuming threads, all-stop-on-top-of-non-stop
[infrun] proceed: resuming 637627.637627.0
[infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [637627.637627.0] at 0x7ffff7d7abf7
[infrun] do_target_resume: resume_ptid=637627.637627.0, step=0, sig=GDB_SIGNAL_0
[infrun] infrun_async: enable=1
[infrun] prepare_to_wait: prepare_to_wait
[infrun] proceed: end: resuming threads, all-stop-on-top-of-non-stop
[infrun] reset: reason=proceeding
[infrun] maybe_set_commit_resumed_all_targets: enabling commit-resumed for target native
[infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target native
[infrun] maybe_call_commit_resumed_all_targets: calling commit_resumed for target native
[infrun] proceed: exit
[infrun] fetch_inferior_event: enter
[infrun] scoped_disable_commit_resumed: reason=handling event
[infrun] do_target_wait: Found 2 inferiors, starting at #1
[infrun] random_pending_event_thread: None found.
[infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
[infrun] print_target_wait_results: 637627.637627.0 [process 637627],
[infrun] print_target_wait_results: status->kind = EXITED, exit_status = 0
[infrun] handle_inferior_event: status->kind = EXITED, exit_status = 0
[Inferior 2 (process 637627) exited normally]
[infrun] stop_waiting: stop_waiting
[infrun] stop_all_threads: start: reason=presenting stop to user in all-stop, inf=-1
[infrun] stop_all_threads: pass=0, iterations=0
[infrun] stop_all_threads: 637624.637624.0 not executing
[infrun] stop_all_threads: pass=1, iterations=1
[infrun] stop_all_threads: 637624.637624.0 not executing
[infrun] stop_all_threads: done
[infrun] stop_all_threads: end: reason=presenting stop to user in all-stop, inf=-1
[Switching to Thread 0x7ffff7c9a740 (LWP 637624)]
[infrun] infrun_async: enable=0
[infrun] reset: reason=handling event
[infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target native, no resumed threads
(gdb) [infrun] fetch_inferior_event: exit
(gdb) info inferiors
Num Description Connection Executable
* 1 process 637624 1 (native) /home/simark/build/binutils-gdb-one-target/gdb/testsuite/outputs/gdb.base/foll-fork/foll-fork
(gdb) i th
Id Target Id Frame
* 1 Thread 0x7ffff7c9a740 (LWP 637624) "foll-fork" main () at /home/simark/src/binutils-gdb/gdb/testsuite/gdb.base/foll-fork.c:17
After handling the EXITED event for inferior 2, inferior 2 should have
stayed the current inferior, which should have prevented it from getting
pruned. When debugging, we find that when getting at the
prune_inferiors call, the current inferior is inferior 1. Further
debugging shows that prior to the call to
clean_up_just_stopped_threads_fsms, the current inferior is inferior 2,
and after, it's inferior 1. Then, back in fetch_inferior_event, the
restore_thread object is disabled, due to:
/* If we got a TARGET_WAITKIND_NO_RESUMED event, then the
previously selected thread is gone. We have two
choices - switch to no thread selected, or restore the
previously selected thread (now exited). We chose the
later, just because that's what GDB used to do. After
this, "info threads" says "The current thread <Thread
ID 2> has terminated." instead of "No thread
selected.". */
if (!non_stop
&& cmd_done
&& ecs->ws.kind () != TARGET_WAITKIND_NO_RESUMED)
restore_thread.dont_restore ();
So in the end, inferior 1 stays current, and inferior 2 gets wrongfully
pruned.
I'd say clean_up_just_stopped_threads_fsms is the culprit here. It
actually attempts to restore the event_thread to be current at the end,
after the loop (I presume the current thread on entry is always supposed
to be the event thread). But in this case, the event is of kind EXITED,
and ecs->event_thread is not set, so the current inferior isn't
restored.
Fix that by using scoped_restore_current_thread. If there is no current
thread, scoped_restore_current_thread will still restore the current
inferior, and that's what we want.
Random note: the thread_info object for inferior 2's thread is never
freed. It is held (by refcount) by the restore_thread object in
fetch_inferior_event, while the inferior's thread list gets cleared, in
the exit event processing. When the refcount reaches 0 (when the
restore_thread object is destroyed), there's nothing that actually
deletes the thread_info object. And I think that nothing in GDB points
to it anymore, so it leaks. I don't want to fix that in this patch, but
thought it would be good to mention it, in case somebody has an idea for
how to fix that.
Change-Id: Ibc7df543e2c46aad5f3b9250b28c3fb5912be4e8
|
|
Replace with equivalent method.
Change-Id: I0e033095e7358799930775e61028b48246971a7d
|
|
While working in gdb/infrun.c:restart_threads, I was wondering what are
the preconditions to call the function. It seems to me that
!step_over_info_valid_p should be a precondition (i.e. if we are doing
an inline step over breakpoint, we do not want to resume non stepping
threads as one of them might miss the breakpoint which is temporally
disabled).
To convince myself that this is true, I have added an assertion to
enforce this, and got one regression in the testsuite:
FAIL: gdb.base/step-over-syscall.exp: vfork: displaced=off: single step over vfork (GDB internal error)
This call to restart_threads originates from handle_vfork_done which
does not check if a step over is active when restarting other threads:
if (target_is_non_stop_p ())
{
scoped_restore_current_thread restore_thread;
insert_breakpoints ();
restart_threads (event_thread, event_thread->inf);
start_step_over ();
}
In this patch, I propose to:
- Call start_step_over before restart_threads. If a step over is already
in progress (as it is the case in the failing testcase),
start_step_over return immediately, and there is no point in restarting
all threads just to stop them right away for a step over breakpoint.
- Only call restart_threads if no step over is in progress at this
point.
In this patch, I also propose to keep the assertion in restart_threads
to help enforce this precondition, and state it explicitly.
I have also checked all other places which call restart_threads, and
they all seem to check that there is no step over currently active
before doing the call.
As for infrun-related things, I am never completely sure I did not miss
something. So as usual, all feedback and thoughts are very welcome.
Tested on x86_64-linux-gnu.
Change-Id: If5f5f98ec4cf9aaeaabb5e3aa88ae6ffd70d4f37
|
|
When running:
$ make check TESTS="gdb.multi/multi-re-run.exp" RUNTESTFLAGS="--target_board=native-gdbserver"
I get:
target remote localhost:2347^M
Remote debugging using localhost:2347^M
Reading symbols from /lib64/ld-linux-x86-64.so.2...^M
Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.31.so...^M
0x00007ffff7fd0100 in _start () from /lib64/ld-linux-x86-64.so.2^M
(gdb) continue^M
Continuing.^M
Cannot execute this command while the target is running.^M
Use the "interrupt" command to stop the target^M
and then try again.^M
(gdb) FAIL: gdb.multi/multi-re-run.exp: re_run_inf=1: iter=1: runto: run to all_started
The test does:
- Connect to a remote target with inferior 2, continue and stop on the
all_started function
- Connect to a separate remote target / GDBserver instance with inferior 1,
continue and (expect to) stop on the all_started function
The failure seen above happens when trying to continue inferior 1.
What happens is:
- GDB tells inferior 1's remote target to continue
- We go into fetch_inferior_event, try to get an event at random from
the targets
- do_target_wait happens to pick inferior 2's target
- That target return TARGET_WAITKIND_NO_RESUMED, which makes sense:
inferior 2 is stopped, that target has no resumed thread
- handle_no_resumed tries to update the thread list of all targets:
for (auto *target : all_non_exited_process_targets ())
{
switch_to_target_no_thread (target);
update_thread_list ();
}
- When trying to update the thread list of inferior 1's target, it hits
the "Cannot execute this command while the target is running" error.
This target is working in "remote all-stop" mode, and it is currently
waiting for a stop reply, so it can't send packets to update the
thread list at this time.
To handle the problem described in the comment in handle_no_resumed, I
don't think it is necessary to update the thread list of all targets,
but only the event target. That comment describes a kind of race
condition where some target reports a breakpoint hit for a thread and
then its last remaining resumed thread exits, so sends a "no resumed"
event. If we ended up resuming the thread that hit a breakpoint, we
want to ignore the "no resumed" and carry on.
But I don't really see why we need to update the thread list on the
other targets. I can't really articulate this, it's more a gut feeling,
maybe I just fail to imagine the situation where this is needed. But
here is the patch anyway, so we can discuss it. The patch changes
handle_no_resumed to only update the thread list of the event target.
This fixes the test run shown above.
The way I originally tried to fix this was to make
remote_target::update_thread_list return early if the target is
currently awaiting a stop reply, since there's no way it can update the
thread list at that time. But that felt more like papering over the
problem. I then thought that we probably shouldn't be asking the target
to update the thread list unnecessarily.
Change-Id: Ide3df22b4f556478e155ad1c67ad4b4fe7c26a58
|
|
failure of gdb.threads/fork-plus-threads.exp
This test sometimes fail like this:
info threads^M
Id Target Id Frame ^M
11.12 process 2270719 Couldn't get registers: No such process.^M
(gdb) FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: no threads left
[Inferior 11 (process 2270719) exited normally]^M
info inferiors^M
Num Description Connection Executable ^M
* 1 <null> /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/fork-plus-threads/fork-plus-threads ^M
11 <null> /home/smarchi/build/binutils-gdb/gdb/testsuite/outputs/gdb.threads/fork-plus-threads/fork-plus-threads ^M
(gdb) FAIL: gdb.threads/fork-plus-threads.exp: detach-on-fork=off: only inferior 1 left (the program exited)
I can get it to fail quite reliably by pinning it to a core:
$ taskset -c 5 make check TESTS="gdb.threads/fork-plus-threads.exp"
The previous attempt at fixing this was:
https://sourceware.org/pipermail/gdb-patches/2021-October/182846.html
What we see is part due to a possible unfortunate ordering of events
given by the kernel, and what could be considered a bug in GDB.
The test program makes a number of forks, waits them all, then exits.
Most of the time, GDB will get and process the exit event for inferior 1
after the exit events of all the children. But this is not guaranteed.
After the last child exits and is waited by the parent, the parent can
exit quickly, such that GDB collects from the kernel the exit events for
the parent and that child at the same time. It then chooses one event
at random, which can be the event for the parent. This will result in
the parent appearing to exit before its child. There's not much we can
do about it, so I think we have to adjust the test to cope.
After expect has seen the "exited normally" notification for inferior 1,
it immediately does an "info thread" that it expects to come back empty.
But at this point, GDB might not have processed inferior 11's (the last
child) exit event, so it will look like there is still a thread. Of
course that thread is dead, we just don't know it yet. But that makes
the "no thread" test fail. If the test waited just a bit more for the
"exited normally" notification for inferior 11, then the list of threads
would be empty.
So, first change, make the test collect all the "exited normally"
notifications for all inferiors before proceeding, that should ensure we
see an empty thread list. That would fix the first FAIL above.
However, we would still have the second FAIL, as we expect inferior 11
to not be there, it should have been deleted automatically. Inferior 11
is normally deleted when prune_inferiors is called. That is called by
normal_stop, which is only called by fetch_inferior_event only if the
event thread completed an execution command FSM (thread_fsm). But the
FSM for the continue command completed when inferior 1 exited. At that
point inferior 11 was not prunable, as it still had a thread. When
inferior 11 exits, prune_inferiors is not called.
I think that can be considered a GDB bug. From the user point of view,
there's no reason why in one case inferior 11 would be deleted and not
in the other case.
This patch makes the somewhat naive change to call prune_inferiors in
fetch_inferior_event, so that it is called in this case. It is placed
at this particular point in the function so that it is called after the
user inferior / thread selection is restored. If it was called before
that, inferior 11 wouldn't be pruned, because it would still be the
current inferior.
Change-Id: I48a15d118f30b1c72c528a9f805ed4974170484a
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=26272
|
|
Remove all macros related to getting and setting some symbol value:
#define SYMBOL_VALUE(symbol) (symbol)->value.ivalue
#define SYMBOL_VALUE_ADDRESS(symbol) \
#define SET_SYMBOL_VALUE_ADDRESS(symbol, new_value) \
#define SYMBOL_VALUE_BYTES(symbol) (symbol)->value.bytes
#define SYMBOL_VALUE_COMMON_BLOCK(symbol) (symbol)->value.common_block
#define SYMBOL_BLOCK_VALUE(symbol) (symbol)->value.block
#define SYMBOL_VALUE_CHAIN(symbol) (symbol)->value.chain
#define MSYMBOL_VALUE(symbol) (symbol)->value.ivalue
#define MSYMBOL_VALUE_RAW_ADDRESS(symbol) ((symbol)->value.address + 0)
#define MSYMBOL_VALUE_ADDRESS(objfile, symbol) \
#define BMSYMBOL_VALUE_ADDRESS(symbol) \
#define SET_MSYMBOL_VALUE_ADDRESS(symbol, new_value) \
#define MSYMBOL_VALUE_BYTES(symbol) (symbol)->value.bytes
#define MSYMBOL_BLOCK_VALUE(symbol) (symbol)->value.block
Replace them with equivalent methods on the appropriate objects.
Change-Id: Iafdab3b8eefc6dc2fd895aa955bf64fafc59ed50
|
|
The test introduced by this patch would fail in this configuration, with
the native-gdbserver or native-extended-gdbserver boards:
FAIL: gdb.threads/next-fork-other-thread.exp: fork_func=fork: target-non-stop=auto: non-stop=off: displaced-stepping=auto: i=2: next to for loop
The problem is that the step operation is forgotten when handling the
fork/vfork. With "debug infrun" and "debug remote", it looks like this
(some lines omitted for brevity). We do the next:
[infrun] proceed: enter
[infrun] proceed: addr=0xffffffffffffffff, signal=GDB_SIGNAL_DEFAULT
[infrun] resume_1: step=1, signal=GDB_SIGNAL_0, trap_expected=0, current thread [4154304.4154304.0] at 0x5555555553bf
[infrun] do_target_resume: resume_ptid=4154304.0.0, step=1, sig=GDB_SIGNAL_0
[remote] Sending packet: $vCont;r5555555553bf,5555555553c4:p3f63c0.3f63c0;c:p3f63c0.-1#cd
[infrun] proceed: exit
We then handle a fork event:
[infrun] fetch_inferior_event: enter
[remote] wait: enter
[remote] Packet received: T05fork:p3f63ee.3f63ee;06:0100000000000000;07:b08e59f6ff7f0000;10:bf60e8f7ff7f0000;thread:p3f63c0.3f63c6;core:17;
[remote] wait: exit
[infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
[infrun] print_target_wait_results: 4154304.4154310.0 [Thread 4154304.4154310],
[infrun] print_target_wait_results: status->kind = FORKED, child_ptid = 4154350.4154350.0
[infrun] handle_inferior_event: status->kind = FORKED, child_ptid = 4154350.4154350.0
[remote] Sending packet: $D;3f63ee#4b
[infrun] resume_1: step=0, signal=GDB_SIGNAL_0, trap_expected=0, current thread [4154304.4154310.0] at 0x7ffff7e860bf
[infrun] do_target_resume: resume_ptid=4154304.0.0, step=0, sig=GDB_SIGNAL_0
[remote] Sending packet: $vCont;c:p3f63c0.-1#73
[infrun] fetch_inferior_event: exit
In the first snippet, we resume the stepping thread with the range-stepping (r)
vCont command. But after handling the fork (detaching the fork child), we
resumed the whole process freely. The stepping thread, which was paused by
GDBserver while reporting the fork event, was therefore resumed freely, instead
of confined to the addresses of the stepped line. Note that since this
is a "next", it could be that we have entered a function, installed a
step-resume breakpoint, and it's ok to continue freely the stepping
thread, but that's not the case here. The two snippets shown above were
next to each other in the logs.
For the fork case, we can resume stepping right after handling the
event.
However, for the vfork case, where we are waiting for the
external child process to exec or exit, we only resume the thread that
called vfork, and keep the others stopped (see patch "gdb: fix handling of
vfork by multi-threaded program" prior in this series). So we can't
resume the stepping thread right now. Instead, do it after handling the
vfork-done event.
Change-Id: I92539c970397ce880110e039fe92b87480f816bd
|
|
(follow-fork-mode=parent, detach-on-fork=on)
There is a problem with how GDB handles a vfork happening in a
multi-threaded program. This problem was reported to me by somebody not
using vfork directly, but using system(3) in a multi-threaded program,
which may be implemented using vfork.
This patch only deals about the follow-fork-mode=parent,
detach-on-fork=on case, because it would be too much to chew at once to
fix the bugs in the other cases as well (I tried).
The problem
-----------
When a program vforks, the parent thread is suspended by the kernel
until the child process exits or execs. Specifically, in a
multi-threaded program, only the thread that called vfork is suspended,
other threads keep running freely. This is documented in the vfork(2)
man page ("Caveats" section).
Let's suppose GDB is handling a vfork and the user's desire is to detach
from the child. Before detaching the child, GDB must remove the software
breakpoints inserted in the shared parent/child address space, in case
there's a breakpoint in the path the child is going to take before
exec'ing or exit'ing (unlikely, but possible). Otherwise the child could
hit a breakpoint instruction while running outside the control of GDB,
which would make it crash. GDB must also avoid re-inserting breakpoints
in the parent as long as it didn't receive the "vfork done" event (that
is, when the child has exited or execed): since the address space is
shared with the child, that would re-insert breakpoints in the child
process also. So what GDB does is:
1. Receive "vfork" event for the parent
2. Remove breakpoints from the (shared) address space and set
program_space::breakpoints_not_allowed to avoid re-inserting them
3. Detach from the child thread
4. Resume the parent
5. Wait for and receive "vfork done" event for the parent
6. Clean program_space::breakpoints_not_allowed and re-insert
breakpoints
7. Resume the parent
Resuming the parent at step 4 is necessary in order for the kernel to
report the "vfork done" event. The kernel won't report a ptrace event
for a thread that is ptrace-stopped. But the theory behind this is that
between steps 4 and 5, the parent won't actually do any progress even
though it is ptrace-resumed, because the kernel keeps it suspended,
waiting for the child to exec or exit. So it doesn't matter for that
thread if breakpoints are not inserted.
The problem is when the program is multi-threaded. In step 4, GDB
resumes all threads of the parent. The thread that did the vfork stays
suspended by the kernel, so that's fine. But other threads are running
freely while breakpoints are removed, which is a problem because they
could miss a breakpoint that they should have hit.
The problem is present with all-stop and non-stop targets. The only
difference is that with an all-stop targets, the other threads are
stopped by the target when it reports the vfork event and are resumed by
the target when GDB resumes the parent. With a non-stop target, the
other threads are simply never stopped.
The fix
-------
There many combinations of settings to consider (all-stop/non-stop,
target-non-stop on/off, follow-fork-mode parent/child, detach-on-fork
on/off, schedule-multiple on/off), but for this patch I restrict the
scope to follow-fork-mode=parent, detach-on-fork=on. That's the
"default" case, where we detach the child and keep debugging the
parent. I tried to fix them all, but it's just too much to do at once.
The code paths and behaviors for when we don't detach the child are
completely different.
The guiding principle for this patch is that all threads of the vforking
inferior should be stopped as long as breakpoints are removed. This is
similar to handling in-line step-overs, in a way.
For non-stop targets (the default on Linux native), this is what
happens:
- In follow_fork, we call stop_all_threads to stop all threads of the
inferior
- In follow_fork_inferior, we record the vfork parent thread in
inferior::thread_waiting_for_vfork_done
- Back in handle_inferior_event, we call keep_going, which resumes only
the event thread (this is already the case, with a non-stop target).
This is the thread that will be waiting for vfork-done.
- When we get the vfork-done event, we go in the (new) handle_vfork_done
function to restart the previously stopped threads.
In the same scenario, but with an all-stop target:
- In follow_fork, no need to stop all threads of the inferior, the
target has stopped all threads of all its inferiors before returning
the event.
- In follow_fork_inferior, we record the vfork parent thread in
inferior::thread_waiting_for_vfork_done.
- Back in handle_inferior_event, we also call keep_going. However, we
only want to resume the event thread here, not all inferior threads.
In internal_resume_ptid (called by resume_1), we therefore now check
whether one of the inferiors we are about to resume has
thread_waiting_for_vfork_done set. If so, we only resume that
thread.
Note that when resuming multiple inferiors, one vforking and one not
non-vforking, we could resume the vforking thread from the vforking
inferior plus all threads from the non-vforking inferior. However,
this is not implemented, it would require more work.
- When we get the vfork-done event, the existing call to keep_going
naturally resumes all threads.
Testing-wise, add a test that tries to make the main thread hit a
breakpoint while a secondary thread calls vfork. Without the fix, the
main thread keeps going while breakpoints are removed, resulting in a
missed breakpoint and the program exiting.
Change-Id: I20eb78e17ca91f93c19c2b89a7e12c382ee814a1
|
|
This helped me, it shows which ptid we actually call target_resume with.
Change-Id: I2dfd771e83df8c25f39371a13e3e91dc7882b73d
|
|
A following patch will want to stop all threads of a given inferior (as
opposed to all threads of all inferiors) while handling a vfork, and
restart them after. To help with this, add inferior parameters to
stop_all_threads and restart_threads. This is done as a separate patch
to make sure this doesn't cause regressions on its own, and to keep the
following patches more concise.
No visible changes are expected here, since all calls sites pass
nullptr, which should keep the existing behavior.
Change-Id: I4d9ba886ce842042075b4e346cfa64bbe2580dbf
|
|
inferior::thread_waiting_for_vfork_done
The inferior::waiting_for_vfork_done flag indicates that some thread in
that inferior is waiting for a vfork-done event. Subsequent patches
will need to know which thread precisely is waiting for that event.
Replace the boolean flag (waiting_for_vfork_done) with a thread_info
pointer (thread_waiting_for_vfork_done).
I think there is a latent buglet in that waiting_for_vfork_done is
currently not reset on inferior exec or exit. I could imagine that if a
thread in the parent process calls exec or exit while another thread of
the parent process is waiting for its vfork child to exec or exit, we
could end up with inferior::waiting_for_vfork_done without a thread
actually waiting for a vfork-done event anymore. And since that flag is
checked in resume_1, things could misbehave there.
Since the new field points to a thread_info object, and those are
destroyed on exec or exit, it could be worse now since we could try to
access freed memory, if thread_waiting_for_vfork_done were to point to a
stale thread_info. To avoid this, clear the field in
infrun_inferior_exit and infrun_inferior_execd.
Change-Id: I31b847278613a49ba03fc4915f74d9ceb228fdce
|
|
This patch removes gdb's dbx mode. Regression tested on x86-64 Fedora
34.
|
|
Add a "reason" parameter, only used to show in debug messages what is
the reason for stopping all threads. This helped me understand the
debug logs while adding some new uses of stop_all_threads, so I am
proposing to merge it.
Change-Id: I66c8c335ebf41836a7bc3d5fe1db92c195f65e55
|
|
Now that filtered and unfiltered output can be treated identically, we
can unify the printf family of functions. This is done under the name
"gdb_printf". Most of this patch was written by script.
|
|
Now that filtered and unfiltered output can be treated identically, we
can unify the puts family of functions. This is done under the name
"gdb_puts". Most of this patch was written by script.
|
|
A number of spots call printf_unfiltered only because they are in code
that should not be interrupted by the pager. However, I believe these
cases are all handled by infrun's blanket ban on paging, and so can be
converted to the default (_filtered) API.
After this patch, I think all the remaining _unfiltered calls are ones
that really ought to be. A few -- namely in complete_command -- could
be replaced by a scoped assignment to pagination_enabled, but for the
remainder, the code seems simple enough like this.
|
|
If GDB reports a watchpoint hit, and then the next event is not
TARGET_WAITKIND_STOPPED, but instead some event for which there's a
catchpoint, such that GDB calls bpstat_stop_status, GDB mistakenly
thinks the watchpoint triggered. Vis, using foll-fork.c:
(gdb) awatch v
Hardware access (read/write) watchpoint 2: v
(gdb) catch fork
Catchpoint 3 (fork)
(gdb) c
Continuing.
Hardware access (read/write) watchpoint 2: v
Old value = 0
New value = 5
main () at gdb.base/foll-fork.c:16
16 pid = fork ();
(gdb)
Continuing.
Hardware access (read/write) watchpoint 2: v <<<<
<<<< these lines are spurious
Value = 5 <<<<
Catchpoint 3 (forked process 1712369), arch_fork (ctid=0x7ffff7fa4810) at arch-fork.h:49
49 arch-fork.h: No such file or directory.
(gdb)
The problem is that when we handle the fork event, nothing called
watchpoints_triggered before calling bpstat_stop_status. Thus, each
watchpoint's watchpoint_triggered field was still set to
watch_triggered_yes from the previous (real) watchpoint stop.
watchpoint_triggered is only current called in the handle_signal_stop
path, when handling TARGET_WAITKIND_STOPPED.
This fixes it by adding watchpoint_triggered calls in the other events
paths that call bpstat_stop_status. But instead of adding them
explicitly, it adds a new function bpstat_stop_status_nowatch that
wraps bpstat_stop_status and calls watchpoint_triggered, and then
replaces most calls to bpstat_stop_status with calls to
bpstat_stop_status_nowatch.
This required constifying watchpoints_triggered.
New test included, which fails without the fix.
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=28621
Change-Id: I282b38c2eee428d25319af3bc842f9feafed461c
|
|
A stepi in a function without debug info with "set debug infrun 1"
crashes GDB since commit c8353d682f69 (gdb/infrun: some extra infrun
debug print statements), due to a reference to
"tp->current_symtab->filename" when tp->current_symtab is null.
This commit adds the missing null check. The output in this case
becomes:
[infrun] set_step_info: symtab = <null>, line = 0, step_frame_id = {stack=0x7fffffffd980,code=0x0000000000456c30,!special}, step_stack_frame_id = {stack=0x7fffffffd980,code=0x0000000000456c30,!special}
Change-Id: I5171a9d222bc7e15b9dba2feaba7d55c7acd99f8
|
|
Commit b60cea7 (Make target_wait options use enum flags) broke
deprecated_target_wait_hook usage: there's a commit comment telling
this hook has not been converted.
Rather than trying to mend it, this patch replaces the hook by two
target_wait observers:
target_pre_wait (ptid_t ptid)
target_post_wait (ptid_t event_ptid)
Upon target_wait entry, target_pre_wait is notified with the ptid
passed to target_wait. Upon exit, target_post_wait is notified with
the event ptid returned by target_wait. Should an exception occur,
event_ptid is null_ptid.
This change benefits to Insight (out-of-tree): there's no real use of the
late hook in gdb itself.
|
|
infrun.c:handle_one calls find_inferior_ptid unnecessarily, since we
already have a thread pointer handy, and the thread has a pointer to
the inferior. This commit removes the unnecessary lookup.
Change-Id: I2ae18601dd75346c6c91068e9a4f9a6484fb3339
|
|
No kind of internal var uses it remove it. This makes the transition to
using a variant easier, since we don't need to think about where this
should be called (in a destructor or not), if it can throw, etc.
Change-Id: Iebbc867d1ce6716480450d9790410d6684cbe4dd
|
|
Commit 14b3360508b1 ("do_target_wait_1: Clear
TARGET_WNOHANG if the target isn't async.") broke some multi-target
tests, such as gdb.multi/multi-target-info-inferiors.exp. The symptom
is that execution just hangs at some point. What happens is:
1. One remote inferior is started, and now sits stopped at a breakpoint.
It is not "async" at this point (but it "can async").
2. We run a native inferior, the event loop gets woken up by the native
target's fd.
3. In do_target_wait, we randomly choose an inferior to call target_wait
on first, it happens to be the remote inferior.
4. Because the target is currently not "async", we clear
TARGET_WNOHANG, resulting in synchronous wait. We therefore block
here:
#0 0x00007fe9540dbb4d in select () from /usr/lib/libc.so.6
#1 0x000055fc7e821da7 in gdb_select (n=15, readfds=0x7ffdb77c1fb0, writefds=0x0, exceptfds=0x7ffdb77c2050, timeout=0x7ffdb77c1f90) at /home/simark/src/binutils-gdb/gdb/posix-hdep.c:31
#2 0x000055fc7ddef905 in interruptible_select (n=15, readfds=0x7ffdb77c1fb0, writefds=0x0, exceptfds=0x7ffdb77c2050, timeout=0x7ffdb77c1f90) at /home/simark/src/binutils-gdb/gdb/event-top.c:1134
#3 0x000055fc7eda58e4 in ser_base_wait_for (scb=0x6250002e4100, timeout=1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:240
#4 0x000055fc7eda66ba in do_ser_base_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:365
#5 0x000055fc7eda6ff6 in generic_readchar (scb=0x6250002e4100, timeout=-1, do_readchar=0x55fc7eda663c <do_ser_base_readchar(serial*, int)>) at /home/simark/src/binutils-gdb/gdb/ser-base.c:444
#6 0x000055fc7eda718a in ser_base_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/ser-base.c:471
#7 0x000055fc7edb1ecd in serial_readchar (scb=0x6250002e4100, timeout=-1) at /home/simark/src/binutils-gdb/gdb/serial.c:393
#8 0x000055fc7ec48b8f in remote_target::readchar (this=0x617000038780, timeout=-1) at /home/simark/src/binutils-gdb/gdb/remote.c:9446
#9 0x000055fc7ec4da82 in remote_target::getpkt_or_notif_sane_1 (this=0x617000038780, buf=0x6170000387a8, forever=1, expecting_notif=1, is_notif=0x7ffdb77c24f0) at /home/simark/src/binutils-gdb/gdb/remote.c:9928
#10 0x000055fc7ec4f045 in remote_target::getpkt_or_notif_sane (this=0x617000038780, buf=0x6170000387a8, forever=1, is_notif=0x7ffdb77c24f0) at /home/simark/src/binutils-gdb/gdb/remote.c:10037
#11 0x000055fc7ec354d4 in remote_target::wait_ns (this=0x617000038780, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/remote.c:8147
#12 0x000055fc7ec38aa1 in remote_target::wait (this=0x617000038780, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/remote.c:8337
#13 0x000055fc7f1409ce in target_wait (ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/target.c:2612
#14 0x000055fc7e19da98 in do_target_wait_1 (inf=0x617000038080, ptid=..., status=0x7ffdb77c33c8, options=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:3636
#15 0x000055fc7e19e26b in operator() (__closure=0x7ffdb77c2f90, inf=0x617000038080) at /home/simark/src/binutils-gdb/gdb/infrun.c:3697
#16 0x000055fc7e19f0c4 in do_target_wait (ecs=0x7ffdb77c33a0, options=...) at /home/simark/src/binutils-gdb/gdb/infrun.c:3716
#17 0x000055fc7e1a31f7 in fetch_inferior_event () at /home/simark/src/binutils-gdb/gdb/infrun.c:4061
Before the aforementioned commit, we would not have cleared
TARGET_WNOHANG, the remote target's wait would have returned nothing,
and we would have consumed the native target's event.
After applying this revert, the testsuite state looks as good as before
for me on Ubuntu 20.04 amd64.
Change-Id: Ic17a1642935cabcc16c25cb6899d52e12c2f5c3f
|
|
Previously, TARGET_WNOHANG was cleared if a target supported async
mode even if async mode wasn't currently enabled. This change only
permits TARGET_WNOHANG if async mode is enabled.
|
|
Enabling async mode above the target layer removes duplicate code in
::resume methods of async-capable targets. Commit 5b6d1e4fa4f
("Multi-target support") enabled async mode in do_target_resume after
target_resume returns which is a step in this direction. However,
other callers of target_resume such as target_continue do not enable
async mode. Rather than enabling async mode in each of the callers
after target_resume returns, enable async mode at the end of
target_resume.
|
|
While reviewing a different patch I wanted to know more about what was
going on during GDB's stepping. I added some extra infrun debug print
calls, and I thought these might be useful to others.
|
|
While working on function calls, I realized that the thread_fsm member
of struct thread_info is a raw pointer to a resource it owns. This
commit changes the type of the thread_fsm member to a std::unique_ptr in
order to signify this ownership relationship and slightly ease resource
management (no need to manually call delete).
To ensure consistent use, the field is made a private member
(m_thread_fsm). The setter method (set_thread_fsm) can then check
that it is incorrect to associate a FSM to a thread_info object if
another one is already in place. This is ensured by an assertion.
The function run_inferior_call takes an argument as a pointer to a
call_thread_fsm and installs it in it in a thread_info instance. Also
change this function's signature to accept a unique_ptr in order to
signify that the ownership of the call_thread_fsm is transferred during
the call.
No user visible change expected after this commit.
Tested on x86_64-linux with no regression observed.
Change-Id: Ia1224f72a4afa247801ce6650ce82f90224a9ae8
|
|
Add a getter and a setter for whether a symbol is an argument. Remove
the corresponding macro and adjust all callers.
Change-Id: I71b4f0465f3dfd2ed8b9e140bd3f7d5eb8d9ee81
|
|
This moves the gdb_argv class to a new header in gdbsupport.
|
|
Move the "started" variable to the scope it's needed, and rename it to
"step_over_started".
Change-Id: I56f3384dbd328f55198063bb855edda10f1492a3
|
|
This commit brings all the changes made by running gdb/copyright.py
as per GDB's Start of New Year Procedure.
For the avoidance of doubt, all changes in this commits were
performed by the script.
|
|
A following patch will change targets so that when they detach an
inferior, they also detach any pending fork children this inferior may
have. While doing this, I hit a case where we couldn't differentiate
two cases, where in one we should detach the fork detach but not in the
other.
Suppose we continue past a fork with "follow-fork-mode == child" &&
"detach-on-fork on". follow_fork_inferior calls target_detach to detach
the parent. In that case the target should not detach the fork
child, as we'll continue debugging the child. As of now, the
tp->pending_follow field of the thread who called fork still contains
the details about the fork.
Then, suppose we run to a fork catchpoint and the user types "detach".
In that case, the target should detach the fork child in addition to the
parent. In that case as well, the tp->pending_follow field contains
the details about the fork.
To allow targets to differentiate the two cases, clear
tp->pending_follow a bit earlier, when following a fork. Targets will
then see that tp->pending_follow contains TARGET_WAITKIND_SPURIOUS, and
won't detach the fork child.
As of this patch, no behavior changes are expected.
Change-Id: I537741859ed712cb531baaefc78bb934e2a28153
|
|
While working on target_waitstatus changes, I noticed a few places where
const target_waitstatus objects could be passed by reference instead of
by pointers. And in some cases, places where a target_waitstatus could
be passed as const, but was not. Convert them as much as possible.
Change-Id: Ied552d464be5d5b87489913b95f9720a5ad50c5a
|
|
Make target_waitstatus_to_string a "to_string" method of
target_waitstatus, a bit like we have ptid_t::to_string already. This
will save a bit of typing.
Change-Id: Id261b7a09fa9fa3c738abac131c191a6f9c13905
|
|
Change gdb_assert_not_reached to accept a format string plus
corresponding arguments. This allows giving more precise messages.
Because the format string passed by the caller is prepended with a "%s:"
to add the function name, the callers can no longer pass a translated
string (`_(...)`). Make the gdb_assert_not_reached include the _(),
just like the gdb_assert_fail macro just above.
Change-Id: Id0cfda5a57979df6cdaacaba0d55dd91ae9efee7
|
|
The test-case gdb.base/foll-vfork.exp contains:
...
if [gdb_debug_enabled] {
untested "debug is enabled"
return 0
}
...
To understand what it does, I disabled this bit and ran with GDB_DEBUG=infrun,
like so:
...
$ cd $build/gdb/testsuite
$ make check GDB_DEBUG=infrun RUNTESTFLAGS=gdb.base/foll-vfork.exp
...
and ran into:
...
(gdb) PASS: gdb.base/foll-vfork.exp: exec: \
vfork parent follow, through step: set follow-fork parent
next^M
33 if (pid == 0) {^M
(gdb) FAIL: gdb.base/foll-vfork.exp: exec: \
vfork parent follow, through step: step
...
The problem is that the test-case expects:
...
(gdb) PASS: gdb.base/foll-vfork.exp: exec: \
vfork parent follow, through step: set follow-fork parent
next^M
[Detaching after vfork from child process 28169]^M
33 if (pid == 0) {^M
(gdb) PASS: gdb.base/foll-vfork.exp: exec: \
vfork parent follow, through step: step
...
but the "Detaching" line has been redirected to
$outputs/gdb.base/foll-vfork/gdb.debug.
I looked at the documentation of "set logging debugredirect [on|off]":
...
By default, GDB debug output will go to both the terminal and the logfile.
Set debugredirect if you want debug output to go only to the log file.
...
and my interpretation of it was that "debug output" did not match the
"messages" description of inferior-events:
...
The set print inferior-events command allows you to enable or disable printing
of messages when GDB notices that new inferiors have started or that inferiors
have exited or have been detached.
...
Fix the discrepancy by not using gdb_stdlog for inferior-events.
Update the gdb.base/foll-vfork.exp test-case to not require
gdb_debug_enabled == 0.
Tested on x86_64-linux.
Tested test-case gdb.base/foll-vfork.exp with and without GDB_DEBUG=infrun.
|
|
GDB hangs when doing this:
- launch inferior with multiple threads
- multiple threads hit some breakpoint(s)
- one breakpoint hit is presented as a stop, the rest are saved as
pending wait statuses
- "set scheduler-locking on"
- resume the currently selected thread (because of scheduler-locking,
it's the only one resumed), let it execute until exit
- GDB hangs, not showing the prompt, impossible to interrupt with ^C
When the resumed thread exits, we expect the target to return a
TARGET_WAITKIND_NO_RESUMED event, and that's what we see:
[infrun] fetch_inferior_event: enter
[infrun] scoped_disable_commit_resumed: reason=handling event
[infrun] random_pending_event_thread: None found.
[Thread 0x7ffff7d9c700 (LWP 309357) exited]
[infrun] print_target_wait_results: target_wait (-1.0.0 [process -1], status) =
[infrun] print_target_wait_results: -1.0.0 [process -1],
[infrun] print_target_wait_results: status->kind = no-resumed
[infrun] handle_inferior_event: status->kind = no-resumed
[infrun] handle_no_resumed: TARGET_WAITKIND_NO_RESUMED (ignoring: found resumed)
[infrun] prepare_to_wait: prepare_to_wait
[infrun] reset: reason=handling event
[infrun] maybe_set_commit_resumed_all_targets: not requesting commit-resumed for target native, no resumed threads
[infrun] fetch_inferior_event: exit
The problem is in handle_no_resumed: we check if some other thread is
actually resumed, to see if we should ignore that event (see comments in
that function for more info). If this condition is true:
(thread->executing () || thread->has_pending_waitstatus ())
... then we ignore the event. The problem is that there are some non-resumed
threads with a pending event, which makes us ignore the event. But these
threads are not resumed, so we end up waiting while nothing executes, hence
waiting for ever.
My first fix was to change the condition to:
(thread->executing ()
|| (thread->resumed () && thread->has_pending_waitstatus ()))
... but then it occured to me that we could simply check for:
(thread->resumed ())
Since "executing" implies "resumed", checking simply for "resumed"
covers threads that are resumed and executing, as well as threads that
are resumed with a pending status, which is what we want.
Change-Id: Ie796290f8ae7f34c026ca3a8fcef7397414f4780
|
|
I don't find that the bpstat typedef, which hides a pointer, is
particularly useful. In fact, it confused me many times, and I just see
it as something to remember that adds cognitive load. Also, with C++,
we might want to be able to pass bpstats objects by const-reference, not
necessarily by pointer.
So, remove the bpstat typedef and rename struct bpstats to bpstat (since
it represents one bpstat, it makes sense that it is singular).
Change-Id: I52e763b6e54ee666a9e045785f686d37b4f5f849
|
|
In debug messages, I think it would be more helpful to print ptid using
the simple "pid.lwp.tid" notation in infrun debug messages. I am
currently debugging some fork issues, and find the pid_to_str output not
so useful, as it doesn't tell which process a thread belongs to.
It currently shows up like this:
[infrun] resume_1: step=1, signal=GDB_SIGNAL_0, trap_expected=0, current thread [Thread 0x7ffff7d95740 (LWP 892942)] at 0x55555555521f
With the patch, it shows up like this:
[infrun] resume_1: step=1, signal=GDB_SIGNAL_0, trap_expected=1, current thread [894072.894077.0] at 0x5555555551d9
Change-Id: I130796d7dfb0d8e763b8358d8a6002701d80c4ea
|
|
The bug fixed by this [1] patch was caused by an out-of-bounds access to
a value's content. The code gets the value's content (just a pointer)
and then indexes it with a non-sensical index.
This made me think of changing functions that return value contents to
return array_views instead of a plain pointer. This has the advantage
that when GDB is built with _GLIBCXX_DEBUG, accesses to the array_view
are checked, making bugs more apparent / easier to find.
This patch changes the return types of these functions, and updates
callers to call .data() on the result, meaning it's not changing
anything in practice. Additional work will be needed (which can be done
little by little) to make callers propagate the use of array_view and
reap the benefits.
[1] https://sourceware.org/pipermail/gdb-patches/2021-September/182306.html
Change-Id: I5151f888f169e1c36abe2cbc57620110673816f3
|
|
I stumbled on a bug caused by the fact that a code path read
target_waitstatus::value::sig (expecting it to contain a gdb_signal
value) while target_waitstatus::kind was TARGET_WAITKIND_FORKED. This
meant that the active union field was in fact
target_waitstatus::value::related_pid, and contained a ptid. The read
signal value was therefore garbage, and that caused GDB to crash soon
after. Or, since that GDB was built with ubsan, this nice error
message:
/home/simark/src/binutils-gdb/gdb/linux-nat.c:1271:12: runtime error: load of value 2686365, which is not a valid value for type 'gdb_signal'
Despite being a large-ish change, I think it would be nice to make
target_waitstatus safe against that kind of bug. As already done
elsewhere (e.g. dynamic_prop), validate that the type of value read from
the union matches what is supposed to be the active field.
- Make the kind and value of target_waitstatus private.
- Make the kind initialized to TARGET_WAITKIND_IGNORE on
target_waitstatus construction. This is what most users appear to do
explicitly.
- Add setters, one for each kind. Each setter takes as a parameter the
data associated to that kind, if any. This makes it impossible to
forget to attach the associated data.
- Add getters, one for each associated data type. Each getter
validates that the data type fetched by the user matches the wait
status kind.
- Change "integer" to "exit_status", "related_pid" to "child_ptid",
just because that's more precise terminology.
- Fix all users.
That last point is semi-mechanical. There are a lot of obvious changes,
but some less obvious ones. For example, it's not possible to set the
kind at some point and the associated data later, as some users did.
But in any case, the intent of the code should not change in this patch.
This was tested on x86-64 Linux (unix, native-gdbserver and
native-extended-gdbserver boards). It was built-tested on x86-64
FreeBSD, NetBSD, MinGW and macOS. The rest of the changes to native
files was done as a best effort. If I forgot any place to update in
these files, it should be easy to fix (unless the change happens to
reveal an actual bug).
Change-Id: I0ae967df1ff6e28de78abbe3ac9b4b2ff4ad03b7
|
|
The "make thread_suspend_state::stop_pc optional" patch caused a
regression on Windows when using shared libraries. I tracked this
down to an unguarded use of stop_pc() in the TARGET_WAITKIND_LOADED
case of handle_inferior_event. This patch fixes the bug by ensuring
that the stop PC is set at this point.
|
|
"follow-fork-mode child"
We found that when handling forks, two inferiors can unexpectedly share
their program space and address space. To reproduce:
1. Using a test program that forks...
2. "set follow-fork-mode child"
3. "set detach-on-fork on" (the default)
4. run to a breakpoint somewhere after the fork
Step 4 should have created a new inferior:
(gdb) info inferiors
Num Description Connection Executable
1 <null> /home/smarchi/build/wt/amd/gdb/fork
* 2 process 251425 1 (native) /home/smarchi/build/wt/amd/gdb/fork
By inspecting the state of GDB, we can see that the two inferiors now
share one program space and one address space:
Inferior 1:
(top-gdb) p inferior_list.m_front.num
$2 = 1
(top-gdb) p inferior_list.m_front.aspace
$3 = (struct address_space *) 0x5595e2520400
(top-gdb) p inferior_list.m_front.pspace
$4 = (struct program_space *) 0x5595e2520440
Inferior 2:
(top-gdb) p inferior_list.m_front.next.num
$5 = 2
(top-gdb) p inferior_list.m_front.next.aspace
$6 = (struct address_space *) 0x5595e2520400
(top-gdb) p inferior_list.m_front.next.pspace
$7 = (struct program_space *) 0x5595e2520440
You can then run inferior 1 again and the two inferiors will still
erroneously share their spaces, but already at this point this is wrong.
The cause of the bad {a,p}space sharing is in follow_fork_inferior.
When following the child and detaching from the parent, we just re-use
the parent's spaces, rather than cloning them. When we switch back to
inferior 1 and run again, we find ourselves with two unrelated inferiors
sharing spaces.
Fix that by creating new spaces for the parent after having moved them
to the child. My initial implementation created new spaces for the
child instead. Doing this breaks doing "next" over fork(). When "next"
start, we record the symtab of the starting location. When the program
stops, we compare that symtab with the symtab the program has stopped
at. If the symtab or the line number has changed, we conclude the
"next" is done. If we create a new program space for the child and copy
the parent's program space to it with clone_program_space, it creates
new symtabs for the child as well. When the child stop, but still on
the fork() line, GDB thinks the "next" is done because the symtab
pointers no longer match. In reality they are two symtab instances that
represent the same file. But moving the spaces to the child and
creating new spaces for the parent, we avoid this problem.
Note that the problem described above happens today with "detach-on-fork
off" and "follow-fork-mode child", because we create new spaces for the
child. This will have to be addressed later.
Test-wise, improve gdb.base/foll-fork.exp to set a breakpoint that is
expected to have a location in each inferiors. Without the fix, when
the two inferiors erroneously share a program space, GDB reports a
single location.
Change-Id: Ifea76e14f87b9f7321fc3a766217061190e71c6e
|
|
This started out as changing thread_info::name to a unique_xmalloc_ptr.
That showed that almost all users of that field had the same logic to
get a thread's name: use thread_info::name if non-nullptr, else ask the
target. Factor out this logic in a new thread_name free function. Make
the field private (rename to m_name) and add some accessors.
Change-Id: Iebdd95f4cd21fbefc505249bd1d05befc466a2fc
|
|
The ptid_t 'tid' member is normally used as an address in gdb -- both
bsd-uthread and ravenscar-thread use it this way. However, because
the type is 'long', this can cause problems with sign extension.
This patch changes the type to ULONGEST to ensure that sign extension
does not occur.
|
|
Currently the stop_pc field of thread_suspect_state is a CORE_ADDR and
when we want to indicate that there is no stop_pc available we set
this field back to a special value.
There are actually two special values used, in post_create_inferior
the stop_pc is set to 0. This is a little unfortunate, there are
plenty of embedded targets where 0 is a valid pc value. The more
common special value for stop_pc though, is set in
thread_info::set_executing, where the value (~(CORE_ADDR) 0) is used.
This commit changes things so that the stop_pc is instead a
gdb::optional. We can now explicitly reset the field to an
uninitialised state, we also have asserts that we don't read the
stop_pc when its in an uninitialised state (both in
gdbsupport/gdb_optional.h, when compiling with _GLIBCXX_DEBUG
defined, and in thread_info::stop_pc).
One situation where a thread will not have a stop_pc value is when the
thread is stopped as a consequence of GDB being in all stop mode, and
some other thread stopped at an interesting event. When GDB brings
all the other threads to a stop those other threads will not have a
stop_pc set (thus avoiding an unnecessary read of the pc register).
Previously, when GDB passed through handle_one (in infrun.c) the
threads executing flag was set to false and the stop_pc field was left
unchanged, i.e. it would (previous) have been left as ~0.
Now, handle_one leaves the stop_pc with no value.
This caused a problem when we later try to set these threads running
again, in proceed() we compare the current pc with the cached stop_pc.
If the thread was stopped via handle_one then the stop_pc would have
been left as ~0, and the compare (in proceed) would (likely) fail.
Now however, this compare tries to read the stop_pc when it has no
value and this would trigger an assert.
To resolve this I've added thread_info::stop_pc_p() which returns true
if the thread has a cached stop_pc. We should only ever call
thread_info::stop_pc() if we know that there is a cached stop_pc,
however, this doesn't mean that every call to thread_info::stop_pc()
needs to be guarded with a call to thread_info::stop_pc_p(), in most
cases we know that the thread we are looking at stopped due to some
interesting event in that thread, and so, we know that the stop_pc is
valid.
After running the testsuite I've seen no other situations where
stop_pc is read uninitialised.
There should be no user visible changes after this commit.
|